[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1509.0. "set_noaccess bugcheck" by AZUR::ANTEUNIS (If it's possible it's not interesting) Mon Mar 24 1997 12:37

  I've encountered a similar problem then the ones mentioned in 1183.* and
  1012.*  but I am not so sure if the remedies mention there are still valid.

  The program reads requests (using DECmessageQ), then creates a thread
  to execute the request. Some requests are "long", i.e. generate many
  responses for some time. Some requests are "short", i.e. they generate 1
  response.

  Our dear customer want of course 24 * 365 unattended operation....

  My question is: what can I do to obtain this ?

  V3.2 of Digital UNIX is currently a must, since apart from DECmessageQ
  the program is also to be linked with TeMIP (ex MCC)


  The first lines of the bugcheck are

%DECthreads bugcheck (version V3.12-311), terminating execution.
% Running on DEC OSF/1 AXP [OSF1 alpha V3.2(62); cpu type 35, configured for 2
%  cpus, 2 cpus in box, 383Mb]
% Reason: set_noaccess: 12 protecting 0x27a000 to 0x27bfff


  not a single mutex is locked; my program is trying to pthread_create
  with a stacksize of 192K (as required by DECmessageQ), a default guardsize
  (which is 2048) and the new thread should be created detached.


  Further down, I find 



Current stacks:
  Thread 1 stack: 0x4000000 to 0x11fffffff (469762047 bytes)
  Thread 2 stack: 0x50000 to 0x84000 (212992 bytes); base is 0x84000, guard is
    0x51fff
  Thread 347654 stack: 0x152000 to 0x186000 (212992 bytes); base is 0x186000,
    guard is 0x153fff
  Thread 347655 stack: 0x1c6000 to 0x1fa000 (212992 bytes); base is 0x1fa000,
    guard is 0x1c7fff
Current memory:
 
  This makes a lot of sense to me, since it's a server process who 
  creates a thread for each valid request it receives. When the request
  is executed the WorkRoutine returns. I hope it cleans up the thread's
  things, as the threads are created detached.

  My counter says read_counter = 347900, and there were a number of 
  invalid requests.

  From the core I learn that Thread 1 is the one executing the main(),
  who waits for the reader thread.

  Thread 2 seems to be the reader

  Thread 3 and 4 are "long" requests

  I was expecting 1 or 2 more "short" requests


  I think the whole is reproducible, but it takes a whole day running
  it is the second time I see this, and the bugchecks are fairly similar



  Many thanks in advance,

  Dirk
T.RTitleUserPersonal
Name
DateLines
1509.1The story here hasn't changed.WTFN::SCALESDespair is appropriate and inevitable.Mon Mar 24 1997 13:4116
I believe you are running into a limitation in the amount of memory available to
your process.  (I can't construct a good, solid justification for this, but I'm
fairly sure that this is the deal.)  

The reason why you are hitting this is either because your process is hitting a
situation where it requires more memory than is available or because the process
is leaking memory.

Fixing the first case should be as simple as reconfiguring your system (upping
the vm-vpagemax parameter).  If it's the second case, you need to locate the
memory leak (which might not even be in your code).  By the way, if you
reconfigure your system for more space, it will make the program fail less often
in either case (which might make it even more frustrating... :-} ).


				Webb
1509.2SMURF::DENHAMDigital UNIX KernelMon Mar 24 1997 14:5627
    RE: running out of memory. There should be a DECthreads patch for
    an apparent memory leak. This is from the V3.2C patch README file:
    
    
    PROBLEM:        (Case ID: MGO101698 )           (Patch ID: OSF350-105)
    ********
    Applications linked with DECthreads will behave as if they have no more
    memory available to them when they are not even close to the operating
    system limit.
    
    
    FILE(s):
    
    /usr/shlib/libpthreads.so               subset OSFBASE350
    CHECKSUM: 34727    568   RCS:   cma_vm.c            Revision: 4.2.20.2
    /usr/include/dce/exc_handling.h         subset OSFPGMR350
    CHECKSUM: 33034     30   RCS:   exc_handling.h      Revision: 4.2.16.2
    /usr/ccs/lib/libpthreads.a              subset OSFPGMR350
    CHECKSUM: 32274    438   RCS:   cma_vm.c            Revision: 4.2.20.2
    ----------------------------------------------------------------------
    
    As a refresher, this was the bug relating to the bad in/out argument
    to the vm_allocate call causing the address space to become
    highly fragmented.
    
    Certainly , any 3.2 system with unexplained memory failures should
    be running this patch.
1509.3Thank you very muchAZUR::ANTEUNISIf it's possible it's not interestingTue Mar 25 1997 04:4611
Thanks a lot for all the good recommendations, I'm already setting
up a memadvise run so see if my code does not leak.

I will make sure the patches are installed, we are running V3.2c of Digital UNIX.

I hope I can make the problems stay away long enough to make the software
look like 24 * 365 with overnight or during the weekend "refreshers". I mean
run down the program and re-start it.


Dirk
1509.4Patch did not help, here some more infoAZUR::ANTEUNISIf it's possible it's not interestingThu Mar 27 1997 04:32121
Together with our system manager I managed to get the patch mentioned
in 1509.2 installed. Because we are running V2.3G he had to install the
patch on his workstation first, and then extract manually the 3 files.

The problem reproduces itself, at a slightly different limit


Here the non-comment contents of /etc/sysconfigtab

# OSF/1 1.2
proc:
max-proc-per-user = 200
vm:
vm-vpagemax = 32768
ipc:
msg-max = 32768
msg-mnb = 65535
sem-mni = 30
sem-msl = 150
sem-opm = 30
sem-ume = 30


setld tells me

OSFBASE350      installed       Base System (- Required -)
OSFBASE375      installed       Base System - V3.2G (- Required -)
OSFBIN350       installed       Standard Kernel Objects (Kernel Build
Environment)
OSFBIN375       installed       Standard Kernel Objects - V3.2G (Kernel Build
Environment)
OSFBINCOM350    installed       Kernel Header and Common Files (Kernel Build
Environment)
OSFBINCOM375    installed       Kernel Header and Common Files - V3.2G(Kernel
Build Environment)


uerf of the reboot we did after the 3 files were copied and making use of the
above sysconfigtab

********************************* ENTRY     1. *********************************

----- EVENT INFORMATION -----

EVENT CLASS                             OPERATIONAL EVENT
OS EVENT TYPE                  300.     SYSTEM STARTUP
SEQUENCE NUMBER                  0.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Wed Mar 26 17:28:49 1997
OCCURRED ON SYSTEM                      lipa1
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100
SYSTYPE                   x00000000
MESSAGE                                 PCXAL keyboard, language English
                                         _(American)

                                        Alpha boot: available memory from
                                         _0xe42000 to 0x17ffe000
                                        Digital UNIX V3.2G (Rev. 62); Thu Dec
                                         _26 10:47:06 MET 1996
                                        physical memory = 384.00 megabytes.
                                        available memory = 369.74 megabytes.
                                        using 1466 buffers containing 11.45
                                         _megabytes of memory
                                        Master cpu at slot 0.
                                        Firmware revision: 4.5
                                        PALcode: OSF version 1.45
                                        ibus0 at nexus
                                        AlphaServer 2100 4/275
                                        cpu 0 EV-45 4mb b-cache
                                        cpu 1 EV-45 4mb b-cache
                                        gpc0 at ibus0
                                        pci0 at ibus0 slot 0
                                        tu0: DECchip 21040-AA: Revision: 2.3
                                        tu0 at pci0 slot 0
                                        tu0: DEC TULIP Ethernet Interface,
                                         _hardware address: 08-00-2B-E5-1E-CB
                                        tu0: console mode: selecting 10Base5
                                         _(AUI) port
                                        psiop0 at pci0 slot 1
                                        Loading SIOP: script 1001b00, reg
                                         _81000000, data 406fdad0
                                        scsi0 at psiop0 slot 0
                                        rz0 at scsi0 bus 0 target 0 lun 0 (DEC
                                         _    RZ28     (C) DEC D41C)
                                        rz2 at scsi0 bus 0 target 2 lun 0 (DEC
                                         _    RZ28B    (C) DEC 0006)
                                        rz3 at scsi0 bus 0 target 3 lun 0 (DEC
                                         _    RZ28B    (C) DEC 0003)
                                        rz4 at scsi0 bus 0 target 4 lun 0 (DEC
                                         _    RRD43   (C) DEC  1084)
                                        rz5 at scsi0 bus 0 target 5 lun 0 (DEC
                                         _    RZ28M    (C) DEC 0466)
                                        rz6 at scsi0 bus 0 target 6 lun 0 (DEC
                                         _    RZ26     (C) DEC T386)
                                        tz1 at scsi0 bus 0 target 1 lun 0 (DEC
                                         _    TLZ06     (C)DEC 0491)
                                        eisa0 at pci0
                                        ace0 at eisa0
                                        ace1 at eisa0
                                        lp0 at eisa0
                                        fdi0 at eisa0
                                        fd0 at fdi0 unit 0
                                        ln0 at eisa0
                                        ln0: DEC LANCE Ethernet Interface,
                                         _hardware address: 08-00-2B-BE-F9-AB
                                        vga0 at eisa0
                                         1024x768 (QVision )
                                        lvm0: configured.
                                        lvm1: configured.
                                        dli: configured
                                        SuperLAT. Copyright 1993 Meridian
                                         _Technology Corp. All rights
                                         _reserved.



So now I go hunting for memory leaks and then we'll see.



Dirk
1509.5I think the patch should have just installed, if it was appropriate...WTFN::SCALESDespair is appropriate and inevitable.Thu Mar 27 1997 09:3615
Dirk,

I'm really nervous about your description of unpacking the kit to install the
patch -- that suggests to me that the patch was inappropriate for the version
that you're running (and maybe installing it was a bad idea -- you shouldn't
have to take the patch kit apart to install it!).

Jeff, if the patch was done on or for V3.2C, wouldn't it already be in V3.2G?
(I.e., didn't he just back out some stuff??)

(BTW, Dirk, a number of our high-performance computing folks have found that
a vm-pagemax of even 32K is low for large data-sets.)


				Webb
1509.6SMURF::DENHAMDigital UNIX KernelThu Mar 27 1997 10:083
    You're right, Webb. That allocation bug fix is in V3.2G.
    
    So it's probably the vpagemax issue...