[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

390.0. "BUGCHECKFATAL and BADDALRQSZ help!" by ODIXIE::RREEVES () Wed Mar 26 1997 20:15

    
    
    
    
    
    
    
    
    
    I need a little advice. My customer continues to receive these
    type of crashes. I thought that if BUGCHECKFATAL was set to 0 
    that the system would continue after logging this event to 
    the error log. Is this handled differently in ALPHA 
    VMS Version 6.2-1h2? 
    Is there someway to have this exception handled as "non"fatal ?
    
    Thanks,
    
    Ray Reeves 
    
    OpenVMS (TM) Alpha System dump analyzer
    
    Dump taken on 24-MAR-1997 14:42:29.39
    BADDALRQSZ, Bad memory deallocation request size or address
    
    SDA> show crash
    
    System crash information
    ------------------------
    Time of system crash: 24-MAR-1997 14:42:29.39
    
    
    Version of system: OpenVMS (TM) Alpha Operating System, Version
    V6.2-1H2
    
    System Version Major ID/Minor ID: 3/0
    
    
    System type: AlphaServer 2100A 5/250
    
    Crash CPU ID/Primary CPU ID:  00/00
    
    Bitmask of CPUs active/available:  00000003/00000003
    
    
    CPU bugcheck codes:
            CPU 00 -- BADDALRQSZ, Bad memory deallocation request size or
    address
            1 other -- CPUEXIT, Shutdown requested by another CPU
    
    
        Press RETURN for more.
    SDA> 
    
    CPU 00 Processor crash information
    ----------------------------------
    
    
    CPU 00 reason for Bugcheck: BADDALRQSZ, Bad memory deallocation request
    size or 
    address
    
    
    Process currently executing on this CPU: AAE
    
    
    Current IPL: 8  (decimal)
    
    
    CPU database address:  8100C000
    
    
    CPUs Capabilities:    PRIMARY,QUORUM,RUN
    
    
    
    
        Press RETURN for more.
    SDA> 
    
    CPU 00 Processor crash information
    ----------------------------------
    
    General registers:
    
    R0  = 00000000 00000058 R1  = 00000000 00002780 R2  = 00000000 00000002
    R3  = 00000000 00002780 R4  = 00000000 00000001 R5  = FFFFFFFF 8B625ED0
    R6  = FFFFFFFF 819E3582 R7  = FFFFFFFF 8B624540 R8  = FFFFFFFF 8100FC00
    R9  = 00000000 001D7200 R10 = 00000000 7FF9D228 R11 = 00000000 7FFBE3E0
    R12 = 00000000 001E9A48 R13 = FFFFFFFF A4DDA8D8 R14 = 00000000 00063424
    R15 = 00000000 00035AB0 R16 = 00000000 0000005C R17 = 00000001 00000000
    R18 = FFFFFFFF 81014DF0 R19 = 00000000 00000002 R20 = 00000000 001B2780
    R21 = FFFFFFFF 819E358D R22 = FFFFFFFF 8B624544 R23 = 00000000 00000008
    R24 = FFFFFFFF 81014DF0 AI  = FFFFFFFF 819E3582 RA  = 00000000 00000010
    PV  = FFFFFFFF 8B625ED0 R28 = 00000000 0000005E FP  = 00000000 7FF91F60
    PC  = FFFFFFFF 8001C0C0 PS  = 30000000 00000804
    
    
    
    
    
T.RTitleUserPersonal
Name
DateLines
390.1EVMS::MORONEYWed Mar 26 1997 21:158
BUGCHECKFATAL determines whether to crash when you get a nonfatal bugcheck.
You always get a crash on a fatal bugcheck.

You have an IPL 8 kernel mode bugcheck, these are always fatal.  Suggest you
find the problem and fix it (are they running some funny drivers?) or report
the problem if it is Digital code that's dying and the problem is new.

-Mike
390.2EEMELI::MOSEROrienteers do it in the bush...Thu Mar 27 1997 06:328
    re: .0
    
    just out of curiosity: do you have either Pathworks and/or Multinet
    running on this system? I have a case open for over 1 year with
    BADDALRQSZ, and if I can find some other friends around the world with
    the same footprint, it would just help...
    
    /cmos
390.3Emulex product.ODIXIE::RREEVESThu Mar 27 1997 06:435
    No not pathworks or Multinet we have a product from Emulex called
    Levereged Host. It is there Novell file server Emulation software for 
    VMS. What is really interesting about this crash is it just started 
    about one month ago. Everything was working fine for the last 8 months
    until then.
390.4SPINLOCK timer expiring related ? ODIXIE::RREEVESThu Mar 27 1997 06:5113
    Could this problem be related to a SPINLOCK which expires. We have also 
    crashed because the SPINLOCK timer expires. I'm going to increase the 
    spin lock wait sysgen parameters to attempt to solve that crash
    problem. 
    
    Does anyone think that these could be related ?
    
    Also what value should I use for POOLCHECK to help debug this problem
    in the crash the next time it occurs ?
    
    Thanks in advance.
    
    Ray
390.5EEMELI::MOSEROrienteers do it in the bush...Thu Mar 27 1997 08:406
    poolcheck does not help identify the BADDALRQSZ bugcheck, since the 
    size field is at offset 8, and poolcheck doesn't check that. Of course
    you can enable poolcheck in order to try to detect any other pool
    corruption.
    
    /cmos
390.6LASSIE::CORENZWITstuck in postcrypt queueThu Mar 27 1997 10:1210
    It used to be possible, and I expect it still is, to get this from
    deallocating a non-paged pool block with garbage in the type field. 
    There is a range of type values that are interpreted as having
    something to do with VAX 782 shared memory.
    
    Oops, now I've either revealed my age by the accuracy of this
    incredibly dated information, or by the inaccuracy of my memory on this
    subject.  Oh well...
    
    Julie
390.7Start With CANASTA, Then Start Looking At The Dump...XDELTA::HOFFMANSteve, OpenVMS EngineeringThu Mar 27 1997 11:0918
   You'll want to send the CLUE footprint to the CANASTA e-mail server.
   (See 233.* for further CANASTA-related information...)

   You'll want to determine what kernel-mode code is involved -- determine
   if there is any consistency around the kernel code active at the bugcheck.
   If the problem is consistently in a single application or image, or if
   it is specific to some kernel-mode code that is asynchronous to any of
   the application image(s) running "down in process space".

   Pool corruptors can be "fun" to find -- start by looking for any kernel
   code from third-parties, and inquire from their support organizations
   if this bugcheck is known.  (Some support organization might recognize
   it.)

   CANASTA will provide you with pointers to any previous reports, and --
   if recognized -- may provide you with a pointer to a solution.

390.8POOLCHECK valueEVMS::GRANTThu Mar 27 1997 15:463
    Set POOLCHECK to 1633943807 (decimal); that's 616400FF (hex). That will
    give you lowercase 'a' as an allocation pattern and lowercase 'd' as
    the deallocation pattern. Not obvious this will help but it can't hurt.
390.9Never tried CANASTA before....ODIXIE::RREEVESThu Mar 27 1997 21:233
    Thank you all for your advice. I have never had the pleasure of 
    using CANASTA in the past. I really don't know what it is but I'll
    start  with the note suggested.