[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:	VAX and Alpha VMS
Notice:	This is a new VMSnotes, please read note 2.1
Moderator:	VAXAXP::BERNARDO

Created:	Wed Jan 22 1997
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	703
Total number of notes:	3722

390.0. "BUGCHECKFATAL and BADDALRQSZ help!" by ODIXIE::RREEVES () Wed Mar 26 1997 20:15

    
    
    
    
    
    
    
    
    
    I need a little advice. My customer continues to receive these
    type of crashes. I thought that if BUGCHECKFATAL was set to 0 
    that the system would continue after logging this event to 
    the error log. Is this handled differently in ALPHA 
    VMS Version 6.2-1h2? 
    Is there someway to have this exception handled as "non"fatal ?
    
    Thanks,
    
    Ray Reeves 
    
    OpenVMS (TM) Alpha System dump analyzer
    
    Dump taken on 24-MAR-1997 14:42:29.39
    BADDALRQSZ, Bad memory deallocation request size or address
    
    SDA> show crash
    
    System crash information
    ------------------------
    Time of system crash: 24-MAR-1997 14:42:29.39
    
    
    Version of system: OpenVMS (TM) Alpha Operating System, Version
    V6.2-1H2
    
    System Version Major ID/Minor ID: 3/0
    
    
    System type: AlphaServer 2100A 5/250
    
    Crash CPU ID/Primary CPU ID:  00/00
    
    Bitmask of CPUs active/available:  00000003/00000003
    
    
    CPU bugcheck codes:
            CPU 00 -- BADDALRQSZ, Bad memory deallocation request size or
    address
            1 other -- CPUEXIT, Shutdown requested by another CPU
    
    
        Press RETURN for more.
    SDA> 
    
    CPU 00 Processor crash information
    ----------------------------------
    
    
    CPU 00 reason for Bugcheck: BADDALRQSZ, Bad memory deallocation request
    size or 
    address
    
    
    Process currently executing on this CPU: AAE
    
    
    Current IPL: 8  (decimal)
    
    
    CPU database address:  8100C000
    
    
    CPUs Capabilities:    PRIMARY,QUORUM,RUN
    
    
    
    
        Press RETURN for more.
    SDA> 
    
    CPU 00 Processor crash information
    ----------------------------------
    
    General registers:
    
    R0  = 00000000 00000058 R1  = 00000000 00002780 R2  = 00000000 00000002
    R3  = 00000000 00002780 R4  = 00000000 00000001 R5  = FFFFFFFF 8B625ED0
    R6  = FFFFFFFF 819E3582 R7  = FFFFFFFF 8B624540 R8  = FFFFFFFF 8100FC00
    R9  = 00000000 001D7200 R10 = 00000000 7FF9D228 R11 = 00000000 7FFBE3E0
    R12 = 00000000 001E9A48 R13 = FFFFFFFF A4DDA8D8 R14 = 00000000 00063424
    R15 = 00000000 00035AB0 R16 = 00000000 0000005C R17 = 00000001 00000000
    R18 = FFFFFFFF 81014DF0 R19 = 00000000 00000002 R20 = 00000000 001B2780
    R21 = FFFFFFFF 819E358D R22 = FFFFFFFF 8B624544 R23 = 00000000 00000008
    R24 = FFFFFFFF 81014DF0 AI  = FFFFFFFF 819E3582 RA  = 00000000 00000010
    PV  = FFFFFFFF 8B625ED0 R28 = 00000000 0000005E FP  = 00000000 7FF91F60
    PC  = FFFFFFFF 8001C0C0 PS  = 30000000 00000804

T.R	Title	User	Personal Name	Date	Lines
390.1		EVMS::MORONEY		`Wed Mar 26 1997 21:15`	8
	BUGCHECKFATAL determines whether to crash when you get a nonfatal bugcheck. You always get a crash on a fatal bugcheck. You have an IPL 8 kernel mode bugcheck, these are always fatal. Suggest you find the problem and fix it (are they running some funny drivers?) or report the problem if it is Digital code that's dying and the problem is new. -Mike
390.2		EEMELI::MOSER	Orienteers do it in the bush...	`Thu Mar 27 1997 06:32`	8
	re: .0 just out of curiosity: do you have either Pathworks and/or Multinet running on this system? I have a case open for over 1 year with BADDALRQSZ, and if I can find some other friends around the world with the same footprint, it would just help... /cmos
390.3	Emulex product.	ODIXIE::RREEVES		`Thu Mar 27 1997 06:43`	5
	No not pathworks or Multinet we have a product from Emulex called Levereged Host. It is there Novell file server Emulation software for VMS. What is really interesting about this crash is it just started about one month ago. Everything was working fine for the last 8 months until then.
390.4	SPINLOCK timer expiring related ?	ODIXIE::RREEVES		`Thu Mar 27 1997 06:51`	13
	Could this problem be related to a SPINLOCK which expires. We have also crashed because the SPINLOCK timer expires. I'm going to increase the spin lock wait sysgen parameters to attempt to solve that crash problem. Does anyone think that these could be related ? Also what value should I use for POOLCHECK to help debug this problem in the crash the next time it occurs ? Thanks in advance. Ray
390.5		EEMELI::MOSER	Orienteers do it in the bush...	`Thu Mar 27 1997 08:40`	6
	poolcheck does not help identify the BADDALRQSZ bugcheck, since the size field is at offset 8, and poolcheck doesn't check that. Of course you can enable poolcheck in order to try to detect any other pool corruption. /cmos
390.6		LASSIE::CORENZWIT	stuck in postcrypt queue	`Thu Mar 27 1997 10:12`	10
	It used to be possible, and I expect it still is, to get this from deallocating a non-paged pool block with garbage in the type field. There is a range of type values that are interpreted as having something to do with VAX 782 shared memory. Oops, now I've either revealed my age by the accuracy of this incredibly dated information, or by the inaccuracy of my memory on this subject. Oh well... Julie
390.7	Start With CANASTA, Then Start Looking At The Dump...	XDELTA::HOFFMAN	Steve, OpenVMS Engineering	`Thu Mar 27 1997 11:09`	18
	You'll want to send the CLUE footprint to the CANASTA e-mail server. (See 233.* for further CANASTA-related information...) You'll want to determine what kernel-mode code is involved -- determine if there is any consistency around the kernel code active at the bugcheck. If the problem is consistently in a single application or image, or if it is specific to some kernel-mode code that is asynchronous to any of the application image(s) running "down in process space". Pool corruptors can be "fun" to find -- start by looking for any kernel code from third-parties, and inquire from their support organizations if this bugcheck is known. (Some support organization might recognize it.) CANASTA will provide you with pointers to any previous reports, and -- if recognized -- may provide you with a pointer to a solution.
390.8	POOLCHECK value	EVMS::GRANT		`Thu Mar 27 1997 15:46`	3
	Set POOLCHECK to 1633943807 (decimal); that's 616400FF (hex). That will give you lowercase 'a' as an allocation pattern and lowercase 'd' as the deallocation pattern. Not obvious this will help but it can't hurt.
390.9	Never tried CANASTA before....	ODIXIE::RREEVES		`Thu Mar 27 1997 21:23`	3
	Thank you all for your advice. I have never had the pleasure of using CANASTA in the past. I really don't know what it is but I'll start with the note suggested.