[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5286.0. "invexceptn in dudriver 6.2" by CSC32::REIGELMAN () Thu Apr 17 1997 03:10

Has any one seem this before, customer had a system crash with an INVEXCEPTN
but for resaons unknown to me they halted the machine before it finished
writing the dump file. Here is what I able to gather from the VCS and
the live system. Crash is in DUdriver, but I can't match the code with
the listing we have here at the center. Also I can find a patch kit 
with the link date and file id that match the version of DUdriver they are
running.  Just prior to the crash there where some shadow sets that when
into mount verify.



        Image Identification Information

                image name: "DUDRIVER"
                image file identification: "X-96"
                link date/time:  6-FEB-1997 12:08:23.32
                linker identification: "05-13"

        Patch Information

                There are no patches at this time.
from the VCS

 **** Fatal BUG CHECK, version = V6.2     INVEXCEPTN, Exception while above ASTD
EL or on interrupt stack

     Crash CPU: 00        Primary CPU: 00

     Active/available CPU masks: 0000000F/0000000F

     Current process = NULL

     Register dump

        R0 = 00000008
        R1 = 04080000
        R2 = BD2BF6E0
        R3 = 00000000
        R4 = BBEF2790
        R5 = BBFD2ED8
        R6 = 00000000
        R7 = 000000CC
        R8 = BBEF3140
        R9 = BBFA8280
        R10= 7FE80F3E
        R11= 7FE80F0A
        AP = 7FE21970
        FP = 7FE2194C
        SP = CC621D34
        PC = B6357C7F
        PSL= 04080009

     Kernel/interrupt/boot stack

        CC621D3C  00000004
        CC621D40  7FE2194C	: FP of establisher
        CC621D44  FFFFFFFD	: depth scan
        CC621D48  00000001	: R0
        CC621D4C  00000000	: R1
        CC621D50  00000001	: flags
        CC621D54  00000005	: # of argument
        CC621D58  0000000C	: 
        CC621D5C  00000000	: reason mask
        CC621D60  00000000	: failing VA
        CC621D64  BFFC914E	: failing PC 
        CC621D68  04080004	: PSL
        CC621D6C  00000001
        CC621D70  00000000
        CC621D74  BD2BF6E0
        CC621D78  BBE2C440
        CC621D7C  BBEF2790
        CC621D80  BBFD2ED8
        CC621D84  BFFC5804
        CC621D88  BBEEC85F
        CC621D8C  BBE2BEC0
        CC621D90  BBEF2790
        CC621D94  BBEED3DC
        CC621D98  BD2BF6C0
        CC621D9C  BD2BF709
        CC621DA0  00000000
        CC621DA4  BBEF3150
        CC621DA8  BBEEDD85
        CC621DAC  BBEED46C
        CC621DB0  BBEF3150
        CC621DB4  CC620210
        CC621DB8  00000034
        CC621DBC  00003768
        CC621DC0  7FE80E80
        CC621DC4  7FE80F3E
        CC621DC8  7FE80F0A
        CC621DCC  7FE21970
        CC621DD0  B648F318
        CC621DD4  BBEED445
        CC621DD8  00000001
        CC621DDC  00000000
        CC621DE0  00000008
        CC621DE4  CC620000
        CC621DE8  BD7D7240
        CC621DEC  CCCB5400
        CC621DF0  0000C418
        CC621DF4  00000000
        CC621DF8  B63DCD53
        CC621DFC  04C30004


     Loaded images

 [SYSMSG]SYSMSG.EXE                     B6225A00 B6265800
 [SYS$LDR]SYSLDR_DYN.EXE                B6495200 B6497200
 [SYS$LDR]DDIF$RMS_EXTENSION.EXE        B6497800 B6498A00
 [SYS$LDR]RECOVERY_UNIT_SERVICES.EXE    B6498C00 B6499400
 [SYS$LDR]RMS.EXE                       B6265C00 B6290E00
 VBSS.EXE                               B6308A00 B630A600
 VAXCLUSTER_CACHE.EXE                   B630AC00 B630F600
 SYS$NETWORK_SERVICES.EXE               B630FC00 B630FE00
 SYS$UTC_SERVICES.EXE                   B6310400 B6311200
 SYS$TRANSACTION_SERVICES.EXE           B6311800 B631D800
 SYS$IPC_SERVICES.EXE                   B631DC00 B6330000
 CPULOA.EXE                             B6330200 B6335200
 LMF$GROUP_TABLE.EXE                    B6337600 B6339000
 SYSLICENSE.EXE                         B6339400 B633B200
 SNAPSHOT_SERVICES.EXE                  B633B800 B633C400
 SYSGETSYI.EXE                          B633CA00 B633E200
 SYSDEVICE.EXE                          B633E600 B6340E00
 MESSAGE_ROUTINES.EXE                   B6341400 B6347400
 EXCEPTION.EXE                          B6357A00 B6362200
 LOGICAL_NAMES.EXE                      B6362A00 B6364E00
 SECURITY.EXE                           B6365800 B636F200
 LOCKING.EXE                            B636FC00 B6376A00
 PAGE_MANAGEMENT.EXE                    B6377400 B6381000
 WORKING_SET_MANAGEMENT.EXE             B63C1E00 B63C7A00
 IMAGE_MANAGEMENT.EXE                   B63C8400 B63CB800
 EVENT_FLAGS_AND_ASTS.EXE               B63CBE00 B63CDE00
 IO_ROUTINES.EXE                        B63CE800 B63DB000
 PROCESS_MANAGEMENT.EXE                 B63DCC00 B63E8200
 ERRORLOG.EXE                           B6487600 B6488200
 PRIMITIVE_IO.EXE                       B6488800 B6489A00
 SYSTEM_SYNCHRONIZATION_SPC.EXE         B6489E00 B648E200
 SYSTEM_PRIMITIVES_MIN.EXE              B648E800 B6492600

 **** Starting memory dump, writing dump to HBVS member with unit number of 95
     Header and error log buffers dumped...
     SPT & GPT dumped...
     System space dumped...
     Global pages dumped...
     AUDIT_SERVER dumped...
     NETACP dumped...
     REMACP dumped...
     CONFIGURE dumped...
     PO_RPT_PTR dumped...
     IPCACP dumped...
     ERRFMT dumped...
     CACHE_SERVER dumped...
     CLUSTER_SERVER dumped...
     OPCOM dumped...
     JOB_CONTROL dumped...
     SHADOW_SERVER dumped...
     SECURITY_SERVER dumped...
     SMISERVER dumped...
     TP_SERVER dumped...
     SYMBIONT_104 dumped...
     MULTINET_SERVER dumped...
     LATACP dumped...
     EVL dumped...
     H i t m a n     dumped...
     SYMBIONT_94 dumped...
     RDMS_MONITOR dumped...
     DDS$055_i1 dumped...
     SYMBIONT_97 dumped...
     SYMBIONT_106 dumped...
     LH_HPLAS_D dumped...
     PSDC$DC_SERVER dumped...
     MRLISTEN_8213 dumped...
     SYMBIONT_109 dumped...
     SYMBIONT_98 dumped...
     SYMBIONT_105 dumped...
     ROBO_SERVER dumped...
     ROBO_ACTION dumped...
     SYMBIONT_107 dumped...
     NSCHED dumped...
     LPS_CYLPSA dumped...
     SYMBIONT_99 dumped...
     ROBOCHG_COLLECT dumped...
     OPENV_Server dumped...
     SYMBIONT_96 dumped...
     LH_HPLAS_K dumped...
     RPC$SWL dumped...
     DCE$RPCD dumped...
     LH_HPLAS_C dumped...
     SYMBIONT_93 dumped...
     SYMBIONT_92 dumped...
     LH_HPLAS_B dumped...
     SYMBIONT_91 dumped...
     SYMBIONT_85 dumped...
     SYMBIONT_84 dumped...
     SYMBIONT_81 dumped...
     SYMBIONT_80 dumped...
     MRLOGGER dumped...
     SCHED_REMOTE dumped...
     LH_HPLAS_E dumped...
     SYMBIONT_82 dumped...
     MR$T_N_1 dumped...
     MR$T_N_2 dumped...
     SYMBIONT_78 dumped...
     SYMBIONT_79 dumped...
     DQS$NOTIFIER dumped...
     DENNY_DM dumped...
     SYMBIONT_103 dumped...
     DDS$055_lt2 dumped...
     OA$FCV dumped...
     DDS$055_lt1 dumped...
     SYMBIONT_182 dumped...
     DDS$LSTN_055_1 dumped...
     SYMBIONT_111 dumped...
     ALLIN1_103 dumped...

 CPU:0 Console entry reason: ^P or Node Halt

 Entry PC: BFFCE7F0     Entry PSL:041F8200
 P00>>>
 P00>>>


from the live using the PC from the console output.


SDA> exam/inst BFFC914E
DUDRIVER+0508E:  CMPF    #3C,#30510830
SDA> exam/inst BFFC914E-40;50
DUDRIVER+0504E:  MOVQ    (SP)+,R4
DUDRIVER+05051:  RSB
DUDRIVER+05052:  REMQUE  @00B8(R3),R5
DUDRIVER+05057:  BVS     DUDRIVER+05069
DUDRIVER+05059:  BISL3   60(R5),64(R5),R0
DUDRIVER+0505F:  BNEQ    DUDRIVER+05052
DUDRIVER+05061:  MOVAB   60(R5),R5
DUDRIVER+05065:  BSBB    DUDRIVER+05070
DUDRIVER+05067:  BRB     DUDRIVER+05052
DUDRIVER+05069:  RSB
DUDRIVER+0506A:  REMQUE  -60(R5),-(SP)
DUDRIVER+0506E:  TSTL    (SP)+
DUDRIVER+05070:  PUSHR   #3F
DUDRIVER+05072:  REMQUE  @-20(R5),R0
DUDRIVER+05076:  BVS     DUDRIVER+05087
DUDRIVER+05078:  PUSHL   R1
DUDRIVER+0507A:  MOVZWL  #0830,R1
DUDRIVER+0507F:  BSBW    DUDRIVER+0530A
DUDRIVER+05082:  MOVL    (SP)+,R1
DUDRIVER+05085:  BRB     DUDRIVER+05072
DUDRIVER+05087:  REMQUE  @-28(R5),R0
DUDRIVER+0508B:  BVS     DUDRIVER+0509C
DUDRIVER+0508D:  PUSHL   R1
DUDRIVER+0508F:  MOVZWL  #0830,R1
DUDRIVER+05094:  BSBW    DUDRIVER+0530A
DUDRIVER+05097:  MOVL    (SP)+,R1
DUDRIVER+0509A:  BRB     DUDRIVER+05087
DUDRIVER+0509C:  MOVAB   -60(R5),R0
DUDRIVER+050A0:  JSB     @#V_COM$DRVDEALMEM
SDA> sho sym/all mmg$gl_npag

Symbols sorted by name
----------------------
MMG$GL_NPAGEDYN                 = 80008460 : BBDB8000
MMG$GL_NPAGNEXT                 = 80008464 : BFFD2000




T.RTitleUserPersonal
Name
DateLines
5286.1It may not have been in DUDRIVERCSC32::B_HIBBERTWhen in doubt, PANICThu Apr 17 1997 11:1921
    Hi Tim,
    
       Comparing the crash PC to the live system gives only a chance at
    finding the correct module where the crash occurred.  In this case I
    suspect that something other than DUDRIVER was loaded at BFFC914E.
    Note that this address is in the middle of an instruction on the live
    system.
    
       You might try using CLUE to see if you can get anything out of the
    dump file (sometimes it works, other times it doesn't).  First check
    SYS$ERRORLOG to see if there is a CLUE output file for this crash, if
    not you can TRY the following:
    $ CLUE :== $CLUE  !define a foriegn command
    $ CLUE /CANASTA SYS$SYSTEM:SYSDUMP.DMP
    
    You can specify and output file on the /CANASTA=file.name switch.
    If this works, it will give you a more accurate module and offset than
    trying to compare the fault address with the live system.
    
    Brian Hibbert
    
5286.2Get Latest Shadowing Patches...XDELTA::HOFFMANSteve, OpenVMS EngineeringThu Apr 17 1997 11:5516
   Get the CLUE output (if you can -- you will want to
   explain to the customer that they should not halt
   the system before the dump has completed), and send
   it to the CANASTA e-mail server -- see VMSNOTES 233.*
   for pointers. Then take a look at the CANASTA response.

   See if anything is turned up by the COMET search engine
   (at http://comet.alf.dec.com/).

   That 6-Feb-1997 date makes it look like there are some
   DUDRIVER-related patches installed, probably shadowing.
   (And I'd definitely get the latest shadowing patches,
   and I'd seriously consider upgrading to the "redhawk"
   V7.1 compatibility kit.)

5286.3VMSSG::FRIEDRICHSAsk me about Young EaglesThu Apr 17 1997 12:4116
   The driver is from the CLUSIO01_062 TIMA kit.  This is the most current
   V6.2 remedial kit for Shadowing and DUDRIVER and is also a superset
   of the "V7.1 Cluster Compatibility Kit"
   
   I took a look and the stack doesn't make a lot of sense given the code
   path.  R0 should have the address of the item removed from the
   CDRP$L_ABTDQFL queue.  Kevin suggested that if the queue was corrupt, 
   it might lead to an ACCVIO during the REMQUE.  
   
   Of course, all of this is based on the probability that DUDRIVER got 
   reloaded at the same base address.  If not, who knows where the PC
   was at the time of the crash.
   
   Cheers,
   jeff
   
5286.4thanksCSC32::REIGELMANThu Apr 17 1997 21:237
    Thank you to all who replied, yes looking on the live system
    is a best guess and we have told the customer several times
    don't touch that button. But do they listen. When I couldn't 
    find dudriver with that link date, I just wanted to be sure
    that I didn't miss a patch.
    
    Tim