[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5272.0. "crash in shdriver+9FB0 with vaxshad05_062" by CSC32::REIGELMAN () Tue Apr 01 1997 03:01

 
Problem: Customer had two nodes that crashed with the following bugcheck        
	 "INVEXCEPTN, Exception while above ASTDEL or on interrupt stack"              
	 at SHDRIVER+9FB0                                                              
                                                                                
Solution: NONE.                                                                 
                                                                                
Analysis: Bill I will make an entry in the cluster notes file to see if         
	  engineering has see this before.                                             
                                                                                
	  Customer has apply VAXSHAD05_062, there is no dump to look here is           
	  what I was able to get from the VCS log.... In both crashes R0 has           
	  been corrupted.                                                              
                                                                                
from node CYV7KE                                                                
                                                                                
 **** Fatal BUG CHECK, version = V6.2     INVEXCEPTN, Exception while above ASTD
EL or on interrupt stack                                                        
                                                                                
     Crash CPU: 00        Primary CPU: 00                                       
                                                                                
     Active/available CPU masks: 0000003F/0000003F                              
                                                                                
     Current process = OPCOM                                                    
                                                                                
     Register dump                                                              
                                                                                
        R0 = 00000008                                                           
        R1 = 04080000                                                           
        R2 = B6F7CC80                                                           
        R3 = B5942280                                                           
        R4 = B5FB8D80                                                           
        R5 = B67B8980                                                           
        R6 = C9212210                                                           
        R7 = 00000034                                                           
        R8 = B57114E0                                                           
        R9 = B57114C0                                                           
        R10= 00000000                                                           
        R11= 00000000                                                           
        AP = 7FE25668                                                           
        FP = 7FFE77E4                                                           
        SP = C9213D64                                                           
        PC = AB5D1E78                                                           
        PSL= 04080009                                                           
                                                                                
     Kernel/interrupt/boot stack                                                
                                                                                
        C9213D6C  00000004                                                      
        C9213D70  7FFE77E4                                                      
        C9213D74  FFFFFFFD                                                      
        C9213D78  00000020	saved R0                                             
        C9213D7C  00000000	saved R1                                             
        C9213D80  00000001                                                      
        C9213D84  00000005                                                      
        C9213D88  0000000C                                                      
        C9213D8C  00000004                                                      
        C9213D90  00000024	failing VA(R0 + 4)                                   
        C9213D94  B82DA930	PC                                                   
        C9213D98  04080004	PSL                                                  
        C9213D9C  B832964B                                                      
        C9213DA0  B29951C0                                                      
        C9213DA4  B2A39B90                                                      
        C9213DA8  B83254CD                                                      
        C9213DAC  AB6FECBA                                                      
        C9213DB0  B8325467                                                      
        C9213DB4  AB6FCC00                                                      
        C9213DB8  00000000                                                      
        C9213DBC  D294FD68                                                      
        C9213DC0  ECB1FCE0                                                      
        C9213DC4  B6A721C0                                                      
        C9213DC8  B64107C0                                                      
        C9213DCC  B69ABF48                                                      
        C9213DD0  0000013D                                                      
        C9213DD4  AB63F0EA                                                      
        C9213DD8  04040001                                                      
        C9213DDC  AB649FA4                                                      
        C9213DE0  00000000                                                      
        C9213DE4  00000006                                                      
        C9213DE8  AB701A50                                                      
        C9213DEC  00000000                                                      
        C9213DF0  B69ABF40                                                      
        C9213DF4  B57114C0                                                      
        C9213DF8  AB6FF7E6                                                      
        C9213DFC  00020000                                                      
                                                                                
                                                                                
     Loaded images                                                              
                                                                                
 ATK$APPLETALK_PROTOCOL_STACK           A969A000 A96B6800                       
 [SYSMSG]SYSMSG.EXE                     AB3DDE00 AB41D600                       
 [SYS$LDR]SYSLDR_DYN.EXE                AB704C00 AB706C00                       
 [SYS$LDR]DDIF$RMS_EXTENSION.EXE        AB707200 AB708400                       
 [SYS$LDR]RECOVERY_UNIT_SERVICES.EXE    AB708600 AB708E00                       
 [SYS$LDR]RMS.EXE                       AB41DA00 AB448C00                       
 VBSS.EXE                               AB582C00 AB584800                       
 VAXCLUSTER_CACHE.EXE                   AB584E00 AB589800                       
 SYS$NETWORK_SERVICES.EXE               AB589E00 AB58A000                       
 SYS$UTC_SERVICES.EXE                   AB58A600 AB58B400                       
 SYS$TRANSACTION_SERVICES.EXE           AB58BA00 AB597A00                       
 SYS$IPC_SERVICES.EXE                   AB597E00 AB5AA200                       
 CPULOA.EXE                             AB5AA400 AB5AF400                       
 LMF$GROUP_TABLE.EXE                    AB5B1800 AB5B3200                       
 SYSLICENSE.EXE                         AB5B3600 AB5B5400                       
 SNAPSHOT_SERVICES.EXE                  AB5B5A00 AB5B6600                       
 SYSGETSYI.EXE                          AB5B6C00 AB5B8400                       
 SYSDEVICE.EXE                          AB5B8800 AB5BB000                       
 MESSAGE_ROUTINES.EXE                   AB5BB600 AB5C1600                       
 EXCEPTION.EXE                          AB5D1C00 AB5DC200                       
 LOGICAL_NAMES.EXE                      AB5DCA00 AB5DEE00                       
 SECURITY.EXE                           AB5DF800 AB5E9200                       
 LOCKING.EXE                            AB5E9C00 AB5F0A00                       
 PAGE_MANAGEMENT.EXE                    AB5F1400 AB5FB000                       
 WORKING_SET_MANAGEMENT.EXE             AB63BE00 AB641A00                       
 IMAGE_MANAGEMENT.EXE                   AB642400 AB645800                       
 EVENT_FLAGS_AND_ASTS.EXE               AB645E00 AB647E00                       
 IO_ROUTINES.EXE                        AB648800 AB655000                       
 PROCESS_MANAGEMENT.EXE                 AB656C00 AB662200                       
 ERRORLOG.EXE                           AB6F7200 AB6F7C00                       
 PRIMITIVE_IO.EXE                       AB6F8200 AB6F9400                       
 SYSTEM_SYNCHRONIZATION_SPC.EXE         AB6F9800 AB6FDC00                       
 SYSTEM_PRIMITIVES_MIN.EXE              AB6FE200 AB702000                       
                                                                                
 **** Starting memory dump, writing dump to HBVS member with unit number of 95  
     Header and error log buffers dumped...                                     
     SPT & GPT dumped...                                                        
                                                                                
CYV7KE=>anal/crash SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP;2                      
                                                                                
                                                                                
                                                                                
OpenVMS (TM) VAX System dump analyzer                                           
                                                                                
%SDA-F-DUMPINCOMPL, the dump file write was not completed                       
                                                                                
from the live system                                                            
SDA> exam/inst b82da930                                                         
B82DA930:  MOVL    24(R3),04(R0)                                                
SDA> eval shdriver                                                              
Hex = B82D0980   Decimal = -1205008000         SHDRIVER                         
SDA> eval b82da930-B82D0980                                                     
Hex = 00009FB0   Decimal = 40880                                                
                                                                                
SDA> exam/inst b82da930-20;30                                                   
%SDA-W-INSKIPPED, unreasonable instruction stream - 2 bytes skipped             
B82DA912:  BBC     #07,02FE(R2)[R0],B82DA940                                    
B82DA919:  MOVL    0360(R2)[R0],R1                                              
B82DA91F:  MOVZWL  5C(R3),R0                                                    
B82DA923:  MULW2   #08,R0                                                       
B82DA926:  ADDL2   R1,R0                                                        
B82DA929:  BITB    #01,00B3(R3)                                                 
B82DA92E:  BNEQ    B82DA939                                                     
B82DA930:  MOVL    24(R3),04(R0)                                                
B82DA935:  BISW2   #04,02(R0)                                                   
B82DA939:  BICB2   #02,02(R0)                                                   
B82DA93D:  INCW    -06(R1)                                                      
B82DA940:  TSTL    10(R3)                                                       
                                                                                
                                                                                
                                                                                
                                                                                
********************************************************************************
from node CYV7KD                                                                
                                                                                
 **** Fatal BUG CHECK, version = V6.2     INVEXCEPTN, Exception while above ASTD
EL or on interrupt stack                                                        
                                                                                
     Crash CPU: 00        Primary CPU: 00                                       
                                                                                
     Active/available CPU masks: 0000000F/0000000F                              
                                                                                
     Current process = NULL                                                     
                                                                                
     Register dump                                                              
                                                                                
        R0 = 00000008                                                           
        R1 = 04080000                                                           
        R2 = BBFE9080                                                           
        R3 = BBE55300                                                           
        R4 = BB30B240                                                           
        R5 = BBCFC640                                                           
        R6 = 00000000                                                           
        R7 = 00000034                                                           
        R8 = BA6F1A80                                                           
        R9 = 0027F060                                                           
        R10= 0027AB38                                                           
        R11= 7FD12512                                                           
        AP = 7FD11D5C                                                           
        FP = 7FD11D34                                                           
        SP = CB75BD50                                                           
        PC = B47D4E78                                                           
        PSL= 04080009                                                           
                                                                                
     Kernel/interrupt/boot stack                                                
                                                                                
        CB75BD58  00000004                                                      
        CB75BD5C  7FD11D34                                                      
        CB75BD60  FFFFFFFD                                                      
        CB75BD64  00000008	saved R0                                             
        CB75BD68  00000000	saved R1                                             
        CB75BD6C  00000001                                                      
        CB75BD70  00000005                                                      
        CB75BD74  0000000C                                                      
        CB75BD78  00000004                                                      
        CB75BD7C  0000000C	failing VA(R0 + 4)                                   
        CB75BD80  BE4A3330	PC                                                   
        CB75BD84  04080004	PSL                                                  
        CB75BD88  BA6EB11F                                                      
        CB75BD8C  BA6071C0                                                      
        CB75BD90  BA6F1190                                                      
        CB75BD94  BA6EBC9C                                                      
        CB75BD98  BB4D5B80                                                      
        CB75BD9C  BB4D5BC8                                                      
        CB75BDA0  00000000                                                      
        CB75BDA4  BA6F1A90                                                      
        CB75BDA8  BA6EC645                                                      
        CB75BDAC  BA6EBD2C                                                      
        CB75BDB0  BA6F1A90                                                      
        CB75BDB4  CB75A210                                                      
        CB75BDB8  00000034                                                      
        CB75BDBC  7FD121F9                                                      
        CB75BDC0  0027F060                                                      
        CB75BDC4  0027AB38                                                      
        CB75BDC8  7FD12512                                                      
        CB75BDCC  7FD11D5C                                                      
        CB75BDD0  B490C118                                                      
        CB75BDD4  BA6EBD05                                                      
        CB75BDD8  00000001                                                      
        CB75BDDC  00000000                                                      
        CB75BDE0  80004BA0                                                      
        CB75BDE4  CB75A000                                                      
        CB75BDE8  BC0BC380                                                      
        CB75BDEC  E59E8A00                                                      
        CB75BDF0  7FD1251C                                                      
        CB75BDF4  00000000                                                      
        CB75BDF8  B4859D47                                                      
        CB75BDFC  04C30004                                                      
                                                                                
                                                                                
     Loaded images                                                              
                                                                                
 [SYSMSG]SYSMSG.EXE                     B46B1600 B46F0E00                       
 [SYS$LDR]SYSLDR_DYN.EXE                B4912200 B4914200                       
 [SYS$LDR]DDIF$RMS_EXTENSION.EXE        B4914800 B4915A00                       
 [SYS$LDR]RECOVERY_UNIT_SERVICES.EXE    B4915C00 B4916400                       
 [SYS$LDR]RMS.EXE                       B46F1200 B471C400                       
 VBSS.EXE                               B4785C00 B4787800                       
 VAXCLUSTER_CACHE.EXE                   B4787E00 B478C800                       
 SYS$NETWORK_SERVICES.EXE               B478CE00 B478D000                       
 SYS$UTC_SERVICES.EXE                   B478D600 B478E400                       
 SYS$TRANSACTION_SERVICES.EXE           B478EA00 B479AA00                       
 SYS$IPC_SERVICES.EXE                   B479AE00 B47AD200                       
 CPULOA.EXE                             B47AD400 B47B2400                       
 LMF$GROUP_TABLE.EXE                    B47B4800 B47B6200                       
 SYSLICENSE.EXE                         B47B6600 B47B8400                       
 SNAPSHOT_SERVICES.EXE                  B47B8A00 B47B9600                       
 SYSGETSYI.EXE                          B47B9C00 B47BB400                       
 SYSDEVICE.EXE                          B47BB800 B47BE000                       
 MESSAGE_ROUTINES.EXE                   B47BE600 B47C4600                       
 EXCEPTION.EXE                          B47D4C00 B47DF200                       
 LO                                                                             
                                                                                
from the live system                                                            
SDA> exam/inst be4a3330                                                         
BE4A3330:  MOVL    24(R3),04(R0)                                                
SDA> eval shdriver                                                              
Hex = BE499380   Decimal = -1102474368         SHDRIVER                         
SDA> eval BE4A3330-BE499380                                                     
Hex = 00009FB0   Decimal = 40880                                                
                                                                                
        Image Identification Information                                        
                                                                                
                image name: "SHDRIVER"                                          
                image file identification: "X-121B5A1A9"                        
                link date/time:  4-NOV-1996 11:27:38.31                         
                linker identification: "05-13"                                  
********************************************************************************
code that we are failing in.....                                                
                                                                                
2900 ;**************************************************************************
2901 ;                                        NOTE                              
2902 ;                                                                          
2903 ; The following instruction was added to prevent crashes when running CTM  
2904 ;  tests.  We should not have a clone IRP for a device that is no longer   
2905 ; a member, but because of a synchronization problem, we do.  This causes  
2906 ; the write log entry address to be calculated incorrectly resulting in an 
2907 ; ACCVIO, usually at 110$.  This modification does not fix the problem but 
2908 ; rather is here to allow the CTM work to continue.  The real fix requires 
2909 ;       redesign of START_SEQ.  See DRAGON QAR #774.                       
2910 ;**************************************************************************
2911 ;                                                                          
2912         BBC     #SHAD$V_MBR_VALID,-            ; Branch if no longer a     
2913                 SHAD$B_MEMBER_STATUS(R2)[R0],120$       ;  member.         
2914         MOVL    SHAD$L_WLG(R2)[R0],R1         ; Get the write log table.   
2915         MOVZWL  IRP$L_WLE_PTR(R3),R0         ; Get the index.              
2916         MULW    #WLT$K_ENTRY_SIZE,R0         ; Get the offset into the t   
2917         ADDL    R1,R0                       ; Get the address of the en    
2918         BITB    #IRP$M_WLE_REUSE,IRP$B_WLG_FLAGS(R3)    ; Reuseable ?      
2919         BNEQ    110$                                    ; Br if not        
2920         ASSUME   WLT$W_INDX EQ <WLT$W_ENTRY+2>          ;                  
********************************************************************************
crash here R0 is corrupt                                                        
                                                                                
2921         MOVL    IRP$L_CLN_WLE(R3),WLT$W_ENTRY(R0)  ; Capture the new entry.
********************************************************************************
2922         BISW    #WLT$M_REUSE,-                          ; The next time    
2923                 WLT$W_WLE_STATUS(R0)         ;  the entry will be reused.  
2924                                                                            
2925 110$:   BICB    #WLT$M_INUSE,WLT$W_WLE_STATUS(R0)                          
2926         INCW    WLT$W_FREE(R1)            ; Increment free count.          
T.RTitleUserPersonal
Name
DateLines
5272.1try VAXCLUSIO01_062HAN::HALLEVolker Halle MCS @HAO DTN 863-5216Tue Apr 01 1997 11:2514
    Timothy,
    
    I've checked the CANASTA crash footprint database and I've not found
    any case with exactly the SAME footprint in all the 26000 VAX crashes.
    Only 2 crashes at SHDRIVER+9816 in V6.0, which could be the same
    problem, but there is NOT enough data in CANASTA to confirm that.
    
    The comment in the code seems to indicate, that this problem is
    'somehow expected'.
    
    You should consider to install VAXCLUSIO01_062. This is the re-write of
    the SHADOW/MOUNT etc. code (called REDHAWK) back-ported to V6.2.
    
    Volker.
5272.2This should be solved with xxxCLUSIO kitVMSSPT::JENKINSKevin M Jenkins VMS Support EngineeringWed Apr 02 1997 09:4617
    
    This problem is expected to be solved with the VAXCLUSIO code and
    in V7.1 code. Part of the rewrite was to correct synchronization
    problems that allowed multiple incompatible code threads to run
    at the same time. In this case Write Logging has been or is being
    disabled, yet there are still outstanding IOs that require the
    Write Logging tables. This problem was somewhat reproducable in 
    testing but not on demand. I believe that this problem has not
    been seen with the new code.

    Since this customer is installing the VAXCLUSIO kit they should 
    be all set. Also try having them run the revalidate code on
    the partial dump. You may then be able to get the CANASTA data
    out of the dump so you can enter.

    Kevin

5272.3re: REVALIDATE_DUMPHAN::HALLEVolker Halle MCS @HAO DTN 863-5216Wed Apr 02 1997 10:569
    Timothy,
    
    the 'revalidate code' Kevin is referring to in .-1 is my
    REVALIDATE_DUMP program (see note HAN::ECSO_SUPPORT #120).
    
    Just installing VAXCLUSIO should be considered a 'solution' to this
    problem.
    
    Volker.
5272.4EVMS::MORONEYWed Apr 02 1997 13:013
The write logging and sync. code in SHDRIVER were entirely rewritten so
again, installing VAXCLUSIO will take care of this.  This very problem was
part of the reason for the rewrite. 
5272.5thank youCSC32::REIGELMANThu Apr 03 1997 02:008
    Kevin,Volker
          Thank you for pointing out the VAXCLUSIO01 kit. I will
    check to see if the customer has that installed. The reason 
    they didn't get a good dump, they had two nodes crashing at the
    same time and trying to write to the same common dump file.
    Again thank you for your replies.
    
    Tim Reigelman
5272.6CANASTA rule has been written: 009B238F-B2249099-31015AHAN::HALLEVolker Halle MCS @HAO DTN 863-5216Thu Apr 03 1997 02:201