[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
|
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
5272.0. "crash in shdriver+9FB0 with vaxshad05_062" by CSC32::REIGELMAN () Tue Apr 01 1997 03:01
Problem: Customer had two nodes that crashed with the following bugcheck
"INVEXCEPTN, Exception while above ASTDEL or on interrupt stack"
at SHDRIVER+9FB0
Solution: NONE.
Analysis: Bill I will make an entry in the cluster notes file to see if
engineering has see this before.
Customer has apply VAXSHAD05_062, there is no dump to look here is
what I was able to get from the VCS log.... In both crashes R0 has
been corrupted.
from node CYV7KE
**** Fatal BUG CHECK, version = V6.2 INVEXCEPTN, Exception while above ASTD
EL or on interrupt stack
Crash CPU: 00 Primary CPU: 00
Active/available CPU masks: 0000003F/0000003F
Current process = OPCOM
Register dump
R0 = 00000008
R1 = 04080000
R2 = B6F7CC80
R3 = B5942280
R4 = B5FB8D80
R5 = B67B8980
R6 = C9212210
R7 = 00000034
R8 = B57114E0
R9 = B57114C0
R10= 00000000
R11= 00000000
AP = 7FE25668
FP = 7FFE77E4
SP = C9213D64
PC = AB5D1E78
PSL= 04080009
Kernel/interrupt/boot stack
C9213D6C 00000004
C9213D70 7FFE77E4
C9213D74 FFFFFFFD
C9213D78 00000020 saved R0
C9213D7C 00000000 saved R1
C9213D80 00000001
C9213D84 00000005
C9213D88 0000000C
C9213D8C 00000004
C9213D90 00000024 failing VA(R0 + 4)
C9213D94 B82DA930 PC
C9213D98 04080004 PSL
C9213D9C B832964B
C9213DA0 B29951C0
C9213DA4 B2A39B90
C9213DA8 B83254CD
C9213DAC AB6FECBA
C9213DB0 B8325467
C9213DB4 AB6FCC00
C9213DB8 00000000
C9213DBC D294FD68
C9213DC0 ECB1FCE0
C9213DC4 B6A721C0
C9213DC8 B64107C0
C9213DCC B69ABF48
C9213DD0 0000013D
C9213DD4 AB63F0EA
C9213DD8 04040001
C9213DDC AB649FA4
C9213DE0 00000000
C9213DE4 00000006
C9213DE8 AB701A50
C9213DEC 00000000
C9213DF0 B69ABF40
C9213DF4 B57114C0
C9213DF8 AB6FF7E6
C9213DFC 00020000
Loaded images
ATK$APPLETALK_PROTOCOL_STACK A969A000 A96B6800
[SYSMSG]SYSMSG.EXE AB3DDE00 AB41D600
[SYS$LDR]SYSLDR_DYN.EXE AB704C00 AB706C00
[SYS$LDR]DDIF$RMS_EXTENSION.EXE AB707200 AB708400
[SYS$LDR]RECOVERY_UNIT_SERVICES.EXE AB708600 AB708E00
[SYS$LDR]RMS.EXE AB41DA00 AB448C00
VBSS.EXE AB582C00 AB584800
VAXCLUSTER_CACHE.EXE AB584E00 AB589800
SYS$NETWORK_SERVICES.EXE AB589E00 AB58A000
SYS$UTC_SERVICES.EXE AB58A600 AB58B400
SYS$TRANSACTION_SERVICES.EXE AB58BA00 AB597A00
SYS$IPC_SERVICES.EXE AB597E00 AB5AA200
CPULOA.EXE AB5AA400 AB5AF400
LMF$GROUP_TABLE.EXE AB5B1800 AB5B3200
SYSLICENSE.EXE AB5B3600 AB5B5400
SNAPSHOT_SERVICES.EXE AB5B5A00 AB5B6600
SYSGETSYI.EXE AB5B6C00 AB5B8400
SYSDEVICE.EXE AB5B8800 AB5BB000
MESSAGE_ROUTINES.EXE AB5BB600 AB5C1600
EXCEPTION.EXE AB5D1C00 AB5DC200
LOGICAL_NAMES.EXE AB5DCA00 AB5DEE00
SECURITY.EXE AB5DF800 AB5E9200
LOCKING.EXE AB5E9C00 AB5F0A00
PAGE_MANAGEMENT.EXE AB5F1400 AB5FB000
WORKING_SET_MANAGEMENT.EXE AB63BE00 AB641A00
IMAGE_MANAGEMENT.EXE AB642400 AB645800
EVENT_FLAGS_AND_ASTS.EXE AB645E00 AB647E00
IO_ROUTINES.EXE AB648800 AB655000
PROCESS_MANAGEMENT.EXE AB656C00 AB662200
ERRORLOG.EXE AB6F7200 AB6F7C00
PRIMITIVE_IO.EXE AB6F8200 AB6F9400
SYSTEM_SYNCHRONIZATION_SPC.EXE AB6F9800 AB6FDC00
SYSTEM_PRIMITIVES_MIN.EXE AB6FE200 AB702000
**** Starting memory dump, writing dump to HBVS member with unit number of 95
Header and error log buffers dumped...
SPT & GPT dumped...
CYV7KE=>anal/crash SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP;2
OpenVMS (TM) VAX System dump analyzer
%SDA-F-DUMPINCOMPL, the dump file write was not completed
from the live system
SDA> exam/inst b82da930
B82DA930: MOVL 24(R3),04(R0)
SDA> eval shdriver
Hex = B82D0980 Decimal = -1205008000 SHDRIVER
SDA> eval b82da930-B82D0980
Hex = 00009FB0 Decimal = 40880
SDA> exam/inst b82da930-20;30
%SDA-W-INSKIPPED, unreasonable instruction stream - 2 bytes skipped
B82DA912: BBC #07,02FE(R2)[R0],B82DA940
B82DA919: MOVL 0360(R2)[R0],R1
B82DA91F: MOVZWL 5C(R3),R0
B82DA923: MULW2 #08,R0
B82DA926: ADDL2 R1,R0
B82DA929: BITB #01,00B3(R3)
B82DA92E: BNEQ B82DA939
B82DA930: MOVL 24(R3),04(R0)
B82DA935: BISW2 #04,02(R0)
B82DA939: BICB2 #02,02(R0)
B82DA93D: INCW -06(R1)
B82DA940: TSTL 10(R3)
********************************************************************************
from node CYV7KD
**** Fatal BUG CHECK, version = V6.2 INVEXCEPTN, Exception while above ASTD
EL or on interrupt stack
Crash CPU: 00 Primary CPU: 00
Active/available CPU masks: 0000000F/0000000F
Current process = NULL
Register dump
R0 = 00000008
R1 = 04080000
R2 = BBFE9080
R3 = BBE55300
R4 = BB30B240
R5 = BBCFC640
R6 = 00000000
R7 = 00000034
R8 = BA6F1A80
R9 = 0027F060
R10= 0027AB38
R11= 7FD12512
AP = 7FD11D5C
FP = 7FD11D34
SP = CB75BD50
PC = B47D4E78
PSL= 04080009
Kernel/interrupt/boot stack
CB75BD58 00000004
CB75BD5C 7FD11D34
CB75BD60 FFFFFFFD
CB75BD64 00000008 saved R0
CB75BD68 00000000 saved R1
CB75BD6C 00000001
CB75BD70 00000005
CB75BD74 0000000C
CB75BD78 00000004
CB75BD7C 0000000C failing VA(R0 + 4)
CB75BD80 BE4A3330 PC
CB75BD84 04080004 PSL
CB75BD88 BA6EB11F
CB75BD8C BA6071C0
CB75BD90 BA6F1190
CB75BD94 BA6EBC9C
CB75BD98 BB4D5B80
CB75BD9C BB4D5BC8
CB75BDA0 00000000
CB75BDA4 BA6F1A90
CB75BDA8 BA6EC645
CB75BDAC BA6EBD2C
CB75BDB0 BA6F1A90
CB75BDB4 CB75A210
CB75BDB8 00000034
CB75BDBC 7FD121F9
CB75BDC0 0027F060
CB75BDC4 0027AB38
CB75BDC8 7FD12512
CB75BDCC 7FD11D5C
CB75BDD0 B490C118
CB75BDD4 BA6EBD05
CB75BDD8 00000001
CB75BDDC 00000000
CB75BDE0 80004BA0
CB75BDE4 CB75A000
CB75BDE8 BC0BC380
CB75BDEC E59E8A00
CB75BDF0 7FD1251C
CB75BDF4 00000000
CB75BDF8 B4859D47
CB75BDFC 04C30004
Loaded images
[SYSMSG]SYSMSG.EXE B46B1600 B46F0E00
[SYS$LDR]SYSLDR_DYN.EXE B4912200 B4914200
[SYS$LDR]DDIF$RMS_EXTENSION.EXE B4914800 B4915A00
[SYS$LDR]RECOVERY_UNIT_SERVICES.EXE B4915C00 B4916400
[SYS$LDR]RMS.EXE B46F1200 B471C400
VBSS.EXE B4785C00 B4787800
VAXCLUSTER_CACHE.EXE B4787E00 B478C800
SYS$NETWORK_SERVICES.EXE B478CE00 B478D000
SYS$UTC_SERVICES.EXE B478D600 B478E400
SYS$TRANSACTION_SERVICES.EXE B478EA00 B479AA00
SYS$IPC_SERVICES.EXE B479AE00 B47AD200
CPULOA.EXE B47AD400 B47B2400
LMF$GROUP_TABLE.EXE B47B4800 B47B6200
SYSLICENSE.EXE B47B6600 B47B8400
SNAPSHOT_SERVICES.EXE B47B8A00 B47B9600
SYSGETSYI.EXE B47B9C00 B47BB400
SYSDEVICE.EXE B47BB800 B47BE000
MESSAGE_ROUTINES.EXE B47BE600 B47C4600
EXCEPTION.EXE B47D4C00 B47DF200
LO
from the live system
SDA> exam/inst be4a3330
BE4A3330: MOVL 24(R3),04(R0)
SDA> eval shdriver
Hex = BE499380 Decimal = -1102474368 SHDRIVER
SDA> eval BE4A3330-BE499380
Hex = 00009FB0 Decimal = 40880
Image Identification Information
image name: "SHDRIVER"
image file identification: "X-121B5A1A9"
link date/time: 4-NOV-1996 11:27:38.31
linker identification: "05-13"
********************************************************************************
code that we are failing in.....
2900 ;**************************************************************************
2901 ; NOTE
2902 ;
2903 ; The following instruction was added to prevent crashes when running CTM
2904 ; tests. We should not have a clone IRP for a device that is no longer
2905 ; a member, but because of a synchronization problem, we do. This causes
2906 ; the write log entry address to be calculated incorrectly resulting in an
2907 ; ACCVIO, usually at 110$. This modification does not fix the problem but
2908 ; rather is here to allow the CTM work to continue. The real fix requires
2909 ; redesign of START_SEQ. See DRAGON QAR #774.
2910 ;**************************************************************************
2911 ;
2912 BBC #SHAD$V_MBR_VALID,- ; Branch if no longer a
2913 SHAD$B_MEMBER_STATUS(R2)[R0],120$ ; member.
2914 MOVL SHAD$L_WLG(R2)[R0],R1 ; Get the write log table.
2915 MOVZWL IRP$L_WLE_PTR(R3),R0 ; Get the index.
2916 MULW #WLT$K_ENTRY_SIZE,R0 ; Get the offset into the t
2917 ADDL R1,R0 ; Get the address of the en
2918 BITB #IRP$M_WLE_REUSE,IRP$B_WLG_FLAGS(R3) ; Reuseable ?
2919 BNEQ 110$ ; Br if not
2920 ASSUME WLT$W_INDX EQ <WLT$W_ENTRY+2> ;
********************************************************************************
crash here R0 is corrupt
2921 MOVL IRP$L_CLN_WLE(R3),WLT$W_ENTRY(R0) ; Capture the new entry.
********************************************************************************
2922 BISW #WLT$M_REUSE,- ; The next time
2923 WLT$W_WLE_STATUS(R0) ; the entry will be reused.
2924
2925 110$: BICB #WLT$M_INUSE,WLT$W_WLE_STATUS(R0)
2926 INCW WLT$W_FREE(R1) ; Increment free count.
T.R | Title | User | Personal Name | Date | Lines |
---|
5272.1 | try VAXCLUSIO01_062 | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Tue Apr 01 1997 11:25 | 14 |
| Timothy,
I've checked the CANASTA crash footprint database and I've not found
any case with exactly the SAME footprint in all the 26000 VAX crashes.
Only 2 crashes at SHDRIVER+9816 in V6.0, which could be the same
problem, but there is NOT enough data in CANASTA to confirm that.
The comment in the code seems to indicate, that this problem is
'somehow expected'.
You should consider to install VAXCLUSIO01_062. This is the re-write of
the SHADOW/MOUNT etc. code (called REDHAWK) back-ported to V6.2.
Volker.
|
5272.2 | This should be solved with xxxCLUSIO kit | VMSSPT::JENKINS | Kevin M Jenkins VMS Support Engineering | Wed Apr 02 1997 09:46 | 17 |
|
This problem is expected to be solved with the VAXCLUSIO code and
in V7.1 code. Part of the rewrite was to correct synchronization
problems that allowed multiple incompatible code threads to run
at the same time. In this case Write Logging has been or is being
disabled, yet there are still outstanding IOs that require the
Write Logging tables. This problem was somewhat reproducable in
testing but not on demand. I believe that this problem has not
been seen with the new code.
Since this customer is installing the VAXCLUSIO kit they should
be all set. Also try having them run the revalidate code on
the partial dump. You may then be able to get the CANASTA data
out of the dump so you can enter.
Kevin
|
5272.3 | re: REVALIDATE_DUMP | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Wed Apr 02 1997 10:56 | 9 |
| Timothy,
the 'revalidate code' Kevin is referring to in .-1 is my
REVALIDATE_DUMP program (see note HAN::ECSO_SUPPORT #120).
Just installing VAXCLUSIO should be considered a 'solution' to this
problem.
Volker.
|
5272.4 | | EVMS::MORONEY | | Wed Apr 02 1997 13:01 | 3 |
| The write logging and sync. code in SHDRIVER were entirely rewritten so
again, installing VAXCLUSIO will take care of this. This very problem was
part of the reason for the rewrite.
|
5272.5 | thank you | CSC32::REIGELMAN | | Thu Apr 03 1997 02:00 | 8 |
| Kevin,Volker
Thank you for pointing out the VAXCLUSIO01 kit. I will
check to see if the customer has that installed. The reason
they didn't get a good dump, they had two nodes crashing at the
same time and trying to write to the same common dump file.
Again thank you for your replies.
Tim Reigelman
|
5272.6 | CANASTA rule has been written: 009B238F-B2249099-31015A | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Thu Apr 03 1997 02:20 | 1 |
|
|