| --------------------------------------------------------------------------------
Log No 01649.00-4C6-1UVO Desc type C
Sequence no 02 Authr badge no 064021
Creation D/T 3-JAN-1995 22:40
--------------------------------------------------------------------------------
Dump taken on 6-DEC-1994 16:40:39.06
SSRVEXCEPT, Unexpected system service exception
Version of system: VAX/VMS VERSION V5.5-2
VAXcluster node: OAK, a VAX 6000-410
Process currently executing on this CPU: BATCH_232
Current IPL: 0 (decimal)
CPU database address: 82320000
MPB address: 00000000
No spinlocks currently owned by CPU 01
7FFE77B8 00000004
7FFE77BC 7FFE9730
7FFE77C0 FFFFFFFD
7FFE77C4 816BDD00
7FFE77C8 801844BF
7FFE77CC 0000000B
7FFE77D0 00000005
7FFE77D4 0000000C
7FFE77D8 00000000
7FFE77DC 801844BF
7FFE77E0 802FFF52 EXE$ASTDEL
7FFE77E4 00000001
7FFE77E8 00000005
7FFE77EC 80181088
7FFE77F0 00000689 PDT$L_SNDDAT_OPER_SNT+00001
7FFE77F4 0073C67C
7FFE77F8 7FFEE3C6 SYS$ENQ+00006
7FFE77FC 01400000
Condition Handler 7FFE9730 00000000
SP Align Bits = 00 7FFE9734 2FFC0000
Saved AP 7FFE9738 7FFE97E8
Saved FP 7FFE973C 7FFE97AC
Return PC 7FFE9740 8020355A RMS+0D55A
R2 7FFE9744 80004BA0 SCH$GQ_LEFWQ
R3 7FFE9748 80887600
R4 7FFE974C 0073C640
R5 7FFE9750 00000000
R6 7FFE9754 00135B7C
R7 7FFE9758 0073A1B0
R8 7FFE975C 00522D30
R9 7FFE9760 0073C640
R10 7FFE9764 00739408
R11 7FFE9768 7FFDFE70 PIO$GW_IIOIMPA
Align Stack by 0 Bytes =>
Argument List 7FFE976C 0000000D
7FFE9770 0000001F efn
7FFE9774 00000000 lkmode
7FFE9778 0073C67C lksb lockid = 44000B19
7FFE977C 0000061B flags
7FFE9780 00000000 resnam
7FFE9784 00000000 parid
7FFE9788 00000000 astadr
7FFE978C 00000000 astprm
7FFE9790 00000000 blkast
7FFE9794 00000001 acmode
7FFE9798 00000000 rsdm_id
7FFE979C 00000000
7FFE97A0 00000000
0073C67C 00000001
0073C680 44000B19 lockid
0073C684 00000014 (lock value block)
0073C688 00016E00
0073C68C 00000000
0073C690 00000000
0,1,3,4,9,10
LCK$V_VALBLK = 00000000
LCK$V_CONVERT = 00000001
LCK$V_SYNCSTS = 00000003
LCK$V_SYSTEM = 00000004
LCK$V_NODLCKWT = 00000009
LCK$V_NODLCKBLK= 0000000A
RMS+0D53D: CLRQ -(SP)
RMS+0D53F: MOVQ #01,-(SP)
RMS+0D542: CLRQ -(SP)
RMS+0D544: CLRQ -(SP)
RMS+0D546: CLRL -(SP)
RMS+0D548: MOVZWL #061B,-(SP)
RMS+0D54D: PUSHAB 3C(R4)
RMS+0D550: MOVQ #1F,-(SP)
RMS+0D553: CALLS #0D,@#SYS$ENQ
RMS+0D55A:
SYS$ENQ+00002: CHMK #004F
SYS$ENQ+00006:
EXE$ENQ+00002: MOVZBL 04(AP),R3 !efn = 1F
EXE$ENQ+00006: CMPB R3,#3F !lower
EXE$ENQ+00009: BGTRU LCK$BREAK_DEADLOCK+000FC !drop through
EXE$ENQ+0000B: BICL3 #-00004000,10(AP),R9 !flags -
LCK$V_NODLCKBLK
EXE$ENQ+00014: MOVL 0C(AP),R8 !lksb
EXE$ENQ+00018: PROBEW #00,#18,(R8) !probe lksb
EXE$ENQ+0001C: BEQL EXE$ENQ+00032 !OK, branch
EXE$ENQ+00032: BLBS R9,EXE$ENQ+0003B !LCK$V_VALBLK set
EXE$ENQ+0003B: MOVZWL #0C,R0
EXE$ENQ+0003E: BRB EXE$ENQ+00043
EXE$ENQ+00043: JMP LCK$NOT_QUEUED+0003F
LCK$NOT_QUEUED+0003F: PUSHL R0
LCK$NOT_QUEUED+00041: MOVL @#CTL$GL_PCB,R4 !pcb
LCK$NOT_QUEUED+00048: MOVL 60(R4),R1 !pid
LCK$NOT_QUEUED+0004C: MOVZWL #02,R2
LCK$NOT_QUEUED+0004F: MOVZBL 04(AP),R3 !efn (1F)
LCK$NOT_QUEUED+00053: JSB @#V_SCH$POSTEF
V_SCH$POSTEF: JMP @#EVENT_FLAGS_AND_ASTS
... post event flag wait
Got lost somewhere here... suspect wrong branch taken
rsb back to here
LCK$NOT_QUEUED+00059: MOVL (SP)+,R0
LCK$NOT_QUEUED+0005C: RET !dismiss CHMK ??
|
|
Um shold be type=ts, not ds force of habit.
R6 =lkb, this was queue to the pcb (pcb queue emptry now, but the lkb point to
it). other rgisters don't seem interesting.
We had an accvio in exe$astdel, trying to call an ast.
TDA> show stack
Process stacks (on CPU 01)
--------------------------
Current operating stack (KERNEL):
7FFE7774 0000061B PDT$L_BASEBL+00003
7FFE7778 00000689 PDT$L_SNDDAT_OPER_SNT+00001
7FFE777C 00000005
7FFE7780 7FFE77AC CTL$GL_KSTKBAS+005AC
7FFE7784 7FFE7794 CTL$GL_KSTKBAS+00594
7FFE7788 7FFE778C CTL$GL_KSTKBAS+0058C
7FFE778C 8000239E EXE$EXCPTN+00006
7FFE7790 00000000
SP => 7FFE7794 00000000
7FFE7798 00000000
7FFE779C 7FFE976C
7FFE77A0 7FFE9730
7FFE77A4 80000014 EXE$QIOW_3+00004
7FFE77A8 802BC4C4 EXE$CONTSIGNAL+0007C
7FFE77AC 00000002
7FFE77B0 7FFE77D0 CTL$GL_KSTKBAS+005D0
Press RETURN for more.
Process stacks (on CPU 01)
--------------------------
7FFE77B4 7FFE77B8 CTL$GL_KSTKBAS+005B8
7FFE77B8 00000004
7FFE77BC 7FFE9730
7FFE77C0 FFFFFFFD
7FFE77C4 816BDD00
7FFE77C8 801844BF
7FFE77CC 0000000B
7FFE77D0 00000005
7FFE77D4 0000000C
7FFE77D8 00000000
7FFE77DC 801844BF (R1, FR
7FFE77E0 802FFF52 EXE$ASTDEL CALLG (SP),(R1)
7FFE77E4 00000001
7FFE77E8 00000005
7FFE77EC 80181088 Astparam
7FFE77F0 00000689 saver0 PDT$L_SNDDAT_OPER_SNT+00001
7FFE77F4 0073C67C saved r1 (is lksb from call)
7FFE77F8 7FFEE3C6 pc SYS$ENQ+00006
7FFE77FC 01400000 psl
TDA> set output tt:
TDA> set nolog
IE we appear to have at the bottom of the stack an ast argument list and
nothing else.
IF we were in system service i would expect a regular (service_exit) call frame
and if an ast happened an ast dispatching call frame.
The only think I think can have happened here is that we were exiting from the
system service when the ast went off (?). hence the lack of sys_ser callframe.
The ast call frame isn't there as it wo't have get built yet. (crashed in the
callg that builds it.).
So we crashed on the callg (sp), (r1), this has a value in r1 of 801844BF
(the r1 should be loaded from the ast$l_ast field. THis is the ast address we
shodl jump to, but this value is bad.
Questions is, where did this value come from.
Now r6, the lkb for the lock we are enqing looks a likely candidate (see below)
- it has already been queued at time point to the pcb, and has the pid address,
there is an ast address init, but it is nothing like the one we crashed at.
However when we did the enq we never asked for an ast, and there is no blking
complet ast in the lkb, so I recon this is stale contents.
SDA> form @r6
81988D00 LKB$L_ASTQFL 808875C0 PCB+00010
81988D04 LKB$L_ASTQBL 808875C0 PCB+00010
81988D08 LKB$W_SIZE 007C
81988D0A LKB$B_TYPE 35
81988D0B LKB$B_RMOD 31
81988D0C LKB$L_PID 0005008A
81988D10 LKB$L_AST 80203572 RMS+0D572
LKB$W_RQSEQNM
81988D14 LKB$L_ASTPRM 00000000
LKB$L_EPID
81988D18 LKB$L_DUETIME 802C9AF0 LCK$GRANT_REM+001A0
LKB$L_KAST
81988D1C LKB$L_CPLASTADR 00000000
81988D20 LKB$L_BLKASTADR 00000000
81988D24 LKB$L_DLCKPRI 0073C67C
LKB$L_LKSB
81988D28 LKB$W_FLAGS 061B
81988D2A LKB$W_STATUS 0000
81988D2C LKB$L_LKST1 00000001
81988D30 LKB$L_LKID 44000B19
LKB$L_LKST2
81988D34 LKB$B_RQMODE 05
81988D35 LKB$B_GRMODE 00
81988D36 LKB$B_STATE 01
81988D37 LKB$B_EFN 1F
81988D38 LKB$L_SQFL 81014640
81988D3C LKB$L_SQBL 81843238
81988D40 LKB$L_OWNQFL 8186B0C0
81988D44 LKB$L_OWNQBL 808876BC ARB+00084
81988D48 LKB$L_PARENT 81997300
81988D4C LKB$W_REFCNT 0000
81988D4E LKB$B_TSLT 8E
81988D4F 80
81988D50 LKB$L_RSB 81014630
81988D54 LKB$L_REMLKID 160016A6
81988D58 LKB$L_CSID 0073C640
LKB$L_OLDASTPRM
81988D5C LKB$L_OLDBLKAST 80203572 RMS+0D572
81988D60 LKB$L_LCKCTX 00000000
81988D64 LKB$W_PRIORITY 0000
81988D66 LKB$W_STAT2 0000
81988D68 LKB$L_RQSTSRNG 00000000
81988D6C LKB$L_RQSTERNG FFFFFFFF
81988D70 LKB$L_GRNTSRNG 00000000
81988D74 LKB$L_GRNTERNG FFFFFFFF
81988D78 LKB$L_TSKPID 0005008A
LKB$C_LENGTH
However searching pool I think this is what did it.
TDA> form 816bdd00
816BDD00 ACB$L_ASTQFL 8186B080
816BDD04 ACB$L_ASTQBL 808875C0 PCB+00010
816BDD08 ACB$W_SIZE 001C
816BDD0A ACB$B_TYPE 02
816BDD0B ACB$B_RMOD 20
816BDD0C ACB$L_PID 0005008A
816BDD10 ACB$L_AST 801844BF
816BDD14 ACB$L_ASTPRM 80181088
816BDD18 ACB$L_KAST 808E946D
ACB$C_LENGTH
Now this is not a freak of nature there are many stale acbs with the same PC
(about 30 with loads of different pcs in)
I recon this must be some sort of process monitoring sw and that is had just
been turned of. I suspect that this must have been turned off just before this
crash and the memory of this code area was deallocated.
Also not that the current proces was priority 1, so maynothave got scheduled
to run this until a while after if the system was busy.
I rang the customer and he cofirmed that perfect tune was turned off just
before on this and other precious occasions. (RAXCO). I asked him to tell us
this up front next time!
--------------------------------------------------------------------------------
heres an example of the acbs,
Note that they all have address 808E946D (kast field) This is a bit of
code in pool I loked through it that hand't ben deallaocated,but I couldng find
any text in the block of memory to identify it.
TDA> ex 81678300;80
00060074 2002001C 0027A680 002ABA00 .�*..�'.... t... 81678300
00000000 808E946D 80180690 801844BF �D......m....... 81678310
81868380 00000000 807CD100 00000000 .....�|......... 81678320
00000000 00000000 0000061B 3EFA52C4 �R�>............ 81678330
808EDE5E 00050090 00000000 00000000 ............^�.. 81678340
00060001 B34A0024 00000000 808EDDE5 ��......$.J�.... 81678350
00000010 80F40000 FFFFFFFF 00060001 ..........�..... 81678360
00000041 41414100 00000000 80F4FCE0 ��......AAAA... 81678370
--------------------------------------------------------------------------------
|