[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

200.0. "raxco perfect tune induced @ do it all" by COMICS::GLEDHILL () Wed Dec 28 1994 13:40

T.RTitleUserPersonal
Name
DateLines
200.1COMICS::GLEDHILLFri Jan 13 1995 01:4825
*****  THIS IS A READ ONLY COPY FROM NICE   -   please handle accordingly  *****
********************************************************************************

Log No            01649.00-4C6-1UVO          Queue      GLEDHILL       
Log D/T            6-DEC-1994 16:47          Owner      GLEDHILL                 
LSDT D/T           8-DEC-1994 12:00          Loc/Phone  UVO  3245
Status as at      13-JAN-1995 01:44 is OPEN                                    
EXT REQ Stat Code             Escalation Indicator  Y
Hold Indicator    N           Planned Indicator     Y

---------------------------------Customer---------------------------------------
Company          DO IT ALL LTD                                                  
Department       FALCON HOUSE                            
Street           THE MINORIES                            
City             DUDLEY WEST MIDLANDS          
Postal Code      DY2 8PG                     PO No                          

Caller           MARK COLLIER                Title      MR             
Phone            0384 842177                 Extension  D/L   
Service Wish     *** COMICS::DSA401:[000000.DO_IT_ALL_01649] ***                

Problem
Rusty/resp...vms v5.5-2                                                         
ts: System crashed 3 times in 3 weeks, has a crash dump for analysis            
bi: high                                                                        
200.2COMICS::GLEDHILLFri Jan 13 1995 01:52129
--------------------------------------------------------------------------------
Log No             01649.00-4C6-1UVO           Desc type      C 
Sequence no        02                          Authr badge no 064021
                                               Creation D/T    3-JAN-1995 22:40
--------------------------------------------------------------------------------
Dump taken on  6-DEC-1994 16:40:39.06
SSRVEXCEPT, Unexpected system service exception
Version of system: VAX/VMS VERSION V5.5-2
VAXcluster node: OAK, a VAX 6000-410
Process currently executing on this CPU: BATCH_232
Current IPL: 0  (decimal)
CPU database address:  82320000
MPB address:   00000000
                No spinlocks currently owned by CPU 01


                7FFE77B8  00000004
                7FFE77BC  7FFE9730
                7FFE77C0  FFFFFFFD
                7FFE77C4  816BDD00
                7FFE77C8  801844BF
                7FFE77CC  0000000B
                7FFE77D0  00000005
                7FFE77D4  0000000C
                7FFE77D8  00000000
                7FFE77DC  801844BF
                7FFE77E0  802FFF52      EXE$ASTDEL
                7FFE77E4  00000001
                7FFE77E8  00000005
                7FFE77EC  80181088
                7FFE77F0  00000689      PDT$L_SNDDAT_OPER_SNT+00001
                7FFE77F4  0073C67C
                7FFE77F8  7FFEE3C6      SYS$ENQ+00006
                7FFE77FC  01400000

Condition Handler       7FFE9730  00000000
SP Align Bits = 00      7FFE9734  2FFC0000
   Saved  AP            7FFE9738  7FFE97E8
   Saved  FP            7FFE973C  7FFE97AC
   Return PC            7FFE9740  8020355A      RMS+0D55A
        R2              7FFE9744  80004BA0      SCH$GQ_LEFWQ
        R3              7FFE9748  80887600
        R4              7FFE974C  0073C640
        R5              7FFE9750  00000000
        R6              7FFE9754  00135B7C
        R7              7FFE9758  0073A1B0
        R8              7FFE975C  00522D30
        R9              7FFE9760  0073C640
        R10             7FFE9764  00739408
        R11             7FFE9768  7FFDFE70      PIO$GW_IIOIMPA
Align Stack by 0 Bytes =>
Argument List           7FFE976C  0000000D
                        7FFE9770  0000001F	efn
                        7FFE9774  00000000	lkmode
                        7FFE9778  0073C67C	lksb	lockid = 44000B19
                        7FFE977C  0000061B      flags
                        7FFE9780  00000000	resnam
                        7FFE9784  00000000	parid
                        7FFE9788  00000000	astadr
                        7FFE978C  00000000	astprm
                        7FFE9790  00000000	blkast
                        7FFE9794  00000001	acmode
                        7FFE9798  00000000	rsdm_id
                        7FFE979C  00000000	
                        7FFE97A0  00000000	

                0073C67C  00000001
                0073C680  44000B19	lockid
                0073C684  00000014	(lock value block)
                0073C688  00016E00	
                0073C68C  00000000	
                0073C690  00000000	
0,1,3,4,9,10

LCK$V_VALBLK   = 00000000
LCK$V_CONVERT  = 00000001
LCK$V_SYNCSTS  = 00000003
LCK$V_SYSTEM   = 00000004
LCK$V_NODLCKWT = 00000009
LCK$V_NODLCKBLK= 0000000A


RMS+0D53D:  CLRQ    -(SP)
RMS+0D53F:  MOVQ    #01,-(SP)
RMS+0D542:  CLRQ    -(SP)
RMS+0D544:  CLRQ    -(SP)
RMS+0D546:  CLRL    -(SP)
RMS+0D548:  MOVZWL  #061B,-(SP)
RMS+0D54D:  PUSHAB  3C(R4)
RMS+0D550:  MOVQ    #1F,-(SP)
RMS+0D553:  CALLS   #0D,@#SYS$ENQ
RMS+0D55A:

SYS$ENQ+00002:  CHMK    #004F
SYS$ENQ+00006:

EXE$ENQ+00002:  MOVZBL  04(AP),R3			!efn = 1F
EXE$ENQ+00006:  CMPB    R3,#3F				!lower
EXE$ENQ+00009:  BGTRU   LCK$BREAK_DEADLOCK+000FC	!drop through
EXE$ENQ+0000B:  BICL3   #-00004000,10(AP),R9		!flags -
LCK$V_NODLCKBLK
EXE$ENQ+00014:  MOVL    0C(AP),R8			!lksb
EXE$ENQ+00018:  PROBEW  #00,#18,(R8)			!probe lksb
EXE$ENQ+0001C:  BEQL    EXE$ENQ+00032			!OK, branch

EXE$ENQ+00032:  BLBS    R9,EXE$ENQ+0003B		!LCK$V_VALBLK set 

EXE$ENQ+0003B:  MOVZWL  #0C,R0
EXE$ENQ+0003E:  BRB     EXE$ENQ+00043

EXE$ENQ+00043:  JMP     LCK$NOT_QUEUED+0003F

LCK$NOT_QUEUED+0003F:  PUSHL   R0
LCK$NOT_QUEUED+00041:  MOVL    @#CTL$GL_PCB,R4		!pcb
LCK$NOT_QUEUED+00048:  MOVL    60(R4),R1		!pid
LCK$NOT_QUEUED+0004C:  MOVZWL  #02,R2
LCK$NOT_QUEUED+0004F:  MOVZBL  04(AP),R3		!efn (1F)
LCK$NOT_QUEUED+00053:  JSB     @#V_SCH$POSTEF

V_SCH$POSTEF:  JMP     @#EVENT_FLAGS_AND_ASTS
... post event flag wait

Got lost somewhere here... suspect wrong branch taken 

rsb back to here

LCK$NOT_QUEUED+00059:  MOVL    (SP)+,R0
LCK$NOT_QUEUED+0005C:  RET				!dismiss CHMK ??

200.3COMICS::GLEDHILLFri Jan 13 1995 01:53178
Um shold be type=ts, not ds force of habit.

R6  =lkb, this was queue to the pcb (pcb queue emptry now, but the lkb point to
it). other rgisters don't seem interesting.

We had an accvio in exe$astdel, trying to call an ast.

TDA> show stack
Process stacks (on CPU 01)
--------------------------
Current operating stack (KERNEL):

		7FFE7774  0000061B	PDT$L_BASEBL+00003
		7FFE7778  00000689	PDT$L_SNDDAT_OPER_SNT+00001
		7FFE777C  00000005	
		7FFE7780  7FFE77AC	CTL$GL_KSTKBAS+005AC
		7FFE7784  7FFE7794	CTL$GL_KSTKBAS+00594
		7FFE7788  7FFE778C	CTL$GL_KSTKBAS+0058C
		7FFE778C  8000239E	EXE$EXCPTN+00006
		7FFE7790  00000000	

	 SP =>  7FFE7794  00000000	
		7FFE7798  00000000	
		7FFE779C  7FFE976C	
		7FFE77A0  7FFE9730	
		7FFE77A4  80000014	EXE$QIOW_3+00004
		7FFE77A8  802BC4C4	EXE$CONTSIGNAL+0007C
		7FFE77AC  00000002	
		7FFE77B0  7FFE77D0	CTL$GL_KSTKBAS+005D0


    Press RETURN for more.
Process stacks (on CPU 01)
--------------------------
		7FFE77B4  7FFE77B8	CTL$GL_KSTKBAS+005B8
		7FFE77B8  00000004	
		7FFE77BC  7FFE9730	
		7FFE77C0  FFFFFFFD	
		7FFE77C4  816BDD00	
		7FFE77C8  801844BF	
		7FFE77CC  0000000B	
		7FFE77D0  00000005	

		7FFE77D4  0000000C	
		7FFE77D8  00000000	
		7FFE77DC  801844BF	(R1, FR
		7FFE77E0  802FFF52	EXE$ASTDEL CALLG (SP),(R1)
		7FFE77E4  00000001	

		7FFE77E8  00000005	
		7FFE77EC  80181088	Astparam
		7FFE77F0  00000689	saver0 PDT$L_SNDDAT_OPER_SNT+00001
		7FFE77F4  0073C67C	saved r1 (is lksb from call) 
		7FFE77F8  7FFEE3C6	pc SYS$ENQ+00006
		7FFE77FC  01400000	psl
TDA> set output tt:
TDA> set nolog


IE we appear to have at the bottom of the stack  an ast argument list and 
nothing else.

IF we were in system service i would expect a regular (service_exit) call frame
and if an ast happened an ast dispatching call frame.

The only think I think can have happened here is that we were exiting from the
system service when the ast went off (?). hence the lack of sys_ser callframe.
The ast call frame isn't there as it wo't have get built yet. (crashed in the
callg that builds it.).

So we crashed on the callg (sp), (r1), this has a value in r1  of 801844BF	
(the r1 should be loaded from the ast$l_ast field. THis is the ast address we
shodl jump to, but this value is bad.

Questions is, where did this value come from.

Now r6, the lkb for the lock we are enqing looks a likely candidate (see below)
- it has already been queued at time point to the pcb, and has the pid address,
there is an ast address init, but it is nothing like the one  we crashed at.
However when we did the enq we never asked for an ast, and there is no blking
complet ast in the lkb, so I recon this is stale contents.

SDA> form @r6
81988D00   LKB$L_ASTQFL                    808875C0	PCB+00010
81988D04   LKB$L_ASTQBL                    808875C0	PCB+00010
81988D08   LKB$W_SIZE                          007C	 
81988D0A   LKB$B_TYPE                        35	 
81988D0B   LKB$B_RMOD                      31	 
81988D0C   LKB$L_PID                       0005008A	 
81988D10   LKB$L_AST                       80203572	RMS+0D572
           LKB$W_RQSEQNM                   
81988D14   LKB$L_ASTPRM                    00000000	 
           LKB$L_EPID                      
81988D18   LKB$L_DUETIME                   802C9AF0	LCK$GRANT_REM+001A0
           LKB$L_KAST                      
81988D1C   LKB$L_CPLASTADR                 00000000	 
81988D20   LKB$L_BLKASTADR                 00000000	 
81988D24   LKB$L_DLCKPRI                   0073C67C	 
           LKB$L_LKSB                      
81988D28   LKB$W_FLAGS                         061B	 
81988D2A   LKB$W_STATUS                    0000	 
81988D2C   LKB$L_LKST1                     00000001	 
81988D30   LKB$L_LKID                      44000B19	 
           LKB$L_LKST2                     
81988D34   LKB$B_RQMODE                          05	 
81988D35   LKB$B_GRMODE                        00	 
81988D36   LKB$B_STATE                       01	 
81988D37   LKB$B_EFN                       1F	 
81988D38   LKB$L_SQFL                      81014640	
81988D3C   LKB$L_SQBL                      81843238	
81988D40   LKB$L_OWNQFL                    8186B0C0	
81988D44   LKB$L_OWNQBL                    808876BC	ARB+00084
81988D48   LKB$L_PARENT                    81997300	
81988D4C   LKB$W_REFCNT                        0000	 
81988D4E   LKB$B_TSLT                        8E	 
81988D4F                                   80	 
81988D50   LKB$L_RSB                       81014630	
81988D54   LKB$L_REMLKID                   160016A6	 
81988D58   LKB$L_CSID                      0073C640	 
           LKB$L_OLDASTPRM                 
81988D5C   LKB$L_OLDBLKAST                 80203572	RMS+0D572
81988D60   LKB$L_LCKCTX                    00000000	 
81988D64   LKB$W_PRIORITY                      0000	 
81988D66   LKB$W_STAT2                     0000	 
81988D68   LKB$L_RQSTSRNG                  00000000	 
81988D6C   LKB$L_RQSTERNG                  FFFFFFFF	
81988D70   LKB$L_GRNTSRNG                  00000000	 
81988D74   LKB$L_GRNTERNG                  FFFFFFFF	
81988D78   LKB$L_TSKPID                    0005008A	 
           LKB$C_LENGTH                    

However searching pool I think this is what did it.

TDA> form 816bdd00
816BDD00   ACB$L_ASTQFL                    8186B080	
816BDD04   ACB$L_ASTQBL                    808875C0	PCB+00010
816BDD08   ACB$W_SIZE                          001C	 
816BDD0A   ACB$B_TYPE                        02	 
816BDD0B   ACB$B_RMOD                      20	 
816BDD0C   ACB$L_PID                       0005008A	 
816BDD10   ACB$L_AST                       801844BF	
816BDD14   ACB$L_ASTPRM                    80181088	
816BDD18   ACB$L_KAST                      808E946D	
           ACB$C_LENGTH                    

Now this is not a freak of nature there are many stale acbs with the same PC
(about 30 with loads of different pcs in)
I recon this must be some sort of process monitoring sw and that is had just
been turned of. I suspect that this must have been turned off just before this 
crash and the memory of this code area was deallocated.

Also not that the current proces was priority 1, so maynothave got scheduled
to run this until a while after if the system was busy.
I rang the customer and he cofirmed that perfect tune was turned off just
before on this and other precious occasions. (RAXCO). I asked him to tell us
this up front next time!
--------------------------------------------------------------------------------
heres an example of the acbs, 

Note that they all have address 808E946D  (kast field) This is a bit of
code in pool I loked through it that hand't ben deallaocated,but I couldng find
any text in the block of memory to identify it.


TDA> ex 81678300;80
00060074 2002001C 0027A680 002ABA00  .�*..�'.... t...     81678300
00000000 808E946D 80180690 801844BF  �D......m.......     81678310
81868380 00000000 807CD100 00000000  .....�|.........     81678320
00000000 00000000 0000061B 3EFA52C4  �R�>............     81678330
808EDE5E 00050090 00000000 00000000  ............^�..     81678340
00060001 B34A0024 00000000 808EDDE5  ��......$.J�....     81678350
00000010 80F40000 FFFFFFFF 00060001  ..........�.....     81678360
00000041 41414100 00000000 80F4FCE0  ��......AAAA...     81678370
--------------------------------------------------------------------------------