| Company TELEWEST - UNITED ARTISTS
Department COMMUNICATIONS (SCOTLAND)
Street 1 SOUTH GYLE CRESCENT
City EDINBURGH
Postal Code EH12 9EG PO No 26-MAY-1995 16:24
Caller ANDY THORN Title MR
Phone 01753 790 470 Extension D/L
Service Wish ** PLEASE SEE 'A' DESC **
---------------------------------Description------------------------------------
--------------------------------------------------------------------------------
Log No 70112.00-54B-1UVO Desc type TS
Sequence no 01 Authr badge no 064234
Creation D/T 3-JUN-1995 15:28
--------------------------------------------------------------------------------
THIS IS FOR THE FIRST DUMP
""""""""""""""""""""""""""
System crash information
------------------------
Time of system crash: 25-MAY-1995 23:35:15.27
Version of system: VAX/VMS VERSION V5.5-2H4
System Version Major ID/Minor ID: 1/0
VAXcluster node: EAGLE, a VAX 7000-640
Crash CPU ID/Primary CPU ID: 02/00
Bitmask of CPUs active/available: 0000000F/0000000F
CPU bugcheck codes:
CPU 02 -- PGFIPLHI, Pagefault with IPL too high
3 others -- CPUEXIT, Shutdown requested by another CPU
CPU 02 Processor crash information
----------------------------------
CPU 02 reason for Bugcheck: PGFIPLHI, Pagefault with IPL too high
Process currently executing on this CPU: FMHELP
Current IPL: 8 (decimal)
CPU database address: 804B4000
ISP = 804B513C
KSP = 0000000A ......... WHAT HAPPENED TO THE KSP?
ESP = 0010C8D7
SSP = 8050C020 ......... THIS DOES NOT LOOK RIGHT
USP = 804F7C3C ......... NOR DOES THIS
CPU 03 Processor crash information
----------------------------------
CPU 03 reason for Bugcheck: CPUEXIT, Shutdown requested by another CPU
Process currently executing on this CPU: II_DBMS_3322
Current image file: DSA100:[INGRES.BIN]IIDBMS.EXE;1
Current IPL: 31 (decimal)
CPU database address: 804B2000
** NO PROCESS ON CPU 00 OR 01 **
Current operating stack (INTERRUPT):
804B511C 00000030
804B5120 00000030
804B5124 803BBF8C
804B5128 804B5170
804B512C 804B515C
804B5130 804B5134
804B5134 8050C014 MMG$MODIFY_FAULT+000EC
804B5138 04080000
SP => 804B513C 00000000
804B5140 00000000
804B5144 00000000
804B5148 8740271B
804B514C 8037E435
804B5150 04080008
804B5154 00000064
804B5158 8037D8E0
804B515C 00000000
804B5160 20000000 CPU$M_CPUSPEC2
804B5164 804B51A8
804B5168 804B5190
804B516C 8037D981
804B5170 00000003
804B5174 803A3E0C
804B5178 00000008
804B517C 8037D8E0
804B5180 803A3E0C
804B5184 FFFFFFFF
804B5188 00000008
804B518C 00000001
804B5190 00000000
804B5194 28000000
804B5198 7FFE97B0
804B519C 7FFE77E4 CTL$GL_KSTKBAS+005E4
804B51A0 80384A01
804B51A4 0C040007
SDA> sho page/sys 8740271B;1
System page table
-----------------
ADDRESS SVAPTE PTE TYPE PROT BITS PAGTYP LOC STATE
TY
PE REFCNT BAK SVAPTE FLINK BLINK
87402600 BDE0404C 30000000 DZERO ERKW K
SDA> e/i 8037E435-30;30
8037E405: HALT
8037E406: MOVL 04(AP),R0
8037E40A: CLRL 5A(R0)
8037E40D: BICL3 #0000007F,04(AP),-(SP)
8037E416: CALLS #01,803777C0
8037E41D: BRB 8037E453
8037E41F: MOVL 04(AP),R0 ! this is where the original R0 came from <--+
8037E423: TSTL 10(R0) |
8037E426: BEQL 8037E43F |
8037E428: PUSHL 0C(AP) |
8037E42B: PUSHL #00000064 |
8037E431: MOVL 10(R0),R0 ! where did original come from <-------------+
8037E435: PUSHL 5A(R0) ! where did R0 come from? >-----------------+
whose code?
SDA> ex @ap+4
804B5174: 803A3E0C ".>:."
SDA> ex @.+10
803A3E1C: 874026C1 "�&@."
SDA> ev @.+5a
Hex = 8740271B Decimal = -2025838821 ! failing VA
SDA> ex r0
R0: 874026C1 "�&@
Process index: 0042 Name: FMHELP Extended PID: 20C09142
-----------------------------------------------------------
Process status: 02040001 RES,PHDRES
PCB address 823574A0 JIB address 83934220
PHD address ******* A482D600 Swapfile disk address 00000000
Master internal PID 00910042 Subprocess count 0
Internal PID 00910042 Creator internal PID 00000000
Extended PID 20C09142 Creator extended PID 00000000
State CUR 02 Termination mailbox 0000
Current priority 2 AST's enabled KESU
Base priority 1 AST's active NONE
UIC [00500,000430] AST's remaining 103
Mutex count 0 Buffered I/O count/limit 99/100
Waiting EF cluster 0 Direct I/O count/limit 100/100
Starting wait time 1B001E1D BUFIO byte count/limit 98848/98944
Event flag wait mask BFFFFFFF # open files allowed left 148
Local EF cluster 0 C0000001 Timer entries allowed left 40
Local EF cluster 1 80000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 449
Global cluster 3 pointer 00000000 Global WS page count 117
SDA> show dev
I/O data structures
-------------------
DDB list
--------
Address Controller ACP Driver DPT DPT size
------- ---------- --- ------ --- --------
83F0D480 INET INETDRIVER 81EA69E0 0980
83F0D100 NTY NTYDRIVER 81EA6180 01B0
SDA> SEARCH/STEPS=BYTE/LENGTH=LONG MMG$A_SYS_END:@EXE$GL_RPB 54454E49
Searching from 8000AF92 to 805A1200 in BYTE steps for 54454E49...
Match at 80364A0C
Match at 80383B9F
SDA> ex 80364A0C;10
EF17003C 00015EE2 EF17083C 54454E49 INET<..��^..<..� 80364A0C
^
WOLLONGONG-----------------------------|
SDA> ex 80383B9F;10
00535245 56524553 5F54454E 490C0000 ...INET_SERVERS. 80383B9C
SDA> SHOW PROCESS INET_SERVERS
Process index: 001A Name: INET_SERVERS Extended PID: 20C0011A
-----------------------------------------------------------------
Process status: 00148001 RES,NOACNT,PHDRES,LOGIN
--------------------------------------------------------------------------------
Log No 70112.00-54B-1UVO Desc type TS
Sequence no 02 Authr badge no 064234
Creation D/T 4-JUN-1995 09:04
--------------------------------------------------------------------------------
THIS IS FOR THE SECOND CRASH
""""""""""""""""""""""""""""
System crash information
------------------------
Time of system crash: 30-MAY-1995 11:04:36.46
Version of system: VAX/VMS VERSION V5.5-2H4
System Version Major ID/Minor ID: 1/0
VAXcluster node: EAGLE, a VAX 7000-640
Crash CPU ID/Primary CPU ID: 00/00
Bitmask of CPUs active/available: 0000000F/0000000F
CPU bugcheck codes:
CPU 00 -- KRNLSTAKNV, Kernel stack not valid
3 others -- CPUEXIT, Shutdown requested by another CPU
CPU 00 Processor crash information
----------------------------------
CPU 00 reason for Bugcheck: KRNLSTAKNV, Kernel stack not valid
Process currently executing on this CPU: II_GCC_1621
Current image file: DSA100:[INGRES.BIN]IIGCC.EXE;1
Current IPL: 31 (decimal)
CPU database address: 84398000
MPB address: 81DD57F0
Spinlocks currently owned by CPU 00
IOLOCK8 Address 8058BCF0
Owner CPU ID 00 IPL 08
Ownership Depth 0001 Rank 14
CPUs Waiting 0000 Index 34
CPU 01 Processor crash information
----------------------------------
CPU 01 reason for Bugcheck: CPUEXIT, Shutdown requested by another CPU
Process currently executing on this CPU: II_DBMS_E30
Current image file: DSA100:[INGRES.BIN]IIDBMS.EXE;1
Current IPL: 31 (decimal)
CPU database address: 804B6000
No spinlocks currently owned by CPU 01
*** NO PROCESS ON CPU 02 ***
CPU 03 Processor crash information
----------------------------------
CPU 03 reason for Bugcheck: CPUEXIT, Shutdown requested by another CPU
Process currently executing on this CPU: II_DBMS_E2F
Current image file: DSA100:[INGRES.BIN]IIDBMS.EXE;1
Current IPL: 31 (decimal)
CPU database address: 804B2000
No spinlocks currently owned by CPU 03
SDA> E/I @PC-10;10
%SDA-W-INSKIPPED, unreasonable instruction stream - 1 bytes skipped
EXE$COMPAT+00035: MOVZWL #042C,-(SP)
EXE$COMPAT+0003A: PUSHL #04
EXE$COMPAT+0003C: BRW EXE$EXCEPTION
EXE$COMPAT+0003F: HALT
EXE$KERSTKNV: BUGW #020C ! BUG$_KRNLSTAKNV
EXE$MCHECK: MOVZWL #02BC,-(SP)
SDA> sho sta
CPU 00 Processor stack
----------------------
Current operating stack (INTERRUPT):
843991D8 00000000
843991DC 83963394
843991E0 803A4680
843991E4 7FFE7224 CTL$GL_KSTKBAS+00024
843991E8 7FFE7208 CTL$GL_KSTKBAS+00008
843991EC 843991F0
843991F0 804F7D44 EXE$MCHECK
843991F4 041F0000
SP => 843991F8 823C8272
843991FC 00080000 UCB$M_MNTVERPND
SDA> sho sta/k
Process stacks (on CPU 00)
--------------------------
KERNEL stack:
7FFE7200 823C8272
7FFE7204 00080000 UCB$M_MNTVERPND
SP => 7FFE7208 00000000
7FFE720C 200A0000
7FFE7210 7FFE7250 CTL$GL_KSTKBAS+00050
7FFE7214 7FFE7230 CTL$GL_KSTKBAS+00030
7FFE7218 823C8548
7FFE721C C0C00000
7FFE7220 839632F0
7FFE7224 00000001
7FFE7228 7FFE722C CTL$GL_KSTKBAS+0002C
7FFE722C 00000000
7FFE7230 00000000
7FFE7234 200E0000
7FFE7238 7FFE72C8 CTL$GL_KSTKBAS+000C8
7FFE723C 7FFE72A8 CTL$GL_KSTKBAS+000A8
7FFE7240 81E00EC6 EXDRIVER+01006
7FFE7244 C0C00000
7FFE7248 0000007A
7FFE724C 839632F0
7FFE7250 00000004
7FFE7254 C0C00114
7FFE7258 00000007
7FFE725C 8000000F EXE$QIOW_2+00007
7FFE7260 00000000
7FFE7264 81E006BB EXDRIVER+007FB
7FFE7268 83963301
7FFE726C 83963304
7FFE7270 81E00DD5 EXDRIVER+00F15
7FFE7274 81EA8140 INETDRIVER+00F10
7FFE7278 00000009
7FFE727C 80381DFE
7FFE7280 81E00927 EXDRIVER+00A67
7FFE7284 81E028E2 EXDRIVER+02A22
7FFE7288 81E024A9 EXDRIVER+025E9
7FFE728C 839632F0
7FFE7290 80381E06
7FFE7294 0000001C
7FFE7298 00000000
7FFE729C 81EA8140 INETDRIVER+00F10
7FFE72A0 80385A4B
7FFE72A4 00000008
7FFE72A8 00000000
7FFE72AC 20380000
7FFE72B0 7FFE7320 CTL$GL_KSTKBAS+00120
7FFE72B4 7FFE72F4 CTL$GL_KSTKBAS+000F4
7FFE72B8 80369B20
7FFE72BC 83963394
7FFE72C0 00000000
7FFE72C4 00000000
7FFE72C8 00000002
7FFE72CC 80381E06
7FFE72D0 839632F0
7FFE72D4 00000000
7FFE72D8 000000B0
7FFE72DC 000000B0
7FFE72E0 00000008
7FFE72E4 80573980
7FFE72E8 80381E06
7FFE72EC 81EA8140 INETDRIVER+00F10
7FFE72F0 80381D88
7FFE72F4 00000000
7FFE72F8 2FC00000
7FFE72FC 7FFE736C CTL$GL_KSTKBAS+0016C
7FFE7300 7FFE7348 CTL$GL_KSTKBAS+00148
7FFE7304 80368C0F
7FFE7308 00000009
7FFE730C 80381D88
7FFE7310 803A46E0
7FFE7314 7FFE7382 CTL$GL_KSTKBAS+00182
7FFE7318 803A4680
7FFE731C 80381D88
7FFE7320 00000004
........ ........
SDA> eval ctl$gl_kstkbas
Hex = 7FFE7200 Decimal = 2147381760 CTL$GL_KSTKBAS
SDA> eval ctl$gl_kspini-1
Hex = 7FFE77FF Decimal = 2147383295 CTL$GL_KSTKBAS+005FF
SDA> sho proc
Process index: 0021 Name: II_GCC_1621 Extended PID: 21201621
----------------------------------------------------------------
Process status: 00140023 RES,DELPEN,RESPEN,PHDRES,LOGIN
PCB address 81EFD840 JIB address 8398AE30
PHD address 8CF38000 Swapfile disk address 00000000
Master internal PID 00160021 Subprocess count 0
Internal PID 00160021 Creator internal PID 00000000
Extended PID 21201621 Creator extended PID 00000000
State CUR 00 Termination mailbox 0000
Current priority 16 AST's enabled KESU
Base priority 16 AST's active U
UIC [00035,000001] AST's remaining 1371
Mutex count 1 Buffered I/O count/limit 1315/1450
Waiting EF cluster 0 Direct I/O count/limit 435/530
Starting wait time 1C001C1B BUFIO byte count/limit ******/1572584
Event flag wait mask DFFFFFFF # open files allowed left 728
Local EF cluster 0 E0000001 Timer entries allowed left 489
Local EF cluster 1 C8000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 9539
Global cluster 3 pointer 00000000 Global WS page count 13
*** WHAT IS WRONG WITH BUFIO byte count/limit ***
SD> sho proc/chan
Process index: 0021 Name: II_GCC_1621 Extended PID: 21201621
----------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DSA100:
0020 83FDB780 DSA100:(2636,2,0)
0030 83FC6800 DSA0:(2079,1,0) (section file)
0040 83FCA080 DSA0:(1434,1,0) (section file)
0050 83FC8180 DSA0:(143,1,0) (section file)
0060 00000000 Busy MBA9892:
0070 00000000 Busy MBA9897:
0080 00000000 NET710:
00B0 00000000 Busy INET2420:
00C0 00000000 Busy INET2608:
00D0 00000000 Busy INET2653:
00E0 00000000 Busy INET2574:
00F0 00000000 Busy INET2607:
0100 00000000 Busy INET2668:
0110 00000000 Busy INET2670:
0120 00000000 MBA8294:
0130 00000000 Busy INET2460:
0140 00000000 Busy INET2491:
0150 00000000 Busy INET2623:
0160 00000000 MBA7477:
0170 00000000 Busy MBA7478:
0180 00000000 Busy INET2461:
0190 00000000 MBA1851:
01A0 00000000 Busy MBA1852:
01B0 00000000 MBA1853:
01C0 00000000 Busy MBA1854:
01D0 00000000 MBA7479:
01E0 00000000 MBA1875:
.... ........ .......
*** THIS CONTINUES WITH SOMETHING LIKE 144 BUSY MBA CHANNELS ***
*** AND SOMETHING LIKE 114 INET CHANNELS ***
|
| --------------------------------------------------------------------------------
Regarding the first dump that seem to be a coding problem in inetdriver code
as we discussed on the phone.
--------------------------------------------------------------------------------
The second one has the following call sequence (this I got from show call
- show call/next sequence, searching on Saved PC to get this extract).
This shows all the return pcs, newest at the top...
823C8548
81E00EC6 EXDRIVER+01006
80369B20
80368C0F
8036AD78
8036AEBD
80368B02
80370478
80375CAA
80376BF8
80376756
80378676
80378357
81EA7AF3 INETDRIVER+008C3
804F8020 EXE$EXCEPTION+00225 ! system service despatch call frame.
8055C9B2 * EXE$RUNDWN+001D2 ! call sys$dassgn here.
804F8020 EXE$EXCEPTION+00225 ! system service despatch call frame.
8056DF85 EXE$CREPRC+00C95 ! call sys$rundwn.
8055DD55 EXE$ASTDEL+00003 ! delete ast queue to process.
8055A830 EXE$EXIT+00030 ! sys$exit calls sys$delprc.
804F8020 EXE$EXCEPTION+00225 ! system service despatch call frame.
Process calls sys$exit first of all...
This shows that the process is in deletion and is in the delete process ast,
calling sys$dassgn (at stage *) for each channel the process (it does each one
at a time, calling sys$assign with the channel no as parameter). at this point
we are processing channel 1890 (show proc/chan doesn't show it, as it has
already been deleted, however can find what it was as follows).
SDA> form/type = ccb @ctl$gl_ccbbase-1890
7FF3D160 CCB$L_UCB 821E47E0 UCB
7FF3D164 CCB$L_WIND 00000000
7FF3D168 CCB$B_STS 00
7FF3D169 CCB$B_AMOD 00
7FF3D16A CCB$W_IOC 0000
7FF3D16C CCB$L_DIRP 00000000
CCB$C_LENGTH
Note that most stuff in the ccb has been cleared by now, this stops show
proc/chan from recognising it as a valid chanel.
SDA> show dev/addr =@(@ctl$gl_ccbbase-1890)
I/O data structures
-------------------
INET2667 Unknown UCB address: 821E47E0
Device status: 00010010 online,deleteucb
Characteristics: 0C140001 rec,avl,mbx,idv,odv
00000000
Owner UIC [000035,000001] Operation count 1 ORB address 821E4890
PID 00000000 Error count 0 DDB address 83F4FF00
Class/Type 00/00 Reference count 0 DDT address 81EA730C
Def. buf. size 65535 BOFF 0000 CRB address 83F4FE80
DEVDEPEND 00000000 Byte count 0000 I/O wait queue empty
DEVDEPND2 00000000 SVAPTE 00000000
FLCK index 34 DEVSTS 0002
DLCK address 8058BCF0
Charge PID 00160021
*** I/O request queue is empty ***
So by calling dassgn for that device we end up calling the inetdriver code
(to be expected as deassgn calls the cancel code for the device we are
deassigning). Once we go into inetdriver we seems to end up going through a
lot of call frames in allocatable system space. THis turns out to be a block
of code that if you examine text right at the start of it has the string
INET in it. (see stars article by searching on INET allocatable system space).
However all these frames are different, so doesn't look we got in an endless
loop (usual cause of these krnlstkinv crashes), rather looks like the kernel
stack is not big enough for what we are trying to do. (this is often a problem
with C code that runs in kernel mode, the stack can get used heavily if you call
a lot of subroutines with lots of arguments/saved registers etc - you can
easily run out of k-stack space).
On axp there is a sysgen param, for the kernal stack size (kstackpages) but
not aware of anything on vax. I think it is fixed at 4 pages or whatever.
Will need to get the customer to get back to the inet vendors as they will need
to modify their code to use the stack less...
|