[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | VAX and Alpha VMS |
Notice: | This is a new VMSnotes, please read note 2.1 |
Moderator: | VAXAXP::BERNARDO |
|
Created: | Wed Jan 22 1997 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 703 |
Total number of notes: | 3722 |
563.0. "Alpha V6.2, OPA0: hangs." by PRSSOS::MAILLARD (Denis MAILLARD) Mon May 05 1997 10:37
Strange OPA0: hang problem on Alphaservers. One of our customers is
selling complete Alpha systems with dedicated applications to his own customers.
For over a year his customers have been experiencing hangs of the interactive
process connected to OPA0:. The problem is quite frequent: on some sites the
customer told me up to once a day, more frequently once or twice a week. The
only way to get out of the situation once the hang occurs is a reboot, which is
seldom practical as there are usually other users working on others terminals on
the system (these other terminals never get hung). Another characteristic is
that once the process on OPA0: gets hung, OPA0: starts experiencing errors at a
fast rate (often over ten per minutes). These errors appear in the SHOW ERROR
command, but none of them is ever entered in the errorlog!
I've recently been able to obtain a forced crash dump of an AlphaServer 300
4/266 under V6.2 that was experiencing this problem. The process is in LEF
state, has channels assigned to OPA0: (one is busy), and is apparently under DCL
(no current image, current stack is Supervisor, the recall buffer is
unfortunately not available). There is one IRP in the I/O request queue of OPA0:
with a rather strange function code (xF000, i.e. IO$_NOP plus x4000 and x8000 as
modifiers). Error count for the device is 2200, but none is in the errorlog,
except that just before the crash entry in ANAL/ERROR, one gets a message saying
%ERF-I-UNKENTRY, unknown entry type, 37
Has anybody any notion of what's happening there? Any hint would be
greatly appreciated.
Denis.
EVIDENCES:
Process index: 0011 Name: GUIOT Extended PID: 00000091
----------------------------------------------------------
Process status: 02040001 RES,PHDRES
Required capabilities: 0000000C QUORUM,RUN
PCB address 80D789C0 JIB address 80D3B400
PHD address 8115C000 Swapfile disk address 00000000
Master internal PID 00020011 Subprocess count 0
Internal PID 00020011 Creator internal PID 00000000
Extended PID 00000091 Creator extended PID 00000000
State LEF Termination mailbox 0000
Previous CPU Id 00000000 Current CPU Id 00000000
Previous ASNSEQ 0000000000000001 Previous ASN 0000000000000004
Current priority 8 # of threads 0000000000000000
Initial process priority 4 Delete pending count 0
Base priority 4 AST's active NONE
UIC [00200,000020] AST's remaining 247
Mutex count 0 Buffered I/O count/limit 149/150
Waiting EF cluster 0 Direct I/O count/limit 150/150
Abs time of last event 037EB5BC BUFIO byte count/limit 99424/99808
Event flag wait mask DFFFFFFF # open files allowed left 100
Swapped copy of LEFC0 00000000 Timer entries allowed left 10
Swapped copy of LEFC1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 36
Global cluster 3 pointer 00000000 Global WS page count 17
Process header
--------------
First free P0 address 00000000 Accumulated CPU time 00000105
Free PTEs between P0/P1 2370 CPU since last quantum 10E2
First free P1 address 7EE82000 Subprocess quota 10
P0 page table address 81164000 AST's enabled KESU
P1 page table address 81070000 ASN sequence # 0000000000000001
Free page file pages 3028 AST limit 250
Page fault cluster size 4 Process header index 0001
Page table cluster size 1 Backup address vector 00001000
Flags 00000080 WSL index save area 00001014
Direct I/O count 681 PTs having locked WSLs 2
Buffered I/O count 6249 PTs having valid WSLs 2
Limit on CPU time 00000000 Active page tables 2
Maximum page file count 3125 Maximum active PTs 5
Total page faults 2109 Guaranteed fluid WS pages 20
File limit 100 Extra dynamic WS entries 92
Process index: 0011 Name: GUIOT Extended PID: 00000091
----------------------------------------------------------
Timer queue limit 10 Locked WSLE counts array 4078
Current page file template 00000000 Valid WSLE counts array 4090
Local event flag cluster 0 C0000001 Local event flag cluster 1 E0000000
Process page file assignments
-----------------------------
PROCIDX SYSIDX REFCNT
0 3 46 Current assignment
1 0 0
2 0 0
3 0 0
Remaining reserved pages 114 Total reserved pages 114
Saved process registers
-----------------------
R0 = 00000000 00000001 R1 = 00000000 00000000 R2 = FFFFFFFF 80C66680
R3 = 00000000 7FFBF680 R4 = 00000000 0000001D R5 = 00000000 7FFBF680
R6 = 00000000 7FFBE4C0 R7 = 00000000 7FF91FC0 R8 = 00000000 7EE85EB8
R9 = 00000000 7FF9C400 R10 = 00000000 7FF9D228 R11 = 00000000 7FFBE3E0
R12 = 00000000 00000000 R13 = 00000000 7EF11DA0 R14 = 00000000 FDD04F5F
R15 = 00000000 7EF11DA0 R16 = FFFFFFFF 80C05528 R17 = FFFFFFFF 80D789C0
R18 = 00000000 00000002 R19 = 00000000 00000001 R20 = 00000000 00018009
R21 = 00000000 00018001 R22 = FFFFFFFF 80C331C0 R23 = FFFFFFFF 80D789C0
R24 = 00000000 00000002 R25 = 00000000 00000005 R26 = 00000000 00000FD2
R27 = FFFFFFFF 80C3BE08 R28 = 00000000 7EF11DA0 FP = 00000000 7FF9C2E0
PC = FFFFFFFF 801991E0 PS = 00000000 00000012
KSP = 00000000 7FF91EF0 ESP = 00000000 7FF96000 SSP = 00000000 7FF9C2E0
USP = 00000000 7EE7FD40 PTBR = 00000000 00000EEE
AST{SR/EN} = 0000000F ASN = 00000000 00000004
Working set information
-----------------------
First WSL entry 000000BE Current authorized working set size 250
First locked entry 000000C4 Default (initial) working set size 125
First dynamic entry 000000C6 Maximum working set allowed (quota) 250
Last entry replaced 0000012A
Last entry in list 00000262
Process index: 0011 Name: GUIOT Extended PID: 00000091
----------------------------------------------------------
Lock data:
Lock id: 33000595 PID: 00020011 Flags: VALBLK CONVERT
Par. id: 01000000 SUBLCKs: 0
LKB: 80DB8F40 BLKAST: 00000000
PRIORTY: 0000
Granted at NL 00000000-FFFFFFFF
Resource: 45504F5F 24464D4C LMF$_OPE Status:
Length 18 504C412D 534D564E NVMS-ALP
Exec. mode 00000000 00004148 HA......
System 00000000 00000000 ........
Local copy
Process index: 0011 Name: GUIOT Extended PID: 00000091
----------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DKA0:
0040 00000000 Busy OPA0:
0060 00000000 OPA0:
0090 80D80EC0 DKA0:(422,1,0) (section file)
00A0 80D85F40 DKA0:(3214,2,0) (section file)
Process activated images
------------------------
IMCB Start End Sym Vect Type Image Name Major ID,Minor ID
-------- -------- -------- -------- ------------ -----------------------------
Total images = 0 Pages allocated = 0
OPA0 VT400_Series UCB address: 80C23BF8
Device status: 00000113 tim,int,online,bsy
Characteristics: 0C040007 rec,ccl,trm,avl,idv,odv
00000200 nnm
Owner UIC [000200,000020] Operation count 66326 ORB address 80D2AD80
PID 00020011 Error count 2200 DDB address 80C23A78
Class/Type 42/71 Reference count 2 DDT address 80C23AB8
Def. buf. size 80 BOFF 00000180 CRB address 80C23E00
DEVDEPEND 180891A0 Byte count 00000100 IRP address 80DE4780
DEVDEPND2 F9601400 SVAPTE 80D5DC40 Fork PC 80C5FC20
DEVDEPND3 00000000 DEVSTS 00000001 Fork R3 0000000D
FLCK index 3A Int. due time 0008F277 I/O wait queue 80C23C64
DLCK address 80C23F00
%SDA-W-NOREAD, unable to access location 00012000
%SDA-W-NOREAD, unable to access location 00902A20
I/O request queue
-----------------
STATE IRP PID MODE CHAN FUNC WCB EFN AST IOSB STATUS
C 80DE4780 00020011 E 0040 C000 00000000 29 80C67760 7EFB00E0 8203
nop bufio,func,termio
******************************* ENTRY 306. *******************************
ERROR SEQUENCE 992. LOGGED ON: CPU_TYPE 00000006
DATE/TIME 25-APR-1997 10:25:40.72 SYS_TYPE 0000000D
SYSTEM UPTIME: 6 DAYS 18:49:54
SCS NODE: ALPHA OpenVMS AXP V6.2
HW_MODEL: 00000639 Hardware Model = 1593.
TIME STAMP AlphaServer 300 4/266
%ERF-I-UNKENTRY, unknown entry type, 37 <<<<<<<<<<<<<<<<
******************************* ENTRY 307. *******************************
ERROR SEQUENCE 993. LOGGED ON: CPU_TYPE 00000006
DATE/TIME 25-APR-1997 10:28:22.95 SYS_TYPE 0000000D
SYSTEM UPTIME: 6 DAYS 18:52:37
SCS NODE: ALPHA OpenVMS AXP V6.2
HW_MODEL: 00000000 Hardware Model = 0.
FATAL BUGCHECK AlphaServer 300 4/266
OPERATOR, Operator requested system shutdown
PROCESS NAME SYSTEM
PROCESS ID 00020014
ERROR PC 00000000 000305B4
Process Status = 38000000 00001F03, SW = 03, Previous Mode = USER
System State = 00, Current Mode = KERNEL
VMM = 00 IPL = 31, SP Alignment = 56
STACK POINTERS
KSP 00000000 7FF91EF8 ESP 00000000 7FF96000 SSP 00000000 7FF9C100
USP 00000000 7EE83B80
GENERAL REGISTERS
R0 00000000 00000000 R1 FFFFFFFF 80000000 R2 00000000 7FF86040
R3 FFFFFFFF 80D5E640 R4 00000000 00000001 R5 00000000 00000001
R6 00000000 0003007C R7 00000000 7FF91FC0 R8 00000000 7FF9C1F8
R9 00000000 7FF9C400 R10 00000000 00000000 R11 00000000 7FFBE3E0
R12 00000000 00000000 R13 00000000 000100F0 R14 00000000 00000000
R15 00000000 00020000 R16 00000000 00000474 R17 00000000 00004000
R18 00000000 7FF91E58 R19 00000000 7FF91FC0 R20 FFFFFFFF 8191D97C
R21 20000000 00000003 R22 00000000 00000000 R23 00000000 00000000
R24 FFFFFFFF 80000000 R25 00000000 00000000 R26 FFFFFFFF 80C05E90
R27 00000000 7FF91E2C R28 00000000 00030334 FP 00000000 7FF91F00
SP 00000000 7FF91EF8 PC 00000000 000305B4 PS 38000000 00001F03
SYSTEM REGISTERS
PTBR 00000000 00000CD2
Page Table Base Register
PCBB 00000000 01322080
Privileged Context Block Base
PRBR FFFFFFFF 80D2A000
Processor Base Register
VPTB 00000002 00000000
Virtual Page Table Base Register
SCBB 00000000 000001A2
System Control Block Base
SISR 00000000 00000000
Software Interrupt Summary Register
ASN 00000000 00000001
Address Space Number
ASTSR_ASTEN 00000000 0000000F
AST Summary/AST Enable
FEN 00000000 00000000
Floating-Point Enable
IPL 00000000 0000001F
Interrupt Priority Level
MCES 00000000 00000008
Machine Check Error Summary
T.R | Title | User | Personal Name | Date | Lines |
---|
563.1 | | STAR::LEWIS | | Mon May 05 1997 10:48 | 8 |
| Errors not entered in the errorlog are usually timeouts. We've fixed
many problems in this area for opdriver. You need to get the latest
patch kit -- sorry, I don't know the name (and I'm not certain that
the absolute latest code has been made into a TIMA kit yet).
I'm not sure I know what an Alphaserver 300 is, there may be platform
specific code that would improve the behavior too.
Sue Lewis
|
563.2 | | PRSSOS::MAILLARD | Denis MAILLARD | Mon May 05 1997 11:25 | 6 |
| Re .1: Thanks for the info, Sue. Do you have any idea of where this
latest OPDRIVER can be obtained? The latest tima kit for V6.2 is
ALPOPDR02_062 and it is nearly a year old (June 96). Should I raise an
IPMT to obtain the kit?
Thanks,
Denis.
|
563.3 | | STAR::LEWIS | | Mon May 05 1997 11:29 | 7 |
| >> The latest tima kit for V6.2 is
>> ALPOPDR02_062 and it is nearly a year old (June 96). Should I raise an
>> IPMT to obtain the kit?
That would be a good idea.
Thanks
Sue
|
563.4 | | AUSS::GARSON | DECcharity Program Office | Mon May 05 1997 20:13 | 7 |
| re .0
Try enabling error logging on OPA0. ($ SET DEV/ERROR OPA0)
I vaguely recall that the func gets changed as it wends its way into
the IRP. See whether you can locate the code that queued the I/O (could
be tricky if it's DCL via RMS) and check the caller specified func.
|
563.5 | | PRSSOS::MAILLARD | Denis MAILLARD | Tue May 06 1997 11:30 | 6 |
| Re .3, .4: Thanks for the tips. Actually SHOW CALL shows that the I/O
was generated by an RMS SYS$GET, and finding the original function code
might indeed get a bit tricky. Right now I'm in the process of getting
a connection to a V6.2 source CD-ROM, and also of writing an IPMT form
to get this last version of SY$OPDRIVER.EXE.
Denis.
|