T.R | Title | User | Personal Name | Date | Lines |
---|
1474.1 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Jan 28 1997 11:00 | 15 |
| Alain,
The forced crash is what we really need. This could be nothing more
than exhaustion of a system resource. Some things to keep an eye on:
Non-paged pool - it could be that non-paged pool needs to expand but
cannot becuase there is insufficient pages on the
free list.
Page-file space - if you have insufficnet page-file space you have a
problem. Again, check the free memory list. If you
have insufficient memory for your envrionment then
the load is placed onthe page-files.
Regs,
dan
|
1474.2 | | BACHUS::BANKEN | | Wed Jan 29 1997 02:22 | 28 |
| Dan,
Thanks for your reply.
I checked the pool and page file at the very beginning and
there is no pool expansion nor page file problem on that
machine
I monitored this system for about 1 day and there is absolutly
nothing to be worry about. Anyway I started PSDC so that we can
analyze the load at the time the problem occured.
Other products behave normally at this moment.
The fact that processes are not removed by a CONSOLE SHUTDOWN
if for my part something important.
They can maybe have some outstanding I/O or/and a lack of
quotas. In this case I expect those processes to come in RWAST
(certainly after a stop/id).
As far as I know, customer has never tried to stop/id nor
observed a RWxxx status.
Is there some quotas you recommand for large site ?
Something special with VXT, do you recommand a 'prefered' transport ?
Is there some ongoing 'case' on hang at this time ?
Many thanks,
Alain.
|
1474.3 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Jan 29 1997 12:38 | 28 |
| >Is there some quotas you recommand for large site ?
Quotas on all the daemons are *quite* large and specified on the
call to SYS$CREPRC that creates them. The point here is that
UAF changes for the system account do not effect the PCM daemons.
Note also that action routine processes are detached processes so they
aren't sharing the ENS daemons quotas.
Of course the C3 is effected by the UAF quotas of whatever account
is running it.
>Something special with VXT, do you recommand a 'prefered' transport ?
Nope. Use whatever you want.
>Is there some ongoing 'case' on hang at this time ?
Nope. We had some ENS hangs a long time ago but they've all been fixed
since ECO 3. If your site is running V1.6-311 (312 may have been
announced as I haven't yet read all the notes from today) your in good
shape.
If you can get the customer to force a crash with DUMPSTYLE = 0 then
we can figure out whats going on from the dump. Thats our best action
plan.
Regs,
Dan
|
1474.4 | | BACHUS::BANKEN | | Tue Feb 04 1997 03:23 | 79 |
| Dan,
I got the forced crash and Console Notify process is in RWAST.
(should be analzed under OpenVMS Alpha 6.2)
Quick overview :
SDA> read/exec
SDA> read sys$loadable_images:sysdef
SDA> sh sum => Console Notify in RWAST
SDA> set proc/id=3c
SDA> sh proc
Process index: 003C Name: Console Notify Extended PID: 0000023C
--------------------------------------------------------------------
Process status: 00140001 RES,PHDRES,LOGIN
Required capabilities: 0000000C QUORUM,RUN
PCB address 81930080 JIB address 817BE700
PHD address 8D4E4000 Swapfile disk address 00000000
Master internal PID 0001003C Subprocess count 0
Internal PID 0001003C Creator internal PID 00000000
Extended PID 0000023C Creator extended PID 00000000
State RWAST Termination mailbox 0037
Previous CPU Id 00000000 Current CPU Id 00000000
Previous ASNSEQ 0000000000006935 Previous ASN 000000000000001B
Current priority 6 # of threads 0000000000000000
Initial process priority 4 Delete pending count 0
Base priority 4 AST's active U
UIC [00001,000004] AST's remaining 970
Mutex count 0 Buffered I/O count/limit 36821/36864
Waiting EF cluster 0 Direct I/O count/limit 1023/1024
Abs time of last event 01C0306E BUFIO byte count/limit 370806/370806
Event flag wait mask 00000001 # open files allowed left 1022
Process index: 003C Name: Console Notify Extended PID: 0000023C
--------------------------------------------------------------------
Swapped copy of LEFC0 00000000 Timer entries allowed left 1024
Swapped copy of LEFC1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 194
Global cluster 3 pointer 00000000 Global WS page count 96
SDA> sh call indicates a EXE$DASSGN_C+0018C
Waiting for I/O request to complete
SDA> sh proc/channel => a lot a busy MBA devices
SDA> sh device/address=@@r6 => mba causing the RWAST
I/O data structures
-------------------
MBA2898 MBX UCB address: 817293C0
Device status: 08000010 online,exfunc_supp
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 2 ORB address 81674440
PID 00000000 Error count 0 DDB address 8C1A4D80
Class/Type A0/01 Reference count 1 DDT address 8C1BC940
Def. buf. size 1049 BOFF 00000000 CRB address 8C1A4DC0
DEVDEPEND 00000001 Byte count 00000000 LNM address 95CA2B60
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue 8172942C
DEVDEPND3 00000000 DEVSTS 00000002
FLCK index 2C
DLCK address 8C1AC280
Charge PID 0001003C
*** I/O request queue is empty ***
and now ????
If you are interested I can make the dump file available (500.000 blocks).
Best regards,
Alain
|
1474.5 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Feb 04 1997 12:28 | 11 |
| Alain,
Format the UCB of the MBA device and post it. You can't trust the
IO request queue empty business on mailboxes. I'll bet we'll find an
IRP at UCB+UCB$$L_MB_READQFL. If so, then SYS$CANCEL was never called
which really bugs me. Also, do a SHOW CALL followed by four SHOW CALL/NEXT
commands.
Are you still running V1.6-310 or have you upgraded it to V1.6-311?
Regs,
Dan
|
1474.6 | | BACHUS::BANKEN | | Wed Feb 05 1997 02:57 | 214 |
| Dan,
Same problem with 1.6-311.
SDA> format 817293C0
817293C0 UCB$L_FQFL 816222C0
UCB$L_MB_MSGQFL
UCB$L_RQFL
UCB$W_MB_SEED
UCB$W_UNIT_SEED
817293C4 UCB$L_FQBL 816222C0
UCB$L_MB_MSGQBL
UCB$L_RQBL
817293C8 UCB$W_SIZE 0118
817293CA UCB$B_TYPE 10
817293CB UCB$B_FLCK 2C
817293CC UCB$L_ASTQFL 00000000
UCB$L_FPC
UCB$L_MB_W_AST
UCB$T_PARTNER
817293D0 UCB$L_ASTQBL 00000000
UCB$L_MB_R_AST
UCB$Q_FR3
817293D4 00000000
817293D8 UCB$L_FIRST 00010000
UCB$Q_FR4
UCB$W_MSGMAX
UCB$W_MSGCNT
817293DC 00000000
817293E0 UCB$W_BUFQUO 147C
UCB$W_DSTADDR
817293E2 UCB$W_INIQUO 147D
UCB$W_SRCADDR
817293E4 UCB$L_ORB 81674440 ORB
817293E8 UCB$L_CPID 0001003C
UCB$L_LOCKID
817293EC UCB$PS_CRAM 00000000
817293F0 UCB$L_CRB 8C1A4DC0 CRB
817293F4 UCB$L_DLCK 8C1AC280 SMP$GL_MAILBOX
817293F8 UCB$L_DDB 8C1A4D80 DDB
817293FC UCB$L_PID 00000000
81729400 UCB$L_LINK 8189AD00
81729404 UCB$L_VCB 00000000
81729408 UCB$L_DEVCHAR 0C150001
UCB$Q_DEVCHAR
8172940C UCB$L_DEVCHAR2 00000200
81729410 UCB$L_AFFINITY FFFFFFFF
81729414 UCB$L_ALTIOWQ 00000000
UCB$L_XTRA
81729418 UCB$B_DEVCLASS A0
81729419 UCB$B_DEVTYPE 01
8172941A UCB$W_DEVBUFSIZ 0419
8172941C UCB$B_LOCSRV 01
UCB$B_SECTORS
UCB$L_DEVDEPEND
UCB$Q_DEVDEPEND
UCB$R_DEVDEPEND_Q_BLOCK
UCB$R_DISK_DEVDEPEND
UCB$R_NET_DEVDEPEND
UCB$R_TERM_DEVDEPEND
8172941D UCB$B_REMSRV 00
UCB$B_TRACKS
8172941E UCB$W_BYTESTOGO 0000
UCB$W_CYLINDERS
UCB$B_VERTSZ
81729420 UCB$L_DEVDEPND2 00000000
UCB$L_TT_DEVDP1
UCB$W_TU_FORMENU
81729424 UCB$L_DEVDEPND3 00000000
UCB$Q_DEVDEPEND2
UCB$R_DEVDEPEND2_Q_BLOCK
UCB$R_TMV_BCNT
UCB$W_TMV_BCNT1
UCB$W_TMV_BCNT2
81729428 UCB$L_DEVDEPND4 00000000
UCB$W_TMV_BCNT3
UCB$W_TMV_BCNT4
8172942C UCB$L_IOQFL 8172942C UCB+0006C
81729430 UCB$L_IOQBL 8172942C UCB+0006C
81729434 UCB$W_UNIT 0B52
81729436 UCB$B_CM1 95
UCB$W_CHARGE
UCB$W_RWAITCNT
81729437 UCB$B_CM2 15
81729438 UCB$L_IRP 00000000
8172943C UCB$L_REFC 00000001
81729440 UCB$B_DIPL 0B
UCB$B_STATE
81729441 UCB$B_AMOD 00
81729442 UCB$W_FILL_0 0000
81729444 UCB$L_AMB 00000000
81729448 UCB$L_STS 08000010
8172944C UCB$L_DEVSTS 00000002
81729450 UCB$L_QLEN 00000000
81729454 UCB$L_DUETIM 00000000
81729458 UCB$L_OPCNT 00000002
8172945C UCB$L_SVPN 00000000
81729460 UCB$L_SVAPTE 00000000
81729464 UCB$L_BCNT 00000000
81729468 UCB$L_BOFF 00000000
8172946C UCB$L_SOFTERRCNT 00000000
81729470 UCB$L_ERTCNT 00000000
81729474 UCB$L_ERTMAX 00000000
81729478 UCB$L_ERRCNT 00000000
8172947C UCB$L_PDT 00000000
81729480 UCB$L_DDT 8C1BC940 MB$DDT
81729484 UCB$PS_ADP 00000000
81729488 UCB$PS_CRCTX 00000000
8172948C UCB$L_MEDIA_ID 00000000
81729490 UCB$PS_DTN 00000000
81729494 UCB$PS_DTN_LINK 00000000
81729498 UCB$PS_TOUTROUT 00000000
8172949C 00000000
817294A0 UCB$L_MB_READERREFC 00000000
817294A4 UCB$L_MB_WRITERREFC 00000000
817294A8 UCB$L_MB_READQFL 817294A8 UCB+000E8
817294AC UCB$L_MB_READQBL 817294A8 UCB+000E8
817294B0 UCB$L_MB_WRITERWAITQFL 817294B0 UCB+000F0
817294B4 UCB$L_MB_WRITERWAITQBL 817294B0 UCB+000F0
817294B8 UCB$L_MB_READERWAITQFL 817294B8 UCB+000F8
817294BC UCB$L_MB_READERWAITQBL 817294B8 UCB+000F8
817294C0 UCB$L_MB_NOWRITERWAITQFL 817294C0 UCB+00100
817294C4 UCB$L_MB_NOWRITERWAITQBL 817294C0 UCB+00100
817294C8 UCB$L_MB_NOREADERWAITQFL 817294C8 UCB+00108
817294CC UCB$L_MB_NOREADERWAITQBL 817294C8 UCB+00108
817294D0 UCB$L_MB_ROOM_NOTIFY 00000000
817294D4 UCB$L_MB_LOGADR 95CA2B60 LNM
SDA> sh call
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 8001C7C0 SCH$RESOURCE_WAIT_PS_C
Return address on stack = FFFFFFFF 8006A64C EXE$DASSGN_C+0018C
Registers saved on stack
------------------------
7FF91F28 00000000 0000000C Saved R0
7FF91F30 00000000 00000003 Saved R1
7FF91F38 FFFFFFFF 81930080 Saved R4 PCB
7FF91F40 FFFFFFFF 8C1BA310 Saved R13 EXE$DASSGN
7FF91F48 00000000 7FF91F50 Saved R29
SDA> sh call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 8006A4C0 EXE$DASSGN_C
Return address on stack = FFFFFFFF 8004F184 EXE$CMODKRNL_C+000C4
Registers saved on stack
------------------------
7FF91F70 00000000 00000003 Saved R2
7FF91F78 FFFFFFFF 8C1B60E0 Saved R3 EXE$GR_CMOD_LINKAGE_SECT
7FF91F80 FFFFFFFF 81930080 Saved R4 PCB
7FF91F88 00000000 00000000 Saved R5
7FF91F90 00000000 00066444 Saved R6
7FF91F98 00000000 7FF91FC0 Saved R7
7FF91FA0 00000000 7FF9C1F8 Saved R8
7FF91FA8 FFFFFFFF 8C1C15D8 Saved R13 KERNEL_ASTDEL_C
7FF91FB0 00000000 7EEF5DA0 Saved R15
7FF91FB8 00000000 7EE497F0 Saved R29
SDA> sh call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 000662D0
Return address on stack = FFFFFFFF 8008E5A4 EXE$AST_RETURN
Registers saved on stack
------------------------
7EE49800 00000000 00000003 Saved R2
7EE49808 00000000 00020260 Saved R3 UCB$M_LCL_VALID+00260
7EE49810 00000000 00020280 Saved R4 UCB$M_LCL_VALID+00280
7EE49818 00000000 00020190 Saved R5 UCB$M_LCL_VALID+00190
7EE49820 00000000 7EE49840 Saved R29
SDA> sh call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 8008CDC0 SCH$ASTDEL_C
Return address on stack = FFFFFFFF 8008E81C SCH$KERNEL_ASTDEL+000DC
Registers saved on stack
------------------------
7EE49858 FFFFFFFF 8C1C16D8 Saved R3 USER_ASTDEL
7EE49860 00000000 7EEF6900 Saved R13
7EE49868 00000000 7EE49A00 Saved R29
SDA> sh call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 0004C690
Return address on stack = 00000000 0003096C
Registers saved on stack
------------------------
7EE49A10 00000000 00010208 Saved R2 SYS$K_VERSION_16+001C8
7EE49A18 00000000 00020040 Saved R3 UCB$M_LCL_VALID+00040
7EE49A20 00000000 7F3EC01C Saved R4
7EE49A28 00000000 00000000 Saved R5
7EE49A30 00000000 7EE49A70 Saved R29
|
1474.7 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Feb 05 1997 14:48 | 19 |
| Now this is really wierd. Notice you have a refcount of 1 yet the
READERREFC AND WRITERREFC fields are 0!!! Thats *never* supposed to
happen!! This tells me that the $DASSGN service must have run
up to some point and then hung.
Also there are messages queued to the UCB. The first two
longwords of the UCB are a listhead. Do a VAL QUE @UCB to see how many
elements there are and then do:
SDA> EXAMINE @UCB;100
followed by the necessary iterations of the following command to get
to the end of the message list.
SDA> EXAMINE @.;100
Regs,
Dan
|
1474.8 | | BACHUS::BANKEN | | Fri Feb 07 1997 06:12 | 46 |
| Dan,
We had yesterday the problem again, I checked the dump and it is
exactly the same.
Examine @ucb;100 displays xterm3, in the new dump I found xterm4
which are vtx2000.
SDA> val que @ucb
Queue is complete, total of 1 element in the queue
SDA> examine @ucb;100
00000000 28130040 817293C0 817293C0 �.r.�.r.@..(.... 816222C0
816222E4 00000000 00000000 00180145 E...........�"b. 816222D0
00000000 00000000 009AF457 00010000 ....W�.......... 816222E0
00000000 0001003C 0000003D 00000043 C...=...<....... 816222F0
00010008 A3750040 8162230C 81835F80 ._...#[email protected]�.... 81622300
00000000 336D7265 74780006 00670040 @.g...xterm3.... 81622310
00000000 00000000 00000000 00000000 ................ 81622320
00000000 00000000 00000000 00000000 ................ 81622330
00000000 00170040 81622358 81621280 ..b.X#b.@....... 81622340
00000000 336D7265 74780006 00670040 @.g...xterm3.... 81622350
00000000 00000000 00000000 00000000 ................ 81622360
00000000 00000000 00000000 00000000 ................ 81622370
001000B0 313500C0 81868530 81868530 0...0...�.51�... 81622380
FFFFFFE0 FFFFFFE0 7EFA00D0 8C1F5460 `T..�.�~�...�... 816223900
7EFA0114 8C1F5460 00000000 8C1C7F38 8.......`T....�~ 816223A0
330012A9 00000001 06000000 0000061B ............�..3 816223B0
SDA> examine @.;100
00000000 2C100118 816222C0 816222C0 �"b.�"b....,.... 817293C0
00000000 00010000 00000000 00000000 ................ 817293D0
00000000 0001003C 81674440 147D147C |.}.@Dg.<....... 817293E0
00000000 8C1A4D80 8C1AC280 8C1A4DC0 �M...�...M...... 817293F0
00000200 0C150001 00000000 8189AD00 .�.............. 817294000
00000001 041901A0 00000000 FFFFFFFF ................ 81729410
8172942C 00000000 00000000 00000000 ............,.r. 81729420
00000001 00000000 15950B52 8172942C ,.r.R........... 81729430
00000002 08000010 00000000 0000000B ................ 81729440
00000000 00000002 00000000 00000000 ................ 81729450
00000000 00000000 00000000 00000000 ................ 81729460
00000000 00000000 00000000 00000000 ................ 81729470
00000000 00000000 00000000 8C1BC940 @�.............. 81729480
00000000 00000000 00000000 00000000 ................ 81729490
817294A8 817294A8 00000000 00000000 ........�.r.�.r. 817294A0
817294B8 817294B8 817294B0 817294B0 �.r.�.r.�.r.�.r. 817294B0B0
|