[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | VAX and Alpha VMS |
Notice: | This is a new VMSnotes, please read note 2.1 |
Moderator: | VAXAXP::BERNARDO |
|
Created: | Wed Jan 22 1997 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 703 |
Total number of notes: | 3722 |
342.0. "JOB_CONTROL hung in LEF state" by DECPRG::ZVONAR () Tue Mar 18 1997 04:53
The following problem repeats 2 - 3 times per week on customer system (OpenVMS
Alpha 6.1, single AS2100, shadowed system disk):
A couple of days after boot batch jobs hangs in starting state, sometime one
batch job hangs in executing state during exit. Restart of queue manager does
not solve this problem - every job hangs in starting state. The only way how to
restart queue manager is reboot. After reboot everything works OK, later the
problem appears again.
I found the JOB_CONTROL process hangs in LEF state with busy channel open on
SYS$SYSROOT:[SYSMGR]ACCOUNTNG.DAT. It looks as this situation has impact only
to batch and print jobs. Interactive logging continue without problems (but
without log to accountng.dat).
Installed ECOs: ALPSHAD09_061, AXPSCSI01_061, ALPQMAN03_070, AXPDRIV02_061.
It is the same problem as described in Note 626 in VMSNOTES_V12 - there is no
final solution.
The problem does not depend on job entry number or submit time. Disk free space
is sufficient, fragmentation acceptable.
Thanks for any info or hints,
Karel
------------------------------------------------------------------------------
The JOB_CONTROL looks:
Process index: 000A Name: JOB_CONTROL Extended PID: 0000008A
Process status: 00141003 RES,DELPEN,WAKEPEN,PHDRES,LOGIN
Required capabilities: 0000000C QUORUM,RUN
PCB address 80A29900 JIB address 80A29B80
PHD address 81972000 Swapfile disk address 00000000
Master internal PID 0001000A Subprocess count 0
Internal PID 0001000A Creator internal PID 00000000
Extended PID 0000008A Creator extended PID 00000000
State LEF Termination mailbox 0000
Previous CPU Id 00000000 Current CPU Id 00000000
Previous ASNSEQ 0000000000013DC4 Previous ASN 000000000000001B
Current priority 13 # of threads 0000000000000000
Initial process priority 8 Delete pending count 0
Base priority 8 AST's active NONE
UIC [00001,000004] AST's remaining 295
Mutex count 0 Buffered I/O count/limit 198/200
Waiting EF cluster 0 Direct I/O count/limit 199/200
Abs time of last event 02035FA1 BUFIO byte count/limit 1637440/1637696
Event flag wait mask BFFFFFFF # open files allowed left 197
Swapped copy of LEFC0 00000000 Timer entries allowed left 298
Swapped copy of LEFC1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 130
Global cluster 3 pointer 00000000 Global WS page count 0
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DSA0:
0020 809F6E40 DSA0:[VMS$COMMON.SYSEXE]JBC$JOB_CONTROL*
0030 00000000 Busy MBA1:
0050 809EDE00 DSA0:[VMS$COMMON.SYSEXE]QMAN$MASTER.DAT*
0060 80B64B00 Busy DSA0:[SYS0.SYSMGR]ACCOUNTNG.DAT;2
SDA> show call
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 8008F8B0 SYS$WAITFR_C
Return address on stack = FFFFFFFF 801E135C RMS_NPRO+1535C
Registers saved on stack
------------------------
7FA5F9F0 FFFFFFFF 8086C480 Saved R2 RMS_NPRW+00080
7FA5F9F8 00000000 0005036C Saved R3
7FA5FA00 FFFFFFFF 836C3150 Saved R13 EXE$PRCDELMSG+00048
7FA5FA08 00000000 7FA5FA10 Saved R29
SDA> show call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 801E12D0 RMS_NPRO+152D0
Return address on stack = FFFFFFFF 801E210C SYS$FLUSH_C+0009C
Registers saved on stack
------------------------
7FA5FA20 FFFFFFFF 8086C720 Saved R2 SYS$FLUSH
7FA5FA28 00000000 00018001 Saved R3
7FA5FA30 00000000 0005031C Saved R4
7FA5FA38 00000000 0005036C Saved R5
7FA5FA40 00000000 7FA5FA50 Saved R29
SDA> show call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: FFFFFFFF 801E2070 SYS$FLUSH_C
Return address on stack = 00000000 00034768
Registers saved on stack
------------------------
7FA5FA68 00000000 00010BB8 Saved R2 INI$LNM_OBJECT_REGISTRATION+00888
7FA5FA70 00000000 00050120 Saved R3
7FA5FA78 00000000 7FA5FA80 Saved R29
SDA> show call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 00034688
Return address on stack = 00000000 00033120
Registers saved on stack
------------------------
7FA5FA90 00000000 00010128 Saved R2 SYS$K_VERSION_16+000E8
7FA5FA98 00000000 0005091C Saved R3
7FA5FAA0 00000000 00000000 Saved R4
7FA5FAA8 00000000 00050000 Saved R5
7FA5FAB0 00000000 7FA5FAC0 Saved R29
SDA> show call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 000330A8
Return address on stack = 00000000 00032268
Registers saved on stack
------------------------
7FA5FAD0 00000000 000102E0 Saved R2 SYS$K_VERSION_16+002A0
7FA5FAD8 00000000 00000004 Saved R3
7FA5FAE0 00000000 7FA5FB20 Saved R29
SDA> show call/next
Call Frame Information
----------------------
Stack Frame Procedure Descriptor
Flags: Base Register = FP, No Jacket, Native
Procedure Entry: 00000000 00031C20
Handler at FFFFFFFF 8081CB60, Data = 00000000 00000018
Return address on stack = FFFFFFFF 836B3A24 EXE$PROC_IMGACT_C+003A4
Registers saved on stack
------------------------
7FA5FB50 00000000 7FFBF87C Saved R2 MMG$IMGHDRBUF+0007C
7FA5FB58 00000000 7FFBF960 Saved R3 MMG$IMGHDRBUF+00160
7FA5FB60 FFFFFFFF 80A28B00 Saved R4 PCB
7FA5FB68 00000000 7FF84000 Saved R5
7FA5FB70 FFFFFFFF 8322ADB0 Saved R6
7FA5FB78 00000000 7FA5FBA0 Saved R29
-----------------------------------------------------------------------
T.R | Title | User | Personal Name | Date | Lines |
---|
342.1 | | MOVIES::WIDDOWSON | Rod | Tue Mar 18 1997 05:33 | 4 |
| It might be interesting to see whether the XQP is active.
SDA> Show proc/lock ! and
SDA> CLUE XQP/ACT/FULL
|
342.2 | No active XQP processes | DECPRG::ZVONAR | | Tue Mar 18 1997 08:03 | 17 |
| Today I can check only forced crash dump file. The JOB_CONTROL is currently
running, customer rebooted system yesterday.
SDA> CLUE XQP/ACT/FULL
%CLUE-I-NOACTIVE, there are no active XQP processes
SDA> Show proc/lock
looks the same as on running system.
I have crash dump file and some SDA outputs from running system after problem
appeared.
Any further tip, please?
Thank in advance,
Karel
|
342.3 | Check for "lost" Kmode AST | GIDDAY::GILLINGS | a crucible of informative mistakes | Tue Mar 18 1997 17:19 | 19 |
| Karel,
It's possible to "lose" K mode ASTs, often leaving a process in LEF
state. Typically there's a busy channel to a disk. Format the PCB of
the process and check the AST queue:
80B99528 PCB$L_ASTQFL_K 80B99528 PCB+00028
80B9952C PCB$L_ASTQBL_K 80B99528 PCB+00028
Here the QFL and QBL are the same => empty queue. If they're different
you're probably seeing the problem described. For some reason this
problem seems to show up more frequently if ALPSHAD09 is installed.
Solution is to install patch ALPSYS17_061. Indeed, I'd recommend you
make sure your system has all of the following patches installed:
ALPLIBR05_070, ALPF11X03_070, ALPSYS08_070, ALPRMS04_061,
ALPSYS17_061, ALPSMUP01_070, ALPSHAD09_061, ALPSHAD12_061
John Gillings, Sydney CSC
|
342.4 | QFL and QBL are not the same | DECPRG::ZVONAR | | Wed Mar 19 1997 06:01 | 15 |
| John,
QFL and QBL are not the same:
> 80A28B28 PCB$L_ASTQFL_K 809C9E58
> 80A28B2C PCB$L_ASTQBL_K 80996E80 SISR+00A38
ECOs currently installed:
ALPSHAD09_061, AXPSCSI01_061, ALPQMAN03_070, AXPDRIV02_061.
As the next step I will install ECOs from .3
Thanks for your help,
Karel
|
342.5 | | VIRKE::GULLNAS | Olof Gulln�s, DTN 876-7997 | Sat Mar 22 1997 11:01 | 14 |
| To be really sure that you have the KAST disabled problem you
should also check the AST{SR/EN} register in SDA>show proc/ind=nn/reg.
This only works on crash dumps. ana/system always show the
KASTs as enabled, even if they are disabled. When using
ana/system the non-empty kernel ast queue for a process
is a good indicator.
The circumstances leading to KAST being disabled involves a kernel
mode AST returning at a elevated IPL. OpenVMS Posix code does
this quite frequently. Perhaps the patched shadowing code does
the same.
Olof
|
342.6 | Problem fixed | DECPRG::ZVONAR | | Tue Mar 25 1997 10:59 | 10 |
| Hello,
I installed ECOs recommended in .3. The system is now running 6 days without
problems (before installation of ECOs the problem occured every 2-3 day).
Thanks all for help,
Karel
PS: AST{SR/EN} = 0000001F
|