T.R | Title | User | Personal Name | Date | Lines |
---|
2844.1 | Need more information | STAR::VATNE | Peter Vatne, VMS Development | Thu May 31 1990 18:04 | 5 |
| I'm not promising to look at this crash. However, if you want anyone to
be able to decode this crash, you will have to post the entire contents
of DECW$SERVER_0_ERROR.LOG. The stuff we need occurs before the ACCVIO
message. Everything after that is just the server attempting to continue
as best it can, which unfortunately for your customer is not too far.
|
2844.2 | Another one | UTRTSC::HELDEN | | Fri Jun 01 1990 08:49 | 6 |
| My customer has the same problem with his 3100-spx,also when his system
"hangs" he has TWA_ processes with PFW state and Page flts 2400000.
Hope this will give the solver more information.
Mark
|
2844.3 | 3100 SPX - server crash | CSC32::JJONES | Jeff Jones | Fri Jun 01 1990 20:25 | 182 |
| Here is the full scoop, with work-around, but no solution. I believe the
solution will require a patch to the appropriate source code - MEMMGR.LIS.
[config]
vaxstation 3100 spx
16 mb main memory
rz55 (only disk)
tk50
vms 5.3-1
Local system disk
[symptom]
Periodically, if a user goes to quit out of a session, the session
manager goes bye-bye. You will get a black screen if you hit return.
Bottom half white, top half black. And you will get a USERNAME prompt.
He is running all applications locally.
He loggs in interactively and does "@DECW$STARTUP RESTART"; loggs out
and then the login box comes back.
There appears to be no pattern as to when it happens and when it doesn't.
I got the customer to recreate the problem (first try) and after
logging in we discovered that all decwindows processes except a
DECW$FD are gone. This process is running
DKB100:[SYS0.SYSCOMMON.][SYSEXE]DECW$DWT_FONT_DAEMON.EXE;1 .
A checksum of the server image is good.
There is a SERVER dump.
ACCVIO, PC=DC01, VA=20D, RM=04, PSL=03C00000
DBG> ex/i 0dc01-20:0dc01
SHARE$DECW$SERVER_DIX+99E1: BGTRU SHARE$DECW$SERVER_DIX+9A18
SHARE$DECW$SERVER_DIX+99E3: MOVZWL B^02(R1),R3
SHARE$DECW$SERVER_DIX+99E7: ADDL3 B^04(R1),R2,R0
SHARE$DECW$SERVER_DIX+99EC: ASHL R3,R0,R3
SHARE$DECW$SERVER_DIX+99F0: MOVAB B^14(R1),R0
SHARE$DECW$SERVER_DIX+99F4: MOVL (R0)[R3],R0
SHARE$DECW$SERVER_DIX+99F8: TSTL B^0C(R0)
SHARE$DECW$SERVER_DIX+99FB: BEQL SHARE$DECW$SERVER_DIX+9A10
SHARE$DECW$SERVER_DIX+99FD: MOVL B^0C(R0),R5
SHARE$DECW$SERVER_DIX+9A01: MOVW R4,B^04(R5)
DBG> ex r5
0\%R5: 00000209
DBG> EX/W 20D
0000020D: 4420
DBG> EX R4
0\%R4: 00000060
The "C" code which this macro is emulating is the following:
([SERVER.DIXKERNEL.LIS]MEMMGR.LIS)
17029 2 ALLOCATE(amount, memtype, ptr);
56 D4 007C clrl r6
000001F4 8F 54 D1 007E cmpl r4,#500
59 18 0085 bgeq sym.8
52 D5 0087 tstl r2
55 13 0089 beql sym.8
51 6744 D0 008B movl (r7)[r4],r1
4F 13 008F beql sym.8
50 24 A1 3C 0091 movzwl 36(r1),r0
49 13 0095 beql sym.8
51 20 A1 D0 0097 movl 32(r1),r1
43 13 009B beql sym.8
0C A1 52 D1 009D cmpl r2,12(r1)
35 1A 00A1 bgtru sym.7
53 02 A1 3C 00A3 movzwl 2(r1),r3
50 52 04 A1 C1 00A7 addl3 4(r1),r2,r0
53 50 53 78 00AC ashl r3,r0,r3
50 14 A1 9E 00B0 movab 20(r1),r0
50 6043 D0 00B4 movl (r0)[r3],r0
0C A0 D5 00B8 tstl 12(r0)
13 13 00BB beql sym.6
55 0C A0 D0 00BD movl 12(r0),r5
04 A5 54 B0 00C1 movw r4,4(r5)
0C A0 08 A5 D0 00C5 movl 8(r5),12(r0)
55 2C A0 C0 00CA addl2 44(r0),r5
13 11 00CE brb sym.9
00D0 sym.6:
56 01 D0 00D0 movl #1,r6
0E 11 00D3 brb sym.9
50 D5 00D5 tstl r0
01 00D7 nop
00D8 sym.7:
56 01 D0 00D8 movl #1,r6
06 11 00DB brb sym.9
50 D5 00DD tstl r0
01 00DF nop
00E0 sym.8:
56 01 D0 00E0 movl #1,r6
00E3 sym.9:
56 D5 00E3 tstl r6
24 13 00E5 beql sym.11
52 DD 00E7 pushl r2
54 DD 00E9 pushl r4
00000000* EF 02 FB 00EB calls #2,ALLOCATE_SLOW
55 50 D0 00F2 movl r0,r5
******************************************************************
Using SDA to look at the running image (all code lines up)
Below is a format of the ICB's (Image Control Blocks) verifying the
offset of the shareable library.
******************************************************************
SDA> ex .;68
00200103 007F0068 7FFE2684 7FFCFCC0 ���..&.h..... . 7FFCFB00
52455652 45532457 43454410 00000240 @....DECW$SERVER 7FFCFB10
00000000 00000000 0000004E 49414D5F _MAIN........... 7FFCFB20
00000000 00000000 00000000 00000000 ................ 7FFCFB30
000007FF 00000200 00000000 00000000 ................ 7FFCFB40
00000200 7FFE970C 00000000 00000000 ............... 7FFCFB50
00000000 00000000 00000000 00000000 ................ 7FFCFB60
SDA> ex @.;68
00400300 007F0068 7FFCFB00 7FFCFB70 p��..��.h.....@. 7FFCFCC0
52455652 45532457 4345440F 00000241 A....DECW$SERVER 7FFCFCD0
00000000 00000000 3130305F 5849445F _DIX_001........ 7FFCFCE0
00000000 00000000 00000000 00000000 ................ 7FFCFCF0
000553FF 00004200 04000005 00000002 .........B...S.. 7FFCFD00
00004200 7FFE967C 00000000 00000000 ........|...B.. 7FFCFD10
00000000 00000000 00000000 00000000 ................ 7FFCFD20
**********************************************************************
DECW$SERVER_ERROR.LOG
**********************************************************************
$ ty decw$server_0_error.log;
1-JUN-1990 11:29:56.2 Hello, this is the X server
Dixmain address=13074
Now attach all known txport images
%DECW-I-ATTACHED, transport DECNET attached to its network
in SetFontPath
Connection 99700 is accepted by Txport
out SetFontPath
ScanProc color support loaded
scn$InitOutput address=12acd0
Connection Prefix: len == 60
1-JUN-1990 11:30:40.9 Now I call scheduler/dispatcher
1-JUN-1990 11:30:42.9 Connection 99738 is accepted by Txport
1-JUN-1990 11:30:47.3 Connection 99700 is closed by Txport
1-JUN-1990 14:51:27.4 Connection 99738 is closed by Txport
1-JUN-1990 14:51:29.5 Connection 99700 is accepted by Txport
1-JUN-1990 14:51:32.2 Connection 99770 is accepted by Txport
1-JUN-1990 14:51:42.5 Connection 99738 is accepted by Txport
1-JUN-1990 14:52:03.8 Connection 9adf8 is accepted by Txport
1-JUN-1990 14:53:12.3 Connection 9adf8 is closed by Txport
1-JUN-1990 14:53:36.6 Client 3 resets the server
1-JUN-1990 14:53:36.9 Destroying Loadable Microcode Context
1-JUN-1990 14:53:37.2 ...Deallocating Ucode ROM
1-JUN-1990 14:53:37.4 ...Deallocating Ucode FIXED
1-JUN-1990 14:53:37.8 ScanProc color support loaded
scn$InitOutput address=12acd0
Connection 99700 is accepted by Txport
**********************************************************************
DECW$SERVER_OUTPUT.LOG
**********************************************************************
$ ty decw$server_0_output.log;
scn_FreeVisual
scn_closeScreen
[solution]
No solution at this point in time.
A workaround IS to log into the console, under the SYSTEM account,
and issue the following command:
$ @DECW$STARTUP RESTART
If the console appears hung and does not respond to a tap of the
RETURN key do the following:
1) Press the HALT button on the back of the cpu.
2) Type 'C' and RETURN
3) Press RETURN on the keyboard and log in.
jjjones/Colorado CSC
p.s. Should I SPR this?
|
2844.4 | | KLARGO::HECKM | | Mon Jun 04 1990 14:17 | 9 |
| I too am having the same problems. I have been rebooting after the
hang so this will help a lot.
Is it known what privs are needed to restart the server. I would like
for the users to be able to do this, but I would rather not give them
all system privs.
Thanx
Mark
|
2844.5 | May already have been identified and fixed. | CSC32::M_MURRAY | | Mon Jun 04 1990 18:24 | 22 |
|
It might be worthwhile trying CSCPAT_0183, which includes
some stuff for 3100/SPX memory corruption and server hanging
\
\ ECO004 25-APR-1990 R.D.L
\ (actual patch written by someone else)
\ This patch fixes two problems with DECwindows on the VAXstation
\ 3100/SPX:
\
\ 1) Running certain applications and then quitting the session
\ cause memory corruption in the server. Once memory is
\ corrupted, the server may crash, or applications may see
\ XIO errors, or the SPX may hang.
\
\ 2) Logging into a session, AND quitting, four times causes
\ the the SPX to hang on the fourth login.
Cheers,
Mike
|
2844.6 | Another person with the same problem | LAIDBK::ELLISON | That is truly a wetbrain concept. | Mon Jun 04 1990 19:05 | 15 |
|
My customer, McDonnell-Douglas, is having this same problem on their
primary DEMONSTRATION units...
While we've been using the DECW$STARTUP RESTART, it is now becoming a
major issue inside MD and I'm starting to get more heat.
In the meantime, is anything being done to get this resolved...
Jan Ellison
dtn: 533-7787
If it helps I also have SERVER PROCESS crash dumps available over the
network...
|
2844.7 | There's another patch | UTRTSC::HELDEN | | Tue Jun 05 1990 04:31 | 188 |
|
I recieved the following mail from Valbonne when I posted the problem
there.
I haven't tryed it out yet, but it might be useful.
-Mark-
From: BEAGLE::AVIGDOR "Patricia Avigdor" 1-JUN-1990 17:29:39.52
To: UTRTSC::HELDEN
CC: AVIGDOR
Subj: Log# 01JunL01 - Decwindows crash when quitting session
Hello Mark,
The following note should help you.
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | INTEROFFICE MEMORANDUM
| | | | | | | |
+---------------------------+
TO: DISTRIBUTION DATE: April 10, 1990
FROM: Rodney Boyle
DEPT: CSSE TIMA Management
EXT: 276-8781
LOC/MAILSTOP: OGO1-2/E16
SUBJECT: TIME DEPENDENT INFORMATION
The attached information is from the CSSE/VMS Support Group.
It contains important information regarding: Workstation will crash
when quitting a Decwindows session.
Please distribute to all Branch and Support Engineers ASAP.
If you have any questions regarding technical issues, or content of the
article, please contact the author. If you have an questions regarding
distribution or administration of these articles please contact me
directly.
Regards,
Rodney Boyle
===========================================================================
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | TIME DEPENDENT CASE
| | | | | | | |
+---------------------------+
TITLE: Workstation will crash when quitting a Decwindows session
due to a corrupted Flink in the SRP lookaside list.
DATE: 10-APR-1990
AUTHOR: Paul Lacombe TD #: 000259
DTN: 381-1697
ENET: VMSSPT::LACOMBE CROSS REFERENCE #'s:
DEPARTMENT: CSSE/VMS Support Group (SPR's, CLD's, TD's)
CXO-04711
AKO-00981
MST-10351
INTENDED AUDIENCE: U.S./EUROPE/GIA PRIORITY LEVEL: 1
(1 = Time Critical,
2 = NON-Time Critical)
See attachment below
for additional info.
---------------------------------------------------------------------
Author Identification:
----------------------
Name : Tom Carr
DTN : 381-1964
Mail Stop : ZKO1-1/D19
E-net Address : VMSSPT::TCARR
Department : CSSE/VMS Support Group
Article Identification:
-----------------------
Title/Problem Summary : Workstation will crash when quitting
a Decwindows session due to a corrupted
Flink in the SRP lookaside list.
Operating System/Layered Product : VMS
Component/Utility : Decwindows
Version Information : VMS v5.2(Decwindows V1), v5.3(Decwindows V2)
Is the problem reproducible at will? : NO
DETAILED Problem Information:
-----------------------------
Problem Description/Symptoms :
The user logs out of the workstation by selecting 'quit' in
the Session Manager's 'session' menu option. The workstation
may crash and the analysis of the dump will show the SRP list
corruption. The corruption is caused when the 'FLINK' of
the SRP is decremented after it has been returned to the SRP
lookaside list.
Hardware configuration specifics :
Workstations running Decwindows
Software configuration specifics :
VMS v5.2(Decwindows V1) or v5.3(Decwindows V2)
Potential Impact on System Operation :
If all workstations in a LAVC log out at the same time and
some percentage of them crash, the system could become slow
due to workstations rebooting.
Frequency of Occurrence :
Unknown, this is a timing problem and a user may never
experience it
DETAILED Resolution Information:
--------------------------------
Problem Resolution/Workaround Description :
There is a patch that can be obtained by any CSC
from the CSSE VOID patch distribution system. The
patch name is
DECWTRNSPT$PATCH01_530 (for V5.3-n ONLY)
When is the final fix expected (Version/Timeframe)? :
VMS v5.4 (AETNA)
Can the fix be engineered/applied to any previous
versions? If so - when? : NO
Installation Instructions :
The kit that the CSC will provide when you request the
patch contains a .README file which has installation
instructions
Limiting Parameters on Hardware Environment :
NONE KNOWN
Limiting Parameters on Software Environment :
NONE KNOWN
Additional Comments :
*** DIGITAL INTERNAL USE ONLY ***
|
2844.8 | Patch, where? | ASDS::SYSTEM | | Tue Jun 05 1990 10:33 | 8 |
|
Ok, I'll plead ignorance. I need the patch mentioned in .7 but I have
no idea what the CSSE VOID Patch Distribution System is. Can anyone
help me out or point me to where I can obtain the patch over the net?
Thanks!
Jim C.
|
2844.9 | Please contact your CSC | VMSSPT::J_OTTERSON | | Tue Jun 05 1990 12:32 | 4 |
| Hi,
You can get the patch(es) for SPX workstations from your CSC.
Regards, Jeff.
|
2844.10 | | VMSSG::LEMBREE | Just do it. | Tue Jun 05 1990 12:40 | 3 |
| If the problem is specific to the SPX, you probably need the patch
DECWSERVER$PATCH02_053, and not the patch for the transport. This is also
available from the CSCs. Related notes are 2665 and 2824.
|
2844.11 | Am I a page fault | UTRTSC::HELDEN | | Wed Jun 06 1990 05:01 | 15 |
|
Well I havn't much experience with DECwindows, so please correct if
wrong.
I thought that decw$transport_local.exe manages the transport
between the client and the server.
Because my customer has TWA processes in PFW state and lots of
page faults, couldn't there be a connection between PFW and
decw$transport_local.exe ?
Thanx in advance for help.
-Mark-
|
2844.12 | more info about crash SPX | EIGER::STACHER | | Wed Jun 06 1990 06:38 | 48 |
| Hope this is enough. I had only a hardcopy of the DECW$SERVER_ERROR.LOG file
Connectuin 26b230 is accepted by Txport
31-May-1990 17:24:22.0 Now I call scheduler/dispatcher
31-May-1990 17:24:25.5 Conection 26ac18 is accepted by Txport
31-May-1990 17:24:25.8 Connection 26b230 is closed by Txport
1-Jun-1990 14:05:04.7 Connection 26ac18 is closed by Txport
1-Jun-1990 14:05:08.6 Connection 26ac50 is accepted by Txport
1-Jun-1990 14:05:10.9 Connection 26ac88 is accepted by Txport
1-Jun-1990 14:05:21.5 Connection 26ac18 is accepted by Txport
1-Jun-1990 14:06:31.5 %SYSTEM-F-ACCVIO, access violation, reason mask mask=05,
virtual address=639C0062, PC=001317F2, PSL=03C00001
Unrecoverable server error(error code = 12) found, terminating all connections.
Exception Call stack dump follows:
8f8c5
1317f2
131218
12ab4f
12a94a
12ecf4
220aa
d5ee
1083d
10355
13343
********** marking the end of call stack dump **********
********************************************************
1-Jun-1990 14:06:32.3 Destroying Loadable Microcode Context
1-Jun-1990 14:06:32.8 ...Deallocating Ucode ROM
1-Jun-1990 14:06:33.1 ...Deallocating Ucode FIXED
1-Jun-1990 14:06:33.4 MEMMGR/XFREE - memory block 135244 has an invalid
header and is not freed
1-Jun-1990 14:06:33.8
Fatal server bug!
1-Jun-1990 14:06:34.1 Server runtime erro limit exceeded - type @SYS$MANAGER:
DECW$STARTUP server to restart, or see your system manager. 1-JUN-1990 14:06:
34.5
Thanks for analizing
cheers
Christian
|
2844.13 | Sorry, not enough information | STAR::VATNE | Peter Vatne, VMS Development | Mon Jun 11 1990 20:21 | 1 |
| The key piece of information required is the InitOutput address.
|
2844.14 | more info from another customersites | UTOPIE::GAISHAUSER | She's always a VAX to me | Thu Aug 16 1990 10:14 | 22 |
|
Hi y'all !
Two customers experience the same problem.
One only on /SPX-es one on /GPX and /SPX.
The contents of the SERVER_0_ERROR.LOG file are the same
(of course not the Connection #)
The scn$InitOutput address is 12acd0
Hope this helps.
BTW: which patch should I apply for the cust with the /GPX and /SPX ?
______________
\ /
\ Thanks /
|-\ Helmut /
|__\ Hg /
\____/
|