T.R | Title | User | Personal Name | Date | Lines |
---|
3243.1 | Need more information | STAR::VATNE | Peter Vatne, VMS Development | Wed Aug 22 1990 14:07 | 13 |
| I'd say that your initial reaction was correct -- if the server is running
out of memory, you should increase the number of virtual pages and the
page file quota. However, you have already increased these numbers to
a generous extent. The problem then is either a memory leak in the
server, or VCS is simply consuming too many resources in the server.
Some characterization of VCS would help. How long does it take before
the server runs out of memory? How many connections are there to the
server? How many windows are created? How many pixmaps are created?
Is there something that VCS is continually creating and destroying?
The idea behind the above questions is to narrow down which resource
is chewing up all of the server's memory.
|
3243.2 | exit | BLKWDO::MCAFOOS | Bob McAfoos - TFO MIS | Wed Aug 22 1990 20:03 | 40 |
|
I had the same problem on VMS 5.3 and VCS. I installed VMS 5.4-4G1 on
the VCS system the other day and the windows manager still shuts down
at regular intervals.
I run 3 windows: VCS C3, VCS$ENS, and CLOCK. The window manager shuts
down at different times, but usually can be counted on to fail at least
twice a day.
The configuration is: VS3100, 24meg memory, 2 RZ23. This is the only
satellite on a uVAXII hosted LAVc.
Here is the DECW$SERVER_0_ERROR.LOG .....
21-AUG-1990 15:21:54.3 Hello, this is the X server
Dixmain address=13074
Now attach all known txport images
%DECW-I-ATTACHED, transport DECNET attached to its network
in SetFontPath
Connection b2500 is accepted by Txport
Connection b2538 is accepted by Txport
out SetFontPath
GPX color/monochrome support loaded
gpx$InitOutput address=162c30
Connection Prefix: len == 42
21-AUG-1990 15:23:21.8 Now I call scheduler/dispatcher
21-AUG-1990 15:23:23.7 Connection b2570 is accepted by Txport
21-AUG-1990 15:23:28.5 Connection b2500 is closed by Txport
21-AUG-1990 15:24:03.4 Connection b2570 is closed by Txport
21-AUG-1990 15:24:07.7 Connection b2500 is accepted by Txport
21-AUG-1990 15:24:12.0 Connection b3c18 is accepted by Txport
21-AUG-1990 15:24:36.3 Connection b2570 is accepted by Txport
21-AUG-1990 15:24:42.6 Connection b3c50 is accepted by Txport
21-AUG-1990 15:24:47.5 Connection b3c88 is accepted by Txport
21-AUG-1990 15:25:04.2 Connection 293650 is accepted by Txport
21-AUG-1990 18:19:16.4 Connection b2538 is closed by Txport (status = 20e4)
21-AUG-1990 18:19:59.8 Connection 293650 is closed by Txport
22-AUG-1990 05:15:40.8 %DECW-I-ATTACH_LOST, transport DECNET detected its network shutting down
|
3243.3 | More Info | BLKWDO::MCAFOOS | Bob McAfoos - TFO MIS | Thu Aug 23 1990 16:04 | 48 |
| I am still getting the "Insufficient Virtual Memory" error also.
This is the DECW$SERVER_0_ERROR.LOG for last night's failure:
22-AUG-1990 13:22:49.7 Hello, this is the X server
Dixmain address=13074
Now attach all known txport images
%DECW-I-ATTACHED, transport DECNET attached to its network
in SetFontPath
Connection b2500 is accepted by Txport
out SetFontPath
GPX color/monochrome support loaded
gpx$InitOutput address=153230
Connection Prefix: len == 42
22-AUG-1990 13:24:10.1 Now I call scheduler/dispatcher
22-AUG-1990 13:24:12.5 Connection b2538 is accepted by Txport
22-AUG-1990 13:24:17.1 Connection b2500 is closed by Txport
22-AUG-1990 13:25:58.1 Connection b2500 is accepted by Txport
22-AUG-1990 13:28:15.4 Connection b2538 is closed by Txport
22-AUG-1990 13:28:19.7 Connection b2570 is accepted by Txport
22-AUG-1990 13:28:24.1 Connection b3c18 is accepted by Txport
22-AUG-1990 13:28:42.1 Connection b2538 is accepted by Txport
22-AUG-1990 13:28:48.7 Connection b3c50 is accepted by Txport
22-AUG-1990 13:28:52.9 Connection b3c88 is accepted by Txport
22-AUG-1990 13:29:09.1 Connection 293650 is accepted by Txport
22-AUG-1990 23:02:12.7 Using extra todo packet pool...
22-AUG-1990 23:43:15.0 %LIB-?-INSVIRMEM, insufficient virtual memory
-C74-W-NOMSG, Message number 801DB2F0
-C74
Request opcode 53 is ignored due to internal runtime error 158217 for client 3(#error = 1)
Exception Call stack dump follows:
a748d
eae2
df7c
154203
15a67b
15a6c3
20adf
d5ee
1083d
10355
13343
********** marking the end of call stack dump **********
********************************************************
22-AUG-1990 23:45:39.5 %LIB-?-INSVIRMEM, insufficient virtual memory
-C74-W-NOMSG, Message number 801DB2F0
-C74
|
3243.4 | Hello, Is Anyone Home???? | BLKWDO::MCAFOOS | Bob McAfoos - TFO MIS | Tue Aug 28 1990 14:15 | 11 |
| Is anyone out there??????? Or is this a case of ignore it and hope it
goes away???
Both DECWindows and VCS are very good products, but if they're
constantly crashing, who needs it?
If more info is required, let me know and I'll try to provide it.
Thanks,
|
3243.5 | Please QAR it | STAR::VATNE | Peter Vatne, VMS Development | Tue Aug 28 1990 15:06 | 8 |
| As has been remarked many times before, if you want to bring this
problem to the attention of developers, please QAR this. You may
wish to QAR it against both VCS and DECwindows, in case the problem
is simply one of excess resource usage on VCS' part. We know that
the server should handle insufficient virtual memory errors better,
but we are more concerned at the moment about possible memory leaks.
See note 2 on how to enter a DECwindows QAR.
|
3243.6 | MORE INFO | RUTILE::DC_BERETTA | | Wed Aug 29 1990 07:18 | 311 |
| Hello Peter
Yes, and thankyou I will QAR this problem with both VCS and DECWINDOWS.
I have been observing both systems and have waited for them to crash
again. One of them crashed yesterday again (FNYVCS) and the next one looks
like it is preparing for a crash (FYVCS2). My reason for saying this
is because having done an "ANALYSE/ERROR" on both systems I noticed at the
approximate time of the first one crashing (i.e. 8 seconds later)
a non fatal bug check was recorded. (see below) On the second system
a similiar non-fatal bugcheck was recorded yesterday evening but
WINDOWS and VCS are still up.
Error log report for node: FNYVCS
---------------------------------
V A X / V M S SYSTEM ERROR REPORT COMPILED 29-AUG-1990 09:21
PAGE 10.
******************************* ENTRY 809. *******************************
ERROR SEQUENCE 616. LOGGED ON: SID 0A000004
DATE/TIME 28-AUG-1990 14:24:44.24 SYS_TYPE 01120102
SCS NODE: FNYVCS VAX/VMS V5.3
NON-FATAL BUGCHECK KA650 CPU REV# 5. FW REV# 1.2
SSRVEXCEPT, Unexpected system service exception
PROCESS NAME VCS FNYVCS 0.0
PROCESS ID 00010013
ERROR PC 80002386
ERROR PSL 01400000
INTERRUPT PRIORITY LEVEL = 00.
PREVIOUS MODE = EXECUTIVE
CURRENT MODE = EXECUTIVE
STACK POINTERS
KSP 7FFE77B4 ESP 7FFE9794 SSP 7FFECBFC USP 7FF27C6C ISP 806EF200
GENERAL REGISTERS
R0 00000001 R1 80002380 R2 000001BC R3 7FF86E20 R4 00000002
R5 7FFECC44 R6 000001BC R7 7FF86E20 R8 7FF56D78 R9 7FFECA0C
R10 803E693A R11 7FFE2BDC AP 7FFE97AC FP 7FFE9794 SP 7FFE77F8
TODR 883636B5
CADR 000000FC
1ST LEVEL CACHE STATUS:
_D STREAM ENABLED
_I STREAM ENABLED
_SET 1 ENABLED
_SET 2 ENABLED
MSER 00000000
1ST LEVEL CACHE HIT
DSER 00000000
QBEAR 0000000A
DEAR 00000000
IPCR0 00000020
LOCAL MEMORY EXT ACCESS ENABLED
MEMCSR16 00000044
PHYSICAL PAGE ADDR = 00000(X)
ECC ERROR SYNDROME = 44(X)
MEMCSR17 00001005
ECC ENABLED
CRD INTERRUPT ENABLED
MAIN MEM CYCLE SELECT = 5/3
MEMCSR0 80000016
MEMORY MODULE TYPE = MS650 (8 MB)
SYSTEM BANK = 00.
MEMCON 00AA3333
MEMORY CONFIGURATION:
_BANKS ENABLED = 0011001100110011
V A X / V M S SYSTEM ERROR REPORT COMPILED 29-AUG-1990 09:21
PAGE 11.
_MEMORY MODULE #1. - MS650 (8 MB)
_MEMORY MODULE #2. - MS650 (8 MB)
_MEMORY MODULE #3. - MS650 (8 MB)
_MEMORY MODULE #4. - MS650 (8 MB)
MEMORY ERROR STATUS:
_MEMORY MODULE #1.
CACR FFB1F690
2ND LEVEL CACHE STATUS:
_ENABLED
CYCLE SPEED CODE = 2.
CBTCR C0000004
CDAL BUS T/O INTERVAL = 000004(X)
******************************* ENTRY 810. *******************************
ERROR SEQUENCE 730. LOGGED ON: SID 0A000004
DATE/TIME 29-AUG-1990 09:19:03.66 SYS_TYPE 01120102
SCS NODE: FNYVCS VAX/VMS V5.3
TIME STAMP KA650 CPU REV# 5. FW REV# 1.2
ANA/ERR/SINC=20-AUG-1990 00:00:00.00/EXCLUDE=(DISKS,TAPE)/OUT=ERR.TXT
____________________________________________________________________________
And the other node: FYVCS2
--------------------------
V A X / V M S SYSTEM ERROR REPORT COMPILED 29-AUG-1990 09:32
PAGE 5.
******************************* ENTRY 483. *******************************
ERROR SEQUENCE 721. LOGGED ON: SID 0A000004
DATE/TIME 28-AUG-1990 17:48:35.88 SYS_TYPE 01120102
SCS NODE: FYVCS2 VAX/VMS V5.3
NON-FATAL BUGCHECK KA650 CPU REV# 5. FW REV# 1.2
SSRVEXCEPT, Unexpected system service exception
PROCESS NAME FLOCON 093.1
PROCESS ID 00010032
ERROR PC 80002386
ERROR PSL 01400000
INTERRUPT PRIORITY LEVEL = 00.
PREVIOUS MODE = EXECUTIVE
CURRENT MODE = EXECUTIVE
STACK POINTERS
KSP 7FFE77B4 ESP 7FFE9794 SSP 7FFEC9F8 USP 7FF29150 ISP 806EF200
GENERAL REGISTERS
R0 00000001 R1 80002380 R2 00000002 R3 7FFC5AFC R4 803A0F50
R5 00000004 R6 803A0F60 R7 00000003 R8 7FFEDE00 R9 001E44B2
R10 7FFED7D4 R11 7FFE2BDC AP 7FFE97AC FP 7FFE9794 SP 7FFE77F8
TODR 89716BDC
CADR 000000FC
1ST LEVEL CACHE STATUS:
_D STREAM ENABLED
_I STREAM ENABLED
_SET 1 ENABLED
_SET 2 ENABLED
MSER 00000000
1ST LEVEL CACHE HIT
DSER 00000000
QBEAR 0000000A
DEAR 00000000
IPCR0 00000020
LOCAL MEMORY EXT ACCESS ENABLED
MEMCSR16 00000044
PHYSICAL PAGE ADDR = 00000(X)
ECC ERROR SYNDROME = 44(X)
MEMCSR17 00001005
ECC ENABLED
CRD INTERRUPT ENABLED
MAIN MEM CYCLE SELECT = 5/3
MEMCSR0 80000016
MEMORY MODULE TYPE = MS650 (8 MB)
SYSTEM BANK = 00.
MEMCON 00AA3333
MEMORY CONFIGURATION:
_BANKS ENABLED = 0011001100110011
V A X / V M S SYSTEM ERROR REPORT COMPILED 29-AUG-1990 09:32
PAGE 6.
_MEMORY MODULE #1. - MS650 (8 MB)
_MEMORY MODULE #2. - MS650 (8 MB)
_MEMORY MODULE #3. - MS650 (8 MB)
_MEMORY MODULE #4. - MS650 (8 MB)
MEMORY ERROR STATUS:
_MEMORY MODULE #1.
CACR FFA1E890
2ND LEVEL CACHE STATUS:
_ENABLED
CYCLE SPEED CODE = 2.
CBTCR C0000004
CDAL BUS T/O INTERVAL = 000004(X)
ANA/ERR/SINC=15-AUG-1990 00:00:00.00/INCLUDE=BUGCHECK/OUT=ERR.FYVCS2
RE: Note .1
===========
I apologise for not supplying you with more information sooner.
Can you suggest some ways for me to detect "memory leaks".
Accounting reveals to me that there were no users logged in at the
time of the crash. A review of the log files does not indicate anything
obvious other than the normal events being logged. A "show error"
does not indicate any memory errors. Should I install VPA or SPM
or something similar, or use Monitor with record/playback ?
Part of the accounting report at about the time of the crash:
DETACHED Process Termination
----------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: <start> Finish time: 28-AUG-1990 14:24:41.21
Process ID: 00000097 Start time: 24-AUG-1990 08:31:56.20
Owner ID: Elapsed time: 4 05:52:45.01
Terminal name: Processor time: 0 00:13:55.08
Remote node addr: Priority: 6
Remote node name: Privilege <31-00>: 17110805
Remote ID: Privilege <63-32>: 00000000
Queue entry: Final status code: 1000000C
Queue name:
Job name:
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB, virtual
Page faults: 28522 Direct IO: 24519
Page fault reads: 81 Buffered IO: 76625
Peak working set: 4000 Volumes mounted: 0
Peak page file: 27306 Images executed: 1
SUBPROCESS Process Termination
------------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: <start> Finish time: 28-AUG-1990 14:24:41.67
Process ID: 0000009C Start time: 24-AUG-1990 08:49:18.23
Owner ID: 00000093 Elapsed time: 4 05:35:23.44
Terminal name: Processor time: 0 00:03:56.09
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: FFFFFFFF
Remote ID: Privilege <63-32>: FFFFFFFF
Queue entry: Final status code: 0BF79351
Queue name:
Job name:
Final status text: <no text>
Page faults: 1578 Direct IO: 42362
Page fault reads: 80 Buffered IO: 25363
Peak working set: 2511 Volumes mounted: 0
Peak page file: 6748 Images executed: 3
SUBPROCESS Process Termination
------------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: <start> Finish time: 28-AUG-1990 14:24:42.92
Process ID: 00000099 Start time: 24-AUG-1990 08:32:37.07
Owner ID: 00000093 Elapsed time: 4 05:52:05.85
Terminal name: Processor time: 0 00:07:35.50
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: FFFFFFFF
Remote ID: Privilege <63-32>: FFFFFFFF
Queue entry: Final status code: 12DBA002
Queue name:
Job name:
Final status text: <no text>
Page faults: 1427 Direct IO: 31
Page fault reads: 81 Buffered IO: 97
Peak working set: 2582 Volumes mounted: 0
Peak page file: 5436 Images executed: 3
DETACHED Process Termination
----------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: <start> Finish time: 28-AUG-1990 14:24:46.44
Process ID: 0000009A Start time: 24-AUG-1990 08:32:41.13
Owner ID: Elapsed time: 4 05:52:05.31
Terminal name: Processor time: 0 00:22:16.71
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: 00158820
Remote ID: Privilege <63-32>: 00000000
Queue entry: Final status code: 12DB821C
Queue name:
Job name:
Final status text: <no text>
Page faults: 3341239 Direct IO: 33
Page fault reads: 1275 Buffered IO: 82226
Peak working set: 2048 Volumes mounted: 0
Peak page file: 13238 Images executed: 1
I notice that pagefaults for this last detached process exceed 3
million over three days. I have noticed previously that before
WINDOWS crashes one of the processes begins faulting like crazy.
I cant remember which process it is though. (I think it was the
"WINDOW 0" process as this often seems to have a PID similar to
0000009A)
1. The server has taken approximately 4 days to crash. This is an
improvement. Since, before increasing VIRTUALPAGECNT and PGFLQUO
it would crash daily.
2. Connections to the server: Both nodes have 20 other nodes connected
to them.
3. Windows created: The C3 display, the EVENTS window and usually
only one other window monitoring a system.
4. Number of pixmaps: (excuse my ignorance: I know what a pixel
is and what a map is, I assume a pixmap is a map of pixels - how
do I know how many pixmaps there are ? )
5. "Is there something VCS is continually creating and destroying?"
I havent noticed, except previously when there was a large number
of events being recorded by VCS, it caused windows to crash. The
VCS$IODL process was using up 100% of the cpu. However on the most
recent crash it is not apparent that there were a large number of
events being recorded.
Peter Beretta
|
3243.7 | WINDOW does leak | VINO::MCARLETON | Reality; what a concept! | Wed Aug 29 1990 11:09 | 15 |
|
If you are running the VCS with the ENS WINDOW action triggering on a
large number of events, it could be the source of a memory leak. The
process name would be "WINDOW 01". The WINDOW application uses the
VLIST widget to add a line to the window for each event. Since the
VLIST widget is never cleared out, the number of events and the size of
the pagefile space can grow very large over time. The work around is
to quit the WINDOW application once in a while to allow ENS to start a
fresh one.
The C3 application should not leak memory. If you find that it does,
let me know and I'll try to plug it. Most things in the C3 get reused
once they are created.
MJC
|
3243.8 | How to get a crash dump | STAR::VATNE | Peter Vatne, VMS Development | Wed Aug 29 1990 15:06 | 12 |
| Thanks for the additional information. The non-fatal bugcheck is a very
serious problem. It indicates that something is wrong in privileged
code, but the problem was confined to a single process in executive mode.
If you are brave, I would like you to crash your system to get a
crash dump of the problem. To turn a non-fatal bugcheck into a fatal
bugcheck, set the SYSGEN parameter BUGCHECKFATAL to 1. Then just
wait until the problem occurs again, and you will get a crash dump.
Once you get a dump, you can set BUGCHECKFATAL back to 0.
Once you get the dump file, please enter a QAR with a pointer to
your crash dump. Thanks very much!
|
3243.9 | CRASH DUMP | RUTILE::DC_BERETTA | | Tue Sep 04 1990 06:12 | 9 |
| I have a crash dump for you. Yes the system crashed this morning and
the crash dump is located in the default decnet account on node FNYVCS.
The dump file is called "FNYVCS.DMP". Output from analyse/error is in a
file called "FNYVCS.FATAL_BUGCHECK". All the current log files from VCS
and DECWINDOWS can also be found there.
Look forward to hearing from you,
Peter.
|
3243.10 | Send pointers via mail | BOMBE::MOORE | Eat or be eaten | Tue Sep 04 1990 21:31 | 6 |
| Please do not publish locations of world readable system dump files!
And don't put them in easily discovered places, like default network
directories.
A knowledgeable hacker could utilize information which may be contained
in the dump file to break into your system.
|
3243.11 | Protected dump file | RUTILE::DC_BERETTA | | Wed Sep 05 1990 06:07 | 3 |
| I have protected the dump file. Those requiring access must please send
me a mail message with your credentials plus telephone number.
Peter.
|
3243.12 | | DECWIN::FISHER | Locutus: Fact or Fraud? | Mon Sep 10 1990 12:59 | 9 |
| Unfortunately for us (fortunately for him) Peter was at European DECUS last
week and is taking a well-deserved vacation in Europe for the next 3 weeks.
Please put the appropriate information in the QAR. Another useful bit of info
would be to do SHOW PROC/CONT on the server and include a snapshot of that. We
would be especially interested in the total virtual pages of the server process.
Thanks,
Burns
|
3243.13 | I've got the dump | VINO::MCARLETON | Reality; what a concept! | Mon Sep 10 1990 14:45 | 8 |
|
Re: .12
I have a copy of the crash dump from Peter's VCS system. The PHD
for the DECW$SERVER_0 process shows a REFCNT of 18558 for page file 0.
Contact me if you want a proxy to copy the dump.
MJC
|
3243.14 | work around | RUTILE::DC_BERETTA | | Mon Sep 24 1990 10:13 | 5 |
|
A work around that I am using to stop VCS from crashing all the time is
to stop ENS starting up. The two nodes have been up now for over 10
days. I assume from this that ENS is the cause for the "memory leaks"
as suspected. Any news from the developers ?
|
3243.15 | VCS?? | REDBRD::COOLEY | | Mon Sep 24 1990 10:32 | 4 |
| please pardon ignorance - what is VCS? what is ENS? is this perhaps
applicable in note 3249?
|
3243.16 | VCS and ENS | RUTILE::DC_BERETTA | | Mon Sep 24 1990 11:43 | 5 |
| It may sound inapropriate to be mentioning ENS and VCS in this
conferance, however they are actually relevent. VCS stands for "VAXcluster
Console System" and ENS is a utility within VCS called "Event
Notification System". The problems being experienced are related to
Decwindows as version 1.3 of this application runs under Decwindows.
|