| Title: | DECWINDOWS 26-JAN-89 to 29-NOV-90 |
| Notice: | See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit |
| Moderator: | STAR::VATNE |
| Created: | Mon Oct 30 1989 |
| Last Modified: | Mon Dec 31 1990 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 3726 |
| Total number of notes: | 19516 |
A customer running VMS 5.3 and ADA multitasking is getting a
number of very strange errors in their (very large) application.
Some of them cause the server the croak, others just kill the
application - not sure if the process dies too - at least all of
the widgets disappear from the screen. Here are some brief
descriptions of the problems. Any pointers to some/any of the
solutions would be helpful... (Note - they may have some code that
�may� be trying to use the toolkit in a re-entrant manner - they
are checking on that).
Rick Beldin
Atlanta CSC
Here they come....
1. This one occurs shortly after ApplicationCreateShell, effect
is to 'hose' (what does that mean?) entire application. This
happens 1 time in 10.
X Toolkit Warning: Cannot convert string
"-*-MENU-MEDIUM-R-Normal--*-120-*-*-P-ISO8859-1"
to type FontList, using fixed font
2. These two sets of error messages have the effect of 'killing
the application' - again not sure if process is deleted or
just that windows disappear... They don't appear together,
but are symptomatic of separate crashes.
XIO: fatal IO error 65535 on XServer "MIP::0.0"
after 5053 request 61 events remaining
XLib: sequence lost ( 0x10020 > 0xb3e ) in reply 0x7!
3. This third problem appears to be around server memory deallocation.
All windows freeze, and it appears that the server has run out
of page file quota. The following error appears when running
their application:
X error event received from server: BadImplementation - server reported
implementation error
Failed request major opcode 53 (X_CreatePixmap)
Failed request minor opcode 0 (if applicable)
ResourceID 0x2011e6 in failed request (if applicable)
Serial number of failed request 44729
Current serial number in output stream 44730
XIO: fatal IO error 65535 on X server "MIP::0.0"
after 44732 requests (44730 known processed) with 0 events remaining
XIO: fatal IO error 65535 on X server "MIP::0.0"
after 44734 requests (44730 known processed) with 0 events remaining
Here are some stats gathered at the time of problem number 3.
VAX/VMS V5.3 on node DMIP 23-MAY-1990 16:07:12.45 Uptime 142 02:17:25
Pid Process Name State Pri I/O CPU Page flts Ph.Mem
00000021 SWAPPER HIB 16 0 0 00:03:22.71 0 0
000093E3 _VTA26: HIB 7 2039 0 00:03:01.61 20023 1773
00001B44 _VTA23: HIB 4 19157 0 00:52:24.65 247262 12000
00000026 ERRFMT HIB 8 6333 0 00:00:36.04 82 137
00000027 OPCOM HIB 7 1138 0 00:00:13.06 506 202
00000028 AUDIT_SERVER HIB 10 30 0 00:00:01.14 1340 243
00000029 JOB_CONTROL HIB 8 53212 0 00:01:53.49 132 310
0000002A CONFIGURE HIB 8 11 0 00:00:00.13 98 165
0000002B NETACP HIB 10 89697 0 00:19:39.23 291 484
0000002C EVL HIB 6 1506 0 00:00:12.56 189621 68
0000002D REMACP HIB 9 114 0 00:00:00.30 77 73
0000884E GRIFFIN HIB 7 1076 0 00:00:05.29 1425 658
00009A4F _RTA1: CUR 4 116 0 00:00:01.43 668 429
00005B51 Len Day LEF 4 421 0 00:00:05.09 1529 356
000047F6 DECW$SERVER_0 HIB 8 21296 0 00:18:19.35 43273 3872
00000057 Window Manager LEF 4 74 0 00:06:54.14 730331 500
$ sh proc /all /id=47f6
23-MAY-1990 16:07:25.06 User: RWP_OPS Process ID: 000047F6
Node: DMIP Process name: "DECW$SERVER_0"
Terminal:
User Identifier: [RWP,RWP_OPS]
Base priority: 6
Default file spec: Not available
Devices allocated: GAA0:
NET3772:
Process Quotas:
Account name:
CPU limit: Infinite Direct I/O limit: 100
Buffered I/O byte count quota: 47664 Buffered I/O limit: 60
Timer queue entry quota: 7 Open file quota: 81
Paging file quota: 0 Subprocess quota: 8
Default page fault cluster: 16 AST quota: 97
Enqueue quota: 28 Shared file limit: 0
Max detached processes: 0 Max active jobs: 0
Accounting information:
Buffered I/O count: 1775 Peak working set size: 4000
Direct I/O count: 19521 Peak virtual size: 27195
Page faults: 43273 Mounted volumes: 0
Images activated: 0
Elapsed CPU time: 0 00:18:19.35
Connect time: 6 00:19:49.88
Process privileges:
CMKRNL may change mode to kernel
SYSNAM may insert in system logical name table
PRMMBX may create permanent mailbox
WORLD may affect other processes in the world
NETMBX may create network device
PRMGBL may create permanent global sections
SYSGBL may create system wide global sections
PFNMAP may map to specific physical pages
SYSPRV may access objects via system protection
There is 1 process in this job:
DECW$SERVER_0 (*)
...from sda
SDA> set proc decw$server_0
SDA> sh proc
Process index: 0016 Name: DECW$SERVER_0 Extended PID: 000047F6
------------------------------------------------------------------
Process status: 00140011 RES,PSWAPM,PHDRES,LOGIN
PCB address 802ED330 JIB address 805387D0
PHD address 80D76000 Swapfile disk address 00000000
Master internal PID 023F0016 Subprocess count 0
Internal PID 023F0016 Creator internal PID 00000000
Extended PID 000047F6 Creator extended PID 00000000
State HIB Termination mailbox 0000
Current priority 11 AST's enabled KESU
Base priority 6 AST's active NONE
UIC [00400,000171] AST's remaining 97
Mutex count 0 Buffered I/O count/limit 58/60
Waiting EF cluster 0 Direct I/O count/limit 100/100
Starting wait time 19001919 BUFIO byte count/limit 47664/47664
Event flag wait mask 0000000C # open files allowed left 81
Local EF cluster 0 60000001 Timer entries allowed left 7
Local EF cluster 1 80000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 3772
Global cluster 3 pointer 00000000 Global WS page count 100
Process index: 0016 Name: DECW$SERVER_0 Extended PID: 000047F6
------------------------------------------------------------------
Saved process registers
-----------------------
R0 = 0000000F R1 = 80193BA0 R2 = 80004BB8 R3 = 00009A24
R4 = 802ED330 R5 = 00008CC0 R6 = 0000600C R7 = 0004A5B0
R8 = 00008F14 R9 = 0002FA2C R10 = 0002C85C R11 = 0004A35C
AP = 7FF8C640 FP = 7FF8C61C PC = 7FFEDF8A PSL = 03C00000
KSP = 7FFE7800 ESP = 7FFE9800 SSP = 7FFED800 USP = 7FF8C61C
P0BR = 80D89400 P0LR = 00006662 P1BR = 80615E00 P1LR = 001FFC50
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 2814.1 | ... | GSRC::WEST | Help stamp out and abolish redundancy ! | Thu May 24 1990 19:10 | 9 |
Are you doing Toolkit calls and X calls from more than one task? If so this is a definite no no. The toolkit is most definitly NOT re-entrant and there is question as to whether Xlib is either. One recommendation is to not use tasking, but if you must, then only one task should do the X and toolkit calls. -=> Jim <=- | |||||
| 2814.2 | Xlib is re-entrant (except for one bug) | STAR::VATNE | Peter Vatne, VMS Development | Thu May 24 1990 20:33 | 11 |
Point of clarification: the Xlib is definitely designed to be re-entrant. There was a bug with Xlib-DECwindows transport interaction that occasionally messed things up. The bug is fixed in VMS V5.4. However, the symptoms of that problem don't match your symptoms. What is strange is that your application can't find the menu fonts. Are they definitely displaying to a VMS workstation? Not finding the menu fonts is symptomatic of displaying to other vendors' workstations. Any server crashes should be immediately QARed. What would be of most interest here is the contents of SYS$MANAGER:DECW$SERVER_O_ERROR.LOG. | |||||
| 2814.3 | Fully re-rentrant? | LEOVAX::TREGGIARI | Fri May 25 1990 09:13 | 5 | |
I know VMS Xlib is AST re-entrant, but is it *fully* re-entrant? If not, I'm not sure that AST re-entrancy will suffice for Ada multi-tasking. Anyone know for sure? Leo | |||||
| 2814.4 | QUARK::LIONEL | Free advice is worth every cent | Sat May 26 1990 16:38 | 6 | |
Re: .3
AST reentrancy is not sufficient for Ada multithread execution. I don't
know how reentrant Xlib is.
Steve
| |||||
| 2814.5 | More info... Application architecture... | PEACHS::BELDIN | Thu May 31 1990 16:05 | 33 | |
Some more info on the customer's architecture. The customer's code has a total of 33 different ADA tasks running, without timeslice. There are a total of 6 tasks that execute Xlib and X-toolkit calls, all at the same priority (8). One of these is an event dispatcher that basically call XtPending and XtProcessEvent. As I understand it, under this strategy, none of these should be interrupted by another task at the same priority, they should all run to completion. There is a database processing task that runs at a lower priority, 7, and a communications something-or-other that runs at priority 12. Neither of these has any calls to Xlib or toolkit. Does this kind of setup appear to violate any 'multi-tasking rules'? I know it violates what appears to be a cardinal rule of having all the X-toolkit stuff in one task... There is also a problem that they call the 'sudden death syndrome.' It appears that somewhere they come across some error that their ADA handler can't handle ( fatal NON-ADA error?) and then their process appears to go into hibernation. Looking at the process, it appears that user-mode ASTs are somehow disabled - which I think is kind of critical to the way that ADA tasking works. He suspects that the ADA tasking mechanism is somehow loosing its mind - I don't know enough about ADA to know... Maybe someone could shed some light on this as a possibility? I should be getting a copy of the DECW$SERVER_0_ERROR.LOG file - these guys are working fast and furious and purge things before I even get a chance to look at it... Rick Beldin | |||||
| 2814.6 | PEACHS::BELDIN | Fri Jun 01 1990 09:16 | 73 | ||
Following is the server error log. -<DECW$SERVER_0_ERROR.LOG>- 30-MAY-1990 14:58:30.4 Hello, this is the X server Dixmain address=127b0 Now attach all known txport images %DECW-I-ATTACHED, transport DECNET attached to its network in SetFontPath out SetFontPath GPX color/monochrome support loaded gpx$InitOutput address=1397f4 Connection Prefix: len == 42 30-MAY-1990 14:59:41.8 Now I call scheduler/dispatcher 30-MAY-1990 15:03:25.3 Using extra todo packet pool... 30-MAY-1990 15:40:34.7 Connection 98d38 is closed by Txport 30-MAY-1990 15:48:36.8 Connection 1b3600 is closed by Txport 30-MAY-1990 16:14:41.7 Using extra todo packet pool... 30-MAY-1990 17:00:08.9 Connection 1b3600 is closed by Txport 30-MAY-1990 17:32:52.8 Connection 1b3600 is closed by Txport 30-MAY-1990 18:39:26.6 Connection 1b3600 is closed by Txport 31-MAY-1990 08:25:09.4 Connection 1b3600 is closed by Txport 31-MAY-1990 08:28:14.7 Connection 1b3600 is closed by Txport 31-MAY-1990 09:56:28.1 Connection 1b3600 is closed by Txport 31-MAY-1990 10:22:10.2 Connection 1b3600 is closed by Txport 31-MAY-1990 10:25:28.6 Connection 1b3600 is closed by Txport 31-MAY-1990 12:39:48.7 Connection 1b3600 is closed by Txport 31-MAY-1990 14:07:25.7 Connection 1b3600 is closed by Txport 31-MAY-1990 15:15:21.3 %LIB-?-INSVIRMEM, insufficient virtual memory Request opcode 53 is ignored due to internal runtime error 158217 for client 2(# error = 1) Exception Call stack dump follows: 8eec5 fb4d dff4 dcaa 13a785 140bcf 140c17 202cf d60a 10f5b 10a91 12a7f 41a 801bfad3 801bfa84 ********** marking the end of call stack dump ********** ******************************************************** 31-MAY-1990 15:15:22.7 %LIB-?-INSVIRMEM, insufficient virtual memory Client 2 has made too many runtime errors(2), its connection is marked for termi nation Exception Call stack dump follows: 8eec5 fb4d dff4 dcaa 13a785 140bcf 140c17 202cf d60a 10f5b 10a91 12a7f 41a 801bfad3 801bfa84 ********** marking the end of call stack dump ********** ******************************************************** 31-MAY-1990 15:15:23.0 ..ddx layer returns bad status(17) 31-MAY-1990 15:15:23.3 ..Dispatcher close down connection 2 31-MAY-1990 15:54:08.2 Connection 1b3600 is closed by Txport | |||||
| 2814.7 | PEACHS::BELDIN | Fri Jun 01 1990 09:45 | 23 | ||
On one of the other errors the customer got - the XIO error (which as I found out was also part of the pagefile quota error) was followed with: X error event received from server: BadImplementation - server reported implementation error. Failed request major opcode 53 ( X_CreatePixmap ) Failed request minor opcode 0 (if applicable) ResourceID 0x2011e6 in failed request (if applicable) Serial number of failed request ... XIO: fatal IO error 65535 on X server "MIP::0.0" after ... request ... Sounds like he was allocating a lot of pixmaps in the server and there wasn't enough memory. My suspicion is that he should try and free them or try and trap this error... Rick Beldin | |||||