| Title: | DECWINDOWS 26-JAN-89 to 29-NOV-90 | 
| Notice: | See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit | 
| Moderator: | STAR::VATNE | 
| Created: | Mon Oct 30 1989 | 
| Last Modified: | Mon Dec 31 1990 | 
| Last Successful Update: | Fri Jun 06 1997 | 
| Number of topics: | 3726 | 
| Total number of notes: | 19516 | 
	A customer running VMS 5.3 and ADA multitasking is getting a 
	number of very strange errors in their (very large) application.  
	Some of them cause the server the croak, others just kill the
	application - not sure if the process dies too - at least all of
	the widgets disappear from the screen.  Here are some brief
	descriptions of the problems.  Any pointers to some/any of the
	solutions would be helpful... (Note - they may have some code that
	�may� be trying to use the toolkit in a re-entrant manner - they
	are checking on that).
	Rick Beldin
	Atlanta CSC
	
	Here they come....
	1. This one occurs shortly after ApplicationCreateShell, effect	
	   is to 'hose' (what does that mean?) entire application. This
	   happens 1 time in 10.
		X Toolkit Warning: Cannot convert string
			 "-*-MENU-MEDIUM-R-Normal--*-120-*-*-P-ISO8859-1"
			 to type FontList, using fixed font
	2. These two sets of error messages have the effect of 'killing
	   the application' - again not sure if process is deleted or
	   just that windows disappear...  They don't appear together,
	   but are symptomatic of separate crashes.
		XIO: fatal IO error 65535 on XServer "MIP::0.0"
		  after 5053 request 61 events remaining
		XLib: sequence lost ( 0x10020 > 0xb3e ) in reply 0x7!
	3. This third problem appears to be around server memory deallocation.
	   All windows freeze, and it appears that the server has run out
	   of page file quota.  The following error appears when running
	   their application:
	
X error event received from server: BadImplementation - server reported
  implementation error
   Failed request major opcode 53 (X_CreatePixmap)
   Failed request minor opcode 0 (if applicable)
   ResourceID 0x2011e6 in failed request (if applicable)
   Serial number of failed request 44729
   Current serial number in output stream 44730
XIO: fatal IO error 65535 on X server "MIP::0.0"
   after 44732 requests (44730 known processed) with 0 events remaining
XIO: fatal IO error 65535 on X server "MIP::0.0"
   after 44734 requests (44730 known processed) with 0 events remaining
	Here are some stats gathered at the time of problem number 3.
VAX/VMS V5.3  on node DMIP  23-MAY-1990 16:07:12.45   Uptime  142 02:17:25
  Pid    Process Name    State  Pri      I/O       CPU       Page flts Ph.Mem
00000021 SWAPPER         HIB     16        0   0 00:03:22.71         0      0
000093E3 _VTA26:         HIB      7     2039   0 00:03:01.61     20023   1773 
00001B44 _VTA23:         HIB      4    19157   0 00:52:24.65    247262  12000 
00000026 ERRFMT          HIB      8     6333   0 00:00:36.04        82    137 
00000027 OPCOM           HIB      7     1138   0 00:00:13.06       506    202 
00000028 AUDIT_SERVER    HIB     10       30   0 00:00:01.14      1340    243 
00000029 JOB_CONTROL     HIB      8    53212   0 00:01:53.49       132    310 
0000002A CONFIGURE       HIB      8       11   0 00:00:00.13        98    165 
0000002B NETACP          HIB     10    89697   0 00:19:39.23       291    484 
0000002C EVL             HIB      6     1506   0 00:00:12.56    189621     68 
0000002D REMACP          HIB      9      114   0 00:00:00.30        77     73 
0000884E GRIFFIN         HIB      7     1076   0 00:00:05.29      1425    658 
00009A4F _RTA1:          CUR      4      116   0 00:00:01.43       668    429 
00005B51 Len Day         LEF      4      421   0 00:00:05.09      1529    356 
000047F6 DECW$SERVER_0   HIB      8    21296   0 00:18:19.35     43273   3872 
00000057 Window Manager  LEF      4       74   0 00:06:54.14    730331    500 
$ sh proc /all /id=47f6
23-MAY-1990 16:07:25.06   User: RWP_OPS          Process ID:   000047F6
                          Node: DMIP             Process name: "DECW$SERVER_0"
Terminal:
User Identifier:    [RWP,RWP_OPS]
Base priority:      6
Default file spec:  Not available
Devices allocated:  GAA0:
                    NET3772:
Process Quotas:
 Account name:
 CPU limit:                      Infinite  Direct I/O limit:       100
 Buffered I/O byte count quota:     47664  Buffered I/O limit:      60
 Timer queue entry quota:               7  Open file quota:         81
 Paging file quota:                     0  Subprocess quota:         8
 Default page fault cluster:           16  AST quota:               97
 Enqueue quota:                        28  Shared file limit:        0
 Max detached processes:                0  Max active jobs:          0
Accounting information:
 Buffered I/O count:      1775  Peak working set size:       4000
 Direct I/O count:       19521  Peak virtual size:          27195
 Page faults:            43273  Mounted volumes:                0
 Images activated:           0
 Elapsed CPU time:      0 00:18:19.35
 Connect time:          6 00:19:49.88
Process privileges:
 CMKRNL               may change mode to kernel
 SYSNAM               may insert in system logical name table
 PRMMBX               may create permanent mailbox
 WORLD                may affect other processes in the world
 NETMBX               may create network device
 PRMGBL               may create permanent global sections
 SYSGBL               may create system wide global sections
 PFNMAP               may map to specific physical pages
 SYSPRV               may access objects via system protection
There is 1 process in this job:
  DECW$SERVER_0 (*)
	...from sda
SDA> set proc decw$server_0
SDA> sh proc
Process index: 0016   Name: DECW$SERVER_0   Extended PID: 000047F6
------------------------------------------------------------------
Process status:  00140011   RES,PSWAPM,PHDRES,LOGIN
PCB address              802ED330    JIB address              805387D0
PHD address              80D76000    Swapfile disk address    00000000
Master internal PID      023F0016    Subprocess count                0
Internal PID             023F0016    Creator internal PID     00000000
Extended PID             000047F6    Creator extended PID     00000000
State                       HIB      Termination mailbox          0000
Current priority               11    AST's enabled                KESU
Base priority                   6    AST's active                 NONE
UIC                [00400,000171]    AST's remaining                97
Mutex count                     0    Buffered I/O count/limit       58/60
Waiting EF cluster              0    Direct I/O count/limit        100/100
Starting wait time       19001919    BUFIO byte count/limit      47664/47664
Event flag wait mask     0000000C    # open files allowed left      81
Local EF cluster 0       60000001    Timer entries allowed left      7
Local EF cluster 1       80000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        3772
Global cluster 3 pointer 00000000    Global WS page count          100
Process index: 0016   Name: DECW$SERVER_0   Extended PID: 000047F6
------------------------------------------------------------------
Saved process registers
-----------------------
R0   = 0000000F    R1   = 80193BA0    R2   = 80004BB8    R3   = 00009A24
R4   = 802ED330    R5   = 00008CC0    R6   = 0000600C    R7   = 0004A5B0
R8   = 00008F14    R9   = 0002FA2C    R10  = 0002C85C    R11  = 0004A35C
AP   = 7FF8C640    FP   = 7FF8C61C    PC   = 7FFEDF8A    PSL  = 03C00000
KSP  = 7FFE7800    ESP  = 7FFE9800    SSP  = 7FFED800    USP  = 7FF8C61C
P0BR = 80D89400    P0LR = 00006662    P1BR = 80615E00    P1LR = 001FFC50
| T.R | Title | User | Personal Name | Date | Lines | 
|---|---|---|---|---|---|
| 2814.1 | ... | GSRC::WEST | Help stamp out and abolish redundancy ! | Thu May 24 1990 19:10 | 9 | 
| Are you doing Toolkit calls and X calls from more than one task? If so this is a definite no no. The toolkit is most definitly NOT re-entrant and there is question as to whether Xlib is either. One recommendation is to not use tasking, but if you must, then only one task should do the X and toolkit calls. -=> Jim <=- | |||||
| 2814.2 | Xlib is re-entrant (except for one bug) | STAR::VATNE | Peter Vatne, VMS Development | Thu May 24 1990 20:33 | 11 | 
| Point of clarification: the Xlib is definitely designed to be re-entrant. There was a bug with Xlib-DECwindows transport interaction that occasionally messed things up. The bug is fixed in VMS V5.4. However, the symptoms of that problem don't match your symptoms. What is strange is that your application can't find the menu fonts. Are they definitely displaying to a VMS workstation? Not finding the menu fonts is symptomatic of displaying to other vendors' workstations. Any server crashes should be immediately QARed. What would be of most interest here is the contents of SYS$MANAGER:DECW$SERVER_O_ERROR.LOG. | |||||
| 2814.3 | Fully re-rentrant? | LEOVAX::TREGGIARI | Fri May 25 1990 09:13 | 5 | |
| I know VMS Xlib is AST re-entrant, but is it *fully* re-entrant? If not, I'm not sure that AST re-entrancy will suffice for Ada multi-tasking. Anyone know for sure? Leo | |||||
| 2814.4 | QUARK::LIONEL | Free advice is worth every cent | Sat May 26 1990 16:38 | 6 | |
|     Re: .3
    
    AST reentrancy is not sufficient for Ada multithread execution.  I don't 
    know how reentrant Xlib is.
    
    				Steve
 | |||||
| 2814.5 | More info... Application architecture... | PEACHS::BELDIN | Thu May 31 1990 16:05 | 33 | |
| Some more info on the customer's architecture. The customer's code has a total of 33 different ADA tasks running, without timeslice. There are a total of 6 tasks that execute Xlib and X-toolkit calls, all at the same priority (8). One of these is an event dispatcher that basically call XtPending and XtProcessEvent. As I understand it, under this strategy, none of these should be interrupted by another task at the same priority, they should all run to completion. There is a database processing task that runs at a lower priority, 7, and a communications something-or-other that runs at priority 12. Neither of these has any calls to Xlib or toolkit. Does this kind of setup appear to violate any 'multi-tasking rules'? I know it violates what appears to be a cardinal rule of having all the X-toolkit stuff in one task... There is also a problem that they call the 'sudden death syndrome.' It appears that somewhere they come across some error that their ADA handler can't handle ( fatal NON-ADA error?) and then their process appears to go into hibernation. Looking at the process, it appears that user-mode ASTs are somehow disabled - which I think is kind of critical to the way that ADA tasking works. He suspects that the ADA tasking mechanism is somehow loosing its mind - I don't know enough about ADA to know... Maybe someone could shed some light on this as a possibility? I should be getting a copy of the DECW$SERVER_0_ERROR.LOG file - these guys are working fast and furious and purge things before I even get a chance to look at it... Rick Beldin | |||||
| 2814.6 | PEACHS::BELDIN | Fri Jun 01 1990 09:16 | 73 | ||
| Following is the server error log. -<DECW$SERVER_0_ERROR.LOG>- 30-MAY-1990 14:58:30.4 Hello, this is the X server Dixmain address=127b0 Now attach all known txport images %DECW-I-ATTACHED, transport DECNET attached to its network in SetFontPath out SetFontPath GPX color/monochrome support loaded gpx$InitOutput address=1397f4 Connection Prefix: len == 42 30-MAY-1990 14:59:41.8 Now I call scheduler/dispatcher 30-MAY-1990 15:03:25.3 Using extra todo packet pool... 30-MAY-1990 15:40:34.7 Connection 98d38 is closed by Txport 30-MAY-1990 15:48:36.8 Connection 1b3600 is closed by Txport 30-MAY-1990 16:14:41.7 Using extra todo packet pool... 30-MAY-1990 17:00:08.9 Connection 1b3600 is closed by Txport 30-MAY-1990 17:32:52.8 Connection 1b3600 is closed by Txport 30-MAY-1990 18:39:26.6 Connection 1b3600 is closed by Txport 31-MAY-1990 08:25:09.4 Connection 1b3600 is closed by Txport 31-MAY-1990 08:28:14.7 Connection 1b3600 is closed by Txport 31-MAY-1990 09:56:28.1 Connection 1b3600 is closed by Txport 31-MAY-1990 10:22:10.2 Connection 1b3600 is closed by Txport 31-MAY-1990 10:25:28.6 Connection 1b3600 is closed by Txport 31-MAY-1990 12:39:48.7 Connection 1b3600 is closed by Txport 31-MAY-1990 14:07:25.7 Connection 1b3600 is closed by Txport 31-MAY-1990 15:15:21.3 %LIB-?-INSVIRMEM, insufficient virtual memory Request opcode 53 is ignored due to internal runtime error 158217 for client 2(# error = 1) Exception Call stack dump follows: 8eec5 fb4d dff4 dcaa 13a785 140bcf 140c17 202cf d60a 10f5b 10a91 12a7f 41a 801bfad3 801bfa84 ********** marking the end of call stack dump ********** ******************************************************** 31-MAY-1990 15:15:22.7 %LIB-?-INSVIRMEM, insufficient virtual memory Client 2 has made too many runtime errors(2), its connection is marked for termi nation Exception Call stack dump follows: 8eec5 fb4d dff4 dcaa 13a785 140bcf 140c17 202cf d60a 10f5b 10a91 12a7f 41a 801bfad3 801bfa84 ********** marking the end of call stack dump ********** ******************************************************** 31-MAY-1990 15:15:23.0 ..ddx layer returns bad status(17) 31-MAY-1990 15:15:23.3 ..Dispatcher close down connection 2 31-MAY-1990 15:54:08.2 Connection 1b3600 is closed by Txport | |||||
| 2814.7 | PEACHS::BELDIN | Fri Jun 01 1990 09:45 | 23 | ||
| On one of the other errors the customer got - the XIO error (which as I found out was also part of the pagefile quota error) was followed with: X error event received from server: BadImplementation - server reported implementation error. Failed request major opcode 53 ( X_CreatePixmap ) Failed request minor opcode 0 (if applicable) ResourceID 0x2011e6 in failed request (if applicable) Serial number of failed request ... XIO: fatal IO error 65535 on X server "MIP::0.0" after ... request ... Sounds like he was allocating a lot of pixmaps in the server and there wasn't enough memory. My suspicion is that he should try and free them or try and trap this error... Rick Beldin | |||||