[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference help::decnet-osi_for_vms

Title:	DECnet/OSI for OpenVMS

Moderator:	TUXEDO::FONSECA

Created:	Thu Feb 21 1991
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	3990
Total number of notes:	19027

3890.0. "DECnet chews up pool if MAXBUF set to 65535" by EDSCLU::GARROD (IBM Interconnect Engineering) Wed Feb 26 1997 09:33

    
    I've discovered a bug in DECnet/OSI or something it calls if MAXBUF
    is set to 65535 (and maybe other values).
    
    Dave
    
             <<< VAXAXP::NOTES$:[NOTES$LIBRARY]ALPHANOTES.NOTE;1 >>>
           -< Alpha Support Conference - Digital Internal Use Only >-
================================================================================
Note 51.0      V6.1, nonpaged pool fills up with 65536 byte blocks     2 replies
SMAUG::GARROD "IBM Interconnect Engineering"         49 lines  25-FEB-1997 19:54
--------------------------------------------------------------------------------
    We're running OpenVMS V6.1 on an AlphaServer 2000 node.
    
    Just lately the system has had made problems. Here are the symptoms.
    
    Soon after boot the system consumes all of Non Paged Pool, expanding up
    to NPAGEVIR. NPAGEVIR is 53 Megabytes at present. Once its done that
    SET HOSTing into the node won't work and quite understandably the
    who system often hangs after that.
    
    Doing a SHOW POOL/SUMMARY shows that most of the pool is
    consumed by "UNKNOWN" memory blocks and that most of these are 65536
    bytes long.
    
    I've searched COMET looking for answers but have come up empty. There
    are plenty of customer reports of similar problems but there never
    seems to be a concrete answer.
    
    I suspect something to do with DECnet OSI. Because if we don't run
    DECnet then the pool expansion doesn't seem to happen. The COMET
    searches also seem to implicate certain versions of Pathworks but I've
    turned off Pathworks and the problem still occurs.
    
    I've turned on the system parameter SYSTEM_CHECK to turn on pool
    checking/poisoning. But have not managed to isolate the problem.
    
    A SHOW POOL/RING never seems to catch one of the 65536 memory
    allocatations. So i suspect that somehow they're not actually
    being allocated but the free list is being corrupted.
    
    I spebt most of the afternoon working with the DECnet OSI project
    leader on this problem because we suspected DNS and/or DEcnet. I'm now
    up to the latest ECO of DECnet (V6.3 ECO6) but that didn't fix the
    problem. We also installed the newest DECdns because we believed there
    were some memory problems with that in earlier kits.
    
    Anybody got any ideas? It's made our node useless. It basically
    gets to the point where no pool is left but limps along because
    there is anough pool for most allocation requests. But I guess REMACP
    needs a bigger block than is left.
    
    What I really need to know is how I can find what's causing the 65536
    byte blocks to appear in pool. By the way looking at the them
    the first byte is always "01". The rest seems to be the pool
    poisoniung pattern "a". And they seem to have a bunch of zeros at
    the end.
    
    Any help would be most appreciated.
    
    Dave
    
================================================================================
Note 51.2      V6.1, nonpaged pool fills up with 65536 byte blocks        2 of 2
EDSCLU::GARROD "IBM Interconnect Engineering"        23 lines  26-FEB-1997 09:24
                -< Problem caused by MAXBUF being set to 65535 >-
--------------------------------------------------------------------------------
    To answer my own question. I hope this is useful to others.
    
    We've finally identified the problem. The problem was caused by a user
    changing the SYSGEN parameters MAXBUF from the default of 8192
    to 65535. a write CURRENT was done and at the next reboot the problem
    occurs.
    
    What was happening was that for every DECnet logical link either DECnet
    or something it uses (maybe DECdns) was allocating a number of chunks
    of non-paged pool of size 65536 and never freeing them. Thus each
    DECnet logical link was causing about 1/2 Megabyte of non-paged pool
    to depleted. It doesn't take long until you kill the system like this.
    
    I verified that an inbound MAIL and an inbound SET HOST link caused
    this pool usage. I'm not sure whether other sorts of DECnet links were
    also doing the same. As I said in .0 we are running DECnet/OSI on
    the OpenVMS Alpha system.
    
    Now we've fixed MAXBUF everything is working fine again.
    
    Dave

T.R Title User Personal
Name Date Lines

3890.1 Have you determined if this happens with VMS V6.2 and DNV V6.3 ? STEVMS::PETTENGILL mulp Thu Feb 27 1997 00:25 10

T.R	Title	User	Personal Name	Date	Lines
3890.1	Have you determined if this happens with VMS V6.2 and DNV V6.3 ?	STEVMS::PETTENGILL	mulp	`Thu Feb 27 1997 00:25`	10
	V6 included some new pool allocation logic in VMS and it took a while to tune it so that it didn't consume all of pool. I think that there were some consumers of pool that made some bad assumptions about how pool was allocated or had latent bugs that also caused problems. Since I have a V6.2/V6.3 system handy, I'm setting it to the maximum value. I note that for V6.2, the maximum value for MAXBUF is 64000. Perhaps that indicates that by the time the pool allocation header and rounding it added on to 65535, the resulting value can not be handled by the code that uses MAXBUF.

V6 included some new pool allocation logic in VMS and it took a while to
tune it so that it didn't consume all of pool.  I think that there were some
consumers of pool that made some bad assumptions about how pool was allocated
or had latent bugs that also caused problems.

Since I have a V6.2/V6.3 system handy, I'm setting it to the maximum value.
I note that for V6.2, the maximum value for MAXBUF is 64000.  Perhaps that
indicates that by the time the pool allocation header and rounding it added
on to 65535, the resulting value can not be handled by the code that uses
MAXBUF.