[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::alphanotes

Title:	Alpha Support Conference
Notice:	This is a new Alphanotes, please read note 2.2
Moderator:	VAXAXP::BERNARDO

Created:	Thu Jan 02 1997
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	128
Total number of notes:	617

51.0. "V6.1, nonpaged pool fills up with 65536 byte blocks" by SMAUG::GARROD (IBM Interconnect Engineering) Tue Feb 25 1997 19:54

    We're running OpenVMS V6.1 on an AlphaServer 2000 node.
    
    Just lately the system has had made problems. Here are the symptoms.
    
    Soon after boot the system consumes all of Non Paged Pool, expanding up
    to NPAGEVIR. NPAGEVIR is 53 Megabytes at present. Once its done that
    SET HOSTing into the node won't work and quite understandably the
    who system often hangs after that.
    
    Doing a SHOW POOL/SUMMARY shows that most of the pool is
    consumed by "UNKNOWN" memory blocks and that most of these are 65536
    bytes long.
    
    I've searched COMET looking for answers but have come up empty. There
    are plenty of customer reports of similar problems but there never
    seems to be a concrete answer.
    
    I suspect something to do with DECnet OSI. Because if we don't run
    DECnet then the pool expansion doesn't seem to happen. The COMET
    searches also seem to implicate certain versions of Pathworks but I've
    turned off Pathworks and the problem still occurs.
    
    I've turned on the system parameter SYSTEM_CHECK to turn on pool
    checking/poisoning. But have not managed to isolate the problem.
    
    A SHOW POOL/RING never seems to catch one of the 65536 memory
    allocatations. So i suspect that somehow they're not actually
    being allocated but the free list is being corrupted.
    
    I spebt most of the afternoon working with the DECnet OSI project
    leader on this problem because we suspected DNS and/or DEcnet. I'm now
    up to the latest ECO of DECnet (V6.3 ECO6) but that didn't fix the
    problem. We also installed the newest DECdns because we believed there
    were some memory problems with that in earlier kits.
    
    Anybody got any ideas? It's made our node useless. It basically
    gets to the point where no pool is left but limps along because
    there is anough pool for most allocation requests. But I guess REMACP
    needs a bigger block than is left.
    
    What I really need to know is how I can find what's causing the 65536
    byte blocks to appear in pool. By the way looking at the them
    the first byte is always "01". The rest seems to be the pool
    poisoniung pattern "a". And they seem to have a bunch of zeros at
    the end.
    
    Any help would be most appreciated.
    
    Dave

T.R	Title	User	Personal Name	Date	Lines
51.1		EEMELI::MOSER	Orienteers do it in the bush...	`Wed Feb 26 1997 01:39`	85
	Do you have PC's running NT 3.51 on your LAN. If the uptime is long enough and the service pack not installed they occasionally send lots of large MAILSOT/BROWSE multicast packets out the wire, and the LAN drivers and upper protocols can't get rid of them fast enough. We had a customer with lots of CLUEXIT's and pool expansion etc. /cmos ------------ PSS ID Number: Q136935 Article last modified on 08-27-1996 PSS database name: WINNT 3.50 3.51 WINDOWS -------------------------------------------------------------------- The information in this article applies to: - Microsoft Windows NT Server versions 3.5 and 3.51 -------------------------------------------------------------------- SYMPTOMS ======== Windows NT Server 3.51 starts a broadcast storm on the network with browser frames after the Windows NT Server Service has reached the System Up Time of 1193 hours, which means it has been running continuously for 1193 hours or multiples thereof. These broadcast frames are sent out on all installed protocols. The broadcast storm typically last less than 5 or 6 minutes and then stops by itself. The broadcast browser frame types that appear are the "Host Announcements" or the "Local Master Announcements" frames, which are typically sent out every 12 minutes. STATUS ====== Microsoft has confirmed this to be a problem in Windows NT versions 3.5 and 3.51. A supported fix is now available for Windows NT version 3.5, but has not been fully regression-tested and should be applied only to systems experiencing this specific problem. Unless you are severely impacted by this specific problem, Microsoft recommends that you wait for the next Service Pack that contains this fix. Contact Microsoft Product Support Services for more information. This problem has been corrected in the latest U.S. Service Pack for Windows NT version 3.51 and Windows NT 4.0. For information on obtaining the Service Pack, query on the following word in the Microsoft Knowledge Base (without the spaces): S E R V P A C K KBCategory: kbnetwork kbbug3.50 kbbug3.51 kbfix3.51.sp2 KBSubcategory: ntnetserv Additional reference words: prodnt 3.50 3.51 ============================================================ ========= ======== Copyright Microsoft Corporation 1996.
51.2	Problem caused by MAXBUF being set to 65535	EDSCLU::GARROD	IBM Interconnect Engineering	`Wed Feb 26 1997 09:24`	23
	To answer my own question. I hope this is useful to others. We've finally identified the problem. The problem was caused by a user changing the SYSGEN parameters MAXBUF from the default of 8192 to 65535. a write CURRENT was done and at the next reboot the problem occurs. What was happening was that for every DECnet logical link either DECnet or something it uses (maybe DECdns) was allocating a number of chunks of non-paged pool of size 65536 and never freeing them. Thus each DECnet logical link was causing about 1/2 Megabyte of non-paged pool to depleted. It doesn't take long until you kill the system like this. I verified that an inbound MAIL and an inbound SET HOST link caused this pool usage. I'm not sure whether other sorts of DECnet links were also doing the same. As I said in .0 we are running DECnet/OSI on the OpenVMS Alpha system. Now we've fixed MAXBUF everything is working fine again. Dave
51.3	Questionable Behaviour -- Sounds Like A Bug...	XDELTA::HOFFMAN	Steve, OpenVMS Engineering	`Wed Feb 26 1997 09:45`	2
	Please log a QAR against DECnet, this sounds like it might be a bug...
51.4		CLOUD::SHIRRON	Stephen F. Shirron, 223-3198	`Wed Feb 26 1997 10:56`	12
	I wonder if this is because 65536 doesn't fit in 16 bits... Normally after allocating non-paged pool, the size is stuffed into a word in the header of the packet. If you ask for 65535 bytes, you'll get 65536 bytes (due to the rounding to keep requests as multiples of the pool granularity). If you try to save this size (65536) in a word, you get 0. You can't really handle such a packet using the normal mechanisms; you have to remember the size somewhere else, and you have to use a special routine to deallocate this pool (a routine which takes the size as an argument). If that's not being done, then pool may be lost... stephen