[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::alphanotes

Title:Alpha Support Conference
Notice:This is a new Alphanotes, please read note 2.2
Moderator:VAXAXP::BERNARDO
Created:Thu Jan 02 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:128
Total number of notes:617

51.0. "V6.1, nonpaged pool fills up with 65536 byte blocks" by SMAUG::GARROD (IBM Interconnect Engineering) Tue Feb 25 1997 19:54

    We're running OpenVMS V6.1 on an AlphaServer 2000 node.
    
    Just lately the system has had made problems. Here are the symptoms.
    
    Soon after boot the system consumes all of Non Paged Pool, expanding up
    to NPAGEVIR. NPAGEVIR is 53 Megabytes at present. Once its done that
    SET HOSTing into the node won't work and quite understandably the
    who system often hangs after that.
    
    Doing a SHOW POOL/SUMMARY shows that most of the pool is
    consumed by "UNKNOWN" memory blocks and that most of these are 65536
    bytes long.
    
    I've searched COMET looking for answers but have come up empty. There
    are plenty of customer reports of similar problems but there never
    seems to be a concrete answer.
    
    I suspect something to do with DECnet OSI. Because if we don't run
    DECnet then the pool expansion doesn't seem to happen. The COMET
    searches also seem to implicate certain versions of Pathworks but I've
    turned off Pathworks and the problem still occurs.
    
    I've turned on the system parameter SYSTEM_CHECK to turn on pool
    checking/poisoning. But have not managed to isolate the problem.
    
    A SHOW POOL/RING never seems to catch one of the 65536 memory
    allocatations. So i suspect that somehow they're not actually
    being allocated but the free list is being corrupted.
    
    I spebt most of the afternoon working with the DECnet OSI project
    leader on this problem because we suspected DNS and/or DEcnet. I'm now
    up to the latest ECO of DECnet (V6.3 ECO6) but that didn't fix the
    problem. We also installed the newest DECdns because we believed there
    were some memory problems with that in earlier kits.
    
    Anybody got any ideas? It's made our node useless. It basically
    gets to the point where no pool is left but limps along because
    there is anough pool for most allocation requests. But I guess REMACP
    needs a bigger block than is left.
    
    What I really need to know is how I can find what's causing the 65536
    byte blocks to appear in pool. By the way looking at the them
    the first byte is always "01". The rest seems to be the pool
    poisoniung pattern "a". And they seem to have a bunch of zeros at
    the end.
    
    Any help would be most appreciated.
    
    Dave
T.RTitleUserPersonal
Name
DateLines
51.1EEMELI::MOSEROrienteers do it in the bush...Wed Feb 26 1997 01:3985
	Do you have PC's running NT 3.51 on your LAN. If the uptime is long
	enough and the service pack not installed they occasionally send
	lots of large MAILSOT/BROWSE multicast packets out the wire, and
	the LAN drivers and upper protocols can't get rid of them fast enough.

	We had a customer with lots of CLUEXIT's and pool expansion etc.

	/cmos


------------

          PSS ID Number: Q136935
          Article last modified on 08-27-1996
          PSS database name: WINNT

          3.50 3.51

          WINDOWS


          --------------------------------------------------------------------
          The information in this article applies to:

           - Microsoft Windows NT Server versions 3.5 and 3.51
          --------------------------------------------------------------------

          SYMPTOMS
          ========

          Windows NT Server 3.51 starts a broadcast storm on the network
          with browser
          frames after the Windows NT Server Service has reached the
          System Up Time
          of 1193 hours, which means it has been running continuously
          for 1193 hours
          or multiples thereof. These broadcast frames are sent out on
          all installed
          protocols. The broadcast storm typically last less than 5 or 6
          minutes and
          then stops by itself. The broadcast browser frame types that
          appear are the
          "Host Announcements" or the "Local Master Announcements"
          frames, which are
          typically sent out every 12 minutes.

          STATUS
          ======

          Microsoft has confirmed this to be a problem in Windows NT
          versions 3.5 and
          3.51.

          A supported fix is now available for Windows NT version 3.5,
          but has not
          been fully regression-tested and should be applied only to
          systems
          experiencing this specific problem. Unless you are severely
          impacted by
          this specific problem, Microsoft recommends that you wait for
          the next
          Service Pack that contains this fix. Contact Microsoft Product
          Support
          Services for more information.

          This problem has been corrected in the latest U.S. Service
          Pack for Windows
          NT version 3.51 and Windows NT 4.0.  For information on
          obtaining the
          Service Pack, query on the following word in the Microsoft
          Knowledge Base
          (without the spaces):

             S E R V P A C K

          KBCategory: kbnetwork kbbug3.50 kbbug3.51 kbfix3.51.sp2
          KBSubcategory: ntnetserv
          Additional reference words: prodnt 3.50 3.51
          ============================================================
          ========= ========
          Copyright Microsoft Corporation 1996.


    
51.2Problem caused by MAXBUF being set to 65535EDSCLU::GARRODIBM Interconnect EngineeringWed Feb 26 1997 09:2423
    To answer my own question. I hope this is useful to others.
    
    We've finally identified the problem. The problem was caused by a user
    changing the SYSGEN parameters MAXBUF from the default of 8192
    to 65535. a write CURRENT was done and at the next reboot the problem
    occurs.
    
    What was happening was that for every DECnet logical link either DECnet
    or something it uses (maybe DECdns) was allocating a number of chunks
    of non-paged pool of size 65536 and never freeing them. Thus each
    DECnet logical link was causing about 1/2 Megabyte of non-paged pool
    to depleted. It doesn't take long until you kill the system like this.
    
    I verified that an inbound MAIL and an inbound SET HOST link caused
    this pool usage. I'm not sure whether other sorts of DECnet links were
    also doing the same. As I said in .0 we are running DECnet/OSI on
    the OpenVMS Alpha system.
    
    Now we've fixed MAXBUF everything is working fine again.
    
    Dave  
    
    
51.3Questionable Behaviour -- Sounds Like A Bug...XDELTA::HOFFMANSteve, OpenVMS EngineeringWed Feb 26 1997 09:452
   Please log a QAR against DECnet, this sounds like it might be a bug...
51.4CLOUD::SHIRRONStephen F. Shirron, 223-3198Wed Feb 26 1997 10:5612
I wonder if this is because 65536 doesn't fit in 16 bits...

Normally after allocating non-paged pool, the size is stuffed into a word in the
header of the packet.  If you ask for 65535 bytes, you'll get 65536 bytes (due
to the rounding to keep requests as multiples of the pool granularity).  If you
try to save this size (65536) in a word, you get 0.  You can't really handle
such a packet using the normal mechanisms; you have to remember the size
somewhere else, and you have to use a special routine to deallocate this pool (a
routine which takes the size as an argument).  If that's not being done, then
pool may be lost...

stephen