[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::pwv50ift

Title:Kit: Note 4229; Please use NOTED::PWDOSWIN5 for V4.x server
Notice:Kit: Note 4229; Please use NOTED::PWDOSWIN5 for V4.x server
Moderator:CPEEDY::KENNEDY
Created:Fri Dec 18 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4319
Total number of notes:18478

4168.0. "V5.0E-ECO1 ?" by STKAI1::BLUNDBERG (It's the monster!!!) Mon Feb 24 1997 04:56

    Any news on V5.0E-ECO1, I heard that it was planned to be availiable
    at the end of march but have a customer who would appriciate to see
    it released in the beginning of march?
    
    This is in order to solve a pool expansion problem, I've posted a
    STARS-article about this in .1
    
    Best regards
    
    Bjorn Lundberg
    CSC Sweden
T.RTitleUserPersonal
Name
DateLines
4168.1Pool expansion problems.STKAI1::BLUNDBERGIt's the monster!!!Mon Feb 24 1997 04:57234
[OpenVMS,PW-VMS] PATHWORKS V5.0 Periodically Consumes Nonpaged Pool


COPYRIGHT (c) 1988, 1993 by Digital Equipment Corporation.
ALL RIGHTS RESERVED. No distribution except as provided under contract.

Copyright (c) Digital Equipment Corporation 1996.  All rights reserved.

PRODUCT:    OpenVMS VAX, Versions 5.5-2 and above
            OpenVMS Alpha, Versions 1.5 and above
            PATHWORKS for VMS, Versions 5.0 through 5.0D

SOURCE:     Digital Equipment Corporation


SYMPTOM:

OpenVMS VAX and Alpha systems, running PATHWORKS, version 5.0 through
5.0D, can periodically consume nonpaged pool (NPAGEDYN) in large 
quantities.  This pool consumption occurs at periodic intervals of
approximately 24 days, 20 hours, and may cause a variety of symptoms:

  - Significant NPAGEDYN expansion, i.e., 20% or more, consuming
    memory resources, inducing extra swapping and paging, and
    degrading system performance.

  - The following system crashes may occur:

      o CPUSPINWAIT bugchecks with the POOL spinlock owned while 
        reclaiming the lookaside list.

      o CLUEXIT bugchecks, often with "maintenance timer expiration"
        errors in ERRLOG.SYS.

      o SSRVEXCEPT bugchecks, due to RFDRIVER exhausting its NPAGEDYN
        resources.

      o INVEXCEPTN bugchecks may also be possible, for products unable 
        to handle the shortage of NPAGEDYN.

  - System hangs with subsequent analysis revealing:

      o Many processes in FPG, RWMPB, PFW, or COLPG state, with
        pagefile resources consumed and NPAGEDYN expanded.

      o Errorlog entries indicating the inability for certain devices
        to initialize, due to insufficient NPAGEDYN.

      o SYS-W-POOLEXPF errors on the console.

  - On OpenVMS VAX systems you may encounter an oversized lookaside 
    list for 64 byte packets due to previous expansion caused by 
    PATHWORKS, and tracked in SYS$SYSTEM:LISTPREPOP.DAT.

      Note:
        Subsequent reboots use LISTPREPOP.DAT to pre-populate the 
        lookaside lists.  This effectively wastes pool space, which
        can cause NPAGEDYN expansion, and performance degradation,
        long after the problem has occurred.  If this occurs, it is
        recommended that LISTPREPOP.DAT;* be deleted just prior to
        a reboot.

  - On OpenVMS VAX systems, AUTOGEN runs can oversize NPAGEDYN and
    NPAGEVIR based on the expansion caused by PATHWORKS.  This wastes
    pool space and possibly cause performance degradation long after 
    the problem has occurred.

      Note:
        After this occurs on an OpenVMS VAX system, it's recommended 
        that you delete LISTPREPOP.DAT;* just prior to a reboot, 
        allow the system to run for 2-3 hours, and then re-run
        AUTOGEN.


SOLUTION:

Engineering has acknowledged this  problem and plans to address it in
a future release.
\ 
\
\ CSC NOTE - 20-Dec-1996
\
\  All IPMT cases against this problem are being closed with the status 
\  'fixed-next-release'.
\
\  +-----------------+---------------+--------------------------------+
\  | PATHWORKS       |  TIMEFRAME    |   Aprox IPMTs Addressed        | 
\  +-----------------+---------------+--------------------------------+
\  | ECO3 for V5.0D  | Late December | fixes most cases - but not all |
\  +-----------------+---------------+--------------------------------+
\  |    V5.0E        | Early-mid Jan | fixes most cases - but not all |
\  +-----------------+---------------+--------------------------------+
\  | ECO1 for V5.0E  | 2 weeks later | fixes all cases                |
\  +-----------------+---------------+--------------------------------+
\
\ The fix is to change the counter from a 1ms longword counter to a 10ms 
\ longword counter, and to start that counter when the PATHWORKS Server 
\ is started.  Thus it is very unlikely to recur, unless the system and 
\ the PATHWORKS server are up continuously for 497.1 days.
\
\ Please comment on this article when any of the above versions are
\ released.


WORKAROUND:

Ensure PATHWORKS isn't running during the projected problem time, i.e.,
shut down PATHWORKS before the projected problem time and restart it
after the time has passed.

   Projected Problem Times
   -----------------------
   31-OCT-1996 11:52:21.01
   25-NOV-1996 07:35:08.31
   20-DEC-1996 03:17:55.61
   13-JAN-1997 23:00:42.91
    7-FEB-1997 18:43:30.21
    4-MAR-1997 14:26:17.51
   29-MAR-1997 10:09:04.81
   23-APR-1997 05:51:52.11
   18-MAY-1997 01:34:39.41
   11-JUN-1997 21:17:26.71
    6-JUL-1997 17:00:14.01
   31-JUL-1997 12:43:01.31
   25-AUG-1997 08:25:48.61
   19-SEP-1997 04:08:35.91
   13-OCT-1997 23:51:23.21
    7-NOV-1997 19:34:10.51
    2-DEC-1997 15:16:57.81
         .          .
         .          .
\
\
\ Similar problems with NPAGEDYN expansion have been seen:
\
\ - When PATHWORKS is using an older version of DECnet/OSI, i.e., 
\   V5.6 with ECO 10 or earlier.  Upgrading to the latest version 
\   of PATHWORKS and DECnet/OSI should address this potential problem.
\
\     [PW-VMS] PWRKV50D_E03050 PATHWORKS V5.0D ECO3 (LAN Manager)
\
\ - With PATHWORKS and SYSCO MULTINET 4.0A.  Multinet reportedly has a 
\   patched PWIP driver to address this problem.
\
\     [PW-VMS]V5 NPAGEDYN Current at NPAGEVIR on V5.5-2 System/Multinet 4.0A


ANALYSIS:

The problem is caused by the overflow of a longword counter, which is 
incrementing at a 1 millisecond interval.  Due to the way PATHWORKS
implements the counter, the overflow appears at a periodic rate of
"24 19:42:47.30".

The problem only occurs if you have active NETBEUI connections during
the time frames that a PATHWORKS counter overflows.  

An analysis of the system, or crash, typically shows significant
NPAGEDYN expansion, with the lookaside list for the smallest packets
(typically the first) encountering the majority of the growth, and 
being populated with PATHWORKS packets.

Following is an example of using SDA to determine the current, initial,
and maximum allowable size for NPAGEDYN:

  SDA> EVALUATE @MMG$GL_NPAGNEXT-@MMG$GL_NPAGEDYN   ! Current 
  SDA> EVALUATE @SGN$GL_NPAGEDYN                    ! Initial 
  SDA> EVALUATE @SGN$GL_NPAGEVIR                    ! Maximum

On an OpenVMS VAX systems you can get an idea of the size of the first 
lookaside list with the following command:  

  SDA> VALIDATE QUEUE/SELF @EXE$AR_NPOOL_DATA+40+(8*0) 
  Queue is complete, total of 404667 elements in the queue

You could then walk the list to see if a lot of the packets contain
PATHWORKS text, e.g., mblk, buff, and dblk text:

  SDA> EXAMINE @EXE$AR_NPOOL_DATA+40      ! Front of queue
  SDA> EXAMINE .+@.;10                    ! Repeat to traverse the
                                            queue.

Example of the data indicating packets containing PATHWORKS text:

  6B6C6264 00000040 6B725750 81CFBB40  @��[email protected]     81CFBBC0
  6B6C626D 00000040 6B725750 81CFBBC0  ���[email protected]     81CFBB80
  66667562 00000040 6B725750 81CFBB80  .��[email protected]     81CFBCC0 
  6B6C626D 00000040 00000040 00000080  ....@[email protected]     88941300
  66667562 00000040 FFFFFF80 00000080  [email protected]     889412C0
  6B6C6264 00000040 FFFFFF80 FFFFFFC0  �[email protected]     88941340

  NOTE:  
    If an OpenVMS VAX system has been rebooted, and the LISTPREPOP.DAT 
    file was not deleted just prior to the reboot, the lookaside list 
    has been filled with prepopulated zeroed entries.

On an OpenVMS ALPHA system you can see if one of the lookaside list 
is large using the following commands:

  SDA> CLUE MEMORY/LOOKASIDE
  Listhead Addr: 81BA1180   Size:   64   Status: Invalid, possible loop
     Possible loop detected, after tracing 10000 elements

You could then walk the list to see if a lot of the packets contain
PATHWORKS text, e.g., mblk, buff, and dblk text:

  SDA> EXAMINE @EXE$AR_NPOOL_DATA+40      ! Front of queue
  SDA> EXAMINE @.;10                      ! Repeat to traverse the
                                            queue.
\
\
\ References:
\
\ VAXAXP::VMSNOTES #1841
\
\
\ CONTRIBUTORS:
\
\      Technical:
\           Mark Morris (140062)
\           Jeff Chisholm (184480)
\
\      Editorial:
\           Reg Hunter (172692)
\
\\ BUGCHK
\\ PROD=OPENVMS-VAX CAT=OPSYS GRP=OPENVMS-VAX OS=OPENVMS-VAX SPD=25.01
\\ PROD=OPENVMS-AXP GRP=OPENVMS-AXP OS=OPENVMS-AXP SPD=41.87
\\ PROD=PW-VMS CAT=COMM GRP=MULTIVENDOR GRP=PATHWORKS OS=OPENVMS-VAX
\\ SPD=30.50 VEND=DEC OFFER=DSKTP-HELP
\\ 140062 184480
\\ SRC961210003429 
\\ EDIT_SRQ=C961210-3429 EDIT_SRQ=C961217-2757 EDIT_SRQ=C961220-526
\\ TYPE=TECH_TIPS TYPE=SYMPTOM_SOLUTION
4168.2CPEEDY::FLEURYMon Feb 24 1997 08:027
    Re: .0
    
    As stated previously in this conference, there is a patch available for
    this.  V50E-ECO1 is still slated for submission at the end of March. 
    Submit an IPMT case to get the patch.
    
    Dan
4168.3IPMT on it's way!STKAI1::BLUNDBERGIt's the monster!!!Tue Feb 25 1997 05:091
    
4168.450e eco1NETRIX::"[email protected]"Steve LangstaffWed Mar 05 1997 06:088
Is 5oe eco1 avilable to download anywhere as yet ???  
I have got several customers awaiting this patch and they
dont want to install 5oe then add the patch a week or so later !!

Cheers

Steve 
[Posted by WWW Notes gateway]
4168.5Not available yet.PATRLR::MCCUSKERWed Mar 05 1997 09:440
4168.6try 5.0EPHXSS1::HEISERMaranatha!Tue Mar 11 1997 15:397
    My clusters running 5.0E (topic 4091) never experienced this pool
    expansion problem on March 4th during primetime.  The article states it
    only applies to versions 5.0-5.0D.  You may want to upgrade to the kit
    in topic 4091.
    
    regards,
    Mike
4168.72 Problems; 1 fixed in v5.0EVMSNET::P_NUNEZWed Mar 12 1997 09:4511
    
    v5.0e has the fix for the problem that was due to happen
    4-mar; it does not contain the fix for the other over-
    flow problem that is due to occur on 29-mar-1997.  For
    that fix, you need to do an IPMT to get new streamsos
    images...
    
    And to experience the problem you must have an IDLE
    NETBEUI session established with the server.
    
    Paul
4168.8how many IPMTs do you want ?LNZALI::BACHNERMouse not found. Click OK to continueFri Mar 14 1997 08:4026
.1�   Projected Problem Times
.1�   -----------------------
.1�   ...
.1�   29-MAR-1997 10:09:04.81
.1�   ...

.2�    As stated previously in this conference, there is a patch available for
.2�    this.  V50E-ECO1 is still slated for submission at the end of March. 
.2�    Submit an IPMT case to get the patch.
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Unfortunately, the next problem time (two weeks from now) will come *before* the
ECO is even submitted, let alone distributed to the field.

I'm supporting PATHWORKS only as part of my job, and can easily name at least
five of my customers (without much thinking about) who will suffer from this
problem. Other colleagues may have more customers who are interested in this
patch.

So how many IPMTs do you want ?  Wouldn't it be easier to put the patch into a
world-readable directory where MCS employees can pick it up and forward it to
customers ?  I don't even dare to think about putting this patch into the public
part of our web site...

Thanks for listening,
Hans.
4168.9PATRLR::MCCUSKERFri Mar 14 1997 09:3421
No.

There is a mechanism in place, that others are using quite well, to report 
problems and recieve patches.  It is a set of checks and balances to insure
that our customers get the appropriate patches for thier problems.  It works
and I don't see a need to vary from it.  There were problems supporting the
product (from engineerings point of view) back in the V50C time frame much
of which was caused by excessive numbers of patches floating around.  Since 
that time we have tried to adhere to a policy of a release per quarter,
providing patches only for those who couldn't wait, and who had reported the 
bug through IPMT and engineering verified the bug and provided the appropriate
fix.

>Unfortunately, the next problem time (two weeks from now) will come *before* the
>ECO is even submitted, let alone distributed to the field.

Actually, it already went to TIMA.  So you may have it before the date in question.

>So how many IPMTs do you want ?  

1 per customer site should be sufficent.
4168.10LNZALI::BACHNERMouse not found. Click OK to continueFri Mar 14 1997 12:5312
> Actually, it already went to TIMA.  So you may have it before the date in 
> question.

Thanks, that's good news. My previous reply was written under the impression
that the kit is still planned for the end-of-March timeframe. In this case, I
thought that requiring the (costly) IPMT process for maybe a few hundred
customers was not an efficient way to deal with a known high impact problem.

I will pull the official kit next week.

Thanks for the clarification,
Hans.
4168.11can't find the TIMA kitLNZALI::BACHNERMouse not found. Click OK to continueFri Mar 21 1997 16:047
Well, .9 was written a week ago, and I still can't find the kit in TIMA. Was it
blocked by some unforeseeable event ?

What's the current status of V5.0E ECO 1 ?

Thanks, 
Hans.
4168.12Still in the loopCPEEDY::HUANGPro vs. Con ==> Progress vs. ?Mon Mar 24 1997 09:5410
    re: .-1
    
    The kit is still being processed in the TIMA system.
    
    Based on our experience, the normal turnaround time is about 2 weeks.
    Not 2 weeks yet.
    
    
    -Jim
    PATHWORKS Server Engineering
4168.13it's here !LNZALI::BACHNERMouse not found. Click OK to continueMon Mar 24 1997 12:135
I just got mail from TIMA that the kit has arrived.

Thank you for your effort.

Hans.
4168.14How does a customer get it?CHOWDA::GLICKMANwriting from Newport,RITue Mar 25 1997 12:012
    Will it be available from DSNlink?  Or how does a customer get it?