[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::pwv50ift

Title:Kit: Note 4229; Please use NOTED::PWDOSWIN5 for V4.x server
Notice:Kit: Note 4229; Please use NOTED::PWDOSWIN5 for V4.x server
Moderator:CPEEDY::KENNEDY
Created:Fri Dec 18 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4319
Total number of notes:18478

4158.0. "v5.0E NETBIOS Names" by VMSNET::P_NUNEZ () Fri Feb 14 1997 10:37

    I've got a customer with very weird things going on since upgrading to
    v5.0e.   2-node VAX4000 cluster running 5.5-2.  
    
    This note will concentrate on a license server problem.
    
    Customer has several license server dump files.  The license server
    logs report the error:
    
    $ type pwrk$license_server_bigbrd.log
    14-FEB-1997 07:59:36.00 INIT_LOGFILE:
            OPEN: PATHWORKS for OpenVMS License Server Startup
    14-FEB-1997 07:59:36.00 MESSAGE:
            Image Identification: V5.0-500E
    14-FEB-1997 07:59:38.00 FATAL_ERROR:
            Name 'PWRK$LBIGBRD    ' is in use by Another License Server
    
    
    Note that the name of the license server is PWRK$LBIGBRD.
    
    Ok.  So it looks like someone's duplicated the license server name. 
    Yet SHOW ASTAT PWRK$LBIGBRD from a client with all transports returns
    command timeout error.  
    
    However, depending on which transport returns the data, below is what
    we see from either the client (show astat bigbrd) or from the server
    (PCSA_CLAIM_NAME or NBSHOW).  
    
    From node BIGBRD:
    
    $ mc pcsa_claim_name /status
    
    NETBIOS Status
    --------------
            Adapter address : aa0004001968
            SW releaselevel : 00
            SW version numb : 0302
            Minutes up-time : 0000
            Nr CRC   errors : 0000
            Nr align errors : 0000
            Nr of collisn's : 0000
            Nr aborted xmts : 0000
            Nr succesf xmts : 00000382
            Nr succesf rcvs : 0000beec
            Nr retransmisns : 0000
            Nr resource err : 0000
            Nr free cmd blk : 0000
            Max nr cmd blks : 0000
            Max nr free cmd : 0000
            Nr pending sess : 0000
            Max nr sessions : 0000
            Nr possible ses : 0000
            Max messagesize : 0000
    DECnet NetBIOS Name Table Contents
            NetBIOS name    Last  Numb  Status
            BIGBRDCMTSERVER  20    03    04
            BIGBRD           20    04    04
            BIGBRD           00    05    04
            LANGROUP         00    06    84
            LANGROUP         1c    07    84
            BARNEY           00    08    84
            GRPWISE          00    09    84
            BARNEY           20    0a    8c
            PWRK$LBIGBRDR01  50    0b    04
            PWRK$LBIGBRDR04  50    0c    04
            PWRK$LBIGBRDR05  50    0d    04
            PWRK$LBIGBRDR06  50    0e    04
            PWRK$LBIGBRDR07  50    0f    04
            PWRK$LBIGBRDR08  50    10    04
            PWRK$LBIGBRDR09  50    11    04
            PWRK$LBIGBRDR0A  50    12    04
            PWRK$LBIGBRDR0B  50    13    04
            PWRK$LBIGBRDR0C  50    14    04
            PWRK$LBIGBRDR0D  50    15    04
            PWRK$LBIGBRDR0E  50    16    04
            PWRK$LBIGBRDR0F  50    17    04
            PWRK$LBIGBRDR0G  50    18    04
            PWRK$LBIGBRDR0H  50    19    04
            PWRK$LBIGBRDR0I  50    1a    04
            PWRK$LBIGBRDR0J  50    1b    04
            PW  00    00    00
              00    00    00
              00    00    00
    
    $ NBSHOW NBSTATUS
    
    Adapter status for * on
    ID: AA   0   4   0  19  68   Version  3.0
    Time up: 0 days 2 hours 29 minutes
    Packets sent:             115896        CRC errors:             0
    Packets received:          98121        Alignment errors:       0
    Retransmitted packets:         0        Collisions:             0
    Resources exhausted:           0        Aborted transmissions:  0
    Ncbs:      Free  : 65535 of 65535;  maximum configurable: 65535
    Sessions:  In use:    32 of   850;  maximum configurable:   850
    Adapter packet size:  1492
    Local name table (9 names):
    Name            Soc Num Status
    ^^^^^^^^^^^^^^^ x 0   0 Unique Registration pending
    BIGBRD          x20   1 Unique Registered
    LANGROUP        x 0   2 Group  Registered
    LANGROUP        x1c   3 Group  Registered
    BARNEY          x20   4 Unique Registered
    PWRK$LBIGBRDR02 x50   5 Unique Registered
    BARNEY          x 0   6 Group  Registered
    GRPWISE         x 0   7 Group  Registered
    ^^^^^^^^^^^^^^^ x68   8 Unique Registered
    Session status:
    name num 9, # sessions 21, rcv dg=7, rcv any=0
    LSN     State   Local Name      Soc  Remote Name     Soc  rcvs  sends
    0       LISTEN  BIGBRDCMTSERVER x20  ^^^^^^^^^^^^^^^ x 0     0     0
    1       ESTAB   BARNEY          x20  LADOSKIT        x 0     1     0
    0       LISTEN  BIGBRD          x20  ^^^^^^^^^^^^^^^ x 0     0     0
    1       ESTAB   BARNEY          x20  TRAINING10      x 0     1     0
    0       LISTEN  BARNEY          x20  ^^^^^^^^^^^^^^^ x 0     0     0
    1       ESTAB   BARNEY          x20  TRAINING6       x 0     1     0
    1       ESTAB   BARNEY          x20  TRAINING4       x 0     1     0
    1       ESTAB   BARNEY          x20  BOXEYK          x 0     1     0
    1       ESTAB   BARNEY          x20  HDQ343          x 0     1     0
    1       ESTAB   BARNEY          x20  RBLOCK          x 0     1     0
    1       ESTAB   BARNEY          x20  TRAINING5       x 0     1     0
    1       ESTAB   BIGBRD          x20  HMRS03          x 0     1     0
    
    And from node CLONE:
    
    NETBIOS Status
    --------------
            Adapter address : aa0004003268
            SW releaselevel : 00
            SW version numb : 0302
            Minutes up-time : 0000
            Nr CRC   errors : 0000
            Nr align errors : 0000
            Nr of collisn's : 0000
            Nr aborted xmts : 0000
            Nr succesf xmts : 000000c3
            Nr succesf rcvs : 00002694
            Nr retransmisns : 0000
            Nr resource err : 0000
            Nr free cmd blk : 0000
            Max nr cmd blks : 0000
            Max nr free cmd : 0000
            Nr pending sess : 0000
            Max nr sessions : 0000
            Nr possible ses : 0000
            Max messagesize : 0000
    DECnet NetBIOS Name Table Contents
           NetBIOS name    Last  Numb  Status
            CLONECMTSERVER   20    02    04
            CLONE            20    03    04
            CLONE            00    04    04
            LANGROUP         00    05    84
            LANGROUP         1c    06    84
            BARNEY           00    07    84
            GRPWISE          00    08    84
            BARNEY           20    09    84
            PWRK$LCLONE R01  50    0a    04
            PWRK$LCLONE R03  50    0b    04
            PWRK$LCLONE R05  50    0c    04
    
    
     On a system running v5.0E in our lab I see similar results from
    PCSA_CLAIM_NAME and NBSHOW:
    
    DECnet NetBIOS Name Table Contents
    
            NetBIOS name    Last  Numb  Status
            ALFPW1CMTSERVER  20    02    04
            ALFPW1           20    03    04
            ALFPW1           00    04    04
            ALFPWIFT         00    05    84
            ALFPWIFT         1c    06    84
            ALFPW1_ALIAS     00    07    84
            ALFPW1_ALIAS     20    08    8c
            PWRK$LALFPW1R01  50    11    04
    
    
     Any ideas on what's causing this?
    
    Paul
T.RTitleUserPersonal
Name
DateLines
4158.1Dump file analysisVMSNET::P_NUNEZFri Feb 14 1997 11:0217
    if it helps, license server dump analysis is returning:
    
    Condition signalled to take dump:
    %SYSTEM-F-ABORT, abort
    %SYSTEM-F-ABORT, abort
    -SYSTEM-S-NOMSG, Message number 0000F941
    
    DBG> show calls
     module name     routine name                     line     rel PC   abs PC
    *LOGGING         PLog                             3998    00000299 0000F941
     IC                                                       00000000 0000E848
     IC                                                       00000000 0000BF7B
     SHARE$PWRK$CSSHR
                                                              00000000 00096085
     MTS$MAIN                                                 00000000 0000581F
                                                              00000000 8962038D
    
4158.2QuestionsCPEEDY::KENNEDYSteve KennedyFri Feb 14 1997 13:0424
    Paul-
    
    .0> Customer has several license server dump files.  The license server
    .0> logs report the error:
    
    Is this an occasional occurance or is this happening all the time?
    ("all the time" meaning that the license server won't run at all)
    
    Is the customer running PATHWORKS on both nodes of the cluster?  If so,
    is the license server configured to run on both nodes?  If so, does the
    license server always fail on both nodes?  Or does it fail sometimes?
    If sometimes, is it always/usually while trying to start up on one node
    as a result of failing over from the other node?  
    
    I know it's a lot of work, but has the customer tried to trouble shoot
    this by configuring for a single transport to see which one (or more)
    fails?
    
    \steve
    
    
    
    
    
4158.3updateVMSNET::P_NUNEZFri Feb 14 1997 14:5396
    Steve,    
    
    .0> Customer has several license server dump files.  The license server
    .0> logs report the error:
    
    >Is this an occasional occurance or is this happening all the time?
    >("all the time" meaning that the license server won't run at all)
    
    It appeared to be happening all the time.  But (1) with version limit
    of 5 I only have stuff from today and (2) the customer just upgraded to
    v5.0E over the weekend.  He did note he's had problems with the license
    server since upgrading that required him to stop/restart it several
    times before it worked (but see below for how he was managing his
    license server in the cluster).  So, based on that, and the fact that
    I'm seeing one strange netbios license server name on a cluster running
    v5.0E in our lab that is similar to what I saw on the customer's, I
    gotta believe this is new to v5.0E.
    
>    Is the customer running PATHWORKS on both nodes of the cluster?  If so,
>    is the license server configured to run on both nodes?  If so, does the
>    license server always fail on both nodes?  Or does it fail sometimes?
>    If sometimes, is it always/usually while trying to start up on one node
>    as a result of failing over from the other node?  
    
    He has a dssi cluster of 2 VAX 4000-500A, hardware model type 453
    (BIGBRD and CLONE) and is running PATHWORKS on them both.  Yes it was
    failing on both nodes.  
    
    I think your hunch about starting the license server on one node after
    it's failed on the other is a good one.  Due to misconceptions on their
    part, they thought they should only run pwrk$license_s on node BIGBRD
    (because that's the name the license server grabbed initially). 
    Because they weren't aware of the inhibit logical, they accomplished
    this by running pwrk$license_shutdown on CLONE after PATHWORKS was
    running on both nodes in the cluster.  They could start PATHWORKS on
    either node first (they didn't have a policy on this).  And I would
    think that if this is the issue, then the order that likely causes the
    problem is:
    
    start PATHWORKS on clone first (becomes active license server)
    start PATHWORKS on bigbrd
    stop license server on clone  (license server fails over to bigbrd)
    
    If it were the other way around, stopping the license server on clone
    wouldn't cause any failover to occur.  
    
    >I know it's a lot of work, but has the customer tried to trouble shoot
    >this by configuring for a single transport to see which one (or more)
    >fails?
    
    By the time we figured out how to get the license server started, the
    customer wanted to leave it alone until Monday.  I've got a cluster I'm
    going to try to duplicate it on (in .0 I did show one strange netbios
    name related to the license server exists on our cluster already).
    
    We found our way around it when I noticed the pwrk$lbigbrd\20,
    pwrk$lbigrd\43, and pwrk$ls\47 NETBIOS names still existed ($ mc
    pcsa_claim_name /status) on bigbrd after stopping the license server on
    bigbrd. 
    
    So I:
    
    $ mc pcsa_claim_name /delete pwrk$lbigbrd
    $ mc pcsa_claim_name /delete pwrk$lbigbrd\43
    $ mc pcsa_claim_name /delete pwrk$ls\47
    $ @sys$startup:pwrk$license_startup
    
    and it worked.  So it seems the netbios names (possibly just DECnet
    netbios names) aren't being deleted when pwrk$license_s is stopped.  
    This would explain the license server log error "Name 'PWRK$LBIGBRD    '
    is in use by Another License Server".  
    
    Comments?
    
    I still don't understand how those odd netbios names are getting
    created?  I checked the customer's license server log and state file
    and they have the correct name of just BIGBRD.  Same on our cluster. 
    Here's the one odd name that existed on our internal cluster (which
    seemed to be running fine - no dumps/etc) that uses the license server
    name PWRK$LALFPW1:
    
                NetBIOS name    Last  Numb  Status
    
                PWRK$LALFPW1R01  50    11    04
    
    In all cases where "R0n" is appended to the name, the last byte is 50. 
    On the customer's system I saw it had names for PWRK$LBIGBRDR01 -
    PWRK$LBIGBRDR0M and all had a last byte of 50.  When he viewed these
    names from DOS with SHOW ASTAT BIGBRD, the names ended with a "P" (for
    example, PWRK$LBIGBRDR02P).  I don't see any names with a last byte of
    50 when things are "normal". 
    
    I'm still dialed in if you need more info (but things are "normal" at
    this point)...
    
    Paul
4158.4More weirdnessVMSNET::P_NUNEZFri Feb 14 1997 15:065
    
    Also note in .0 that we see on node CLONE that it's claimed the name
    PWRK$LCLONE R01 (and others).  But why isn't PWRK$LBIGBRD????
    
    Paul
4158.5Account issue?VMSNET::P_NUNEZFri Feb 14 1997 15:178
    Possibly another factor.  The customer noted that it seemed he had to
    run the pwrk$license_startup from the SYSTEM account even though he has
    fully privileged VMS account.  I was using FIELD account (with all
    privs enabled) to stop/start the license server process.  Could this be
    a factor?
    
    
    paul
4158.6PWRK$L<name>\4c ?VMSNET::P_NUNEZFri Feb 14 1997 15:216
    
    I'm trying to duplicate on our cluster.  I'm seeing an additional
    license server netbios name with a last byte of 4c.  I don't see this
    on the customers systems?
    
    Paul
4158.7my thought is names aren't being deletedCPEEDY::KENNEDYSteve KennedyFri Feb 14 1997 18:5179
    .3> So it seems the netbios names (possibly just DECnet
    .3> netbios names) aren't being deleted when pwrk$license_s is stopped.  
    .3> This would explain the license server log error "Name 'PWRK$LBIGBRD    '
    .3> is in use by Another License Server".  
    .3> 
    .3> Comments?

    This was my suspicion.  I remembered we ran into a problem like this in
    our test lab, but I couldn't remember if it was while testing shipping
    software or prototype software.  In either case it looks like the
    problem is now in the field.  FWIW: when we saw it before it was DECnet
    only.

    .3> I still don't understand how those odd netbios names are getting
    .3> created?  

    The odd names that you can now see were introduced recently as an
    optimization to the license components' "PING client" functions.
    Essentially these names are created and serve as a 'cache' of network
    names which the LS (or LR) use to ping clients for license information.
    Previously the license components created new names on the fly, which
    turns out to be very expensive (time wise) - especially in the LR case
    where the client is waiting in the middle of trying to establish a
    connection with the file server while this is going on.

    The "R01P" ("R01"+ASC(50)) you see in the names is just a four
    character tag appended to a "PWRK$Lname" name base to create a unique
    name (*). The first character of this tag indicates if the name is
    associated with the license registrar ("R") or license server ("S").
    The next two characters are actually an alpha-numeric counter used to
    create multiple unique names, where either character may be "0"-"9",
    "A"-"Z". The last of the four characters is "P" (Ascii(50)), indicating
    a "Ping" end-point.


    .4> Also note in .0 that we see on node CLONE that it's claimed the name
    .4> PWRK$LCLONE R01 (and others).  But why isn't PWRK$LBIGBRD????
                   _^_

    (*) This is a registrar name, so the name base is formed using "PWRK$L"
        plus the node name (ie in this case CLONE), so as not to conflict 
        with other LRs in a cluster).  I believe the LS uses the LS name as 
        its name base when forming these names.
    
    
    .5> Possibly another factor.  The customer noted that it seemed he had to
    .5> run the pwrk$license_startup from the SYSTEM account even though he has
    .5> fully privileged VMS account.  I was using FIELD account (with all
    .5> privs enabled) to stop/start the license server process.  Could this be
    .5> a factor?

    I can't think of a reason why this is a factor, but I won't dismiss it
    as a possibility. 

    I did notice on my system that the LS groups names, "PWRK$LS...G" and
    "PWRK$Lname...L", are not cleaned up when the license server is shut
    down using PWRK$LICENSE_SHUTDOWN (though these leftovers shouldn't
    cause the conflict the customer is seeing). I wonder if it might be a
    timing thing where the failover happens too quickly and the name on the
    other node of the cluster isn't cleaned up?  That said, I would only
    expect this to be a possibility true if there were changes in this
    area, since we haven't seen this type of problem before with cluster
    configurations.

    .6> I'm trying to duplicate on our cluster.  I'm seeing an additional
    .6> license server netbios name with a last byte of 4c.  I don't see this
    .6> on the customers systems?

    Ascii(4c) = "L".  The "L" is a tag which the license server uses (in
    addition to the other tags listed in Note 2479.4.  I can't remember its
    exact use off the top of my head, but I think this indicates some sort
    of listener thread for the license server.

    Let us know the results or any info you glean from your testing.  

    Also, this seems to be a problem which will require a code change
    solution - probably should escalate.

    \steve
4158.8New Features, eh?VMSNET::P_NUNEZMon Feb 17 1997 09:3918
    Steve,
    
>    Let us know the results or any info you glean from your testing.  

    From your reply, our cluster is working normally and I was unable to 
    duplicate the customer's "duplicate name" problems by stopping/starting
    license server many times...
    
>    Also, this seems to be a problem which will require a code change
>    solution - probably should escalate.
    
    I had the customer run the gather info procedure and ftp the saveset to
    me last Friday, but it didn't make it in tact; I'll have him send it on
    tape, but is there anything else I should get?

    Appreciate the help,
    
    Paul
4158.9feature? we think so ;-)CPEEDY::KENNEDYSteve KennedyMon Feb 17 1997 12:5037
    Paul-

    re: "New Features, eh?"

    We thought so ;-)  Here's why: When server-based licensing is being
    used, caching NETBIOS names for use by the license registrar in pinging
    the client will save ~3 seconds in the turn around time back to the
    client (the three seconds it takes to claim a new network name that the
    LR used to ping the client). In V6 things potentially get worse because
    the the 3+ second delay will turn into ~5 seconds if WINS is being used
    (due to the extra time to go to the name server).  Caching network
    names for this purpose allows us to eliminate this very long delay in
    most cases.
    
    Feature? ;-}

    .8> [...] and I was unable to duplicate the customer's "duplicate name"
    .8> problems by stopping/starting license server many times...

    Someone will try to reproduce this here once we get a CLD.

    I'm now wondering if this isn't just a timing issue during failover in
    a cluster, where the conflict is caused by the license server's name
    not being cleaned up quickly enough on one node before the other node
    tries to claim it. The reason I'm leaning this way now is that the
    "PWRK$Lname" didn't show up in the PCSA_CLAIM_NAME list, so it's not
    like something just lost track and didn't clean-up the name.  Since the
    name isn't "hanging around" in the name tables, I'm asuming there must
    have been some intermittent conflict.

    .8> [...] anything else I should get?
    
    I can't think of any other info that the customer's going to have that
    you can ask for.
    
    thanks,
    \steve