[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference help::decnet-osi_for_vms

Title:DECnet/OSI for OpenVMS
Moderator:TUXEDO::FONSECA
Created:Thu Feb 21 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3990
Total number of notes:19027

3929.0. "nsp retransmission time" by CSC32::J_RYER (MCI Mission Critical Support Team) Fri Apr 11 1997 19:06

    Customer has a time critical application which sends data between
    Alpha's (running VMS V6.2, OSI V6.3 with ECO 6) at two sites over 
    multiplexed T1's (between DECnis routers).  The typical roundtrip 
    delay estimate on logical links between the two systems is 
    100 milliseconds (as observed via ncl> "sho nsp port * roundtrip
    delay estimate" ).
    
    Occasionally (several times a day), a packet has to be re-transmitted
    because it apparently just got dropped down at the data link layer.
    Customer would like this retransmission to happen as quickly as
    possible.   He has the NSP delay factor parameter set to 2, which
    is as low as it's allowed to be set.  However, he sometimes observes
    the roundtrip delay estimate on one of the NSP ports suddenly jump 
    from 100 milliseconds up to three or four seconds.  (no intermittent
    values that he can observe via repeated up-arrow, carriage returns,
    to repeat the ncl>show command.)
    
    In an attempt to prevent this, he set delay weight to its maximum
    allowable value (255).  In our reading of the NSP specification, we
    assumed that this would mean that new values for round-trip time
    would get incorporated into the rolling average very slowly, so the
    variability in the estimate should be minimal.  However, when he did
    this, what he observed was that the estimate would suddenly jump to
    five or six seconds (even longer than the previous observed value of
    three-to-four seconds).
    
    What are we misunderstanding?  Are we backwards on what high and low
    values for delay weight mean?  Is something broken in the OSI
    implementation of NSP retransmission of packets which have not been
    acknowledged in twice the roundtrip delay estimate?  Customer does
    have some CTF traces which show that the packet doesn't get 
    re-transmitted for much longer than twice the roundtrip delay estimate.
    
    Thanks in advance for any comments/advice,
    Jane Ryer
    MCI Mission Critical Support Team
    
    
    Here are his NSP parameter settings:
    
    ncl> sho node noas00 nsp all
    
    Node noas00 NSP
    AT 1997-04-11-21:41:31.750+00:00I45.540
    
    Status
    
        UID                               =
    05818060-ABE3-11D0-8003-AA0004008C64
        State                             = On
        Currently Active Connections      = 13
    
    Characteristics
    
        Maximum Transport Connections     = 200
        Maximum Receive Buffers           = 4000
        Delay Weight                      = 3
        Delay Factor                      = 2
        Maximum Window                    = 20
        DNA Version                       = T4.2.1
        Acknowledgement Delay Time        = 3
        Maximum Remote NSAPS              = 201
        NSAP Selector                     = 32
        Keepalive Time                    = 60
        Retransmit Threshold              = 12
        Congestion Avoidance              = False
        Flow Control Policy               = Segment Flow Control
    
    
    Comment:  could the fact that flow control policy is set to "segment"
    rather than "no" flow control be affecting the retransmission
    algorithm?  I'll have the customer re-test with that changed to
    "no flow control" and the delay weight at 3 and then at 255 (the
    two extremes of its allowable range).
T.RTitleUserPersonal
Name
DateLines
3929.1Lower Weight, use OSI TP if possibleHELP::TAYLORThu Apr 17 1997 11:1416
    Hi Jane,
    
    Yes, they should keep the delay weight as low as
    possible to get a smaller incremental change.
    
    Retransmits are bad because even in Phase V you
    are going to get a delay in seconds.  Also, the
    routing end node cache is going to be flushed.
    
    If they have 2 Phase V systems then they should
    use OSI Transport.
    
    Cheers,
    
    Pat
    
3929.2need more details . . .CSC32::J_RYERMCI Mission Critical Support TeamMon Apr 21 1997 17:3012
    Hi, Pat,
    
    Why is the delay before retransmission in seconds rather
    than milliseconds?  Doesn't that go against the NSP spec?
    (which I think says that the retransmission timer will be 
    delay factor times the estimated roundtrip delay estimate)
    
    Is the extra dely related to the second part of what you said 
    (that the entry gets flushed from the end-system cache) ?
    
    Thanks,
    Jane
3929.3traces confirm three-second pauseCSC32::J_RYERMCI Mission Critical Support TeamThu May 29 1997 18:5292
    Still looking for answers.  Here's a more detailed description
    of my customer's situation . . .
    
MCI runs an application which uses Digital's RTR (Reliable Transaction Router)
product to communicate between two sites (North Royalton, Ohio and Sacramento, 
California) via NSP logical links.  MCI has experienced numerous problems with
the RTR product over the last few years, and they have been told by the RTR
engineers that at least one of the triggering factors is relatively "long" 
pauses in data arriving over the NSP logical links.  

To test exactly what the delay was (and how often it occurred), one of MCI's
software engineers wrote an application which brings up an NSP logical link to
the MIRROR session control application and sends a packet once a minute.
Typical return times for the packet are around a tenth of a second.
(Even though this is a wide-area link, there are multiple T1's between the
sites.)  The program keeps a log of any packet for which the mirrored response 
is not received within a second.  On average, this occurs about twenty to 
thirty times a day (out of 60 x 24 = 1440 attempts), so it occurs on less than 
1% of the transmitted packets).

MCI did some tracing of the LAN at one of the two sites and was able to 
correlate the retransmissions with packets that never showed up on the LAN
trace.  They are not disputing that packets will sometimes be dropped or
corrupted on the LAN.  Their concern is the length of time that the sending
NSP waits before retransmitting the packet when no acknowledgement arrives 
from the receiving end.  It seems to never be less than three seconds.

We've all read the NSP spec until we're blue in the face and tried to correlate
what it says about when to retransmit a not-yet-acknowledged data packet with
what we're observing in MCI's actual network.  As I read the NSP spec, the
retransmission should occur at delay factor (2 in MCI's case) times the
estimated roundtrip delay estimate (reported as around 100 milliseconds by
NCL> SHO NSP PORT * ALL).  So MCI believes that a packet which just
"disappears" en-route to the other site should be re-transmitted 200
milliseconds after the first retransmission.  Instead, the values they
tyically see for the "echo" are in the 3.2 to 3.6 seconds range.

It certainly appears that NSP is adding 3 seconds to the value for
retransmission timer that would be implied by the NSP functional spec.
The only parameter that I can find that looks like it might have an effect on
this is "acknowledgment delay time" which is hard-coded at three seconds and
apparently can not be modified.

I found a note in the DECNETVAX conference (5589), entered a couple of years
ago, which seems to describe a similar situation.  The last entry on that
note reads:
-------------------------------------------------------------------------------
    Can somebody verify that the following info given to a customer from
    the CSC is correct?

                   ... question that you have regarding the long time that
decnet takes to retransmit a packet.  The fact that you can only get it
down it about 4 seconds, is because there is a hardcoded 3 second delay, and
whatever is calculated using delay weight and delay factor is added on to
that value.  This is a leftover from when DECnet was primarily a WAN protocol.
This has been addressed in DECnet/osi (phase V decnet), and the value can
be as low as 1 second.

    Why is this so?  Is it that once NSP creates a link transparently,
    the retransmit timer isn't recalculated?  Do you somehow have to
    force the transmission of packets that have the Delay ACK bit clear?
--------------------------------------------------------------------------------

but was never responded to by "people in the know".

On a previous reply (.1) to this note, Pat Taylor said "even in Phase V
you are going to get a delay in seconds".

What I am hoping to have answered here is :

a) why does my customer never see NSP retransmissions in less than three 
seconds, even though the estimated round-trip time for the link is less than a
tenth of a second, delay factor is set to 2 and delay weight is set to 3?

b) is there some sort of "hardcoded" value that affects NSP retransmission?

c) if there is a fixed delay prior to retransmission, is that considered an
extension to the NSP functional specification, or am I missing something in my
reading of said spec?

d) any suggestions on anything the customer can change (he's currently running
OSI V6.3 with eco 6) to get NSP to retrasmit those packets in less than a
second?

If an IPMT case is needed to answer these questions, I can do that.
(I'm gathering CTF traces at the moment and actually intend to log the
IPMT case on Friday afternoon unless replies to this note produce an
explanation prior to then.)

Thanks,
Jane Ryer
MCI Mission Critical Support Team
3929.4Possible workaroundOZROCK::HARTWIGArthur Hartwig, TaN Engineering-AustraliaFri May 30 1997 10:325
    Question:
    	Would using OSI Transport instead of NSP be a suitable work-around?
    
    (I don't know, but maybe its worth a try.)
    
3929.5trivial change or more involved?CSC32::J_RYERMCI Mission Critical Support TeamFri May 30 1997 11:2410
    Would that be as simple as just changing the session control transport 
    precedence, or would more work be required?  
    
    I don't know if Bruce (the MCI employee investigating this problem,
    he's well known to you, Arthur!) has discussed that possibility with
    the Digital RTR engineers or not.  Might there be anything in their
    code that could break when using OSI Transport rather than NSP for its
    node-to-node connections?
    
    Jane Ryer
3929.6RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringFri May 30 1997 12:504
All you should need to do is change the transport precedence.  A DNA Session
Control application should not be able to tell what transport it is running
over, however, be aware that RTR Engineering may claim "no support" for this
configuration simply because they haven't tested it.
3929.7OZROCK::HARTWIGArthur Hartwig, TaN Engineering-AustraliaTue Jun 03 1997 10:463
    Maybe someone with more knowledge about the transport protocol
    specifics could comment on whether or not OSI Transport is also likely
    to show these delays on timeout recovery.
3929.8OSITP - Local retransmission time (T1)BIKINI::DITEJohn Dite@RTO DTN 865-4065Thu Jun 05 1997 05:2830
I don't know if this helps:

Extract out of ISO/IEC 8073:1992 
--------------------------------------------------------------------------------
12.2.1.1.4 Local retransmission time (T1)
The local transport entity is assumed to maintain a bound on the time 
it will wait for an acknowledgement before retransmitting the TPDU.

The value is given by

T1=Elr+Erl+Ar+x

where
Elr is the expected maximum transit delay local-to-remote;
Erl is the expected maximum transit delay remote-to-local;
Ar  is the remote acknowledgement time;
x   is the local processing time for a TPDU
--------------------------------------------------------------------------------

As far as I'm aware there is no additional 'hard coded value' that is added 
during this calculation. 

Please be aware that DECnet/OSI systems have default value of Ar (NCL
OSI Transpor Template characteristic attribute 'Acknowledgement Delay Time')
of 1 second. Ar is passed at connection establishment time. 

So if you want to ensure that every TPDU is acknowledged as quickly as possible
then set 'Acknowledgement Delay Time' to 0.

John