T.R | Title | User | Personal Name | Date | Lines |
---|
3929.1 | Lower Weight, use OSI TP if possible | HELP::TAYLOR | | Thu Apr 17 1997 11:14 | 16 |
| Hi Jane,
Yes, they should keep the delay weight as low as
possible to get a smaller incremental change.
Retransmits are bad because even in Phase V you
are going to get a delay in seconds. Also, the
routing end node cache is going to be flushed.
If they have 2 Phase V systems then they should
use OSI Transport.
Cheers,
Pat
|
3929.2 | need more details . . . | CSC32::J_RYER | MCI Mission Critical Support Team | Mon Apr 21 1997 17:30 | 12 |
| Hi, Pat,
Why is the delay before retransmission in seconds rather
than milliseconds? Doesn't that go against the NSP spec?
(which I think says that the retransmission timer will be
delay factor times the estimated roundtrip delay estimate)
Is the extra dely related to the second part of what you said
(that the entry gets flushed from the end-system cache) ?
Thanks,
Jane
|
3929.3 | traces confirm three-second pause | CSC32::J_RYER | MCI Mission Critical Support Team | Thu May 29 1997 18:52 | 92 |
| Still looking for answers. Here's a more detailed description
of my customer's situation . . .
MCI runs an application which uses Digital's RTR (Reliable Transaction Router)
product to communicate between two sites (North Royalton, Ohio and Sacramento,
California) via NSP logical links. MCI has experienced numerous problems with
the RTR product over the last few years, and they have been told by the RTR
engineers that at least one of the triggering factors is relatively "long"
pauses in data arriving over the NSP logical links.
To test exactly what the delay was (and how often it occurred), one of MCI's
software engineers wrote an application which brings up an NSP logical link to
the MIRROR session control application and sends a packet once a minute.
Typical return times for the packet are around a tenth of a second.
(Even though this is a wide-area link, there are multiple T1's between the
sites.) The program keeps a log of any packet for which the mirrored response
is not received within a second. On average, this occurs about twenty to
thirty times a day (out of 60 x 24 = 1440 attempts), so it occurs on less than
1% of the transmitted packets).
MCI did some tracing of the LAN at one of the two sites and was able to
correlate the retransmissions with packets that never showed up on the LAN
trace. They are not disputing that packets will sometimes be dropped or
corrupted on the LAN. Their concern is the length of time that the sending
NSP waits before retransmitting the packet when no acknowledgement arrives
from the receiving end. It seems to never be less than three seconds.
We've all read the NSP spec until we're blue in the face and tried to correlate
what it says about when to retransmit a not-yet-acknowledged data packet with
what we're observing in MCI's actual network. As I read the NSP spec, the
retransmission should occur at delay factor (2 in MCI's case) times the
estimated roundtrip delay estimate (reported as around 100 milliseconds by
NCL> SHO NSP PORT * ALL). So MCI believes that a packet which just
"disappears" en-route to the other site should be re-transmitted 200
milliseconds after the first retransmission. Instead, the values they
tyically see for the "echo" are in the 3.2 to 3.6 seconds range.
It certainly appears that NSP is adding 3 seconds to the value for
retransmission timer that would be implied by the NSP functional spec.
The only parameter that I can find that looks like it might have an effect on
this is "acknowledgment delay time" which is hard-coded at three seconds and
apparently can not be modified.
I found a note in the DECNETVAX conference (5589), entered a couple of years
ago, which seems to describe a similar situation. The last entry on that
note reads:
-------------------------------------------------------------------------------
Can somebody verify that the following info given to a customer from
the CSC is correct?
... question that you have regarding the long time that
decnet takes to retransmit a packet. The fact that you can only get it
down it about 4 seconds, is because there is a hardcoded 3 second delay, and
whatever is calculated using delay weight and delay factor is added on to
that value. This is a leftover from when DECnet was primarily a WAN protocol.
This has been addressed in DECnet/osi (phase V decnet), and the value can
be as low as 1 second.
Why is this so? Is it that once NSP creates a link transparently,
the retransmit timer isn't recalculated? Do you somehow have to
force the transmission of packets that have the Delay ACK bit clear?
--------------------------------------------------------------------------------
but was never responded to by "people in the know".
On a previous reply (.1) to this note, Pat Taylor said "even in Phase V
you are going to get a delay in seconds".
What I am hoping to have answered here is :
a) why does my customer never see NSP retransmissions in less than three
seconds, even though the estimated round-trip time for the link is less than a
tenth of a second, delay factor is set to 2 and delay weight is set to 3?
b) is there some sort of "hardcoded" value that affects NSP retransmission?
c) if there is a fixed delay prior to retransmission, is that considered an
extension to the NSP functional specification, or am I missing something in my
reading of said spec?
d) any suggestions on anything the customer can change (he's currently running
OSI V6.3 with eco 6) to get NSP to retrasmit those packets in less than a
second?
If an IPMT case is needed to answer these questions, I can do that.
(I'm gathering CTF traces at the moment and actually intend to log the
IPMT case on Friday afternoon unless replies to this note produce an
explanation prior to then.)
Thanks,
Jane Ryer
MCI Mission Critical Support Team
|
3929.4 | Possible workaround | OZROCK::HARTWIG | Arthur Hartwig, TaN Engineering-Australia | Fri May 30 1997 10:32 | 5 |
| Question:
Would using OSI Transport instead of NSP be a suitable work-around?
(I don't know, but maybe its worth a try.)
|
3929.5 | trivial change or more involved? | CSC32::J_RYER | MCI Mission Critical Support Team | Fri May 30 1997 11:24 | 10 |
| Would that be as simple as just changing the session control transport
precedence, or would more work be required?
I don't know if Bruce (the MCI employee investigating this problem,
he's well known to you, Arthur!) has discussed that possibility with
the Digital RTR engineers or not. Might there be anything in their
code that could break when using OSI Transport rather than NSP for its
node-to-node connections?
Jane Ryer
|
3929.6 | | RMULAC::S_WATTUM | Scott Wattum - FTAM/VT/OSAK Engineering | Fri May 30 1997 12:50 | 4 |
| All you should need to do is change the transport precedence. A DNA Session
Control application should not be able to tell what transport it is running
over, however, be aware that RTR Engineering may claim "no support" for this
configuration simply because they haven't tested it.
|
3929.7 | | OZROCK::HARTWIG | Arthur Hartwig, TaN Engineering-Australia | Tue Jun 03 1997 10:46 | 3 |
| Maybe someone with more knowledge about the transport protocol
specifics could comment on whether or not OSI Transport is also likely
to show these delays on timeout recovery.
|
3929.8 | OSITP - Local retransmission time (T1) | BIKINI::DITE | John Dite@RTO DTN 865-4065 | Thu Jun 05 1997 05:28 | 30 |
| I don't know if this helps:
Extract out of ISO/IEC 8073:1992
--------------------------------------------------------------------------------
12.2.1.1.4 Local retransmission time (T1)
The local transport entity is assumed to maintain a bound on the time
it will wait for an acknowledgement before retransmitting the TPDU.
The value is given by
T1=Elr+Erl+Ar+x
where
Elr is the expected maximum transit delay local-to-remote;
Erl is the expected maximum transit delay remote-to-local;
Ar is the remote acknowledgement time;
x is the local processing time for a TPDU
--------------------------------------------------------------------------------
As far as I'm aware there is no additional 'hard coded value' that is added
during this calculation.
Please be aware that DECnet/OSI systems have default value of Ar (NCL
OSI Transpor Template characteristic attribute 'Acknowledgement Delay Time')
of 1 second. Ar is passed at connection establishment time.
So if you want to ensure that every TPDU is acknowledged as quickly as possible
then set 'Acknowledgement Delay Time' to 0.
John
|