[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5241.0. "ANAL/SYS SHOW PORT/VC=VC_node ReXmt -- retransmit counter reasonable values?" by AMCFAC::RABAHY (dtn 471-5160, outside 1-810-347-5160) Tue Mar 04 1997 17:19

Although the attached example doesn't show it, is it reasonable to get 1,000's
of ReXmt's?

What do the two numbers mean?

$ ANAL/SYS
SDA> show port
...
SDA> show port /vc=vc_node

VMScluster data structures
--------------------------
                 --- Virtual Circuit (VC) 812147C0 ---
Remote System Name:  WOODYS (1:ALPHA)   Remote SCSSYSTEMID:  34830
Local System ID:  220 (DC)              Status: 0005 open,path
------ Transmit -------  ------ VC Closures ----  ---- Congestion Control ----
Msg Xmt          177838  SeqMsg TMO            0  Pipe Quota/Slo/Max  16/ 2/31
  Unsequence         12  CC DFQ Empty          0  Pipe Quota Reached       147
  Sequence       169657  Topology Change       0  Xmt C/T             372/1024
  ReXmt           24/24  NPAGEDYN Low          0  RndTrp uS        23107+34993
  Lone ACK         8145                           UnAcked Msgs               0
Bytes Xmt      99123393                           CMD Queue Len/Max        0/5
------- Receive -------  - Messages  Discarded -  ----- Channel Selection ----
Msg Rcv          237606  No Xmt Chan           0  Preferred Channel   81216440
  Unsequence          7  Rcv Short Msg         0  Delay Time          FFD1B1CD
  Sequence       206276  Illegal Seq Msg       0  Buffer Size             1412
  ReRcv               5  Bad Checksum          0  Channel Count              1
  Lone ACK        31323  TR DFQ Empty          0  Channel Selections        29
  Cache               0  TR MFQ Empty          0  Protocol               1.4.0
  Ill ACK             0  CC MFQ Empty          0  Open  4-MAR-1997 00:33:08.17
Bytes Rcv      58926718  Cache Miss            0  Cls   4-MAR-1997 00:33:06.25
T.RTitleUserPersonal
Name
DateLines
5241.1UTRTSC::thecow.uto.dec.com::JurVanDerBurgChange mode to Panic!Wed Mar 05 1997 01:246
1000's of ReXmt's must be related to the time the connection is up. If not more 
than let's say .1% then no worry. If it's significant then you have some network 
problem which may affect overall performance.

Jur.

5241.2AMCFAC::RABAHYdtn 471-5160, outside 1-810-347-5160Wed Mar 05 1997 08:5418
In 24 days, give or take, the worse case was 21387 ReXmt's.  There seems to be a
load depended steady increase over time.  In just a couple of hours one node had
83 ReXmt's to a single partner after a reboot.

The network is comprised of dedicated point-to-point MMF FDDI GIGAswitch/FDDI
links.  So, they do operate in full duplex mode.  6 nodes, 4 with 1 link, 2 with
2 links.  All using DEFPA-DA.

One node sticks out with especially large values of ReXmt.  This node is the
only node which had NISCS_MAX_PKTSZ still at the default 1498 instead of being
increased to 4468.  The next reboot will correct this and we'll see if it comes
into line.  The other difference about this node is it is the only one in a
different data center.  Longer fibers are required through the walls.  Naturally
this involves connecting at ST-style LIU's.

I am most concerned about performance impact.  My gut is telling me these
numbers are way too small to adversely effect performance noticably -- still it
would be nice to have confirmation.  Also, what do the two numbers mean?
5241.3AMCFAC::RABAHYdtn 471-5160, outside 1-810-347-5160Wed Mar 05 1997 14:5810
From section F.2.4 of the OpenVMS Cluster Systems manual;

	A well-configured OpenVMS Cluster system should
	not perform excessive retransmissions between nodes.
	Retransmissions between any nodes that occur more fre-
	quently than once every few seconds deserve network
	investigation.

Every few seconds is too vague and too lenient, yes?  It should be guided by
load.
5241.4AMCFAC::RABAHYdtn 471-5160, outside 1-810-347-5160Wed Mar 05 1997 15:3818
re .3:

Section F.3.3, Table F-5 does much better;

The leftmost number (128) indicates the number of packets actually retransmit-
ted. For example, if the network loses two packets at the same time, one timeout
is counted but two packets are retransmitted. A retransmission occurs when the
local node does not receive an acknowledgment for a transmitted packet within a
predetermined timeout interval.

Although you should expect to see a certain number of retransmissions (especially
in heavily loaded networks), an excessive number of retransmissions wastes
network bandwidth and indicates excessive load or intermittent hardware fail-
ure. If the leftmost value in the ReXmt field is greater than about 0.01% to
0.05%
of the total number of the transmitted messages shown in the Msg Xmt field, the
OpenVMS Cluster system probably is experiencing excessive network problems or
local loss from congestion.
5241.5AMCFAC::RABAHYdtn 471-5160, outside 1-810-347-5160Thu Mar 06 1997 10:2231
re .4:

Again, from section F.3.3, Table F-5;

>The rightmost number (106) in the ReXmt field indicates the number of times a
>timeout occured.

My question is, how can there be more timeouts than retransmitted packets per
the following actual example??

VMScluster data structures
--------------------------
                 --- Virtual Circuit (VC) 82028D00 ---
Remote System Name:  SPARTY (1:ALPHA)   Remote SCSSYSTEMID:  26725
Local System ID:  218 (DA)              Status: 0005 open,path
------ Transmit -------  ------ VC Closures ----  ---- Congestion Control ----
Msg Xmt       356679359  SeqMsg TMO            1  Pipe Quota/Slo/Max   5/ 3/31
  Unsequence          5  CC DFQ Empty          0  Pipe Quota Reached   1611311
  Sequence    298467108  Topology Change       0  Xmt C/T                6/320
  ReXmt       3082/3106  NPAGEDYN Low          0  RndTrp uS          8700+9756
  Lone ACK     58209164                           UnAcked Msgs              13
Bytes Xmt    3211438740                           CMD Queue Len/Max      0/121
------- Receive -------  - Messages  Discarded -  ----- Channel Selection ----
Msg Rcv       273590792  No Xmt Chan           0  Preferred Channel   82023AC0
  Unsequence          6  Rcv Short Msg         0  Delay Time          0A0A2747
  Sequence    267286043  Illegal Seq Msg       0  Buffer Size             4382
  ReRcv            7576  Bad Checksum          0  Channel Count              2
  Lone ACK      6295562  TR DFQ Empty          0  Channel Selections    227776
  Cache            1609  TR MFQ Empty          0  Protocol               1.4.0
  Ill ACK             0  CC MFQ Empty          0  Open 11-FEB-1997 08:28:01.80
Bytes Rcv    2485526100  Cache Miss            0  Cls  11-FEB-1997 08:27:52.17
5241.6AMCFAC::RABAHYdtn 471-5160, outside 1-810-347-5160Thu Mar 06 1997 10:263
What does it mean when the Preferred Channel under the channel selection portion
is 00000000 momentarily?  I'm guessing it is just a fluke that I caught it
changing between channels.