[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5311.0. "SEVERAL ERRORS IN PEA0 in a SCSI Openvms cluster" by LATINA::GREGORIO (MCS Madrid) Fri May 16 1997 09:07


	Hello,


	I have a customer with a SCSI OpenVMS Cluster in 6.2-1h3.
	When he makes a backup from node 1 to a tape of node 2, he 
	obtains several errors in PEA0 and the node 1 works very slow.

	Node 1: BUNNY
	Node 2: DUFFY

Tape of node 2.

Magtape $1$MKF500: (DUFFY), device type TZ88, is online, record-oriented device,
    file-oriented device, available to cluster, error logging is enabled,
    controller supports compaction (compaction  enabled,).

    Error count                    0    Operations completed              13120
    Owner process                 ""    Owner UIC              [SYSTEST,SYSTEM]
    Owner process ID        00000000    Dev Prot            S:RWPL,O:RWPL,G:R,W
    Reference count                0    Default buffer size                 512
    Density                  unknown    Format                        Normal-11
    Host name                "DUFFY"    Host type, avail AlphaServer 8400 Model
5/300, yes
    Allocation class               1

  Volume status:  no-unload on dismount, odd parity.

Errors in Node 1:
$ sh err
Device                           Error Count
PEA0:                                  205

Now, the customer has two ethernet connections availables, in EWC and EWD


VMScluster data structures
--------------------------

                  --- Port Descriptor Table (PDT) 8251AE88 ---

Type: 03 pe
Characteristics: 0000

Msg Header Size           32  Flags               0000  Port Map        00000000
Max Xfer Bcnt       FFFFFFFF  Counter CDRP    00000000
Poller Sweep              30  Load Vector     8246B04C
Fork Block W.Q.     8251AF60  Load Class            10
UCB Address         8251AA80  Connection W.Q. 82537654
ADP Address         00000000  Yellow Q.       8251AFB8
Max VC timeout            16  Red Q.          8251AFC0
SCS Version                2  Disabled Q.     82484DA0
--------------------------

                 --- Port Block 8251B840 ---

Status: 0001 authorize
VC Count: 2
Secs Since Last Zeroed: 34689

SBUF Size             484     LBUF Size         1816     Fork Count     8211851
SBUF Count             10     LBUF Count           1     Refork Count         8
SBUF Max              768     LBUF Max           384     Last Refork   00215EF9
SBUF Quo               10     LBUF Quo             1     SCS Messages   7848730
SBUF Miss             121     LBUF Miss        28322     VC Queue Cnt    407722
SBUF Allocs       6275760     LBUF Allocs     436682     TQE Received    346898
SBUFs In Use            0     LBUFs In Use         0     Timer Done      346898
Peak SBUF In Use       29     Peak LBUF In Use    23     RWAITQ Count     80462
SBUF Queue Empty        0     LBUF Queue Empty     0     LDL Buf/Msg      14807
TR SBUF Queue Empty     0     Ticks/Second        10     ACK Delay      1000000
No SBUF for ACK         0     Listen Timeout       8     Hello Interval      30
VMScluster data structures
--------------------------
Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
82465000  LCL  00-00-00-00-00-00           0
8255BD80  EWC  00-00-F8-31-1C-78        3924  0000204C  16-MAY-1997 12:02:00.33
84ADD6C0  EWD  00-00-F8-31-12-00          61  0000204C  16-MAY-1997 12:01:36.31
                 --- Virtual Circuit (VC) Summary ---

VC Addr     Node    SCS ID  Lcl ID    Status Summary        Last Event Time
--------  --------  ------  ------  -----------------   -----------------------
82592740  BUNNY       1055  223/DF  open,path           16-MAY-1997 03:54:05.78
826D3800  DUFFY       1054  222/DE  open,path           16-MAY-1997 12:02:01.00

SDA> show port/add=8251AE88/vc=826D3800/chann/dev
VMScluster data structures
--------------------------

                  --- Port Descriptor Table (PDT) 8251AE88 ---

Type: 03 pe
Characteristics: 0000

Msg Header Size           32  Flags               0000  Port Map        00000000
Max Xfer Bcnt       FFFFFFFF  Counter CDRP    00000000
Poller Sweep              30  Load Vector     8246B04C
Fork Block W.Q.     8251AF60  Load Class            10
UCB Address         8251AA80  Connection W.Q. 82537654
ADP Address         00000000  Yellow Q.       8251AFB8
Max VC timeout            16  Red Q.          8251AFC0
SCS Version                2  Disabled Q.     82484DA0
 
                 --- Port Block 8251B840 ---

Status: 0001 authorize
VC Count: 2
Secs Since Last Zeroed: 34787

SBUF Size             484     LBUF Size         1816     Fork Count     8251707
SBUF Count             10     LBUF Count           1     Refork Count         8
SBUF Max              768     LBUF Max           384     Last Refork   00215EF9
SBUF Quo               10     LBUF Quo             1     SCS Messages   7888695
SBUF Miss             121     LBUF Miss        28322     VC Queue Cnt    407921
SBUF Allocs       6304893     LBUF Allocs     436682     TQE Received    347879
SBUFs In Use            0     LBUFs In Use         0     Timer Done      347879
Peak SBUF In Use       29     Peak LBUF In Use    23     RWAITQ Count     80464
SBUF Queue Empty        0     LBUF Queue Empty     0     LDL Buf/Msg      14849
TR SBUF Queue Empty     0     Ticks/Second        10     ACK Delay      1000000
No SBUF for ACK         0     Listen Timeout       8     Hello Interval      30
Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
82465000  LCL  00-00-00-00-00-00           0
8255BD80  EWC  00-00-F8-31-1C-78        3924  0000204C  16-MAY-1997 12:02:00.33
84ADD6C0  EWD  00-00-F8-31-12-00          61  0000204C  16-MAY-1997 12:01:36.31

                 --- Virtual Circuit (VC) 826D3800 ---
Remote System Name:  DUFFY  (1:ALPHA)   Remote SCSSYSTEMID:  1054
Local System ID:  222 (DE)              Status: 0005 open,path
------ Transmit -------  ------ VC Closures ----  ---- Congestion Control ----
Msg Xmt         6766349  SeqMsg TMO          200  Pipe Quota/Slo/Max  16/ 8/16
  Unsequence        403  CC DFQ Empty          0  Pipe Quota Reached     17750
  Sequence      6444945  Topology Change       0  Xmt C/T               0/1024
  ReXmt        851/7046  NPAGEDYN Low          0  RndTrp uS          4363+3095
  Lone ACK       320150                           UnAcked Msgs               1
Bytes Xmt    1642636555                           CMD Queue Len/Max       0/31
------- Receive -------  - Messages  Discarded -  ----- Channel Selection ----
Msg Rcv         6803466  No Xmt Chan           0  Preferred Channel   84C0DA00
  Unsequence        403  Rcv Short Msg         0  Delay Time          00F2CE4A
  Sequence      6669065  Illegal Seq Msg     482  Buffer Size             1412
  ReRcv             752  Bad Checksum          0  Channel Count              2
  Lone ACK       133162  TR DFQ Empty          0  Channel Selections      6961
  Cache             285  TR MFQ Empty          0  Protocol               1.4.0
  Ill ACK             0  CC MFQ Empty          0  Open 16-MAY-1997 12:02:01.00
Bytes Rcv    2079725596  Cache Miss            0  Cls  16-MAY-1997 12:01:53.36

 -- Preferred Channel (CH:84C0DA00) for Virtual Circuit (VC:826D3800) DUFFY  --
State: 0004 open                Status: 0B path,open,rmt_hwa_valid
BUS: 84ADD6C0  (EWD)  Lcl Device: EW_TULIP  Lcl LAN Address: 00-00-F8-31-12-00
Rmt Name: EWD         Rmt Device: EW_TULIP  Rmt LAN Address: 00-00-F8-31-16-3C
Rmt Seq #: 001A   Open:16-MAY-1997 12:01:54.76  Closed:16-MAY-1997 12:01:53.36
------- Transmit ------  ------- Receive -------  ----- Channel Selection ----
Lcl CH Seq #       001A  Msg Rcv          792478  Average Xmt Time    00F2CF62
Msg Xmt          673621    Mcast Msgs       2958  Remote Buffer Size      1412
  Ctrl Msgs          69    Mcast Bytes    289884  Max Buffer Size         1412
  Ctrl Bytes       6762    Ctrl Msgs          30  Best Channel             601
Bytes Xmt     115673308    Ctrl Bytes       2940  Preferred Channel        725
Rmt Ring Size        16  Bytes Rcv     134334772  Retransmit Penalty       266
---------------  Channel Errors  ---------------  Xmt Error Penalty          0
Handshake TMO         0  Short CC Msgs         0  ------- Channel Timer ------
Listen TMO            0  Incompat Chan         0  Timer Entry Flink   9431F074
Bad Authorize         0  No MSCP Srvr          0              Blink   9431F074
Bad ECO               0  Disk Not Srvd         0  Last Ring Index           11
Bad Multicast         0  Old TR Msgs           0  Protocol               1.4.0
Topology Change       0                           Supported Services  00000000
 -- Active Channel (CH:82741E00) for Virtual Circuit (VC:826D3800) DUFFY  --
State: 0004 open                Status: 0B path,open,rmt_hwa_valid
BUS: 8255BD80  (EWC)  Lcl Device: EW_TULIP  Lcl LAN Address: 00-00-F8-31-1C-78
Rmt Name: EWC         Rmt Device: EW_TULIP  Rmt LAN Address: 00-00-F8-31-15-71
Rmt Seq #: 00C9   Open:16-MAY-1997 12:02:00.95  Closed:16-MAY-1997 12:01:53.36
------- Transmit ------  ------- Receive -------  ----- Channel Selection ----
Lcl CH Seq #       00C9  Msg Rcv         4727525  Average Xmt Time    00F2CFC6
Msg Xmt         4216468    Mcast Msgs      15004  Remote Buffer Size      1412
  Ctrl Msgs         752    Mcast Bytes   1470392  Max Buffer Size         1412
  Ctrl Bytes      73696    Ctrl Msgs         266  Best Channel            2813
Bytes Xmt    1182346528    Ctrl Bytes      26068  Preferred Channel       3456
Rmt Ring Size        16  Bytes Rcv    1712190923  Retransmit Penalty      4381
---------------  Channel Errors  ---------------  Xmt Error Penalty          0
Handshake TMO         0  Short CC Msgs         0  ------- Channel Timer ------
Listen TMO            0  Incompat Chan         0  Timer Entry Flink   9431F014
Bad Authorize         0  No MSCP Srvr          0              Blink   8252BF00
Bad ECO               0  Disk Not Srvd         0  Last Ring Index           05
Bad Multicast         0  Old TR Msgs           0  Protocol               1.4.0
Topology Change       0                           Supported Services  00000000

LAN Data Structures
-------------------
              -- EWC Counters Information 16-MAY-1997 13:47:33 --

Octets received           1798172886    Octets sent               1262759314
PDUs received                4778978    PDUs sent                    4296028
Mcast octets received        1732892    Mcast octets sent            1692454
Mcast PDUs received            15202    Mcast PDUs sent                14821
Unrec indiv dest PDUs              0    PDUs sent, deferred            50763
Unrec mcast dest PDUs              0    PDUs sent, one coll             4686
Data overruns                      0    PDUs sent, mul coll             4846
Unavail station buffs              0    Excessive collisions               0
Unavail user buffers               0    Late collisions                    0
CRC errors                         0    Carrier check failure           3924
Alignment errors                   0    Last carrier failure 16-MAY 12:02:00
Rcv data length err                0    Coll detect chk fail               0
Frame size errors                  0    Short circuit failure              0
Frames too long                    0    Open circuit failure               0
Seconds since zeroed           35573    Transmits too long                 0
Station failures                   0    Send data length err               0
           -- EWC Counters Information (cont) 16-MAY-1997 13:47:33 --

No work transmits             384321    Ring avail transitions             0
Buffer_Addr transmits              0    Ring unavail transitions           0
SVAPTE/BOFF transmits              0    Loopback sent                      0
Global page transmits              0    System ID sent                   118
Bad PTE transmits                  0    ReqCounters sent                   0
Restart pending counter            0    Internal counters size            88
+00 Transmit underflows            0    +2C LW align (map buffer)          0
+04 Transmit length err            0    +30 LCarrier workarounds           0
+08 Receive overflows              0    +34 Xmt 2-adr requests        158279
+0C Receive collision              0    +38 Device interrupts        9091644
+10 Device startups              127    +3C BNC selections done            0
+14 Setup buffers                128    +40 FDX selection changes          0
+18 CSR6 changes                 254    +44 Init nomap/map          00010000
+1C PTEtoPFN translations          0    +48 Xmt segments mapped        10083
+20 CSR13 (AUI=bit3 set)H   FFFFFFFF    +4C Rcv buffers mapped             0
+24 LW align (1 segment)           0    +50 Soft errors handled          126
+28 LW align (2 segment)           0    +54 NWAY resets handled            0
         -- EWC1 60-07 (SCA) Counters Information 16-MAY-1997 13:48:34 --

Octets received           1790288142    Octets sent               1324649660
PDUs received                4792340    PDUs sent                    4304742
Mcast octets received        1705312    Mcast octets sent            1884928
Mcast PDUs received            15226    Mcast PDUs sent                14726
Unavail user buffer                0    Multicast not enabled              0
Last UUB time                   None    User buffer too small              0
              -- EWD Counters Information 16-MAY-1997 13:49:14 --

Octets received            159359849    Octets sent                134448240
PDUs received                 858901    PDUs sent                     717848
Mcast octets received         364914    Mcast octets sent             366388
Mcast PDUs received             3201    Mcast PDUs sent                 3208
Unrec indiv dest PDUs              0    PDUs sent, deferred             1841
Unrec mcast dest PDUs              0    PDUs sent, one coll               72
Data overruns                      0    PDUs sent, mul coll               60
Unavail station buffs              0    Excessive collisions               0
Unavail user buffers               0    Late collisions                    0
CRC errors                         0    Carrier check failure         235986
Alignment errors                   0    Last carrier failure 16-MAY 12:01:36
Rcv data length err                0    Coll detect chk fail               0
Frame size errors                  0    Short circuit failure              0
Frames too long                    0    Open circuit failure               0
Seconds since zeroed           35674    Transmits too long                 0
Station failures                   0    Send data length err               0
           -- EWD Counters Information (cont) 16-MAY-1997 13:49:14 --

No work transmits               2672    Ring avail transitions             0
Buffer_Addr transmits              0    Ring unavail transitions           0
SVAPTE/BOFF transmits              0    Loopback sent                      0
Global page transmits              0    System ID sent                   118
Bad PTE transmits                  0    ReqCounters sent                   0
Restart pending counter            0    Internal counters size            88
+00 Transmit underflows            0    +2C LW align (map buffer)          0
+04 Transmit length err            0    +30 LCarrier workarounds           0
+08 Receive overflows              0    +34 Xmt 2-adr requests           179
+0C Receive collision              0    +38 Device interrupts        1815879
+10 Device startups                8    +3C BNC selections done            0
+14 Setup buffers                 11    +40 FDX selection changes          0
+18 CSR6 changes                  16    +44 Init nomap/map          00010000
+1C PTEtoPFN translations          0    +48 Xmt segments mapped          627
+20 CSR13 (AUI=bit3 set)H   FFFFFFFF    +4C Rcv buffers mapped             0
+24 LW align (1 segment)           0    +50 Soft errors handled            7
+28 LW align (2 segment)           0    +54 NWAY resets handled            0
         -- EWD4 60-07 (SCA) Counters Information 16-MAY-1997 13:50:10 --

Octets received            158574793    Octets sent                145314215
PDUs received                 864042    PDUs sent                     721934
Mcast octets received         361200    Mcast octets sent             410368
Mcast PDUs received             3225    Mcast PDUs sent                 3206
Unavail user buffer                0    Multicast not enabled              0
Last UUB time                   None    User buffer too small              0


I have connected EWD with a cable at 11:30.

I think that is posible a saturation of Ethernet interfaces.

Please tell me anything,
If you need more information tell me.

Thanks and regards
Goyo Fdez
  
    
T.RTitleUserPersonal
Name
DateLines
5311.1A TurboLaser BACKUP Over Ethernet? Ye-hah!XDELTA::HOFFMANSteve, OpenVMS EngineeringFri May 16 1997 12:2030
   If the customer is using a VMScluster with multi-host SCSI, this
   indicates multiple Alpha systems are in use.  And all available
   Alpha systems have network controllers that can each easily operate
   an Ethernet network right at the rated per-node performance limit.

   When someone here in OpenVMS Engineering mistakenly starts using
   a local Ethernet LAN segment heavily, performance seen by all users
   on that segment suffers, and users see noticable delays in DECnet
   activity due to retransmits.

   At least one of the systems is an AlphaServer 8400.  You will want
   a faster multi-access interconnect network for this VMScluster than
   Ethernet -- memory channel, FDDI, fast Ethernet, CI, etc.  (Classic
   Ethernet is nearly insufficient for a quiescent TurboLaser.  :-)

   One can look at the deferrals counters in SCS or in DECnet, but one
   needs to zero them to make sure they reflect the current situation,
   and not some historical event.  I strongly suspect you'll find the
   counters will start to climb rapidly when you start using the Ethernet
   for a TurboLaser BACKUP...

   It is also possible that there is some other problem on the LAN
   segment or the local Ethernet; the error log entries would be
   interesting.  But -- given the high I/O load -- I'd expect this
   was due simply to overloading the segment. 

	--

   Beware: tape devices are not allowed on shared SCSI busses.
5311.2 100 Line speed (megabits/second)LATINA::GREGORIOMCS MadridMon May 19 1997 07:13106
Hello,


There are two Alphaserver 8400 with five ethernet boards.


	The customer has two direct connection with de500-AX:

	------- Fast Ethernet     -------
ewc	|     |-------------------|     |
	|     |                   |     |
	|     | Fast Ethernet     |     |
ewd	|     |-------------------|     |
	|     |                   |     |
	-------                   -------


	In these interfaces there are only SCA communication Software.

	The interfaces are communicated by one cable without other elements.

LANCP> sh dev ewc/para

Device Parameters EWC0:
             Value  Parameter
             -----  ---------
            Normal  Controller mode
          External  Internal loopback mode
 00-00-F8-31-1C-78  Hardware LAN address
           CSMA/CD  Communication medium
                32  Minimum receive buffers
                32  Maximum receive buffers
                No  Full duplex enable
                No  Full duplex operational
       TwistedPair  Line media
               100  Line speed (megabits/second)
LANCP> sh dev ewd/para

Device Parameters EWD0:
             Value  Parameter
             -----  ---------
            Normal  Controller mode
          External  Internal loopback mode
 00-00-F8-31-12-00  Hardware LAN address
           CSMA/CD  Communication medium
                32  Minimum receive buffers
                32  Maximum receive buffers
                No  Full duplex enable
                No  Full duplex operational
       TwistedPair  Line media
               100  Line speed (megabits/second)

The connections have 100 megabits/second, is it posible a saturation?

When we execute a backup throut ethernet, we have this error and the system, 
works very very sloww:
 ERROR SEQUENCE 5830.                            LOGGED ON:  CPU_TYPE 00000007
 DATE/TIME 16-MAY-1997 01:18:52.89                            SYS_TYPE 0000000C
 SYSTEM UPTIME: 0 DAYS 00:20:01
 SCS NODE: BUNNY                                            OpenVMS AXP V6.2-1H3

 HW_MODEL: 00000621 Hardware Model = 1569.

 ERL$LOGMESSAGE AlphaServer 8400 Model EV56/440

 NI-SCS SUB-SYSTEM, _BUNNY$PEA0:

       PORT HAS CLOSED VIRTUAL CIRCUIT

       LOCAL STATION ADDRESS, FFFFFFFFFF00(X)
       LOCAL SYSTEM ID, 00000000041F(X)

       REMOTE STATION ADDRESS, 0000000000DE(X)
       REMOTE SYSTEM ID, 00000000041E(X)

       UCB$L_ERTCNT    00000032
                                       50. RETRIES REMAINING
       UCB$L_ERTMAX    00000032
                                       50. RETRIES ALLOWABLE
       UCB$L_ERRCNT    00000001
                                          1. ERRORS THIS UNIT
       PPD$B_PORT            00
                                       REMOTE NODE # 0.
       PPD$B_STATUS          00
       PPD$B_OPC             00
                                       UNKNOWN OPCODE
       PPD$B_FLAGS           00



The customer makes the buckup command with /blocksize=65534

$  mount /foreign /cache=tape_data/blocksize=65534/noassist  'p22'
$cualifiers:=/SAVE_SET/OWNER=ORIGINAL/IGNORE=(INTERLOCK,LABEL)/blocksize=65534

In July, the customer update these systems to 7.1 and install a memory channel,
but today he need this backup throut ethernet.

Please tell me anything,


Thanks and regards.
Goyo Fdez.

    
5311.3UTRTSC::16.198.64.201::JurVanDerBurgChange mode to Panic!Mon May 19 1997 09:285
The counters indicate a lot of "Carrier check failures" which may be
due to hardware problems.

Jur.

5311.4"Carrier check failures"LATINA::GREGORIOMCS MadridMon May 19 1997 10:4410
    Hello Jur,
    
    The "Carrier check failures" are due to loss connection in the cable.
    
    I selected these interfaces because they aren't used. And I connect
    these boards throut a cable.
    
    Thanks And Regards
    Goyo Fdez.