[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
|
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
5311.0. "SEVERAL ERRORS IN PEA0 in a SCSI Openvms cluster" by LATINA::GREGORIO (MCS Madrid) Fri May 16 1997 09:07
Hello,
I have a customer with a SCSI OpenVMS Cluster in 6.2-1h3.
When he makes a backup from node 1 to a tape of node 2, he
obtains several errors in PEA0 and the node 1 works very slow.
Node 1: BUNNY
Node 2: DUFFY
Tape of node 2.
Magtape $1$MKF500: (DUFFY), device type TZ88, is online, record-oriented device,
file-oriented device, available to cluster, error logging is enabled,
controller supports compaction (compaction enabled,).
Error count 0 Operations completed 13120
Owner process "" Owner UIC [SYSTEST,SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 0 Default buffer size 512
Density unknown Format Normal-11
Host name "DUFFY" Host type, avail AlphaServer 8400 Model
5/300, yes
Allocation class 1
Volume status: no-unload on dismount, odd parity.
Errors in Node 1:
$ sh err
Device Error Count
PEA0: 205
Now, the customer has two ethernet connections availables, in EWC and EWD
VMScluster data structures
--------------------------
--- Port Descriptor Table (PDT) 8251AE88 ---
Type: 03 pe
Characteristics: 0000
Msg Header Size 32 Flags 0000 Port Map 00000000
Max Xfer Bcnt FFFFFFFF Counter CDRP 00000000
Poller Sweep 30 Load Vector 8246B04C
Fork Block W.Q. 8251AF60 Load Class 10
UCB Address 8251AA80 Connection W.Q. 82537654
ADP Address 00000000 Yellow Q. 8251AFB8
Max VC timeout 16 Red Q. 8251AFC0
SCS Version 2 Disabled Q. 82484DA0
--------------------------
--- Port Block 8251B840 ---
Status: 0001 authorize
VC Count: 2
Secs Since Last Zeroed: 34689
SBUF Size 484 LBUF Size 1816 Fork Count 8211851
SBUF Count 10 LBUF Count 1 Refork Count 8
SBUF Max 768 LBUF Max 384 Last Refork 00215EF9
SBUF Quo 10 LBUF Quo 1 SCS Messages 7848730
SBUF Miss 121 LBUF Miss 28322 VC Queue Cnt 407722
SBUF Allocs 6275760 LBUF Allocs 436682 TQE Received 346898
SBUFs In Use 0 LBUFs In Use 0 Timer Done 346898
Peak SBUF In Use 29 Peak LBUF In Use 23 RWAITQ Count 80462
SBUF Queue Empty 0 LBUF Queue Empty 0 LDL Buf/Msg 14807
TR SBUF Queue Empty 0 Ticks/Second 10 ACK Delay 1000000
No SBUF for ACK 0 Listen Timeout 8 Hello Interval 30
VMScluster data structures
--------------------------
Bus Addr Bus LAN Address Error Count Last Error Time of Last Error
-------- --- ----------------- ----------- ---------- -----------------------
82465000 LCL 00-00-00-00-00-00 0
8255BD80 EWC 00-00-F8-31-1C-78 3924 0000204C 16-MAY-1997 12:02:00.33
84ADD6C0 EWD 00-00-F8-31-12-00 61 0000204C 16-MAY-1997 12:01:36.31
--- Virtual Circuit (VC) Summary ---
VC Addr Node SCS ID Lcl ID Status Summary Last Event Time
-------- -------- ------ ------ ----------------- -----------------------
82592740 BUNNY 1055 223/DF open,path 16-MAY-1997 03:54:05.78
826D3800 DUFFY 1054 222/DE open,path 16-MAY-1997 12:02:01.00
SDA> show port/add=8251AE88/vc=826D3800/chann/dev
VMScluster data structures
--------------------------
--- Port Descriptor Table (PDT) 8251AE88 ---
Type: 03 pe
Characteristics: 0000
Msg Header Size 32 Flags 0000 Port Map 00000000
Max Xfer Bcnt FFFFFFFF Counter CDRP 00000000
Poller Sweep 30 Load Vector 8246B04C
Fork Block W.Q. 8251AF60 Load Class 10
UCB Address 8251AA80 Connection W.Q. 82537654
ADP Address 00000000 Yellow Q. 8251AFB8
Max VC timeout 16 Red Q. 8251AFC0
SCS Version 2 Disabled Q. 82484DA0
--- Port Block 8251B840 ---
Status: 0001 authorize
VC Count: 2
Secs Since Last Zeroed: 34787
SBUF Size 484 LBUF Size 1816 Fork Count 8251707
SBUF Count 10 LBUF Count 1 Refork Count 8
SBUF Max 768 LBUF Max 384 Last Refork 00215EF9
SBUF Quo 10 LBUF Quo 1 SCS Messages 7888695
SBUF Miss 121 LBUF Miss 28322 VC Queue Cnt 407921
SBUF Allocs 6304893 LBUF Allocs 436682 TQE Received 347879
SBUFs In Use 0 LBUFs In Use 0 Timer Done 347879
Peak SBUF In Use 29 Peak LBUF In Use 23 RWAITQ Count 80464
SBUF Queue Empty 0 LBUF Queue Empty 0 LDL Buf/Msg 14849
TR SBUF Queue Empty 0 Ticks/Second 10 ACK Delay 1000000
No SBUF for ACK 0 Listen Timeout 8 Hello Interval 30
Bus Addr Bus LAN Address Error Count Last Error Time of Last Error
-------- --- ----------------- ----------- ---------- -----------------------
82465000 LCL 00-00-00-00-00-00 0
8255BD80 EWC 00-00-F8-31-1C-78 3924 0000204C 16-MAY-1997 12:02:00.33
84ADD6C0 EWD 00-00-F8-31-12-00 61 0000204C 16-MAY-1997 12:01:36.31
--- Virtual Circuit (VC) 826D3800 ---
Remote System Name: DUFFY (1:ALPHA) Remote SCSSYSTEMID: 1054
Local System ID: 222 (DE) Status: 0005 open,path
------ Transmit ------- ------ VC Closures ---- ---- Congestion Control ----
Msg Xmt 6766349 SeqMsg TMO 200 Pipe Quota/Slo/Max 16/ 8/16
Unsequence 403 CC DFQ Empty 0 Pipe Quota Reached 17750
Sequence 6444945 Topology Change 0 Xmt C/T 0/1024
ReXmt 851/7046 NPAGEDYN Low 0 RndTrp uS 4363+3095
Lone ACK 320150 UnAcked Msgs 1
Bytes Xmt 1642636555 CMD Queue Len/Max 0/31
------- Receive ------- - Messages Discarded - ----- Channel Selection ----
Msg Rcv 6803466 No Xmt Chan 0 Preferred Channel 84C0DA00
Unsequence 403 Rcv Short Msg 0 Delay Time 00F2CE4A
Sequence 6669065 Illegal Seq Msg 482 Buffer Size 1412
ReRcv 752 Bad Checksum 0 Channel Count 2
Lone ACK 133162 TR DFQ Empty 0 Channel Selections 6961
Cache 285 TR MFQ Empty 0 Protocol 1.4.0
Ill ACK 0 CC MFQ Empty 0 Open 16-MAY-1997 12:02:01.00
Bytes Rcv 2079725596 Cache Miss 0 Cls 16-MAY-1997 12:01:53.36
-- Preferred Channel (CH:84C0DA00) for Virtual Circuit (VC:826D3800) DUFFY --
State: 0004 open Status: 0B path,open,rmt_hwa_valid
BUS: 84ADD6C0 (EWD) Lcl Device: EW_TULIP Lcl LAN Address: 00-00-F8-31-12-00
Rmt Name: EWD Rmt Device: EW_TULIP Rmt LAN Address: 00-00-F8-31-16-3C
Rmt Seq #: 001A Open:16-MAY-1997 12:01:54.76 Closed:16-MAY-1997 12:01:53.36
------- Transmit ------ ------- Receive ------- ----- Channel Selection ----
Lcl CH Seq # 001A Msg Rcv 792478 Average Xmt Time 00F2CF62
Msg Xmt 673621 Mcast Msgs 2958 Remote Buffer Size 1412
Ctrl Msgs 69 Mcast Bytes 289884 Max Buffer Size 1412
Ctrl Bytes 6762 Ctrl Msgs 30 Best Channel 601
Bytes Xmt 115673308 Ctrl Bytes 2940 Preferred Channel 725
Rmt Ring Size 16 Bytes Rcv 134334772 Retransmit Penalty 266
--------------- Channel Errors --------------- Xmt Error Penalty 0
Handshake TMO 0 Short CC Msgs 0 ------- Channel Timer ------
Listen TMO 0 Incompat Chan 0 Timer Entry Flink 9431F074
Bad Authorize 0 No MSCP Srvr 0 Blink 9431F074
Bad ECO 0 Disk Not Srvd 0 Last Ring Index 11
Bad Multicast 0 Old TR Msgs 0 Protocol 1.4.0
Topology Change 0 Supported Services 00000000
-- Active Channel (CH:82741E00) for Virtual Circuit (VC:826D3800) DUFFY --
State: 0004 open Status: 0B path,open,rmt_hwa_valid
BUS: 8255BD80 (EWC) Lcl Device: EW_TULIP Lcl LAN Address: 00-00-F8-31-1C-78
Rmt Name: EWC Rmt Device: EW_TULIP Rmt LAN Address: 00-00-F8-31-15-71
Rmt Seq #: 00C9 Open:16-MAY-1997 12:02:00.95 Closed:16-MAY-1997 12:01:53.36
------- Transmit ------ ------- Receive ------- ----- Channel Selection ----
Lcl CH Seq # 00C9 Msg Rcv 4727525 Average Xmt Time 00F2CFC6
Msg Xmt 4216468 Mcast Msgs 15004 Remote Buffer Size 1412
Ctrl Msgs 752 Mcast Bytes 1470392 Max Buffer Size 1412
Ctrl Bytes 73696 Ctrl Msgs 266 Best Channel 2813
Bytes Xmt 1182346528 Ctrl Bytes 26068 Preferred Channel 3456
Rmt Ring Size 16 Bytes Rcv 1712190923 Retransmit Penalty 4381
--------------- Channel Errors --------------- Xmt Error Penalty 0
Handshake TMO 0 Short CC Msgs 0 ------- Channel Timer ------
Listen TMO 0 Incompat Chan 0 Timer Entry Flink 9431F014
Bad Authorize 0 No MSCP Srvr 0 Blink 8252BF00
Bad ECO 0 Disk Not Srvd 0 Last Ring Index 05
Bad Multicast 0 Old TR Msgs 0 Protocol 1.4.0
Topology Change 0 Supported Services 00000000
LAN Data Structures
-------------------
-- EWC Counters Information 16-MAY-1997 13:47:33 --
Octets received 1798172886 Octets sent 1262759314
PDUs received 4778978 PDUs sent 4296028
Mcast octets received 1732892 Mcast octets sent 1692454
Mcast PDUs received 15202 Mcast PDUs sent 14821
Unrec indiv dest PDUs 0 PDUs sent, deferred 50763
Unrec mcast dest PDUs 0 PDUs sent, one coll 4686
Data overruns 0 PDUs sent, mul coll 4846
Unavail station buffs 0 Excessive collisions 0
Unavail user buffers 0 Late collisions 0
CRC errors 0 Carrier check failure 3924
Alignment errors 0 Last carrier failure 16-MAY 12:02:00
Rcv data length err 0 Coll detect chk fail 0
Frame size errors 0 Short circuit failure 0
Frames too long 0 Open circuit failure 0
Seconds since zeroed 35573 Transmits too long 0
Station failures 0 Send data length err 0
-- EWC Counters Information (cont) 16-MAY-1997 13:47:33 --
No work transmits 384321 Ring avail transitions 0
Buffer_Addr transmits 0 Ring unavail transitions 0
SVAPTE/BOFF transmits 0 Loopback sent 0
Global page transmits 0 System ID sent 118
Bad PTE transmits 0 ReqCounters sent 0
Restart pending counter 0 Internal counters size 88
+00 Transmit underflows 0 +2C LW align (map buffer) 0
+04 Transmit length err 0 +30 LCarrier workarounds 0
+08 Receive overflows 0 +34 Xmt 2-adr requests 158279
+0C Receive collision 0 +38 Device interrupts 9091644
+10 Device startups 127 +3C BNC selections done 0
+14 Setup buffers 128 +40 FDX selection changes 0
+18 CSR6 changes 254 +44 Init nomap/map 00010000
+1C PTEtoPFN translations 0 +48 Xmt segments mapped 10083
+20 CSR13 (AUI=bit3 set)H FFFFFFFF +4C Rcv buffers mapped 0
+24 LW align (1 segment) 0 +50 Soft errors handled 126
+28 LW align (2 segment) 0 +54 NWAY resets handled 0
-- EWC1 60-07 (SCA) Counters Information 16-MAY-1997 13:48:34 --
Octets received 1790288142 Octets sent 1324649660
PDUs received 4792340 PDUs sent 4304742
Mcast octets received 1705312 Mcast octets sent 1884928
Mcast PDUs received 15226 Mcast PDUs sent 14726
Unavail user buffer 0 Multicast not enabled 0
Last UUB time None User buffer too small 0
-- EWD Counters Information 16-MAY-1997 13:49:14 --
Octets received 159359849 Octets sent 134448240
PDUs received 858901 PDUs sent 717848
Mcast octets received 364914 Mcast octets sent 366388
Mcast PDUs received 3201 Mcast PDUs sent 3208
Unrec indiv dest PDUs 0 PDUs sent, deferred 1841
Unrec mcast dest PDUs 0 PDUs sent, one coll 72
Data overruns 0 PDUs sent, mul coll 60
Unavail station buffs 0 Excessive collisions 0
Unavail user buffers 0 Late collisions 0
CRC errors 0 Carrier check failure 235986
Alignment errors 0 Last carrier failure 16-MAY 12:01:36
Rcv data length err 0 Coll detect chk fail 0
Frame size errors 0 Short circuit failure 0
Frames too long 0 Open circuit failure 0
Seconds since zeroed 35674 Transmits too long 0
Station failures 0 Send data length err 0
-- EWD Counters Information (cont) 16-MAY-1997 13:49:14 --
No work transmits 2672 Ring avail transitions 0
Buffer_Addr transmits 0 Ring unavail transitions 0
SVAPTE/BOFF transmits 0 Loopback sent 0
Global page transmits 0 System ID sent 118
Bad PTE transmits 0 ReqCounters sent 0
Restart pending counter 0 Internal counters size 88
+00 Transmit underflows 0 +2C LW align (map buffer) 0
+04 Transmit length err 0 +30 LCarrier workarounds 0
+08 Receive overflows 0 +34 Xmt 2-adr requests 179
+0C Receive collision 0 +38 Device interrupts 1815879
+10 Device startups 8 +3C BNC selections done 0
+14 Setup buffers 11 +40 FDX selection changes 0
+18 CSR6 changes 16 +44 Init nomap/map 00010000
+1C PTEtoPFN translations 0 +48 Xmt segments mapped 627
+20 CSR13 (AUI=bit3 set)H FFFFFFFF +4C Rcv buffers mapped 0
+24 LW align (1 segment) 0 +50 Soft errors handled 7
+28 LW align (2 segment) 0 +54 NWAY resets handled 0
-- EWD4 60-07 (SCA) Counters Information 16-MAY-1997 13:50:10 --
Octets received 158574793 Octets sent 145314215
PDUs received 864042 PDUs sent 721934
Mcast octets received 361200 Mcast octets sent 410368
Mcast PDUs received 3225 Mcast PDUs sent 3206
Unavail user buffer 0 Multicast not enabled 0
Last UUB time None User buffer too small 0
I have connected EWD with a cable at 11:30.
I think that is posible a saturation of Ethernet interfaces.
Please tell me anything,
If you need more information tell me.
Thanks and regards
Goyo Fdez
T.R | Title | User | Personal Name | Date | Lines |
---|
5311.1 | A TurboLaser BACKUP Over Ethernet? Ye-hah! | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri May 16 1997 12:20 | 30 |
|
If the customer is using a VMScluster with multi-host SCSI, this
indicates multiple Alpha systems are in use. And all available
Alpha systems have network controllers that can each easily operate
an Ethernet network right at the rated per-node performance limit.
When someone here in OpenVMS Engineering mistakenly starts using
a local Ethernet LAN segment heavily, performance seen by all users
on that segment suffers, and users see noticable delays in DECnet
activity due to retransmits.
At least one of the systems is an AlphaServer 8400. You will want
a faster multi-access interconnect network for this VMScluster than
Ethernet -- memory channel, FDDI, fast Ethernet, CI, etc. (Classic
Ethernet is nearly insufficient for a quiescent TurboLaser. :-)
One can look at the deferrals counters in SCS or in DECnet, but one
needs to zero them to make sure they reflect the current situation,
and not some historical event. I strongly suspect you'll find the
counters will start to climb rapidly when you start using the Ethernet
for a TurboLaser BACKUP...
It is also possible that there is some other problem on the LAN
segment or the local Ethernet; the error log entries would be
interesting. But -- given the high I/O load -- I'd expect this
was due simply to overloading the segment.
--
Beware: tape devices are not allowed on shared SCSI busses.
|
5311.2 | 100 Line speed (megabits/second) | LATINA::GREGORIO | MCS Madrid | Mon May 19 1997 07:13 | 106 |
|
Hello,
There are two Alphaserver 8400 with five ethernet boards.
The customer has two direct connection with de500-AX:
------- Fast Ethernet -------
ewc | |-------------------| |
| | | |
| | Fast Ethernet | |
ewd | |-------------------| |
| | | |
------- -------
In these interfaces there are only SCA communication Software.
The interfaces are communicated by one cable without other elements.
LANCP> sh dev ewc/para
Device Parameters EWC0:
Value Parameter
----- ---------
Normal Controller mode
External Internal loopback mode
00-00-F8-31-1C-78 Hardware LAN address
CSMA/CD Communication medium
32 Minimum receive buffers
32 Maximum receive buffers
No Full duplex enable
No Full duplex operational
TwistedPair Line media
100 Line speed (megabits/second)
LANCP> sh dev ewd/para
Device Parameters EWD0:
Value Parameter
----- ---------
Normal Controller mode
External Internal loopback mode
00-00-F8-31-12-00 Hardware LAN address
CSMA/CD Communication medium
32 Minimum receive buffers
32 Maximum receive buffers
No Full duplex enable
No Full duplex operational
TwistedPair Line media
100 Line speed (megabits/second)
The connections have 100 megabits/second, is it posible a saturation?
When we execute a backup throut ethernet, we have this error and the system,
works very very sloww:
ERROR SEQUENCE 5830. LOGGED ON: CPU_TYPE 00000007
DATE/TIME 16-MAY-1997 01:18:52.89 SYS_TYPE 0000000C
SYSTEM UPTIME: 0 DAYS 00:20:01
SCS NODE: BUNNY OpenVMS AXP V6.2-1H3
HW_MODEL: 00000621 Hardware Model = 1569.
ERL$LOGMESSAGE AlphaServer 8400 Model EV56/440
NI-SCS SUB-SYSTEM, _BUNNY$PEA0:
PORT HAS CLOSED VIRTUAL CIRCUIT
LOCAL STATION ADDRESS, FFFFFFFFFF00(X)
LOCAL SYSTEM ID, 00000000041F(X)
REMOTE STATION ADDRESS, 0000000000DE(X)
REMOTE SYSTEM ID, 00000000041E(X)
UCB$L_ERTCNT 00000032
50. RETRIES REMAINING
UCB$L_ERTMAX 00000032
50. RETRIES ALLOWABLE
UCB$L_ERRCNT 00000001
1. ERRORS THIS UNIT
PPD$B_PORT 00
REMOTE NODE # 0.
PPD$B_STATUS 00
PPD$B_OPC 00
UNKNOWN OPCODE
PPD$B_FLAGS 00
The customer makes the buckup command with /blocksize=65534
$ mount /foreign /cache=tape_data/blocksize=65534/noassist 'p22'
$cualifiers:=/SAVE_SET/OWNER=ORIGINAL/IGNORE=(INTERLOCK,LABEL)/blocksize=65534
In July, the customer update these systems to 7.1 and install a memory channel,
but today he need this backup throut ethernet.
Please tell me anything,
Thanks and regards.
Goyo Fdez.
|
5311.3 | | UTRTSC::16.198.64.201::JurVanDerBurg | Change mode to Panic! | Mon May 19 1997 09:28 | 5 |
| The counters indicate a lot of "Carrier check failures" which may be
due to hardware problems.
Jur.
|
5311.4 | "Carrier check failures" | LATINA::GREGORIO | MCS Madrid | Mon May 19 1997 10:44 | 10 |
| Hello Jur,
The "Carrier check failures" are due to loss connection in the cable.
I selected these interfaces because they aren't used. And I connect
these boards throut a cable.
Thanks And Regards
Goyo Fdez.
|