| I have read nearly half of this and it seems to have something to do
with tapes. I think!! In any case it's extremely interesting.
I think!! Isn't it? (TFB)
***************************************************************
Dear Colleagues....
As a result of a telesupport query, and some probing into the ol'
STARS database, I found the name of someone who seemed to know something about
it, so I asked him!....The result was a blown disk quota, so I thought I'd
share it with you!!
Please observe the point about CSSE needing to manage the restricted
release, and report all cases to Bob Brassard/George White.
Now read on!!
I have been discussing a problem with a field engineer, which appears
to have all the hallmarks of the scenario outlined in a STARS article, for
which various workarounds are indicated.
The "cluster" is a single 8800/cibci with hw/sw revision from
"show cluster"=80007, HSC RP_REV for HSC=236. CRONIC is 394, VMS is 4.7.
*(hsc=236 indicates K.CI u-code 54 which is latest following installation of the
L0107-YA FCO .)
The HSC supports 4 x RA82's, 1 x TA81, and 7 x S.I. 97C's on DEC
requestors.
The "problem" is two edged.
(1)The customer is complaining about long backup times.
(2)Two months ago the HSC started "dropping off line" (Online light going out,
all drive port lights going out, tape would rewind and restart backups)
After a short period it would recover the V.C. (online light came on again,
drive port lights came on again)
This has become more frequent, causing much loss of performance. This morning
there were six observed events in one hour, with no errlog/hsc-console reports.
There were no identifiable changes to either software or hardware corresponding
to the start of the problem, although I havent yet been able to rule out such
things as changes to backup command files or trainee operators, ...etc.
There are frequent indications of clock dropouts from the S.I. drives,
on the HSC console, and as HSC datagrams, but not corresponding in time to the
HSC events, and (as the STARS scenario suggests......)
***** No PAA0, or HSC errors on either VMS or HSC consoles. ****
NO ERRORLOG ENTRIES FOR THE HSC GOING AWAY!!
These S.I. dropouts were occuring before this problem started.
The ERROR threshold on the HSC was FATAL. This has been reset to "INFO"
and immediately gave "vc closure" events corresponding to the HSC online light
going out, however these didn't indicate the reason (e.g. VC closed due to
timeout of RTNDAT/CNF...etc).
I have a feeling that I may be missing out on some information which
may have already been published on this topic, if so I apologise. However, I
would be grateful if you could maybe enlarge on this problem and any fixes
(VMS 5.2??) or point me in the right direction.
Thanks in anticipation......
Dave Clark
UK-CSC
Devices Support (Disks)
*******************************************************************************
From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 20-SEP-1989 14:49:22.21
To: BRASSARD
CC:
Subj: FAS UPDATE
7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS
DEVICE: HSC50,HSC70,CIXXX (Added: 16-DEC-1988)
CLD #: CXO02335,CXO02677 * PRISM #: (Update: 16-JAN-1989)
(Updated 20-SEP-1989)
*** UPDATE SEP-89 *** VMS-5.2 "BACKUP" UTILITY I/O buffering
enhancements will cause many more/new sites to be impacted
by this VC-CLOSURE (CI CMDQ-0 STARVATION) problem, AS SOON
AS THEY UPGRADE TO VMS-5.2 !! Although an "official" CRONIC
new patch-version fix to the underlying Tape-write "CI buffer-
data priority" problem was intended to be released in NOV-89,
EVAL TESTING errors with this CRONIC (tentatively V-39A) fix is
requiring the solution strategy to be re-evaluated by CI/HSC/VMS
Engineering and CSSE.
!!!! THERE IS CURRENTLY NO ENGINEERING APPROVED WORKAROUND !!!!
!!!! OR PATCH (VMS-PADRIVER, HSC-CRONIC, OR OTHERWISE) FOR !!!!
!!!! FOR VMS-5.2 CUSTOMERS EXPERIENCING THIS PROBLEM; BUT !!!!
!!!! SOME FORM OF WORKAROUND WILL SOON (by NOV-89) BE !!!!
!!!! AVAILABLE. UNTIL THEN, ONLY THE FOLLOWING TWO (2) !!!!
!!!! SUGGESTIONS ARE AVAILABLE FOR VMS-5.2 CUSTOMERS: !!!!
1. DELAY VMS-5.2 UPGRADE UNTIL CSSE PUBLISHED AND OFFERS
A WORKAROUND PATCH (PADRIVER, CRONIC, etc.);
2. REDUCE VMS-5.2 "BACKUP" PROCESS I/O PERFORMANCE BY
LOWERING BACKUP-USER/ACCOUNT "UAF-RECORD" (AUTHORIZE
FILE) I/O QUOTAS SUCH AS "DIOLM, BYTLM". THIS WILL
CAREFULLY HAVE TO MANAGED TO AVOID REDUCING VMS-5.2
BACKUP PERFORMANCE TO LEVELS UNACCEPTABLE TO CUSTOMER.
!!!! CUSTOMER SITUATIONS WHERE THE ABOVE IS NOT SATISFACTORY !!!!
!!!! SHOULD BE ESCALATED VIA CLD-PROCESS TO VAX CSSE (George !!!!
!!!! White, Bob Brassard) FOR EXCEPTION HANDLING ADVICE. !!!!
!!!! VMS-4.7, 5.0, & 5.1 SITES SHOULD BE MANAGED AS DICTATED !!!!
!!!! BELOW: NO CLD IS REQUIRED; BUT VAX CSSE MUST BE !!!!
!!!! CONTACTED TO MANAGE THE "RESTRICTED DISTRIBUTION" !!!!
!!!! VMS-4.7/5.0/5.1 "PADRIVER" PATCH. !!!!
SYMPTOMS:
*** CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART,
*** SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY !
During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B,
CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk
BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the
CIxxx/VAX-host. Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO"
(default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure
and the "VC-CLOSURE" itself *** WILL NOT BE REPORTED BY THE HSC *** !
By reducing the HSC ERROR-LEVEL to "INFO", the following general error
messages will be seen:
HSC ERROR-MESSAGES
------------------
Path A has gone from good to bad.
Path B has gone from good to bad.
HOST-W-SEQ 100. xx:xx:xx (time-stamp)
VC closed with node-5 (node-name) due to request from K.CI
DISK-I-SEQ 101. xx:xx:xx
VC closed due to timeout of RTNDAT/CNF from host node-5.
HOST-I-SEQ 102. xx:xx:xx
VC opened with node-5 (node-name).
VMS on the affected VAX-host typically *** WILL NOT REPORT ANY
PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES *** either !!
Normally, the most obvious symptom visible to the customer is
simply tapes rewind/restart during BACKUP (due to lack of VMS
TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS
begin "COPYING" (if only mounted from 1 node); or unexplained
DISK MOUNT-VERIFY messages. These symptoms are simply the
result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery
from the VC-CLOSURE with the HSCxx.
The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically
recover the virtual-circuit within 5-10 seconds. Although this
recovery is automatic, significant BACKUP time is wasted in
re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE;
and significant SHADOW-SET performance is lost during a "COPY".
The customer may be legitamately and justifiably concerned...
*** DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !! ***
PROBLEM DESCRIPTION:
An intensive cross-functional CI/HSC Engineering and CSSE investigation
has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT"
VC-CLOSURE problem. To understand the 3 causes, a definition of
"RTNDAT/CNF TIMEOUT" is required. The HSCxx K.CI uses a 3-second
timer on each of its oldest CI data-transfers (usually a 4-8 block
fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring
more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT",
implying that the VAX CI-PORT has not returned the expected
"RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST-
PACKET) packet within 3 seconds.
1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
A microcode bug in the KCI supervisor loop stalls SNDDAT
packet processing, when DMA_CREDITS for DATREQ are exhausted.
SNDDAT and DATREQ processing should be independent. This
bug is corrected by KCI V2.54 (L0107 REV-E*) firmware,
soon to be released as FCO for RA70 support.
2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
Current CI-PORT command-queue prioritization may cause excessive
COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in
HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during
heavy BACKUP-applic. tape-write/disk-read activity. CI
processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message-
commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG)
can pre-empt servicing of COMQ0, thus indefinitely delaying
DATREQ0 HSC disk-write data-requests and resulting in HSC data-
transfer timeout. A "VMS SUPPORTED RESTRICTED DISTRIBUTION"
PADRIVER.EXE patch is available from VAX CSSE as a short-term
(6 month) workaround to this problem: THE MOST PREDOMINANT
CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !!
3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit
SNDDAT "LAST-PACKET" out-of-sequence, due to performance
optimization which allows KCI firmware to packetize up to
3 different SNDDAT operations at 1 time on a single VC.
The SNDDAT packet queue-manipulation can result in indefinite
pre-emption of the oldest SNDDAT/LP packet, depending on
SNDDAT "CNF" credit-return timing and queue-position.
This is a low-frequency/low-impact problem, likely only
occurring once or twice per-year ! There is currently no
KCI firmware fix, since the optimization is desirable in
most cluster CI traffic situations.
STATUS:
VAX CSSE has short-term workarounds for each of the above problems,
intended only for critical customer situations at this time.
1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
The KCI V2.54 firmware fix will soon be released as HSCxx
FCO to L0107 module, but CSSE can supply preliminary parts
for critical sites.
2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1.
THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE
APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER PERFORMANCE
(LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION
CI-TRAFFIC LOADS, thus requiring careful characterization of
CI-traffic at candidate sites. This patch is only a short-
term (6-month) workaround.
*** NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL
*** DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT
*** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !!
A cross-functional CI/VMS/HSC Engineering team is actively
considering and investigating an appropriate long-term fix,
which will not jeopardize performance. Pending results,
VMS Engineering will formally adopt an optimized-patch or
any team-recommended CI/PA-architecture changes into the
next VMS major release (V5.2) and a retrofittable patch.
3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC
V370 patch is available from VAX CSSE to extend the "RTNDAT/
CNF TIMEOUT" from 3 to 45 seconds, an effective workaround.
This is a low-risk/impact patch, and is normally not required,
but advised as a guarantee to avoid VC-Closure on political
sites. HSC Engineering is developing a KCI firmware fix,
likely to be included in an HSC/KCI future upgrade product.
SOLUTION/WORK-AROUND:
CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN
CURRENT WORKAROUNDS:
- Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
DDD 508-474-6492;
- George White, VOLKS::WHITE, DTN 240-6490,
DDD 508-474-6492.
The following immediate workarounds will lessen or completely avoid
problem impact while awaiting VAX CSSE qualification, approval, and
workarounds; or while awaiting formal Engineering release of
solutions. Note that each measure will lengthen the time required
for customer's daily/weekly BACKUP procedures !!
+ BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command
buffer-count parameter to default of "3 buffers" or less
in customer BACKUP command files, or BACKUP procedures.
+ CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the
number of concurrently running TA/TUxx BACKUP tape-drives/jobs,
to a number avoiding or limiting HSC VC-CLOSURE to an
acceptable leve.
INTRIM STATUS: (16 Jan 89) Field tests at two customer sites of the patched
PA driver appear to have been completly successful thus far.
From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 23-JAN-1989 20:02:25.06
To: MYFILE
CC:
Subj: F.A.S. HSC-VC-CLOSE CSSE-PROBLEM-DESC. & VMS-4.7/5.X PADRIVER PATCH...
(EXTRACT OF VAX-CSSE DEC-88 FOCUS-PRODUCT-REPORT)
-------------------------------------------------
7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS
DEVICE: HSC50,HSC70,CIXXX (Added: 16-DEC-1988)
CLD #: CXO02335,CXO02677 * PRISM #:
SYMPTOMS:
*** CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART,
*** SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY !
During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B,
CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk
BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the
CIxxx/VAX-host. Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO"
(default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure
and the "VC-CLOSURE" itself *** WILL NOT BE REPORTED BY THE HSC *** !
By reducing the HSC ERROR-LEVEL to "INFO", the following general error
messages will be seen:
HSC ERROR-MESSAGES
------------------
Path A has gone from good to bad.
Path B has gone from good to bad.
HOST-W-SEQ 100. xx:xx:xx (time-stamp)
VC closed with node-5 (node-name) due to request from K.CI
DISK-I-SEQ 101. xx:xx:xx
VC closed due to timeout of RTNDAT/CNF from host node-5.
HOST-I-SEQ 102. xx:xx:xx
VC opened with node-5 (node-name).
VMS on the affected VAX-host typically *** WILL NOT REPORT ANY
PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES *** either !!
Normally, the most obvious symptom visible to the customer is
simply tapes rewind/restart during BACKUP (due to lack of VMS
TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS
begin "COPYING" (if only mounted from 1 node); or unexplained
DISK MOUNT-VERIFY messages. These symptoms are simply the
result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery
from the VC-CLOSURE with the HSCxx.
The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically
recover the virtual-circuit within 5-10 seconds. Although this
recovery is automatic, significant BACKUP time is wasted in
re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE;
and significant SHADOW-SET performance is lost during a "COPY".
The customer may be legitamately and justifiably concerned...
*** DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !! ***
PROBLEM DESCRIPTION:
An intensive cross-functional CI/HSC Engineering and CSSE investigation
has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT"
VC-CLOSURE problem. To understand the 3 causes, a definition of
"RTNDAT/CNF TIMEOUT" is required. The HSCxx K.CI uses a 3-second
timer on each of its oldest CI data-transfers (usually a 4-8 block
fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring
more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT",
implying that the VAX CI-PORT has not returned the expected
"RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST-
PACKET) packet within 3 seconds.
1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
A microcode bug in the KCI supervisor loop stalls SNDDAT
packet processing, when DMA_CREDITS for DATREQ are exhausted.
SNDDAT and DATREQ processing should be independent. This
bug is corrected by KCI V2.54 (L0107 REV-E*) firmware,
soon to be released as FCO for RA70 support.
2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
Current CI-PORT command-queue prioritization may cause excessive
COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in
HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during
heavy BACKUP-applic. tape-write/disk-read activity. CI
processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message-
commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG)
can pre-empt servicing of COMQ0, thus indefinitely delaying
DATREQ0 HSC disk-write data-requests and resulting in HSC data-
transfer timeout. A "VMS SUPPORTED RESTRICTED DISTRIBUTION"
PADRIVER.EXE patch is available from VAX CSSE as a short-term
(6 month) workaround to this problem: THE MOST PREDOMINANT
CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !!
3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit
SNDDAT "LAST-PACKET" out-of-sequence, due to performance
optimization which allows KCI firmware to packetize up to
3 different SNDDAT operations at 1 time on a single VC.
The SNDDAT packet queue-manipulation can result in indefinite
pre-emption of the oldest SNDDAT/LP packet, depending on
SNDDAT "CNF" credit-return timing and queue-position.
This is a low-frequency/low-impact problem, likely only
occurring once or twice per-year ! There is currently no
KCI firmware fix, since the optimization is desirable in
most cluster CI traffic situations.
STATUS:
VAX CSSE has short-term workarounds for each of the above problems,
intended only for critical customer situations at this time.
1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
The KCI V2.54 firmware fix will soon be released as HSCxx
FCO to L0107 module, but CSSE can supply preliminary parts
for critical sites.
2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1.
THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE
APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER PERFORMANCE
(LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION
CI-TRAFFIC LOADS, thus requiring careful characterization of
CI-traffic at candidate sites. This patch is only a short-
term (6-month) workaround.
*** NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL
*** DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT
*** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !!
A cross-functional CI/VMS/HSC Engineering team is actively
considering and investigating an appropriate long-term fix,
which will not jeopardize performance. Pending results,
VMS Engineering will formally adopt an optimized-patch or
any team-recommended CI/PA-architecture changes into the
next VMS major release (V5.2) and a retrofittable patch.
3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC
V370 patch is available from VAX CSSE to extend the "RTNDAT/
CNF TIMEOUT" from 3 to 45 seconds, an effective workaround.
This is a low-risk/impact patch, and is normally not required,
but advised as a guarantee to avoid VC-Closure on political
sites. HSC Engineering is developing a KCI firmware fix,
likely to be included in an HSC/KCI future upgrade product.
SOLUTION/WORK-AROUND:
CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN
CURRENT WORKAROUNDS:
- Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
DDD 508-474-6492;
- George White, VOLKS::WHITE, DTN 240-6490,
DDD 508-474-6492.
The following immediate workarounds will lessen or completely avoid
problem impact while awaiting VAX CSSE qualification, approval, and
workarounds; or while awaiting formal Engineering release of
solutions. Note that each measure will lengthen the time required
for customer's daily/weekly BACKUP procedures !!
+ BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command
buffer-count parameter to default of "3 buffers" or less
in customer BACKUP command files, or BACKUP procedures.
+ CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the
number of concurrently running TA/TUxx BACKUP tape-drives/jobs,
to a number avoiding or limiting HSC VC-CLOSURE to an
acceptable leve.
From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 16-DEC-1988 19:39
To: NM%VOLKS::WHITE,MYFILE
Subj: CONTENTS OF FAS$PADRIVER DIRECTORY
Below are all the files necessary to implement the HSC VC-CLOSURE
workarounds on any sites.
Regards, Bob Brassard
Directory VOLKS::FAS$PADRIVER:
($1$DUA1:[FAS_PADRIVER])
HSC70_R002_KCI_V254.FCO;1
25 16-DEC-1988 16:16:39.40 (RE,RWED,RE,RE)
HSC K.CI V2.54 FIRMWARE FCO: L0107 REV-E*
HSC_KCI_TIMEOUT.PATCH;1
3 16-DEC-1988 19:34:37.14 (RE,RWED,RE,RE)
HSC CRONIC V370 PATCH TO EXTEND HOST-TIMEOUT TO 45-SECONDS
HSC_VC_CLOSE.FOCUS;1
16 16-DEC-1988 19:33:17.98 (RE,RWED,RE,RE)
HSC VC-CLOSURE FOCUS-REPORT ENTRY/PROBLEM DESCRIPTION
PADRIVER_V47_MSG0.COM;2
11 16-DEC-1988 15:16:28.37 (RE,RWED,RE,RE)
VMS-4.7 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION
PADRIVER_V47_MSG0.EXE;2
40 15-DEC-1988 19:49:53.25 (RE,RWED,RE,RE)
VMS-4.7 PATCHED PADRIVER.EXE IMAGE
PADRIVER_V50_MSG0.COM;2
11 16-DEC-1988 15:15:20.76 (RE,RWED,RE,RE)
VMS-5.0 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION
PADRIVER_V50_MSG0.EXE;2
46 15-DEC-1988 19:50:01.32 (RE,RWED,RE,RE)
VMS-5.0 (ALSO 5.0-1, 5.0-2, 5.1 FT) PATCHED PADRIVER.EXE IMAGE
Total of 7 files, 152 blocks.
! VMS-5.0-x PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE
! -------------------------------------------------------------
! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88
!
! WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING
! VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER
! PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN
! APPLICATION CI-TRAFFIC LOADS !!
!
! SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch.
! Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
! DDD 508-474-6492; or George White) with any problems.
!
! VERSION APPLICABILITY: This patch *** ONLY *** applies
! to VMS-5.0 distributed PADRIVER.EXE (also used for V5.0-1
! V5.0-2, and current V5.1 FT sites) with this "image
! ident & link-date" (ANAL/IMAGE PADRIVER.EXE):
!
! Image Identification Information
!
! image name: "PADRIVER"
! image file identification: "X-9"
! link date/time: 8-APR-1988 05:41:19.56
! linker identification: "04-92"
!
! ECO50 RRB0050 (R.R.Brassard, CSSE) 15-DEC-88
! MODULE: SCSXPORT.MAR of PADRIVER.EXE
!
! PROBLEM: Current CI-PORT command-queue prioritization
! may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0)
! service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT"
! VC-closure on disk-writes, during heavy BACKUP-applic.
! tape-write/disk-read activity. CI processing of Tape-
! writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/
! COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt
! servicing of COMQ0, thus indefinitely delaying DATREQ0
! HSC disk-write data-requests and resulting in HSC data-
! transfer timeout: currently defined in V370 CRONIC at
! 3 seconds.
!
! SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC)
! closures are only reported with HSC "OUTBAND & ERROR"
! level at "INFO" (default = ERROR). The first customer
! indication may only be "tapes rewinding/restarting",
! "shadow-set copying", or "mount verification" messages
! during heavy multiple concurrent disk/tape BACKUP
! activity.
!
! FIX: Modify SCS$FPC_SENDMSG routine to direct all CI SYSAP-
! MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND-
! QUEUE, instead of current COMQH (CI COMQ1). Therefore,
! new MSCP command messages (and unintentionally all SYSAP
! MSGs) will only be sent if CI can service COMQ0, effectively
! throttling CI data-transfer work to the rate at which CI
! can send new MSCP commands to HSCxx; thus guaranteeing
! reasonable COMQ0 service latency.
!
! **** PERFORMANCE IMPLICATIONS ****
! WARNING: This patch requires VAX CSSE authorization for
! implementation, due to cluster performance risks.
! Significant reduction of CI's sequenced-message I/O
! (SYSAP MESSAGEs sent) performance, of up to 65%, will
! occur under CI-port data-transfer saturation: approx.
! 1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for
! CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/
! CPU combinations. DCL "$ MONITOR SCS" (KB_MAP) provides
! an instantaneous CI data-transfer measurement; VPA and
! MONITOR/RECORD can be used for long-term monitoring.
!
! Sequenced messages are used by VMS for LOCK_MGR, CLUSTER
! CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR
! issuing most of these messages. Increased LOCK_MGR "lock
! granting" latencies will directly impact cluster-wide
! file/record/database I/O applications, since LOCK "MASTERing"
! and LOCK "DIRECTORYing" is a distributed function within a
! cluster. In other words, even with this patch on only 1/offline
! node, message slowdown will impact MASTER/DIRECTORY functions
! performed on behalf of other cluster nodes.
!
! Sequenced-message I/O reduction is especially dependent on
! disk-write (DATREQ0) data-transfers, which also use COMQ0.
! This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET
! datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write
! HSC data-requests) and used by VMS for CI-polling. Therefore,
! SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0
! (from HSC) and VMS polling, instead of before (higher priority)
! this activity on COMQ1 without this patch.
!
! Under non-saturated CI-port data-transfer conditions, this
! patch should only result in a 5% sequenced-message rate
! reduction. Of benefit, this patch may significantly improve
! disk-write performance during heavy mass-storage I/O activity.
! Datagrams (used mostly for DECNET) will also benefit.
!
! INSTALLATION:
! 1. COPY this PATCH command file (PADRIVER_V50_MSG0.COM) to
! work-directory.
! 2. COPY SYS$LOADABLE_IMAGES:PADRIVER.EXE to work area.
! 3. APPLY PATCH: "$ @PADRIVER_V50_MSG0.COM" or type in below
! patch-commands. Verify patch correctly installed: use
! ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text.
! 4. COPY PADRIVER.EXE SYS$COMMON:[SYS$LDR]PADRIVER.EXE. If
! patch only intended for 1 system, copy to SYS$SPECIFIC:
! [SYS$LDR]PADRIVER.EXE.
! 5. REBOOT SYSTEM, coordinating with customer.
!
! BEGINNING OF PATCH COMMANDS....
! -------------------------------
!
$ PATCH PADRIVER.EXE
SET ECO 50
VERIFY/INSTRUCTION 1FA8
"MOVW #04,B^0F2(R2)"
EXIT
REPLACE/INSTRUCTION 5147
"BSBW 1F3B"
EXIT
"BSBW 1FA8"
EXIT
UPDATE
EXIT
$ EXIT
$ !
$ ! END OF PADRIVER PATCH FILE
! VMS-4.7 PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE
! -------------------------------------------------------------
! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88
!
! WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING
! VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER
! PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN
! APPLICATION CI-TRAFFIC LOADS !!
!
! SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch.
! Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
! DDD 508-474-6492; or George White) with any problems.
!
! VERSION APPLICABILITY: This patch *** ONLY *** applies
! to VMS-4.7 distributed PADRIVER.EXE with this "image
! ident & link-date" (ANAL/IMAGE PADRIVER.EXE):
!
! Image Identification Information
!
! image name: "PADRIVER"
! image file identification: "X-3"
! link date/time: 22-MAY-1987 23:50:26.53
! linker identification: "04-00"
!
! ECO50 RRB0050 (R.R.Brassard, CSSE) 15-DEC-88
! MODULE: PAFPCALL.MAR of PADRIVER.EXE
!
! PROBLEM: Current CI-PORT command-queue prioritization
! may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0)
! service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT"
! VC-closure on disk-writes, during heavy BACKUP-applic.
! tape-write/disk-read activity. CI processing of Tape-
! writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/
! COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt
! servicing of COMQ0, thus indefinitely delaying DATREQ0
! HSC disk-write data-requests and resulting in HSC data-
! transfer timeout: currently defined in V370 CRONIC at
! 3 seconds.
!
! SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC)
! closures are only reported with HSC "OUTBAND & ERROR"
! level at "INFO" (default = ERROR). The first customer
! indication may only be "tapes rewinding/restarting",
! "shadow-set copying", or "mount verification" messages
! during heavy multiple concurrent disk/tape BACKUP
! activity.
!
! FIX: Modify FPC$SENDMSG routine to direct all CI SYSAP-
! MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND-
! QUEUE, instead of current COMQH (CI COMQ1). Therefore,
! new MSCP command messages (and unintentionally all SYSAP
! MSGs) will only be sent if CI can service COMQ0, effectively
! throttling CI data-transfer work to the rate at which CI
! can send new MSCP commands to HSCxx; thus guaranteeing
! reasonable COMQ0 service latency.
!
! **** PERFORMANCE IMPLICATIONS ****
! WARNING: This patch requires VAX CSSE authorization for
! implementation, due to cluster performance risks.
! Significant reduction of CI's sequenced-message I/O
! (SYSAP MESSAGEs sent) performance, of up to 65%, will
! occur under CI-port data-transfer saturation: approx.
! 1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for
! CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/
! CPU combinations. DCL "$ MONITOR SCS" (KB_MAP) provides
! an instantaneous CI data-transfer measurement; VPA and
! MONITOR/RECORD can be used for long-term monitoring.
!
! Sequenced messages are used by VMS for LOCK_MGR, CLUSTER
! CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR
! issuing most of these messages. Increased LOCK_MGR "lock
! granting" latencies will directly impact cluster-wide
! file/record/database I/O applications, since LOCK "MASTERing"
! and LOCK "DIRECTORYing" is a distributed function within a
! cluster. In other words, even with this patch on only 1/offline
! node, message slowdown will impact MASTER/DIRECTORY functions
! performed on behalf of other cluster nodes.
!
! Sequenced-message I/O reduction is especially dependent on
! disk-write (DATREQ0) data-transfers, which also use COMQ0.
! This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET
! datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write
! HSC data-requests) and used by VMS for CI-polling. Therefore,
! SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0
! (from HSC) and VMS polling, instead of before (higher priority)
! this activity on COMQ1 without this patch.
!
! Under non-saturated CI-port data-transfer conditions, this
! patch should only result in a 5% sequenced-message rate
! reduction. Of benefit, this patch may significantly improve
! disk-write performance during heavy mass-storage I/O activity.
! Datagrams (used mostly for DECNET) will also benefit.
!
! INSTALLATION:
! 1. COPY this PATCH command file (PADRIVER_V47_MSG0.COM) to
! work-directory.
! 2. COPY SYS$SYSTEM:PADRIVER.EXE to work area.
! 3. APPLY PATCH: "$ @PADRIVER_V47_MSG0.COM" or type in below
! patch-commands. Verify patch correctly installed: use
! ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text.
! 4. COPY PADRIVER.EXE SYS$COMMON[SYSEXE]:PADRIVER.EXE. If
! patch only intended for 1 system, copy to SYS$SPECIFIC:
! [SYSEXE]PADRIVER.EXE.
! 5. REBOOT SYSTEM, coordinating with customer.
!
! BEGINNING OF PATCH COMMANDS....
! -------------------------------
!
$ PATCH PADRIVER.EXE
SET ECO 50
VERIFY/INS 2485
"SUBL2 W^0B4(R4),R2"
EXIT
REPLACE/INSTRUCTION 1627
"BSBW 2450"
EXIT
"BSBW 2485"
EXIT
UPDATE
EXIT
$ EXIT
$ !
$ ! END-OF-PATCH
$ !=======================================================================
From: CVG::BRASSARD 19-DEC-1988 12:11
To: VOLKS::BRASSARD
Subj: HSC Timeout
From: SSDEVO::ENGLUND "Glenn Englund, HSC Engineering Manager" 16-DEC-1988 19:13:15.83
To: CVG::TOMASWICK,KOLLER,MOE,SHIVELY,BEAN,LARY,NM%VOLKS::WHITE,CVG::BRASSARD
CC:
Subj: No change to HSC host timeout - it should remain at 3 seconds
Unfortunately the suggested change to raise the HSC host timeout value from 3
seconds to 45 seconds was never tested (so I am told). I guess it fell through
the cracks out here.
Since it was not tested, it seems that the right thing to do is to leave it
unchanged, rather than delay the release of this patch in order to test the
change.
I would recommend changing the following note from George White to remove
any reference to a change to the HSC's host timer.
- Glenn
From: 27054::WHITE "VAX CSSE SUPPORT 12-Dec-1988 1303" 12-DEC-1988 11:10:20.30
To: @FAS
CC:
Subj: FAS,FORD,IRVINV TRUST VC CLOS. STATUS
-----------------------------
! d ! i ! g ! i ! t ! a ! l ! I N T E R O F F I C E M E M O
-----------------------------
TO: DISTRIBUTION DATE: 12 DEC 88
FROM: GEORGE WHITE
DEPT: MID-RANGE VAX CSSE
DTN: 240-6490
LOCN: AET 1-1/6
ENET: VOLKS::WHITE
DECMAIL: WHITE @VOLKS @AET
SUBJECT: FAS - (CXO2335), FORD - (CXO2677), IRVING TRUST VC CLOSURE STATUS
9 DEC 88, FROM BOB BRASSARD
The 2nd cause of the HSC VC Closure (RTNDAT/CNF TIMEOUT during heavy
BACKUP between 1 85/87/88xx and multi-HSC disk/tapes) was isolated about
3 weeks ago. As you will remember, the 1st bug was with HSC KCI ucode:
SNDDAT packets would not be sent if DATREQs were in DMA_CREDIT stall...
essentially KCI supervisor loop (scheduler) bug.
The 2nd problem involves the use of the CI's 4 prioritized commandd
queues: COMQ0 (low), COMQ1, COMQ2, and COMQ3 (highest). VMS sends messages
normally on COMQ1 (including MSCP) except for VC-Closure on COMQ0;
disk-writes use DATREQ0 (COMQ0), initiated by HSC; tape-writes use
DATREQ2 (COMQ2). If CI-port is transferring data at its limit
(1.2 Mb for CIBCA on 85/87/88xx), DATREQ2/COMQ2 and MSCP-MSG/COMQ1
activity will pre-empt CI ever looking at COMQ0 (disk-write DATREQ0);
COMQ0 latencies as high as 90-seconds were observed.
The short term solution will be a PADRIVER patch to put all
messages on COMQ0. This way, if CI is too busy to look at COMQ0,
HSC will run out of work (reads/writes), thus throttling data-transfers
until CI works on more messages from COMQ0.
The VMS PADRIVER patch is only a short-term solution. The CI-Architectual
committee is re-investigating CI-PORT prioritization algorithms, with
possible major scheduling changes for future CI products.
The PADRIVER patch was just tested during the past 2 weeks for performance
impact on message rates: negligible except for data-xfer saturated
CI-ports where message rates dropped 60%. I will be generating a
work-around package/procedures/documentation for the 3 required fixes:
PADRIVER patch, KCI V2.54 ucode (L0107-YA @ Rev-E2/3/4), CRONIC V370 patch
to extend host data-xfer timeout from 3 to 45 seconds (workaround for HSC
SNDDAT pipelining/sequencing problem: finishes SNDDATs out of order sometimes).
BTW, KCI V2.54 will soon be released as HSC50/70 FCO required for RA70 drives;
initial RA70s will include 2 sets of 12-PROMS each.
Best Regards, Bob Brassard
! CVG FAS-TESTING INTEREST DISTRIBUTION LIST: CVG_FAS.DIS
! =======================================================
NM%SSDEVO::LARY
NM%SSDEVO::BEAN
NM%SSDEVO::SHIVELY
NM%SSDEVO::MOE
NM%SSDEVO::KOLLER
NM%SSDEVO::ENGLUND
NM%SSDEVO::REPKA
NM%SSDEVO::ELMER
NM%HYEND::BLYONS
NM%CVG::TODHUNTER
NM%ACTIVE::GOELZ
NM%CSSE32::GOELZ
NM%VCSESU::TODHUNTER
NM%HYEND::WERTH
NM%HYEND::HJAKIELA
NM%HYEND::AVERY
N%INANNA::BALKOVICH
NM%HYDRA::BOAEN
NM%HYDRA::NIELSEN
NM%HYDRA::HAYAKAWA
NM%FROBUS::CONNOR
NM%CVG::TOMASWICK
NM%CVG::VIEIRA
NM%CVG::BAKER
NM%VOLKS::FREEMAN
NM%VOLKS::WHITE
NM%VOLKS::BRASSARD
NM%CVG::BRASSARD
NM%PYONS::BRANNON
NM%CSSE::MILLER
NM%CSSE::HOWINGTON
NM%SUPVAX::BLENDINGER
NM%PTOVAX::PEARLMAN
MTS$"FHO::BILL NOSEWORTHY"
MTS$"OHF::RICH LYONS"
MTS$"CYO::ROBERT B LEWIS"
MTS$"PTO::STEPHEN STEVENS"
MTS$"PTO::BILL REIGHT"
From: STAR::OSHAUGHNESSY "Dan, ZKO3-4/U14, DTN 381-1268, pole T/B8" 16-DEC-1988 11:26
To: VOLKS::BRASSARD,VOLKS::WHITE,CHIN,FOX,THIEL
Subj: VMS SUPPORT OF RESTRICTED DISTRIBUTION OF FAS PATCH
DIGITAL INTEROFFICE MEMORANDUM
TO: Bob Brasssard DATE: December 15, 1988
George White FROM: Dan O'Shaughnessy
DEPT: 354
EXT: 381-1268
LOC: ZK03-4/U14
ENET: STAR::OSHAUGHNESSY
cc: T. Chin
M. Fox
D. Thiel
SUBJECT: VMS Support of FAS Patch
VMS supports the restricted distribution of the "FAS" patch
written by Bob Brassard. Suitable warnings concerning the
impact to a system's sequenced message I/O performance (con-
nection manager and lock manager traffic) will accompany
the patch. Bob Brassard will manage the distribution of the
patch to insure that the performance impact on a candidate
site has been carefully considered. The patch should not
be published or made generally available for at least 6 months.
This time period should provide us with sufficient infor-
mation on how often the problem occurs on customer sites
and of any unintended side effects the patch may have.
A longterm solution should be provided by the SCA and CI
Architecture groups. Another meeting including VMS,CSSE,SASE
and architecture representatives should be planned in 3 months,
March 1989, to discuss and reevaluate the situation. At this
time a decision should be made to allow the general (un-
restricted release) release of the "FAS" patch in June 1989
or whether some other "midterm solution" is needed before
a "longterm" architectured solution is available.
From: SSDEVO::ELMER "Randy Elmer MLDS CSSE CESG MGR. 522-3874 Being flexible means never being bent out of shape 25-Oct-1989 1614" 25-OCT-1989 18:24:12.77
To: MOE
CC: RON,GARY,VOLKS::BRASSARD
Subj: V39A access over the net
Karen
The 4x4 today agreed that when we ship to SSB V39A saying this is good code
we should also it on the net and make it available to all internal customers
for early exposure to V39A. We also agreed that for a handful of customers
that may have a specific CLD open that V39A will fix we hand manage those
site and provide an early release as well across the NET.
Can we get Stacy to place V39A into HSC$ENETKITS with the release notes and
remove V390/V394?
Thanks
Randy
From: GENRAL::SSDEVO::ELMER "Randy Elmer MLDS CSSE CESG MGR. 522-3874 Being flexible means never being bent out of shape" 9-NOV-1989 18:34:24.98
To: GENRAL::VOLKS::BRASSARD
CC: RON
Subj: RE: HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ?
Bob
I have answered your questions below.
Randy
=============================================
From: GENRAL::VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6 09-Nov-1989 1813" 9-NOV-1989 16:15:30.05
To: GENRAL::ELMER,SSDEVO::REPKA,MYFILE
CC:
Subj: HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ?
Hi Randy & Ron,
I have not seen any announcement on ENET availability of V39A.
Is this now copyable on ENET ?
>>> Yes, but we need to hand manage it until the SSB release date of 18
>>> Decemeber, incase we find a bug that needs to have the code recalled. Only
>>> provide this code to the sites that are of political nature or we feel would
>>> be a good field test.
>>> ENET location is SSDEVO::HSC$FIELDTEST:
Are there release notes to copy with it ?
>>> Yes in the same location.
Do we still need Field Test Agreement ?
>>> No the 4X4 agreed that because it did go to SSB and that we would hand
>>> manage the code no field test agree in needed. We just need to track it
>>> and ensure it does not become public.
Neither have I seen Bob Lyons FAS meeting minutes with the approval status
on the release/SDC-submission of CRONIC V39A with FAS fix. Have you seen
any status ? Rumor has it approved.
>>> The code was submitted to SSB with the FAS fix. I to did not see the
>>> minutes.
BTW, I am currently on CLD in Hartford, Ct.; so mail response may be slow.
Best Regards, Bob Brassard
From: SSDEVO::REPKA "RON REPKA HSC CSSE 522-6195" 14-NOV-1989 14:05:33.92
To: VOLKS::BRASSARD
CC:
Subj: V39A Release Notes
HSC VERSION V3.9A
SOFTWARE RELEASE NOTES
Order Number: AA-GMFAH-TK
These release notes contain a summary of the features in the V3.9A
software.
digital equipment corporation maynard, massachusetts
January, 1990
The information in this document is subject to change without
notice and should not be construed as a commitment by Digital
Equipment Corporation. Digital Equipment Corporation assumes no
responsibility for any errors that may appear in this document.
The software described in this document is furnished under a
license and may be used or copied only in accordance with the
terms of such license.
No responsibility is assumed for the use or reliability of soft-
ware on equipment that is not supplied by Digital Equipment
Corporation or its affiliated companies.
Copyright (c)1990 by Digital Equipment Corporation
All Rights Reserved.
Printed in U.S.A.
The postpaid READER'S COMMENTS form on the last page of this
document requests the user's critical evaluation to assist in
preparing future documentation.
The following are trademarks of Digital Equipment Corporation:
DEC DIBOL UNIBUS
DEC/CMS EduSystem VAX
DEC/MMS IAS VAXcluster
DECnet MASSBUS VMS
DECsystem-10 PDP VT
DECSYSTEM-20 PDT
DECUS RSTS
DECwriter RSX DIGITAL
This document was prepared using VAX DOCUMENT, Version 1.0
Contents
1 INTRODUCTION 1
2 PREINSTALLATION CONSIDERATIONS 1
2.1 Software Restrictions 1
2.1.1 HSC50 Restricted to One Operator-Loaded
Utility 2
2.1.2 Maximum of 12 Tape Drives and 12 Formatters 2
3 HSC VERSION 3.9A SOFTWARE INSTALLATION 3
3.1 Preinstallation Backup 3
3.2 Software Installation Procedure 3
4 FEATURES IN HSC SOFTWARE VERSION 3.90 6
4.1 Disk Server 7
4.2 Utilities 7
5 MISCELLANEOUS ENHANCEMENTS 10
6 MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A 10
6.1 Disk Server 10
6.2 Tape Server 11
6.3 Block Size Recommendation For Non-TA90 Tape
Drives 12
6.4 Utilities 14
6.5 Miscellaneous Changes 14
7 HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR
MESSAGES 15
7.1 HSC Version 3.90 Software Exception Codes 15
7.2 HSC Version 3.90 Software Error Messages 16
7.3 Operator Control Panel Fault Codes 17
8 TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES 17
8.1 VTDPY Operation 18
8.1.1 Using the VTDPY Display 19
8.1.2 VTDPY Error Messages 27
8.2 Volume Shadowing 27
8.3 Exception Codes 29
1 INTRODUCTION
The HSC Version 3.9A software release package contains these HSC
Version 3.9A Software Release Notes and the Version 3.9A software
distribution media. The software for the HSC70 is distributed on
diskette. The software for the HSC50 is distributed on two TU58
cassettes.
These release notes document the following:
o All information in the HSC Version 3.90 Software Release Notes.
o Maintenance changes provided in Version 3.9A software to cor-
rect identified problems in the Version 3.90 software.
2 PREINSTALLATION CONSIDERATIONS
This section contains information you should consider before
upgrading your HSC to Version 3.9A software.
2.1 Software Restrictions
This section describes some restrictions of the HSC Version 3.9A
software.
2.1.1 HSC50 Restricted to One Operator-Loaded Utility
Because of the smaller memory size of the HSC50, Version 3.9A
software allows you to run only one utility at a time. This limi-
tation ensures that sufficient memory is available to run Device
Integrity Tests when scheduled by the CRONIC executive. If you at-
tempt to run a second utility, the following message is displayed:
KMON-F All Utility Partitions in Use
Start the second utility when the first utility has completed, or
press CTRL/C to terminate the first utility.
This change does not affect the operation of Version 3.9A software
on the HSC70, which retains its ability to run two utilities
simultaneously.
2.1.2 Maximum of 12 Tape Drives and 12 Formatters
HSC Version 3.9A software supports a maximum of 12 tape drives
and 12 formatters on each HSC. For example, 3 TA78 formatters
with 4 tape drives on each formatter reach the 12-tape drive
configuration limit. If more than 12 formatters or drives are
configured on an HSC, one of the following messages is displayed:
No tape formatter structures available for Requestor x Port y
or:
No tape drive structures available for Requestor x Port y
When the HSC boots, resources are allocated to formatters on
requestors in ascending requestor priority order until the limit
of 12 tape drives and 12 formatters is reached. Resources are
allocated among tape drives on the same formatter according to the
arbitrary order in which the drives became known to the HSC.
3 HSC VERSION 3.9A SOFTWARE INSTALLATION
Use the following procedure to install the software supplied in
this kit.
3.1 Preinstallation Backup
Before installing the software, use a blank diskette or cassette
to make a backup copy of the software. Instructions for the copy
procedure are in Chapter 10 of the HSC User Guide.
If you need additional backup copies, order blank, formatted
RX33 diskettes from the Software Distribution Center. Extra TU58
cassettes can be ordered from the DECdirect catalog.
3.2 Software Installation Procedure
NOTE
If your HSC cluster has RA90 disk drives connected to it,
use the SHOW DISKS command to verify that the RA90 drives
report a minimum software revision level of MC = 10.
If the drives do not report the minimum software revi-
sion, ask your Digital Field Service Representative to
install FCO RA90X-O001 prior to installing HSC Version 3.9A
software.
Use the following procedure to install the software:
1. On each HSC being upgraded to Version 3.9A software:
o Press CTRL/C.
o Enter the SHOW SYSTEM command.
3
This produces a hard copy of system parameters, as shown in the
following example:
<CTRL/C>
HSCxx>SHOW SYSTEM <RETURN>
17-JUL-l988 14:42:43.41 Boot: 17-JUL-1988 11:31:11,41 Up: 3:11
Version: V39A System ID: %X0000000000B7 Name: HSC006
Front Panel: Enabled HSC Type: HSC70
Console Dump: Enabled Load Dump: Disabled
Automatic DITs: Enabled
Periodic DITs: Enabled, Interval = 1
Disk Allocation Class: 0 Tape Allocation Class: 0
Start-up Command File: Disabled
Disk Drive Controller Timeout: 2 seconds
Maximum Tape Drives: 12
Maximum Formatters: 12
SETSHO-I Program Exit
2. Print a hard copy of these system parameters if your system
does not automatically produce a copy. Use this copy later in
the procedure to reset your HSC's parameters.
3. If your cluster does not have failover capabilities, shut down
the cluster and perform Steps 6 through 17.
4. Failover all disk and tape drives to the alternate HSC. Make
sure none of the tape or disk drives are on line to the HSC you
are upgrading to Version 3.9A software. The failover procedure
is described in the Guide to VAXclusters.
5. After successful failover, set the Online button on the HSC
operator control panel to the out position.
6. Open the HSC front panel, remove all old load media, and in-
stall the new software system/utility media in the HSC load
device. The new software system/utility media must be write
enabled.
4
Instructions for loading the software are in the following
sections of your HSC User Guide:
o If you have an HSC70, refer to Section 4.2.
o If you have an HSC50, refer to Section 4.3.
7. Press and release the Init button as you hold in the Fault
button. Hold in the Fault button until the following message
appears:
INIPIO-I Booting
8. When booting has completed:
o Press CTRL/C.
o Enter the RUN SETSHO command at the HSC> prompt.
o If the SETSHO> prompt is not displayed, review the pre-
vious steps to ensure that you have properly installed
the software, booted the HSC, and entered the RUN SETSHO
command.
9. At the SETSHO> prompt, enter the SHOW SYSTEM command to print a
hard copy of the default parameters on the new load media.
10.Compare this list of default parameters to the list you made in
Step 1 for this HSC. At the SETSHO> prompt, use the following
commands to reset the parameters to their former values:
SETSHO> SET NAME HSCaaaaaa <RETURN>
SETSHO> SET ID %Xnnnnn <RETURN>
SETSHO> SET ALLOCATE DISK n <RETURN>
SETSHO> SET ALLOCATE TAPE n <RETURN>
SETSHO> SET SERVER DISK DRIVE_TIMEOUT=n <RETURN>
Chapter 6 of your HSC User Guide contains detailed descriptions
of how to set each of these parameters.
11.Set any other parameters required by your system configuration.
When all parameters are set, enter the EXIT command at the
SETSHO> prompt.
5
12.The HSC prompts ask if you wish to reboot the HSC. Enter YES.
13.After the HSC reboots, press CTRL/C and enter the SHOW SYSTEM
command.
14.Compare the new parameters with the ones on the list you made
in Step 1 to verify that all parameters are the same. If the
parameters are not identical:
o Enter the RUN SETSHO command at the HSC> prompt.
o Return to Step 10 and set the parameters that need changing.
o Continue with this procedure from Step 11.
15.If all parameters are correct, press the operator control panel
Online button to the in position. This allows the cluster to
re-establish connections to this HSC.
16.Enter the SHOW VIRTUAL_CIRCUITS command to verify that all
connections have been made. This command lists nodes that
have established virtual circuits with the HSC. Check that
all active hosts have established virtual circuits to this HSC.
If they have not, reboot the HSC and repeat this step.
17.Failover the drives to the HSC on which you have just installed
the new software. After all units have failed over, install the
new software on the alternate HSC. After making the hard copy
list of system parameters in Step 1, go to Step 5 and complete
the software installation procedure.
__________________________________________________________
4 FEATURES IN HSC SOFTWARE VERSION 3.90
This section describes the features in the Version 3.90 software.
6
__________________________________________________________
4.1 Disk Server
The HSC disk environment has been improved in the following ways:
o Some potential causes of IOT 4076 crashes have been located and
resolved.
o A problem in the disk path that caused the HSC to report
databus overrun errors has been resolved.
o HSC V3.90 software supports the maximum number of shadow sets
allowed by your version of VMS. Refer to Section 8.2 for fur-
ther discussion of Volume Shadowing. Detailed information on
VMS shadowing support is found in the VAX/VMS Volume Shadowing
Manual.
__________________________________________________________
4.2 Utilities
DKCOPY
A new message for DKCOPY clears confusion when the requested tar-
get device is either in use or nonexistent. Refer to Section 7.2
for a description of this error message.
A new message for DKCOPY warns you when the target device of
a disk-to-disk copy is hardware write protected. Refer to
Section 7.2 for a description of this error message.
SETSHO
SETSHO has been updated as follows:
o SET MAX_FORMATTER -- Changed to improve performance.
o SET MAX_TAPES -- Changed to improve performance.
o SET SERVER -- Changed to improve performance.
o SET PROMPT -- Allows you to select your own prompt on the HSC.
o SET REQUESTOR -- Allows you to select the correct data channel
microcode.
7
o SET RESTART CLEAR -- Renamed to SET EXCEPTION CLEAR to more
closely describe the command function. This change deletes the
SET RESTART command.
o SET SECTOR_SIZE -- Deleted and the default sector size set to
512 bytes.
o SET OUTBAND/SHOW OUTBAND -- Merged with the SET ERROR and SHOW
ERROR commands.
o SET DEVICE [NO]HOST_ACCESS -- Changes the state of the requestor
when you exit SETSHO instead of when the command is entered.
This allows you to exit SETSHO with a CTRL/Y if you want the
requestor to remain in the previous state.
o SHOW REQUESTOR -- Displays information about each requestor
connected to the HSC.
o SHOW CONNECTIONS -- Displays information about all virtual
circuits and connections the HSC has with other nodes.
BACKUP/RESTOR
BACKUP/RESTOR has been updated as follows:
o A Write Ring Missing problem that caused BACKUP to abort has
been resolved. BACKUP now allows you to correct the problem and
continue without aborting the backup operation.
o BACKUP/RESTOR no longer supports 576 byte records.
o New features have been added to enhance tape unloading:
-- The tape drive will not unload if you press CTRL/C before a
tape drive has started reading or writing operations.
-- If a tape drive finishes a backup or restore operation
without using all of the mounted tapes, the extra tape is
not unloaded. This reduces the amount of wasted tape because
it identifies an empty tape by not unloading it.
8
o New prompts improve performance and reduce operator interven-
tion.
-- You may run the BACKUP and RESTOR utilities without operator
interaction. The HSC prompts with either of the following:
Would you like to run BACKUP with "NO OPERATOR"? [N]
or
Would you like to run RESTOR with "NO OPERATOR"? [N]
If you press RETURN or answer N, the utility continues
to prompt you for appropriate responses. If you leave the
terminal during a backup or restore operation, the utility
times out 5 minutes after a query and aborts.
If you answer Y, further system queries are disabled. The
utility bypasses all further prompts and uses the default
responses instead of your inputs. This feature allows you to
leave the terminal and perform other tasks without further
interaction with the utility.
NOTE
If you disable the queries, you may not know when the
volume to back up has expired or when the save-set of
a restore has expired. However, the warning message
for this condition appears almost immediately after
the NO OPERATOR query. When you see this message, you
may press CTRL/C to abort the operation. Otherwise,
the operation continues after a 10-second delay.
-- When BACKUP encounters a tape reel that has reached its
limit of media errors (hard write errors), it prompts you to
increase the error threshold with the following message:
Do you wish to increase the media error limit for this tape reel
and continue? (Y or N) [N]:
9
Press RETURN or answer N if you do not wish to change the
error threshold. You are then prompted to change the tape
reel.
Answer Y to increase the threshold and continue. You are
then prompted for the increased error limit. After you enter
the new limit, the operation continues.
-- HSC70 users may now perform a backup or restore operation
using a 16K-byte record size instead of the default 8K-byte
record size. This feature is described in Chapter 7 of your
updated HSC User Guide.
__________________________________________________________
5 Miscellaneous Enhancements
Node Lockout
A node lockout problem in the Diagnostics and Utilities Protocol
(DUP) Server is now resolved.
__________________________________________________________
6 MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A
This section describes the maintenance changes in the Version 3.9A
software.
__________________________________________________________
6.1 Disk Server
The following disk server changes have been implemented in the
Version 3.9A software:
o A possible problem in which an MMU crash may result if a forced
error is detected on a 2- or 3-member shadow set is resolved.
This fix also corrects the possible problem in which a repair
operation may not be performed as expected on an LBN with
forced error set.
10
o A potential cause of excessive positioner errors and IOT 4076
crashes has been corrected.
o An extremely rare problem in which a primary revectored block
is not handled properly has been corrected.
o A possible problem of virtual circuit closures on disks during
extremely heavy tape activity has been corrected.
__________________________________________________________
6.2 Tape Server
The following tape server changes have been implemented in Version
3.9A software:
o A possible problem in which failover may not occur in the
unlikely event that an operator releases a selected port button
on a tape drive while the drive is transferring a heavy data
load has been resolved.
o The ILTAPE diagnostic, in all circumstances, recognizes a TA90
tape drive and prompts for write memory region parameters.
o The restriction of no more than one TA90 tape formatter on the
same requestor has been lifted.
o An error flag problem that caused VAXsimPLUS to erroneously
signal alarms on tape drives that were actually operative is
resolved.
o When you run heavy TA90 loads and either a Cache Data Lost or
Cache Busy condition occurs, an IOT 6037 crash will no longer
occur.
o The default drive timeout has been increased from 30 to 80
seconds to provide a workaround for the problem of drives
unexpectedly changing to the AVAILABLE state. This may cause
failover to take longer.
11
o The following changes have been made in pipeline error report-
ing:
- A problem that caused improper reporting of pipeline errors
has been corrected.
- The severity level of the message that reports a pipeline
error has been changed from ERROR to WARNING.
- When an application or operating system issues a tape com-
mand with the inhibit error recovery condition set, the HSC
treats a pipeline error as recoverable. For example, if a
pipeline error occurs when you are running VMS Backup, the
HSC recovers the error even though the default is to inhibit
the error recovery. A pipeline error is NOT a tape error.
__________________________________________________________
6.3 Block Size Recommendation For Non-TA90 Tape Drives
The HSC CI interface can be significantly faster than the host CI
adapter when performing multiple backups. Therefore, it is possi-
ble that all of the CI bandwidth can be used by tape traffic and
cause data timeouts and virtual circuit closures. To prevent this,
a change in the V39A software more evenly distributes the data
flow over the CI to the hosts without noticeably affecting the
overall data throughput. Because of this change, it is strongly
recommended that you use the following operational parameters if
you wish to use a block size greater than 24Kb with VMS Backup
(the default is 8Kb):
o If only one requestor is configured for tape in the HSC, the
maximum recommended block size is 48Kb.
o If two requestors are configured for tape in the HSC, the
maximum recommended block size is 32Kb.
o If more than two requestors are configured for tape, the maxi-
mum recommended block size is 24Kb.
12
NOTE
These guidelines apply to the number of tape requestors
configured for tape in the HSC (not the number of tape
requestors actively transferring data).
They DO NOT apply to the cached TA90 tape drive. It is
still recommended that a 64Kb block size be used with the
TA90.
Use the following procedure to determine how many requestors are
configured for tape in your HSC:
1. Press CTRL/Y on the HSC console or terminal.
2. Type SHOW REQUESTOR. Each requestor displayed as type K.sti is
configured for tape.
Failure to follow these recommendations can result in pipeline and
drive- detected EDC errors when running multiple backup streams.
Pipeline errors are the result of the CI and host momentarily
not being able to supply data fast enough to the HSC during tape
writes. Due to the bursty nature of TA78, TA79, and TA81 tape
transfers, it is not uncommon to see an occasional pipeline error.
However, the more even distribution of data flow between tape
and disk in the V39A software will cause these errors to be seen
much more frequently. Following the recommended block size will
eliminate the possibility of these errors occurring and will have
minimal performance impact.
NOTE
Pipeline errors DO NOT indicate any hardware or software
fault in the HSC or host.
If a pipeline error occurs, the VMS Error Log prints the following
message for the MSLG$EVENT field in the error log entry:
Data OVRFLW due to pipeline error
13
The associated drive-detected EDC error can be recognized by a
code of 0440 in the ERRN1/ERRNUM field in the error log entry. It
will also have the same command reference number as the pipeline
error. To eliminate pipeline errors, reduce your block size ac-
cording to the recommendations provided. These errors are fully
recoverable.
__________________________________________________________
6.4 Utilities
BACKUP
You can now use the "NO OPERATOR" feature when you run BACKUP.
This feature is described in Chapter 7 of the HSC User Guide.
When you run BACKUP and reach the media error limit, the following
conditions occur:
o If you have chosen to run a backup operation without operator
interaction, the media error limit is automatically increased
and the backup operation continues.
o If you have chosen to run a backup operation with operator
interaction, BACKUP prompts you to increase the media error
limit.
__________________________________________________________
6.5 Miscellaneous Changes
Booting a System with a Shadowed System Disk
The HSC polling algorithm has been changed. This provides a
workaround to decrease the system boot time when booting from
a shadowed system disk when the virtual unit for the shadow set
has not yet been formed.
14
__________________________________________________________
7 HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR MESSAGES
This section lists the exception codes and error messages in the
Version 3.90 software.
__________________________________________________________
7.1 HSC Version 3.90 Software Exception Codes
4115
DCB address inconsistency
Facility: DISK, SDI
Explanation: While processing an error on a seek DCB, the facil-
ity found an inconsistency between the current seek DCB address
and the DCB address stored in the DRAT. This new crash code was
created in connection with the fixes for IOT 4076 crashes.
User Action: Submit an SPR with the crash dump.
4116
Bad error completion queue in DCB
Facility: DISK, MSCP
Explanation: The DCB error completion queue was not properly
restored during DCB completion.
User Action: Submit an SPR with the crash dump.
4117
No DRAT on DRAT list head when expected
Facility: DISK, ERROR
Explanation: No elements were found on the DRAT queue when the
error process tried to remove a DRAT from the head of the queue.
User Action: Submit an SPR with the crash dump.
15
__________________________________________________________
7.2 HSC Version 3.90 Software Error Messages
DKCOPY-E-INVALR--Invalid unit id. Valid range is 0 through 4094
Explanation: You have entered a unit identification number that is
not in the range of 0 through 4094.
User Action: Enter a unit identification number within the valid
range.
DKCOPY-E-OFFLINE--Specified unit is offline or nonexistent
Explanation: You have entered a unit identification number that is
not recognized by the system.
User Action: Check the unit identification number and enter the
command again.
DKCOPY-F-RUNSTOP--No volume mounted or drive disabled via RUN/STOP
Explanation: One or both of the drives that you are using to
perform a disk-to-disk copy does not have a volume mounted or
is spun down.
User Action: Check to see that a volume is mounted and that both
drives are spun up.
DKCOPY-F-WRITEPROTECT--Unit is write protected
Explanation: The target unit of a disk-to-disk copy is hardware
write protected.
User Action: Press and release the write protect button. Run
DKCOPY again.
KMON-F All Utility Partitions in Use
Explanation: You attempted to run more than one utility at a time.
User Action: Wait until the currently running utility has com-
pleted or terminate its operation.
16
VERIFY-E-INVALR-Invalid unit id. Valid range is 0 through 4094
Explanation: You have entered a unit identification number that is
not in the range of 0 through 4094.
User Action: Enter a unit identification number within the valid
range.
__________________________________________________________
7.3 Operator Control Panel Fault Codes
Your operator control panel may display the following fault code:
__________________________________________________________________
Status_Code_(Octal)__Description__________________________________
33 Invalid hardware configuration
__________________________________________________________________
This fault code indicates that the configuration of modules in
your HSC is not supported. Contact your Digital Field Service
Representative if this fault code is displayed on your operator
control panel.
Chapter 3 of your HSC User Guide contains a complete listing of
operator control panel fault codes.
8 TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES
This section contains important topics that are carried forward
and updated from previous issues of the HSC Release Notes. You
will need this information if you are a new user of the HSC.
8.1 VTDPY Operation
VTDPY is a utility for gathering and displaying system statistics.
VTDPY can display system throughput, status of the disk and tape
drives, and utilities running on other terminals. This utility
also indicates which nodes have virtual circuits, connections, and
multiple connections to the HSC.
NOTE
Avoid running VTDPY using the VMS command SET HOST/HSC with
VMS versions prior to V4.6.
This utility requires a video terminal and does not display on
a hard-copy printer. Either a VT100, a VT220, or a VT320, set
at 9600 baud, must be attached to the EIA port on the HSC to run
VTDPY.
To run VTDPY, enter the following command at the HSC> prompt:
HSC> RUN [device]:VTDPY [update-interval]
Where device is the device holding the VTDPY program. For the
HSC50, the device is DD1:, and for the HSC70, the device is DX0:.
The update-interval is in seconds, from 2 to 420. If this update
interval is not provided, VTDPY prompts:
VTDPY-Q Interval (secs) ?
If the response is outside the allowable range, VTDPY displays an
error message. The higher the number for the update interval, the
less the performance impact on the HSC.
VTDPY terminates after you enter a CTRL/Y or a CTRL/C. The screen
is cleared upon termination.
The following control keys are used in VTDPY:
CTRL/E--Displays tape status on the next refresh. Thereafter,
the display alternates with disk status on subsequent re-
freshes.
CTRL/D--Displays disk status on the next refresh. Thereafter,
the display alternates with tape status on subsequent re-
freshes.
CTRL/V--Displays host path status information (i.e., A, B, or a
diamond) on the next refresh only.
CTRL/W--Refreshes the screen.
__________________________________________________________
8.1.1 Using the VTDPY Display
This section presents a sample VTDPY display and explains the
meaning of the fields in the display.
HSC70 V39A HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12 UP: 113.49
42.9% Idle 39 Work Requests/Sec 40 Sectors/Sec 0 Records/Sec
Free Lists Process Pr St Time% Disk Status
CTRL Blks 2269 + Kernel 16.4% 1111111111
SLCB/DCB 32 + 2 VTDPY 11 Rn 19.2% +1234567890123456789
Buffers 889 + 50 DEMON 11 Bl 0.....................
52 PDEMON 7 Bl 20A.A..........A......
Pool Sizes 54 PSCHED 13 Rn 42.9% 40..........A.A.A.....
SYSCOM 1800 + 72 DISK 9 Rn 16.0% 60.AA.......O..A..O...
Kernel 6504 + 110 ECC 6 Bl 80....................
Program 821120 + 120 TAPE 8 Bl 100....................
Control 32436 + 122 TTRASH 7 Bl 120....................
124 HOST 4 Bl .9% 140...........O........
Data B/W used: .0% 126 POLLER 5 Bl 160....................
130 SCSDIR 5 Bl .9% 180..................A.
Host Connections 200A...................
111111111122222222223333333333 220....................
0123456789012345678901234567890123456789 240....................
0MM..C....V....M.........................
40........................................
The VTDPY display is continuously updated at the update interval
you have set and it changes as the internal state of the HSC
changes. These changes are made for all fields in the display,
except those fields relating to HSC memory. Memory statistics are
updated by pressing CTRL/W.
The major fields are explained in the following paragraphs. As
you read this section, refer to the VTDPY display to see where the
fields are located and to the paragraphs below the sample fields
to interpret the meaning of the fields.
HSC70 V370 HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12
UP: 113.49
The top line, reading from left to right, shows the HSC model num-
ber (HSC70); the baselevel of the operating software (V3.90); the
system name (HSC001); the HSC id number, given as a hexadecimal
number unique in the cluster (in this case 0000000000DD); and the
system date and time. The last number on the right indicates the
hours and minutes the HSC has been running since the last boot or
reboot.
42.9% Idle 39 Work Requests/Sec 40 Sectors/Sec 0 Records/Sec
This second line in the display shows the percentage of current
P.io idle time, average number of work requests (i.e., MSCP and
TMSCP) per second, number of disk data sectors transferred per
second, and number of tape data records transferred per second.
These numbers are normalized to match the update interval.
Free Lists
CTRL Blks 2269 +
SLCB/DCB 32 +
Buffers 889 +
Pool Sizes
SYSCOM 1800 +
Kernel 6504 +
Program 821120 +
Control 32436 +
This field represents the quantity of available memory and memory
structures. The units used in the display are:
CTRL Blks -- Blocks
SLCB/DCB -- Number of structures
Buffers -- Number of buffers
Pool Sizes -- All are given in words of memory
The numbers are usually followed by plus signs. If the numbers are
followed by minus signs, the system is in memory deficit. During
memory deficit, the HSC slows down and, if the deficit lasts long
enough, the HSC could crash.
Data B/W used: .0%
This display shows the percentage of HSC data bus bandwidth used.
This is an instantaneous display and may often show 0% when the
HSC is busy, because the bandwidth was zero at the instant the
sample was taken.
Host Connections
111111111122222222223333333333
0123456789012345678901234567890123456789
0MM..C....V....M.........................
40........................................
This display indicates host connection status. The two horizontal
rows of numbers below the Host Connections heading represent host
node numbers 0 through 39. Each digit on the first line is read
with the digit directly below it to form the numbers 10 through
39.
The connection status for host node numbers above 40 is read on
the last line of the display. Add the base number 40 at the far
left of the last line to the number above the display to derive
these host node numbers.
The next line indicates the status of the host connections. A
C on this line indicates one connection to that host, and an
M indicates multiple connections. Because each host can make a
separate connection to each of the disk, tape, and DUP servers,
this field frequently shows multiple connections. In the example,
nodes 0, 1, and 14 show multiple connections, and node 4 shows one
connection.
If no letter corresponds to the node number, that host does not
have any connection to the HSC. If a V appears on that line, a
Virtual Circuit only is open and no connection is present. This
usually means the host is in a transitional state. The example
shows node 9 with only a virtual circuit open.
Host Path Status
111111111122222222223333333333
0123456789012345678901234567890123456789
0^A..^....B....A.........................
40........................................
When you press CTRL/V, the display toggles to an alternate Host
Path Status display for one refresh only. This display contains
CI path status information and each position can contain either a
diamond symbol, an A, or a B. If one path (A or B) goes down,
this display alternates on every other refresh with the Host
Connections display until that path comes back.
The meanings of the symbols are as follows:
o A solid diamond symbol means normal operation (both paths
operating). This symbol is represented in the example with
a caret (^).
o An A or B indicates only one CI path is operational. If an
A is displayed, path A is active, but path B is not; if a
B is displayed, path B is active, but path A is not. These
conditions indicate a probable hardware problem.
The example shows that nodes 0 and 4 have both paths operating.
Nodes 1 and 14 have only path A operating, and node 9 has only
path B operating.
Process Pr St Time%
Kernel 16.4%
2 VTDPY 11 Rn 19.2%
50 DEMON 11 Bl
52 PDEMON 7 Bl
54 PSCHED 13 Rn 42.9%
72 DISK 9 Rn 16.0%
110 ECC 6 Bl
120 TAPE 8 Bl
122 TTRASH 7 Bl
124 HOST 4 Bl .9%
126 POLLER 5 Bl
130 SCSDIR 5 Bl .9%
The previous portion of the display shows the active processes.
The columns in this display (from left to right) mean the follow-
ing:
o The first column is the process number.
o The Process column shows the name of the process running at the
time.
o The Pr column shows the priority of the process.
o The St column shows the status of the process, either running
(Rn) or blocked (Bl).
o The Time% column is the percentage of P.io time each currently
running process is using.
Names in the process column under Kernel (the operating system)
are defined as follows:
o VTDPY is running. However, another utility could be running, in
which case the priority number might change also.
o DEMON indicates that demand and automatic device integrity
tests are running.
o PDEMON indicates that periodic device integrity tests are
running.
o PSCHED is the scheduler for periodic device integrity tests.
This is the HSC idle loop.
o DISK is the disk server.
o ECC is the error correction code process and is displayed when
disk I/O is active.
o TAPE is the tape server.
o TTRASH is displayed when the tape server is active. This pro-
cess sends tape error logs to the host.
o HOST is the process that interfaces to the host. It is always
present.
o POLLER polls for the host processors and is present when a
connection is present.
o SCSDIR processes directory requests from the host.
Not all active processes are necessarily shown. Because of lim-
ited space on the screen, the display of some processes may be
truncated and the CPU time percentages may not total 100 percent
depending on the polling interval of the data sample.
Disk Status
1111111111
+1234567890123456789
0....................
20A.A..........A......
40..........A.A.A.....
60.AA.......O..A..O...
80....................
100....................
120....................
140...........O........
160....................
180..................A.
200A...................
220....................
240....................
The last area in the display alternates between disk and tape
status displays when both device types are connected to the HSC.
The two horizontal rows of numbers under the Disk Status heading
represent the numbers 0 through 19. Each 1 on the first line is
read with the digit directly below it to form the numbers 10
through 19. This number is added to the numbers 0 through 240
given on the vertical axis of the display to derive the disk unit
number.
For example, the letter O in the approximate center of the display
refers to disk unit 151 because it is at the intersection of
the number 140 on the vertical axis and the number 11 in the
horizontal rows, and the sum of 140 and 11 is 151.
The drive status is coded as follows:
o The letter O indicates the drive is Online. That is, the drive
is in use by a host, an HSC utility, or an HSC device integrity
test. In the example, drive unit number 151 is on line.
o An A indicates the drive is Available but not mounted. Drive
unit number 62 is available.
o A D indicates the HSC is connected to Duplicate units (two or
more drives with the same unit number).
o A U indicates the drive is in an Undefined state.
The letters and method of determining the drive unit number are
the same when tape status is displayed. In the tape status dis-
play, an additional letter, F, indicates that no tape is mounted
on the tape drive.
__________________________________________________________
8.1.2 VTDPY Error Messages
This utility has two error messages, as follows:
VTDPY-E Illegal Interval Value (2 to 420 seconds)
Explanation: You have entered an update interval outside the range
permitted. VTDPY reprompts for the update interval.
User Action: Reenter a value within the correct range.
VTDPY-F Insufficient Common Pool
Explanation: This message indicates insufficient memory to run
VTDPY.
User Action: Retry VTDPY when the demands on memory are reduced.
__________________________________________________________
8.2 Volume Shadowing
This release supports VMS Volume Shadowing. HSC Version 3.9A
supports the maximum number of shadow sets specified in the VMS
Volume Shadowing Software Product Description (SPD).
When you run volume shadowing, adhere to the following rules:
o Use only identical disk types with the same geometry within a
shadow set.
o Do not attempt to dismount the source shadow member of a shadow
set while a VMS Shadow Copy operation is in progress. The VMS
command, SHOW DEVICE, indicates whether such an operation is
executing.
o Read Section 2.8 of the VAX/VMS Volume Shadowing Manual, which
describes a method using a particular former shadow set member
as the source for all copy operations involved in rebuilding
the shadow set.
o Always include the device names for all shadow set members
in the shadowing MOUNT command. The VMS operating system will
correctly select between source and target members for you.
Note the following items specific to volume shadowing:
o During a copy operation, different VAXcluster members may have
different views of the shadow set's membership, as shown by the
SHOW DEVICE command. This situation corrects itself when the
copy operation completes. Differences appear when a shadow set
is first mounted and during the copy operations resulting from
shadow set failover processing.
Although this situation can be confusing, it is relatively
harmless. If the condition results from a MOUNT command, the
SHOW DEVICE output on the VAXcluster member where the MOUNT
command was executed is the most accurate view of the shadow
set.
o During a merge copy operation (initiated either by a MOUNT
command or as a result of a shadow set failover), only the
VAXcluster member executing the copy indicates a merge copy is
executing. All other VAXcluster members indicate a full copy is
being done. This is part of the volume shadowing design used by
the HSC controller and VMS operating system.
o Hardware write-protected shadow sets are not supported. If you
write protect the members of a shadow set, any data degradation
errors will be unrecoverable.
o Shadow set members with foreign file structures (that is, not
FILES-11 ODS 2) receive limited support. Full volume shadowing
support requires the ability to store shadow set context some-
where on the shadow set member volumes. This is not possible
for volumes with a foreign file structure. Read the VAX/VMS
Volume Shadowing Manual carefully before attempting to use
volume shadowing on volumes with foreign file structures.
__________________________________________________________
8.3 Exception Codes
This section provides error codes and user actions.
004106
DRAT allocation failure
Facility: DISK, MSCP
Explanation: While preparing to read the Factory Control Table
(FCT) during online processing, the DRAT allocation subroutine
failed.
User Action: Submit an SPR with the crash dump.
004107
Command not completed after drive declared inoperative
Facility: DISK, MSCP
Explanation: Get Command Status processing declared the drive
inoperative, but the command still failed to complete within the
timeout period.
User Action: Submit an SPR with the crash dump. Note the type
of the drive identified in the error message. The error message
identifies the unit number; the drive type for the unit number may
be obtained from a SHOW DISKS display.
004110
GCS Status Overflow
Facility: DISK, MSCP
Explanation: Get Command Status processing determined that the
calculated status will result in a overflow.
User Action: Submit an SPR with the crash dump.
004111
A timer has link field values inconsistent with its current opera-
tional state
Facility: DISK, many
Explanation: When a timer was added or removed from the active
list, it was in a state that should not exist.
User Action: Submit an SPR with the crash dump.
004112
A unit is incorrectly marked as a shadow set member
Facility: DISK, many
Explanation: A unit was incorrectly marked as being a member of a
shadow set.
User Action: Submit an SPR with the crash dump.
004113
No DRAT list invalid
Facility: DISK, many
Explanation: During Fragment Request Block (FRB) retirement while
declaring a drive inoperative, the NO DRAT list was found to be
invalid.
User Action: Submit an SPR with the crash dump.
004114
Connection closed after delay in ATTN process
Facility: DISK, ATTN
Explanation: While the disk server was waiting to acquire re-
sources to send an attention message to the host, the connection
closed.
User Action: Submit an SPR with the crash dump.
007022
Invalid BMB address
Facility: CIMGR, CIMISCPRC
Explanation: A Host Message Block (HMB) arrived at the resource
collector with an invalid Big Message Block (BMB) address attached
to it.
User Action: Note the K.pli microcode revision level with a SETSHO
SHOW REQUESTORS command. The K.ci MC version reported by this
command is the K.pli microcode revision level. If the revision
level is less than revision 45, contact your Digital Field Service
Representative for a K.pli microcode update. Also, note the cur-
rent disk configuration. If the K.pli microcode revision level is
greater than or equal to 45, submit an SPR with the crash dump and
the noted disk configuration.
007023
SCS buffer retrieval failure
Facility: CIMGR, CISUBRS
Explanation: When changing the status of the virtual circuit, the
CIMGR tried to retrieve the SCS buffer from the K.ci .KHSRR queue.
This buffer should have been on the queue because it was not in
use at the time of the crash. If no elements have been queued to
the .KHSRR queue, CIMGR would have forced a crash.
User Action: Submit an SPR with the crash dump.
062002
Common Pool memory returned twice
Facility: Many
Explanation: A process attempted to return a memory segment that
was already in the Common Pool.
User Action: Submit an SPR with the crash dump.
********************************************************************************
32
From: GENRAL::FIALA "Eschew Obfuscation." 17-NOV-1989 10:45:48.95
To: VOLKS::BRASSARD
CC: FIALA
Subj: CLD CX4373..
Hi.
Rone Repka forwarded your memo to me.
I have the invidious position of remedial support for CX controllers.
You obviously need HSC V39A. I am hand managing distribution of 39A
untill SDC start shipping mid January.
I have 3 bundled savesets with everything needed [including
instructions] inside.
1 for HSC50, 1 for HSC70, 1 for both.
Which kit do you want?. Where do you want it copied to?.
Let me know "where to stick it" !!!.
Stefan Fiala
PS: Your phone # in your mail header still has 240-6492.
Elf lists your old number also.
From: GENRAL::FIALA "Eschew Obfuscation." 17-NOV-1989 17:24:16.82
To: KERNEL::CLARK
CC: FIALA
Subj: HSC's and dropping VC's.
Hi.
Bob Brassard forwarded your memo to me.
The problems you have are probably to do with:-
Bad install of V390(4), KCI 2.54 an/or HSC V39A or reasons unknown.
There is a buffer/credit starvation fixed in 254 and a
hack to the CI wire handling in 39A.
I beleive there are other VMS things too.
Bad install of V390(4) can cause all these things and more...
By and large if the HSC is reporting VC closed [info]
and VMS 5.2 is running and Backup is involved [maybe with a large
blocksize or /nocrc] then the phenomena you outline fits.
Use "SHO REQ" on the HSC to check for 2.54. Fix this first.
Check the way V390(4) was installed:-
o Did they get no HSC prompt after installation. [yes=Bad]
o Did they follow the correct install proceedure.[No=bad]
o How many tape formatters does "SHO TAPE" indicate.
[MUST BE LESS THAN 24]. [>12 BAD]
o Suspect VMS credit starvation for HSC disk/tapes. [Difficult]
o Devices on that HSC "run slow". [Difficult]
o If in any doubt [reinstall V390(4):- [Easy]
o SHOW ALL
o Check disks wil failover.
o <online> out.
o Press <init> hold in <fault>
o "Inipio-Booting..."
o Let go of <fault>.
o Use SETSHO to reset any paramaters from SHO ALL.
o Reboot as necessary.
o <online> in.
I generated a Blitz about badly installing V390(4) some time ago.
[For VC closes you get a message "VC closed by request from KCI"
but no reason... No retndat/cnf timeout, for instance.]
I couldnt tell the whole story... But if in doubt re-initialise it.
Submit a Prism/Cld for a pre-release of HSC V39A.
V39A wont exit SDC or SSB [new name?] till mid January [in the USA].
Do me a favour and spread the word amongst the CSC folks about
the VC closures being "hidden" by the HSC error level.
And the bad-install of V390(4) issue.
If you have funnies like HSC goes offline/tapes mysteriously rewind/
shadowcopies start spontaneously/drive drop offline/etc.
SET ERROR INFO immediately...
Stefan Fiala
CX CSSE Product Support.
|