[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:	CSGUK_SYSTEMS
Notice:	No restrictions on keyword creation
Moderator:	KERNEL::ADAMS

Created:	Wed Mar 01 1989
Last Modified:	Thu Nov 28 1996
Last Successful Update:	Fri Jun 06 1997
Number of topics:	242
Total number of notes:	1855

60.0. "TAPES" by KERNEL::MOUNTFORD () Fri Aug 11 1989 14:44

    
    THIS TOPIC IS DEDICATED TO THE INTERCHANGE  OF INFORMATION & ASSISTANCE
    ------------------------------------------------------------------------                  
                       WITHIN THE DEVICE GROUP FOR TAPES
                       ----------------------------------

T.R	Title	User	Personal Name	Date	Lines
60.1	TA90 no power gotcha	KERNEL::ARCHER	Graham Archer Devices Diagnosis	`Sat Aug 19 1989 16:32`	32
	Here's a possible TA90 gotcha, Scenario -------- Customer uses the "Unit Emergency" switch to power down the TA90, or powers the TA90 off from the mains breaker on the wall. Result ------ When either switch is turned back on, the TA90 fails to power up. No DC power is present. Solution -------- With the power switches set back to "on" press the Local Power Enable button, located immediately to the left of the floppy drive inside the TA90. You will have to open the front door of the TA90 to do this. Press the right hand side of the door hard with the palm of the hand to release the door catch to gain access to the inside of the TA90. Reason ------ The Unit Emergency switch kills the entire TA90 Power system immediately, TA90 power is still held off, even when the ac input is restored, until the Local power enable switch is set. Graham Archer.
60.2	Tapes and disks going offline temporarily	KERNEL::BARTLEY		`Mon Nov 20 1989 09:22`	2520
	I have read nearly half of this and it seems to have something to do with tapes. I think!! In any case it's extremely interesting. I think!! Isn't it? (TFB) *************************************************************** Dear Colleagues.... As a result of a telesupport query, and some probing into the ol' STARS database, I found the name of someone who seemed to know something about it, so I asked him!....The result was a blown disk quota, so I thought I'd share it with you!! Please observe the point about CSSE needing to manage the restricted release, and report all cases to Bob Brassard/George White. Now read on!! I have been discussing a problem with a field engineer, which appears to have all the hallmarks of the scenario outlined in a STARS article, for which various workarounds are indicated. The "cluster" is a single 8800/cibci with hw/sw revision from "show cluster"=80007, HSC RP_REV for HSC=236. CRONIC is 394, VMS is 4.7. (hsc=236 indicates K.CI u-code 54 which is latest following installation of the L0107-YA FCO .) The HSC supports 4 x RA82's, 1 x TA81, and 7 x S.I. 97C's on DEC requestors. The "problem" is two edged. (1)The customer is complaining about long backup times. (2)Two months ago the HSC started "dropping off line" (Online light going out, all drive port lights going out, tape would rewind and restart backups) After a short period it would recover the V.C. (online light came on again, drive port lights came on again) This has become more frequent, causing much loss of performance. This morning there were six observed events in one hour, with no errlog/hsc-console reports. There were no identifiable changes to either software or hardware corresponding to the start of the problem, although I havent yet been able to rule out such things as changes to backup command files or trainee operators, ...etc. There are frequent indications of clock dropouts from the S.I. drives, on the HSC console, and as HSC datagrams, but not corresponding in time to the HSC events, and (as the STARS scenario suggests......) ** No PAA0, or HSC errors on either VMS or HSC consoles. NO ERRORLOG ENTRIES FOR THE HSC GOING AWAY!! These S.I. dropouts were occuring before this problem started. The ERROR threshold on the HSC was FATAL. This has been reset to "INFO" and immediately gave "vc closure" events corresponding to the HSC online light going out, however these didn't indicate the reason (e.g. VC closed due to timeout of RTNDAT/CNF...etc). I have a feeling that I may be missing out on some information which may have already been published on this topic, if so I apologise. However, I would be grateful if you could maybe enlarge on this problem and any fixes (VMS 5.2??) or point me in the right direction. Thanks in anticipation...... Dave Clark UK-CSC Devices Support (Disks) ***************************************************************************** From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 20-SEP-1989 14:49:22.21 To: BRASSARD CC: Subj: FAS UPDATE 7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS DEVICE: HSC50,HSC70,CIXXX (Added: 16-DEC-1988) CLD #: CXO02335,CXO02677 * PRISM #: (Update: 16-JAN-1989) (Updated 20-SEP-1989) * UPDATE SEP-89 * VMS-5.2 "BACKUP" UTILITY I/O buffering enhancements will cause many more/new sites to be impacted by this VC-CLOSURE (CI CMDQ-0 STARVATION) problem, AS SOON AS THEY UPGRADE TO VMS-5.2 !! Although an "official" CRONIC new patch-version fix to the underlying Tape-write "CI buffer- data priority" problem was intended to be released in NOV-89, EVAL TESTING errors with this CRONIC (tentatively V-39A) fix is requiring the solution strategy to be re-evaluated by CI/HSC/VMS Engineering and CSSE. !!!! THERE IS CURRENTLY NO ENGINEERING APPROVED WORKAROUND !!!! !!!! OR PATCH (VMS-PADRIVER, HSC-CRONIC, OR OTHERWISE) FOR !!!! !!!! FOR VMS-5.2 CUSTOMERS EXPERIENCING THIS PROBLEM; BUT !!!! !!!! SOME FORM OF WORKAROUND WILL SOON (by NOV-89) BE !!!! !!!! AVAILABLE. UNTIL THEN, ONLY THE FOLLOWING TWO (2) !!!! !!!! SUGGESTIONS ARE AVAILABLE FOR VMS-5.2 CUSTOMERS: !!!! 1. DELAY VMS-5.2 UPGRADE UNTIL CSSE PUBLISHED AND OFFERS A WORKAROUND PATCH (PADRIVER, CRONIC, etc.); 2. REDUCE VMS-5.2 "BACKUP" PROCESS I/O PERFORMANCE BY LOWERING BACKUP-USER/ACCOUNT "UAF-RECORD" (AUTHORIZE FILE) I/O QUOTAS SUCH AS "DIOLM, BYTLM". THIS WILL CAREFULLY HAVE TO MANAGED TO AVOID REDUCING VMS-5.2 BACKUP PERFORMANCE TO LEVELS UNACCEPTABLE TO CUSTOMER. !!!! CUSTOMER SITUATIONS WHERE THE ABOVE IS NOT SATISFACTORY !!!! !!!! SHOULD BE ESCALATED VIA CLD-PROCESS TO VAX CSSE (George !!!! !!!! White, Bob Brassard) FOR EXCEPTION HANDLING ADVICE. !!!! !!!! VMS-4.7, 5.0, & 5.1 SITES SHOULD BE MANAGED AS DICTATED !!!! !!!! BELOW: NO CLD IS REQUIRED; BUT VAX CSSE MUST BE !!!! !!!! CONTACTED TO MANAGE THE "RESTRICTED DISTRIBUTION" !!!! !!!! VMS-4.7/5.0/5.1 "PADRIVER" PATCH. !!!! SYMPTOMS: * CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART, * SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY ! During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B, CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the CIxxx/VAX-host. Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO" (default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure and the "VC-CLOSURE" itself * WILL NOT BE REPORTED BY THE HSC * ! By reducing the HSC ERROR-LEVEL to "INFO", the following general error messages will be seen: HSC ERROR-MESSAGES ------------------ Path A has gone from good to bad. Path B has gone from good to bad. HOST-W-SEQ 100. xx:xx:xx (time-stamp) VC closed with node-5 (node-name) due to request from K.CI DISK-I-SEQ 101. xx:xx:xx VC closed due to timeout of RTNDAT/CNF from host node-5. HOST-I-SEQ 102. xx:xx:xx VC opened with node-5 (node-name). VMS on the affected VAX-host typically * WILL NOT REPORT ANY PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES * either !! Normally, the most obvious symptom visible to the customer is simply tapes rewind/restart during BACKUP (due to lack of VMS TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS begin "COPYING" (if only mounted from 1 node); or unexplained DISK MOUNT-VERIFY messages. These symptoms are simply the result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery from the VC-CLOSURE with the HSCxx. The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically recover the virtual-circuit within 5-10 seconds. Although this recovery is automatic, significant BACKUP time is wasted in re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE; and significant SHADOW-SET performance is lost during a "COPY". The customer may be legitamately and justifiably concerned... * DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !! * PROBLEM DESCRIPTION: An intensive cross-functional CI/HSC Engineering and CSSE investigation has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURE problem. To understand the 3 causes, a definition of "RTNDAT/CNF TIMEOUT" is required. The HSCxx K.CI uses a 3-second timer on each of its oldest CI data-transfers (usually a 4-8 block fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT", implying that the VAX CI-PORT has not returned the expected "RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST- PACKET) packet within 3 seconds. 1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG: A microcode bug in the KCI supervisor loop stalls SNDDAT packet processing, when DMA_CREDITS for DATREQ are exhausted. SNDDAT and DATREQ processing should be independent. This bug is corrected by KCI V2.54 (L0107 REV-E) firmware, soon to be released as FCO for RA70 support. 2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION": Current CI-PORT command-queue prioritization may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during heavy BACKUP-applic. tape-write/disk-read activity. CI processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message- commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt servicing of COMQ0, thus indefinitely delaying DATREQ0 HSC disk-write data-requests and resulting in HSC data- transfer timeout. A "VMS SUPPORTED RESTRICTED DISTRIBUTION" PADRIVER.EXE patch is available from VAX CSSE as a short-term (6 month) workaround to this problem: THE MOST PREDOMINANT CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !! 3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit SNDDAT "LAST-PACKET" out-of-sequence, due to performance optimization which allows KCI firmware to packetize up to 3 different SNDDAT operations at 1 time on a single VC. The SNDDAT packet queue-manipulation can result in indefinite pre-emption of the oldest SNDDAT/LP packet, depending on SNDDAT "CNF" credit-return timing and queue-position. This is a low-frequency/low-impact problem, likely only occurring once or twice per-year ! There is currently no KCI firmware fix, since the optimization is desirable in most cluster CI traffic situations. STATUS: VAX CSSE has short-term workarounds for each of the above problems, intended only for critical customer situations at this time. 1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG: The KCI V2.54 firmware fix will soon be released as HSCxx FCO to L0107 module, but CSSE can supply preliminary parts for critical sites. 2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION": VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1. THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION CI-TRAFFIC LOADS, thus requiring careful characterization of CI-traffic at candidate sites. This patch is only a short- term (6-month) workaround. NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL * DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT *** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !! A cross-functional CI/VMS/HSC Engineering team is actively considering and investigating an appropriate long-term fix, which will not jeopardize performance. Pending results, VMS Engineering will formally adopt an optimized-patch or any team-recommended CI/PA-architecture changes into the next VMS major release (V5.2) and a retrofittable patch. 3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC V370 patch is available from VAX CSSE to extend the "RTNDAT/ CNF TIMEOUT" from 3 to 45 seconds, an effective workaround. This is a low-risk/impact patch, and is normally not required, but advised as a guarantee to avoid VC-Closure on political sites. HSC Engineering is developing a KCI firmware fix, likely to be included in an HSC/KCI future upgrade product. SOLUTION/WORK-AROUND: CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN CURRENT WORKAROUNDS: - Bob Brassard, VOLKS::BRASSARD, DTN 240-6492, DDD 508-474-6492; - George White, VOLKS::WHITE, DTN 240-6490, DDD 508-474-6492. The following immediate workarounds will lessen or completely avoid problem impact while awaiting VAX CSSE qualification, approval, and workarounds; or while awaiting formal Engineering release of solutions. Note that each measure will lengthen the time required for customer's daily/weekly BACKUP procedures !! + BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command buffer-count parameter to default of "3 buffers" or less in customer BACKUP command files, or BACKUP procedures. + CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the number of concurrently running TA/TUxx BACKUP tape-drives/jobs, to a number avoiding or limiting HSC VC-CLOSURE to an acceptable leve. INTRIM STATUS: (16 Jan 89) Field tests at two customer sites of the patched PA driver appear to have been completly successful thus far. From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 23-JAN-1989 20:02:25.06 To: MYFILE CC: Subj: F.A.S. HSC-VC-CLOSE CSSE-PROBLEM-DESC. & VMS-4.7/5.X PADRIVER PATCH... (EXTRACT OF VAX-CSSE DEC-88 FOCUS-PRODUCT-REPORT) ------------------------------------------------- 7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS DEVICE: HSC50,HSC70,CIXXX (Added: 16-DEC-1988) CLD #: CXO02335,CXO02677 * PRISM #: SYMPTOMS: * CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART, * SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY ! During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B, CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the CIxxx/VAX-host. Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO" (default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure and the "VC-CLOSURE" itself * WILL NOT BE REPORTED BY THE HSC * ! By reducing the HSC ERROR-LEVEL to "INFO", the following general error messages will be seen: HSC ERROR-MESSAGES ------------------ Path A has gone from good to bad. Path B has gone from good to bad. HOST-W-SEQ 100. xx:xx:xx (time-stamp) VC closed with node-5 (node-name) due to request from K.CI DISK-I-SEQ 101. xx:xx:xx VC closed due to timeout of RTNDAT/CNF from host node-5. HOST-I-SEQ 102. xx:xx:xx VC opened with node-5 (node-name). VMS on the affected VAX-host typically * WILL NOT REPORT ANY PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES * either !! Normally, the most obvious symptom visible to the customer is simply tapes rewind/restart during BACKUP (due to lack of VMS TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS begin "COPYING" (if only mounted from 1 node); or unexplained DISK MOUNT-VERIFY messages. These symptoms are simply the result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery from the VC-CLOSURE with the HSCxx. The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically recover the virtual-circuit within 5-10 seconds. Although this recovery is automatic, significant BACKUP time is wasted in re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE; and significant SHADOW-SET performance is lost during a "COPY". The customer may be legitamately and justifiably concerned... * DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !! * PROBLEM DESCRIPTION: An intensive cross-functional CI/HSC Engineering and CSSE investigation has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURE problem. To understand the 3 causes, a definition of "RTNDAT/CNF TIMEOUT" is required. The HSCxx K.CI uses a 3-second timer on each of its oldest CI data-transfers (usually a 4-8 block fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT", implying that the VAX CI-PORT has not returned the expected "RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST- PACKET) packet within 3 seconds. 1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG: A microcode bug in the KCI supervisor loop stalls SNDDAT packet processing, when DMA_CREDITS for DATREQ are exhausted. SNDDAT and DATREQ processing should be independent. This bug is corrected by KCI V2.54 (L0107 REV-E) firmware, soon to be released as FCO for RA70 support. 2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION": Current CI-PORT command-queue prioritization may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during heavy BACKUP-applic. tape-write/disk-read activity. CI processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message- commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt servicing of COMQ0, thus indefinitely delaying DATREQ0 HSC disk-write data-requests and resulting in HSC data- transfer timeout. A "VMS SUPPORTED RESTRICTED DISTRIBUTION" PADRIVER.EXE patch is available from VAX CSSE as a short-term (6 month) workaround to this problem: THE MOST PREDOMINANT CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !! 3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit SNDDAT "LAST-PACKET" out-of-sequence, due to performance optimization which allows KCI firmware to packetize up to 3 different SNDDAT operations at 1 time on a single VC. The SNDDAT packet queue-manipulation can result in indefinite pre-emption of the oldest SNDDAT/LP packet, depending on SNDDAT "CNF" credit-return timing and queue-position. This is a low-frequency/low-impact problem, likely only occurring once or twice per-year ! There is currently no KCI firmware fix, since the optimization is desirable in most cluster CI traffic situations. STATUS: VAX CSSE has short-term workarounds for each of the above problems, intended only for critical customer situations at this time. 1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG: The KCI V2.54 firmware fix will soon be released as HSCxx FCO to L0107 module, but CSSE can supply preliminary parts for critical sites. 2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION": VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1. THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION CI-TRAFFIC LOADS, thus requiring careful characterization of CI-traffic at candidate sites. This patch is only a short- term (6-month) workaround. NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL * DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT *** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !! A cross-functional CI/VMS/HSC Engineering team is actively considering and investigating an appropriate long-term fix, which will not jeopardize performance. Pending results, VMS Engineering will formally adopt an optimized-patch or any team-recommended CI/PA-architecture changes into the next VMS major release (V5.2) and a retrofittable patch. 3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC V370 patch is available from VAX CSSE to extend the "RTNDAT/ CNF TIMEOUT" from 3 to 45 seconds, an effective workaround. This is a low-risk/impact patch, and is normally not required, but advised as a guarantee to avoid VC-Closure on political sites. HSC Engineering is developing a KCI firmware fix, likely to be included in an HSC/KCI future upgrade product. SOLUTION/WORK-AROUND: CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN CURRENT WORKAROUNDS: - Bob Brassard, VOLKS::BRASSARD, DTN 240-6492, DDD 508-474-6492; - George White, VOLKS::WHITE, DTN 240-6490, DDD 508-474-6492. The following immediate workarounds will lessen or completely avoid problem impact while awaiting VAX CSSE qualification, approval, and workarounds; or while awaiting formal Engineering release of solutions. Note that each measure will lengthen the time required for customer's daily/weekly BACKUP procedures !! + BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command buffer-count parameter to default of "3 buffers" or less in customer BACKUP command files, or BACKUP procedures. + CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the number of concurrently running TA/TUxx BACKUP tape-drives/jobs, to a number avoiding or limiting HSC VC-CLOSURE to an acceptable leve. From: VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 16-DEC-1988 19:39 To: NM%VOLKS::WHITE,MYFILE Subj: CONTENTS OF FAS$PADRIVER DIRECTORY Below are all the files necessary to implement the HSC VC-CLOSURE workarounds on any sites. Regards, Bob Brassard Directory VOLKS::FAS$PADRIVER: ($1$DUA1:[FAS_PADRIVER]) HSC70_R002_KCI_V254.FCO;1 25 16-DEC-1988 16:16:39.40 (RE,RWED,RE,RE) HSC K.CI V2.54 FIRMWARE FCO: L0107 REV-E* HSC_KCI_TIMEOUT.PATCH;1 3 16-DEC-1988 19:34:37.14 (RE,RWED,RE,RE) HSC CRONIC V370 PATCH TO EXTEND HOST-TIMEOUT TO 45-SECONDS HSC_VC_CLOSE.FOCUS;1 16 16-DEC-1988 19:33:17.98 (RE,RWED,RE,RE) HSC VC-CLOSURE FOCUS-REPORT ENTRY/PROBLEM DESCRIPTION PADRIVER_V47_MSG0.COM;2 11 16-DEC-1988 15:16:28.37 (RE,RWED,RE,RE) VMS-4.7 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION PADRIVER_V47_MSG0.EXE;2 40 15-DEC-1988 19:49:53.25 (RE,RWED,RE,RE) VMS-4.7 PATCHED PADRIVER.EXE IMAGE PADRIVER_V50_MSG0.COM;2 11 16-DEC-1988 15:15:20.76 (RE,RWED,RE,RE) VMS-5.0 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION PADRIVER_V50_MSG0.EXE;2 46 15-DEC-1988 19:50:01.32 (RE,RWED,RE,RE) VMS-5.0 (ALSO 5.0-1, 5.0-2, 5.1 FT) PATCHED PADRIVER.EXE IMAGE Total of 7 files, 152 blocks. ! VMS-5.0-x PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE ! ------------------------------------------------------------- ! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88 ! ! WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING ! VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER ! PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN ! APPLICATION CI-TRAFFIC LOADS !! ! ! SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch. ! Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492, ! DDD 508-474-6492; or George White) with any problems. ! ! VERSION APPLICABILITY: This patch * ONLY * applies ! to VMS-5.0 distributed PADRIVER.EXE (also used for V5.0-1 ! V5.0-2, and current V5.1 FT sites) with this "image ! ident & link-date" (ANAL/IMAGE PADRIVER.EXE): ! ! Image Identification Information ! ! image name: "PADRIVER" ! image file identification: "X-9" ! link date/time: 8-APR-1988 05:41:19.56 ! linker identification: "04-92" ! ! ECO50 RRB0050 (R.R.Brassard, CSSE) 15-DEC-88 ! MODULE: SCSXPORT.MAR of PADRIVER.EXE ! ! PROBLEM: Current CI-PORT command-queue prioritization ! may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0) ! service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT" ! VC-closure on disk-writes, during heavy BACKUP-applic. ! tape-write/disk-read activity. CI processing of Tape- ! writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/ ! COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt ! servicing of COMQ0, thus indefinitely delaying DATREQ0 ! HSC disk-write data-requests and resulting in HSC data- ! transfer timeout: currently defined in V370 CRONIC at ! 3 seconds. ! ! SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC) ! closures are only reported with HSC "OUTBAND & ERROR" ! level at "INFO" (default = ERROR). The first customer ! indication may only be "tapes rewinding/restarting", ! "shadow-set copying", or "mount verification" messages ! during heavy multiple concurrent disk/tape BACKUP ! activity. ! ! FIX: Modify SCS$FPC_SENDMSG routine to direct all CI SYSAP- ! MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND- ! QUEUE, instead of current COMQH (CI COMQ1). Therefore, ! new MSCP command messages (and unintentionally all SYSAP ! MSGs) will only be sent if CI can service COMQ0, effectively ! throttling CI data-transfer work to the rate at which CI ! can send new MSCP commands to HSCxx; thus guaranteeing ! reasonable COMQ0 service latency. ! ! ** PERFORMANCE IMPLICATIONS ! WARNING: This patch requires VAX CSSE authorization for ! implementation, due to cluster performance risks. ! Significant reduction of CI's sequenced-message I/O ! (SYSAP MESSAGEs sent) performance, of up to 65%, will ! occur under CI-port data-transfer saturation: approx. ! 1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for ! CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/ ! CPU combinations. DCL "$ MONITOR SCS" (KB_MAP) provides ! an instantaneous CI data-transfer measurement; VPA and ! MONITOR/RECORD can be used for long-term monitoring. ! ! Sequenced messages are used by VMS for LOCK_MGR, CLUSTER ! CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR ! issuing most of these messages. Increased LOCK_MGR "lock ! granting" latencies will directly impact cluster-wide ! file/record/database I/O applications, since LOCK "MASTERing" ! and LOCK "DIRECTORYing" is a distributed function within a ! cluster. In other words, even with this patch on only 1/offline ! node, message slowdown will impact MASTER/DIRECTORY functions ! performed on behalf of other cluster nodes. ! ! Sequenced-message I/O reduction is especially dependent on ! disk-write (DATREQ0) data-transfers, which also use COMQ0. ! This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET ! datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write ! HSC data-requests) and used by VMS for CI-polling. Therefore, ! SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0 ! (from HSC) and VMS polling, instead of before (higher priority) ! this activity on COMQ1 without this patch. ! ! Under non-saturated CI-port data-transfer conditions, this ! patch should only result in a 5% sequenced-message rate ! reduction. Of benefit, this patch may significantly improve ! disk-write performance during heavy mass-storage I/O activity. ! Datagrams (used mostly for DECNET) will also benefit. ! ! INSTALLATION: ! 1. COPY this PATCH command file (PADRIVER_V50_MSG0.COM) to ! work-directory. ! 2. COPY SYS$LOADABLE_IMAGES:PADRIVER.EXE to work area. ! 3. APPLY PATCH: "$ @PADRIVER_V50_MSG0.COM" or type in below ! patch-commands. Verify patch correctly installed: use ! ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text. ! 4. COPY PADRIVER.EXE SYS$COMMON:[SYS$LDR]PADRIVER.EXE. If ! patch only intended for 1 system, copy to SYS$SPECIFIC: ! [SYS$LDR]PADRIVER.EXE. ! 5. REBOOT SYSTEM, coordinating with customer. ! ! BEGINNING OF PATCH COMMANDS.... ! ------------------------------- ! $ PATCH PADRIVER.EXE SET ECO 50 VERIFY/INSTRUCTION 1FA8 "MOVW #04,B^0F2(R2)" EXIT REPLACE/INSTRUCTION 5147 "BSBW 1F3B" EXIT "BSBW 1FA8" EXIT UPDATE EXIT $ EXIT $ ! $ ! END OF PADRIVER PATCH FILE ! VMS-4.7 PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE ! ------------------------------------------------------------- ! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88 ! ! WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING ! VAX CSSE APPROVAL AND AUTHORIZATION !! SIGNIFICANT CLUSTER ! PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN ! APPLICATION CI-TRAFFIC LOADS !! ! ! SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch. ! Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492, ! DDD 508-474-6492; or George White) with any problems. ! ! VERSION APPLICABILITY: This patch * ONLY * applies ! to VMS-4.7 distributed PADRIVER.EXE with this "image ! ident & link-date" (ANAL/IMAGE PADRIVER.EXE): ! ! Image Identification Information ! ! image name: "PADRIVER" ! image file identification: "X-3" ! link date/time: 22-MAY-1987 23:50:26.53 ! linker identification: "04-00" ! ! ECO50 RRB0050 (R.R.Brassard, CSSE) 15-DEC-88 ! MODULE: PAFPCALL.MAR of PADRIVER.EXE ! ! PROBLEM: Current CI-PORT command-queue prioritization ! may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0) ! service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT" ! VC-closure on disk-writes, during heavy BACKUP-applic. ! tape-write/disk-read activity. CI processing of Tape- ! writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/ ! COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt ! servicing of COMQ0, thus indefinitely delaying DATREQ0 ! HSC disk-write data-requests and resulting in HSC data- ! transfer timeout: currently defined in V370 CRONIC at ! 3 seconds. ! ! SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC) ! closures are only reported with HSC "OUTBAND & ERROR" ! level at "INFO" (default = ERROR). The first customer ! indication may only be "tapes rewinding/restarting", ! "shadow-set copying", or "mount verification" messages ! during heavy multiple concurrent disk/tape BACKUP ! activity. ! ! FIX: Modify FPC$SENDMSG routine to direct all CI SYSAP- ! MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND- ! QUEUE, instead of current COMQH (CI COMQ1). Therefore, ! new MSCP command messages (and unintentionally all SYSAP ! MSGs) will only be sent if CI can service COMQ0, effectively ! throttling CI data-transfer work to the rate at which CI ! can send new MSCP commands to HSCxx; thus guaranteeing ! reasonable COMQ0 service latency. ! ! PERFORMANCE IMPLICATIONS ! WARNING: This patch requires VAX CSSE authorization for ! implementation, due to cluster performance risks. ! Significant reduction of CI's sequenced-message I/O ! (SYSAP MESSAGEs sent) performance, of up to 65%, will ! occur under CI-port data-transfer saturation: approx. ! 1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for ! CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/ ! CPU combinations. DCL "$ MONITOR SCS" (KB_MAP) provides ! an instantaneous CI data-transfer measurement; VPA and ! MONITOR/RECORD can be used for long-term monitoring. ! ! Sequenced messages are used by VMS for LOCK_MGR, CLUSTER ! CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR ! issuing most of these messages. Increased LOCK_MGR "lock ! granting" latencies will directly impact cluster-wide ! file/record/database I/O applications, since LOCK "MASTERing" ! and LOCK "DIRECTORYing" is a distributed function within a ! cluster. In other words, even with this patch on only 1/offline ! node, message slowdown will impact MASTER/DIRECTORY functions ! performed on behalf of other cluster nodes. ! ! Sequenced-message I/O reduction is especially dependent on ! disk-write (DATREQ0) data-transfers, which also use COMQ0. ! This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET ! datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write ! HSC data-requests) and used by VMS for CI-polling. Therefore, ! SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0 ! (from HSC) and VMS polling, instead of before (higher priority) ! this activity on COMQ1 without this patch. ! ! Under non-saturated CI-port data-transfer conditions, this ! patch should only result in a 5% sequenced-message rate ! reduction. Of benefit, this patch may significantly improve ! disk-write performance during heavy mass-storage I/O activity. ! Datagrams (used mostly for DECNET) will also benefit. ! ! INSTALLATION: ! 1. COPY this PATCH command file (PADRIVER_V47_MSG0.COM) to ! work-directory. ! 2. COPY SYS$SYSTEM:PADRIVER.EXE to work area. ! 3. APPLY PATCH: "$ @PADRIVER_V47_MSG0.COM" or type in below ! patch-commands. Verify patch correctly installed: use ! ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text. ! 4. COPY PADRIVER.EXE SYS$COMMON[SYSEXE]:PADRIVER.EXE. If ! patch only intended for 1 system, copy to SYS$SPECIFIC: ! [SYSEXE]PADRIVER.EXE. ! 5. REBOOT SYSTEM, coordinating with customer. ! ! BEGINNING OF PATCH COMMANDS.... ! ------------------------------- ! $ PATCH PADRIVER.EXE SET ECO 50 VERIFY/INS 2485 "SUBL2 W^0B4(R4),R2" EXIT REPLACE/INSTRUCTION 1627 "BSBW 2450" EXIT "BSBW 2485" EXIT UPDATE EXIT $ EXIT $ ! $ ! END-OF-PATCH $ !======================================================================= From: CVG::BRASSARD 19-DEC-1988 12:11 To: VOLKS::BRASSARD Subj: HSC Timeout From: SSDEVO::ENGLUND "Glenn Englund, HSC Engineering Manager" 16-DEC-1988 19:13:15.83 To: CVG::TOMASWICK,KOLLER,MOE,SHIVELY,BEAN,LARY,NM%VOLKS::WHITE,CVG::BRASSARD CC: Subj: No change to HSC host timeout - it should remain at 3 seconds Unfortunately the suggested change to raise the HSC host timeout value from 3 seconds to 45 seconds was never tested (so I am told). I guess it fell through the cracks out here. Since it was not tested, it seems that the right thing to do is to leave it unchanged, rather than delay the release of this patch in order to test the change. I would recommend changing the following note from George White to remove any reference to a change to the HSC's host timer. - Glenn From: 27054::WHITE "VAX CSSE SUPPORT 12-Dec-1988 1303" 12-DEC-1988 11:10:20.30 To: @FAS CC: Subj: FAS,FORD,IRVINV TRUST VC CLOS. STATUS ----------------------------- ! d ! i ! g ! i ! t ! a ! l ! I N T E R O F F I C E M E M O ----------------------------- TO: DISTRIBUTION DATE: 12 DEC 88 FROM: GEORGE WHITE DEPT: MID-RANGE VAX CSSE DTN: 240-6490 LOCN: AET 1-1/6 ENET: VOLKS::WHITE DECMAIL: WHITE @VOLKS @AET SUBJECT: FAS - (CXO2335), FORD - (CXO2677), IRVING TRUST VC CLOSURE STATUS 9 DEC 88, FROM BOB BRASSARD The 2nd cause of the HSC VC Closure (RTNDAT/CNF TIMEOUT during heavy BACKUP between 1 85/87/88xx and multi-HSC disk/tapes) was isolated about 3 weeks ago. As you will remember, the 1st bug was with HSC KCI ucode: SNDDAT packets would not be sent if DATREQs were in DMA_CREDIT stall... essentially KCI supervisor loop (scheduler) bug. The 2nd problem involves the use of the CI's 4 prioritized commandd queues: COMQ0 (low), COMQ1, COMQ2, and COMQ3 (highest). VMS sends messages normally on COMQ1 (including MSCP) except for VC-Closure on COMQ0; disk-writes use DATREQ0 (COMQ0), initiated by HSC; tape-writes use DATREQ2 (COMQ2). If CI-port is transferring data at its limit (1.2 Mb for CIBCA on 85/87/88xx), DATREQ2/COMQ2 and MSCP-MSG/COMQ1 activity will pre-empt CI ever looking at COMQ0 (disk-write DATREQ0); COMQ0 latencies as high as 90-seconds were observed. The short term solution will be a PADRIVER patch to put all messages on COMQ0. This way, if CI is too busy to look at COMQ0, HSC will run out of work (reads/writes), thus throttling data-transfers until CI works on more messages from COMQ0. The VMS PADRIVER patch is only a short-term solution. The CI-Architectual committee is re-investigating CI-PORT prioritization algorithms, with possible major scheduling changes for future CI products. The PADRIVER patch was just tested during the past 2 weeks for performance impact on message rates: negligible except for data-xfer saturated CI-ports where message rates dropped 60%. I will be generating a work-around package/procedures/documentation for the 3 required fixes: PADRIVER patch, KCI V2.54 ucode (L0107-YA @ Rev-E2/3/4), CRONIC V370 patch to extend host data-xfer timeout from 3 to 45 seconds (workaround for HSC SNDDAT pipelining/sequencing problem: finishes SNDDATs out of order sometimes). BTW, KCI V2.54 will soon be released as HSC50/70 FCO required for RA70 drives; initial RA70s will include 2 sets of 12-PROMS each. Best Regards, Bob Brassard ! CVG FAS-TESTING INTEREST DISTRIBUTION LIST: CVG_FAS.DIS ! ======================================================= NM%SSDEVO::LARY NM%SSDEVO::BEAN NM%SSDEVO::SHIVELY NM%SSDEVO::MOE NM%SSDEVO::KOLLER NM%SSDEVO::ENGLUND NM%SSDEVO::REPKA NM%SSDEVO::ELMER NM%HYEND::BLYONS NM%CVG::TODHUNTER NM%ACTIVE::GOELZ NM%CSSE32::GOELZ NM%VCSESU::TODHUNTER NM%HYEND::WERTH NM%HYEND::HJAKIELA NM%HYEND::AVERY N%INANNA::BALKOVICH NM%HYDRA::BOAEN NM%HYDRA::NIELSEN NM%HYDRA::HAYAKAWA NM%FROBUS::CONNOR NM%CVG::TOMASWICK NM%CVG::VIEIRA NM%CVG::BAKER NM%VOLKS::FREEMAN NM%VOLKS::WHITE NM%VOLKS::BRASSARD NM%CVG::BRASSARD NM%PYONS::BRANNON NM%CSSE::MILLER NM%CSSE::HOWINGTON NM%SUPVAX::BLENDINGER NM%PTOVAX::PEARLMAN MTS$"FHO::BILL NOSEWORTHY" MTS$"OHF::RICH LYONS" MTS$"CYO::ROBERT B LEWIS" MTS$"PTO::STEPHEN STEVENS" MTS$"PTO::BILL REIGHT" From: STAR::OSHAUGHNESSY "Dan, ZKO3-4/U14, DTN 381-1268, pole T/B8" 16-DEC-1988 11:26 To: VOLKS::BRASSARD,VOLKS::WHITE,CHIN,FOX,THIEL Subj: VMS SUPPORT OF RESTRICTED DISTRIBUTION OF FAS PATCH DIGITAL INTEROFFICE MEMORANDUM TO: Bob Brasssard DATE: December 15, 1988 George White FROM: Dan O'Shaughnessy DEPT: 354 EXT: 381-1268 LOC: ZK03-4/U14 ENET: STAR::OSHAUGHNESSY cc: T. Chin M. Fox D. Thiel SUBJECT: VMS Support of FAS Patch VMS supports the restricted distribution of the "FAS" patch written by Bob Brassard. Suitable warnings concerning the impact to a system's sequenced message I/O performance (con- nection manager and lock manager traffic) will accompany the patch. Bob Brassard will manage the distribution of the patch to insure that the performance impact on a candidate site has been carefully considered. The patch should not be published or made generally available for at least 6 months. This time period should provide us with sufficient infor- mation on how often the problem occurs on customer sites and of any unintended side effects the patch may have. A longterm solution should be provided by the SCA and CI Architecture groups. Another meeting including VMS,CSSE,SASE and architecture representatives should be planned in 3 months, March 1989, to discuss and reevaluate the situation. At this time a decision should be made to allow the general (un- restricted release) release of the "FAS" patch in June 1989 or whether some other "midterm solution" is needed before a "longterm" architectured solution is available. From: SSDEVO::ELMER "Randy Elmer MLDS CSSE CESG MGR. 522-3874 Being flexible means never being bent out of shape 25-Oct-1989 1614" 25-OCT-1989 18:24:12.77 To: MOE CC: RON,GARY,VOLKS::BRASSARD Subj: V39A access over the net Karen The 4x4 today agreed that when we ship to SSB V39A saying this is good code we should also it on the net and make it available to all internal customers for early exposure to V39A. We also agreed that for a handful of customers that may have a specific CLD open that V39A will fix we hand manage those site and provide an early release as well across the NET. Can we get Stacy to place V39A into HSC$ENETKITS with the release notes and remove V390/V394? Thanks Randy From: GENRAL::SSDEVO::ELMER "Randy Elmer MLDS CSSE CESG MGR. 522-3874 Being flexible means never being bent out of shape" 9-NOV-1989 18:34:24.98 To: GENRAL::VOLKS::BRASSARD CC: RON Subj: RE: HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ? Bob I have answered your questions below. Randy ============================================= From: GENRAL::VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6 09-Nov-1989 1813" 9-NOV-1989 16:15:30.05 To: GENRAL::ELMER,SSDEVO::REPKA,MYFILE CC: Subj: HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ? Hi Randy & Ron, I have not seen any announcement on ENET availability of V39A. Is this now copyable on ENET ? >>> Yes, but we need to hand manage it until the SSB release date of 18 >>> Decemeber, incase we find a bug that needs to have the code recalled. Only >>> provide this code to the sites that are of political nature or we feel would >>> be a good field test. >>> ENET location is SSDEVO::HSC$FIELDTEST: Are there release notes to copy with it ? >>> Yes in the same location. Do we still need Field Test Agreement ? >>> No the 4X4 agreed that because it did go to SSB and that we would hand >>> manage the code no field test agree in needed. We just need to track it >>> and ensure it does not become public. Neither have I seen Bob Lyons FAS meeting minutes with the approval status on the release/SDC-submission of CRONIC V39A with FAS fix. Have you seen any status ? Rumor has it approved. >>> The code was submitted to SSB with the FAS fix. I to did not see the >>> minutes. BTW, I am currently on CLD in Hartford, Ct.; so mail response may be slow. Best Regards, Bob Brassard From: SSDEVO::REPKA "RON REPKA HSC CSSE 522-6195" 14-NOV-1989 14:05:33.92 To: VOLKS::BRASSARD CC: Subj: V39A Release Notes HSC VERSION V3.9A SOFTWARE RELEASE NOTES Order Number: AA-GMFAH-TK These release notes contain a summary of the features in the V3.9A software. digital equipment corporation maynard, massachusetts January, 1990 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of soft- ware on equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Copyright (c)1990 by Digital Equipment Corporation All Rights Reserved. Printed in U.S.A. The postpaid READER'S COMMENTS form on the last page of this document requests the user's critical evaluation to assist in preparing future documentation. The following are trademarks of Digital Equipment Corporation: DEC DIBOL UNIBUS DEC/CMS EduSystem VAX DEC/MMS IAS VAXcluster DECnet MASSBUS VMS DECsystem-10 PDP VT DECSYSTEM-20 PDT DECUS RSTS DECwriter RSX DIGITAL This document was prepared using VAX DOCUMENT, Version 1.0 Contents 1 INTRODUCTION 1 2 PREINSTALLATION CONSIDERATIONS 1 2.1 Software Restrictions 1 2.1.1 HSC50 Restricted to One Operator-Loaded Utility 2 2.1.2 Maximum of 12 Tape Drives and 12 Formatters 2 3 HSC VERSION 3.9A SOFTWARE INSTALLATION 3 3.1 Preinstallation Backup 3 3.2 Software Installation Procedure 3 4 FEATURES IN HSC SOFTWARE VERSION 3.90 6 4.1 Disk Server 7 4.2 Utilities 7 5 MISCELLANEOUS ENHANCEMENTS 10 6 MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A 10 6.1 Disk Server 10 6.2 Tape Server 11 6.3 Block Size Recommendation For Non-TA90 Tape Drives 12 6.4 Utilities 14 6.5 Miscellaneous Changes 14 7 HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR MESSAGES 15 7.1 HSC Version 3.90 Software Exception Codes 15 7.2 HSC Version 3.90 Software Error Messages 16 7.3 Operator Control Panel Fault Codes 17 8 TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES 17 8.1 VTDPY Operation 18 8.1.1 Using the VTDPY Display 19 8.1.2 VTDPY Error Messages 27 8.2 Volume Shadowing 27 8.3 Exception Codes 29 1 INTRODUCTION The HSC Version 3.9A software release package contains these HSC Version 3.9A Software Release Notes and the Version 3.9A software distribution media. The software for the HSC70 is distributed on diskette. The software for the HSC50 is distributed on two TU58 cassettes. These release notes document the following: o All information in the HSC Version 3.90 Software Release Notes. o Maintenance changes provided in Version 3.9A software to cor- rect identified problems in the Version 3.90 software. 2 PREINSTALLATION CONSIDERATIONS This section contains information you should consider before upgrading your HSC to Version 3.9A software. 2.1 Software Restrictions This section describes some restrictions of the HSC Version 3.9A software. 2.1.1 HSC50 Restricted to One Operator-Loaded Utility Because of the smaller memory size of the HSC50, Version 3.9A software allows you to run only one utility at a time. This limi- tation ensures that sufficient memory is available to run Device Integrity Tests when scheduled by the CRONIC executive. If you at- tempt to run a second utility, the following message is displayed: KMON-F All Utility Partitions in Use Start the second utility when the first utility has completed, or press CTRL/C to terminate the first utility. This change does not affect the operation of Version 3.9A software on the HSC70, which retains its ability to run two utilities simultaneously. 2.1.2 Maximum of 12 Tape Drives and 12 Formatters HSC Version 3.9A software supports a maximum of 12 tape drives and 12 formatters on each HSC. For example, 3 TA78 formatters with 4 tape drives on each formatter reach the 12-tape drive configuration limit. If more than 12 formatters or drives are configured on an HSC, one of the following messages is displayed: No tape formatter structures available for Requestor x Port y or: No tape drive structures available for Requestor x Port y When the HSC boots, resources are allocated to formatters on requestors in ascending requestor priority order until the limit of 12 tape drives and 12 formatters is reached. Resources are allocated among tape drives on the same formatter according to the arbitrary order in which the drives became known to the HSC. 3 HSC VERSION 3.9A SOFTWARE INSTALLATION Use the following procedure to install the software supplied in this kit. 3.1 Preinstallation Backup Before installing the software, use a blank diskette or cassette to make a backup copy of the software. Instructions for the copy procedure are in Chapter 10 of the HSC User Guide. If you need additional backup copies, order blank, formatted RX33 diskettes from the Software Distribution Center. Extra TU58 cassettes can be ordered from the DECdirect catalog. 3.2 Software Installation Procedure NOTE If your HSC cluster has RA90 disk drives connected to it, use the SHOW DISKS command to verify that the RA90 drives report a minimum software revision level of MC = 10. If the drives do not report the minimum software revi- sion, ask your Digital Field Service Representative to install FCO RA90X-O001 prior to installing HSC Version 3.9A software. Use the following procedure to install the software: 1. On each HSC being upgraded to Version 3.9A software: o Press CTRL/C. o Enter the SHOW SYSTEM command. 3 This produces a hard copy of system parameters, as shown in the following example: <CTRL/C> HSCxx>SHOW SYSTEM <RETURN> 17-JUL-l988 14:42:43.41 Boot: 17-JUL-1988 11:31:11,41 Up: 3:11 Version: V39A System ID: %X0000000000B7 Name: HSC006 Front Panel: Enabled HSC Type: HSC70 Console Dump: Enabled Load Dump: Disabled Automatic DITs: Enabled Periodic DITs: Enabled, Interval = 1 Disk Allocation Class: 0 Tape Allocation Class: 0 Start-up Command File: Disabled Disk Drive Controller Timeout: 2 seconds Maximum Tape Drives: 12 Maximum Formatters: 12 SETSHO-I Program Exit 2. Print a hard copy of these system parameters if your system does not automatically produce a copy. Use this copy later in the procedure to reset your HSC's parameters. 3. If your cluster does not have failover capabilities, shut down the cluster and perform Steps 6 through 17. 4. Failover all disk and tape drives to the alternate HSC. Make sure none of the tape or disk drives are on line to the HSC you are upgrading to Version 3.9A software. The failover procedure is described in the Guide to VAXclusters. 5. After successful failover, set the Online button on the HSC operator control panel to the out position. 6. Open the HSC front panel, remove all old load media, and in- stall the new software system/utility media in the HSC load device. The new software system/utility media must be write enabled. 4 Instructions for loading the software are in the following sections of your HSC User Guide: o If you have an HSC70, refer to Section 4.2. o If you have an HSC50, refer to Section 4.3. 7. Press and release the Init button as you hold in the Fault button. Hold in the Fault button until the following message appears: INIPIO-I Booting 8. When booting has completed: o Press CTRL/C. o Enter the RUN SETSHO command at the HSC> prompt. o If the SETSHO> prompt is not displayed, review the pre- vious steps to ensure that you have properly installed the software, booted the HSC, and entered the RUN SETSHO command. 9. At the SETSHO> prompt, enter the SHOW SYSTEM command to print a hard copy of the default parameters on the new load media. 10.Compare this list of default parameters to the list you made in Step 1 for this HSC. At the SETSHO> prompt, use the following commands to reset the parameters to their former values: SETSHO> SET NAME HSCaaaaaa <RETURN> SETSHO> SET ID %Xnnnnn <RETURN> SETSHO> SET ALLOCATE DISK n <RETURN> SETSHO> SET ALLOCATE TAPE n <RETURN> SETSHO> SET SERVER DISK DRIVE_TIMEOUT=n <RETURN> Chapter 6 of your HSC User Guide contains detailed descriptions of how to set each of these parameters. 11.Set any other parameters required by your system configuration. When all parameters are set, enter the EXIT command at the SETSHO> prompt. 5 12.The HSC prompts ask if you wish to reboot the HSC. Enter YES. 13.After the HSC reboots, press CTRL/C and enter the SHOW SYSTEM command. 14.Compare the new parameters with the ones on the list you made in Step 1 to verify that all parameters are the same. If the parameters are not identical: o Enter the RUN SETSHO command at the HSC> prompt. o Return to Step 10 and set the parameters that need changing. o Continue with this procedure from Step 11. 15.If all parameters are correct, press the operator control panel Online button to the in position. This allows the cluster to re-establish connections to this HSC. 16.Enter the SHOW VIRTUAL_CIRCUITS command to verify that all connections have been made. This command lists nodes that have established virtual circuits with the HSC. Check that all active hosts have established virtual circuits to this HSC. If they have not, reboot the HSC and repeat this step. 17.Failover the drives to the HSC on which you have just installed the new software. After all units have failed over, install the new software on the alternate HSC. After making the hard copy list of system parameters in Step 1, go to Step 5 and complete the software installation procedure. __________________________________________________________ 4 FEATURES IN HSC SOFTWARE VERSION 3.90 This section describes the features in the Version 3.90 software. 6 __________________________________________________________ 4.1 Disk Server The HSC disk environment has been improved in the following ways: o Some potential causes of IOT 4076 crashes have been located and resolved. o A problem in the disk path that caused the HSC to report databus overrun errors has been resolved. o HSC V3.90 software supports the maximum number of shadow sets allowed by your version of VMS. Refer to Section 8.2 for fur- ther discussion of Volume Shadowing. Detailed information on VMS shadowing support is found in the VAX/VMS Volume Shadowing Manual. __________________________________________________________ 4.2 Utilities DKCOPY A new message for DKCOPY clears confusion when the requested tar- get device is either in use or nonexistent. Refer to Section 7.2 for a description of this error message. A new message for DKCOPY warns you when the target device of a disk-to-disk copy is hardware write protected. Refer to Section 7.2 for a description of this error message. SETSHO SETSHO has been updated as follows: o SET MAX_FORMATTER -- Changed to improve performance. o SET MAX_TAPES -- Changed to improve performance. o SET SERVER -- Changed to improve performance. o SET PROMPT -- Allows you to select your own prompt on the HSC. o SET REQUESTOR -- Allows you to select the correct data channel microcode. 7 o SET RESTART CLEAR -- Renamed to SET EXCEPTION CLEAR to more closely describe the command function. This change deletes the SET RESTART command. o SET SECTOR_SIZE -- Deleted and the default sector size set to 512 bytes. o SET OUTBAND/SHOW OUTBAND -- Merged with the SET ERROR and SHOW ERROR commands. o SET DEVICE [NO]HOST_ACCESS -- Changes the state of the requestor when you exit SETSHO instead of when the command is entered. This allows you to exit SETSHO with a CTRL/Y if you want the requestor to remain in the previous state. o SHOW REQUESTOR -- Displays information about each requestor connected to the HSC. o SHOW CONNECTIONS -- Displays information about all virtual circuits and connections the HSC has with other nodes. BACKUP/RESTOR BACKUP/RESTOR has been updated as follows: o A Write Ring Missing problem that caused BACKUP to abort has been resolved. BACKUP now allows you to correct the problem and continue without aborting the backup operation. o BACKUP/RESTOR no longer supports 576 byte records. o New features have been added to enhance tape unloading: -- The tape drive will not unload if you press CTRL/C before a tape drive has started reading or writing operations. -- If a tape drive finishes a backup or restore operation without using all of the mounted tapes, the extra tape is not unloaded. This reduces the amount of wasted tape because it identifies an empty tape by not unloading it. 8 o New prompts improve performance and reduce operator interven- tion. -- You may run the BACKUP and RESTOR utilities without operator interaction. The HSC prompts with either of the following: Would you like to run BACKUP with "NO OPERATOR"? [N] or Would you like to run RESTOR with "NO OPERATOR"? [N] If you press RETURN or answer N, the utility continues to prompt you for appropriate responses. If you leave the terminal during a backup or restore operation, the utility times out 5 minutes after a query and aborts. If you answer Y, further system queries are disabled. The utility bypasses all further prompts and uses the default responses instead of your inputs. This feature allows you to leave the terminal and perform other tasks without further interaction with the utility. NOTE If you disable the queries, you may not know when the volume to back up has expired or when the save-set of a restore has expired. However, the warning message for this condition appears almost immediately after the NO OPERATOR query. When you see this message, you may press CTRL/C to abort the operation. Otherwise, the operation continues after a 10-second delay. -- When BACKUP encounters a tape reel that has reached its limit of media errors (hard write errors), it prompts you to increase the error threshold with the following message: Do you wish to increase the media error limit for this tape reel and continue? (Y or N) [N]: 9 Press RETURN or answer N if you do not wish to change the error threshold. You are then prompted to change the tape reel. Answer Y to increase the threshold and continue. You are then prompted for the increased error limit. After you enter the new limit, the operation continues. -- HSC70 users may now perform a backup or restore operation using a 16K-byte record size instead of the default 8K-byte record size. This feature is described in Chapter 7 of your updated HSC User Guide. __________________________________________________________ 5 Miscellaneous Enhancements Node Lockout A node lockout problem in the Diagnostics and Utilities Protocol (DUP) Server is now resolved. __________________________________________________________ 6 MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A This section describes the maintenance changes in the Version 3.9A software. __________________________________________________________ 6.1 Disk Server The following disk server changes have been implemented in the Version 3.9A software: o A possible problem in which an MMU crash may result if a forced error is detected on a 2- or 3-member shadow set is resolved. This fix also corrects the possible problem in which a repair operation may not be performed as expected on an LBN with forced error set. 10 o A potential cause of excessive positioner errors and IOT 4076 crashes has been corrected. o An extremely rare problem in which a primary revectored block is not handled properly has been corrected. o A possible problem of virtual circuit closures on disks during extremely heavy tape activity has been corrected. __________________________________________________________ 6.2 Tape Server The following tape server changes have been implemented in Version 3.9A software: o A possible problem in which failover may not occur in the unlikely event that an operator releases a selected port button on a tape drive while the drive is transferring a heavy data load has been resolved. o The ILTAPE diagnostic, in all circumstances, recognizes a TA90 tape drive and prompts for write memory region parameters. o The restriction of no more than one TA90 tape formatter on the same requestor has been lifted. o An error flag problem that caused VAXsimPLUS to erroneously signal alarms on tape drives that were actually operative is resolved. o When you run heavy TA90 loads and either a Cache Data Lost or Cache Busy condition occurs, an IOT 6037 crash will no longer occur. o The default drive timeout has been increased from 30 to 80 seconds to provide a workaround for the problem of drives unexpectedly changing to the AVAILABLE state. This may cause failover to take longer. 11 o The following changes have been made in pipeline error report- ing: - A problem that caused improper reporting of pipeline errors has been corrected. - The severity level of the message that reports a pipeline error has been changed from ERROR to WARNING. - When an application or operating system issues a tape com- mand with the inhibit error recovery condition set, the HSC treats a pipeline error as recoverable. For example, if a pipeline error occurs when you are running VMS Backup, the HSC recovers the error even though the default is to inhibit the error recovery. A pipeline error is NOT a tape error. __________________________________________________________ 6.3 Block Size Recommendation For Non-TA90 Tape Drives The HSC CI interface can be significantly faster than the host CI adapter when performing multiple backups. Therefore, it is possi- ble that all of the CI bandwidth can be used by tape traffic and cause data timeouts and virtual circuit closures. To prevent this, a change in the V39A software more evenly distributes the data flow over the CI to the hosts without noticeably affecting the overall data throughput. Because of this change, it is strongly recommended that you use the following operational parameters if you wish to use a block size greater than 24Kb with VMS Backup (the default is 8Kb): o If only one requestor is configured for tape in the HSC, the maximum recommended block size is 48Kb. o If two requestors are configured for tape in the HSC, the maximum recommended block size is 32Kb. o If more than two requestors are configured for tape, the maxi- mum recommended block size is 24Kb. 12 NOTE These guidelines apply to the number of tape requestors configured for tape in the HSC (not the number of tape requestors actively transferring data). They DO NOT apply to the cached TA90 tape drive. It is still recommended that a 64Kb block size be used with the TA90. Use the following procedure to determine how many requestors are configured for tape in your HSC: 1. Press CTRL/Y on the HSC console or terminal. 2. Type SHOW REQUESTOR. Each requestor displayed as type K.sti is configured for tape. Failure to follow these recommendations can result in pipeline and drive- detected EDC errors when running multiple backup streams. Pipeline errors are the result of the CI and host momentarily not being able to supply data fast enough to the HSC during tape writes. Due to the bursty nature of TA78, TA79, and TA81 tape transfers, it is not uncommon to see an occasional pipeline error. However, the more even distribution of data flow between tape and disk in the V39A software will cause these errors to be seen much more frequently. Following the recommended block size will eliminate the possibility of these errors occurring and will have minimal performance impact. NOTE Pipeline errors DO NOT indicate any hardware or software fault in the HSC or host. If a pipeline error occurs, the VMS Error Log prints the following message for the MSLG$EVENT field in the error log entry: Data OVRFLW due to pipeline error 13 The associated drive-detected EDC error can be recognized by a code of 0440 in the ERRN1/ERRNUM field in the error log entry. It will also have the same command reference number as the pipeline error. To eliminate pipeline errors, reduce your block size ac- cording to the recommendations provided. These errors are fully recoverable. __________________________________________________________ 6.4 Utilities BACKUP You can now use the "NO OPERATOR" feature when you run BACKUP. This feature is described in Chapter 7 of the HSC User Guide. When you run BACKUP and reach the media error limit, the following conditions occur: o If you have chosen to run a backup operation without operator interaction, the media error limit is automatically increased and the backup operation continues. o If you have chosen to run a backup operation with operator interaction, BACKUP prompts you to increase the media error limit. __________________________________________________________ 6.5 Miscellaneous Changes Booting a System with a Shadowed System Disk The HSC polling algorithm has been changed. This provides a workaround to decrease the system boot time when booting from a shadowed system disk when the virtual unit for the shadow set has not yet been formed. 14 __________________________________________________________ 7 HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR MESSAGES This section lists the exception codes and error messages in the Version 3.90 software. __________________________________________________________ 7.1 HSC Version 3.90 Software Exception Codes 4115 DCB address inconsistency Facility: DISK, SDI Explanation: While processing an error on a seek DCB, the facil- ity found an inconsistency between the current seek DCB address and the DCB address stored in the DRAT. This new crash code was created in connection with the fixes for IOT 4076 crashes. User Action: Submit an SPR with the crash dump. 4116 Bad error completion queue in DCB Facility: DISK, MSCP Explanation: The DCB error completion queue was not properly restored during DCB completion. User Action: Submit an SPR with the crash dump. 4117 No DRAT on DRAT list head when expected Facility: DISK, ERROR Explanation: No elements were found on the DRAT queue when the error process tried to remove a DRAT from the head of the queue. User Action: Submit an SPR with the crash dump. 15 __________________________________________________________ 7.2 HSC Version 3.90 Software Error Messages DKCOPY-E-INVALR--Invalid unit id. Valid range is 0 through 4094 Explanation: You have entered a unit identification number that is not in the range of 0 through 4094. User Action: Enter a unit identification number within the valid range. DKCOPY-E-OFFLINE--Specified unit is offline or nonexistent Explanation: You have entered a unit identification number that is not recognized by the system. User Action: Check the unit identification number and enter the command again. DKCOPY-F-RUNSTOP--No volume mounted or drive disabled via RUN/STOP Explanation: One or both of the drives that you are using to perform a disk-to-disk copy does not have a volume mounted or is spun down. User Action: Check to see that a volume is mounted and that both drives are spun up. DKCOPY-F-WRITEPROTECT--Unit is write protected Explanation: The target unit of a disk-to-disk copy is hardware write protected. User Action: Press and release the write protect button. Run DKCOPY again. KMON-F All Utility Partitions in Use Explanation: You attempted to run more than one utility at a time. User Action: Wait until the currently running utility has com- pleted or terminate its operation. 16 VERIFY-E-INVALR-Invalid unit id. Valid range is 0 through 4094 Explanation: You have entered a unit identification number that is not in the range of 0 through 4094. User Action: Enter a unit identification number within the valid range. __________________________________________________________ 7.3 Operator Control Panel Fault Codes Your operator control panel may display the following fault code: __________________________________________________________________ Status_Code_(Octal)__Description__________________________________ 33 Invalid hardware configuration __________________________________________________________________ This fault code indicates that the configuration of modules in your HSC is not supported. Contact your Digital Field Service Representative if this fault code is displayed on your operator control panel. Chapter 3 of your HSC User Guide contains a complete listing of operator control panel fault codes. 8 TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES This section contains important topics that are carried forward and updated from previous issues of the HSC Release Notes. You will need this information if you are a new user of the HSC. 8.1 VTDPY Operation VTDPY is a utility for gathering and displaying system statistics. VTDPY can display system throughput, status of the disk and tape drives, and utilities running on other terminals. This utility also indicates which nodes have virtual circuits, connections, and multiple connections to the HSC. NOTE Avoid running VTDPY using the VMS command SET HOST/HSC with VMS versions prior to V4.6. This utility requires a video terminal and does not display on a hard-copy printer. Either a VT100, a VT220, or a VT320, set at 9600 baud, must be attached to the EIA port on the HSC to run VTDPY. To run VTDPY, enter the following command at the HSC> prompt: HSC> RUN [device]:VTDPY [update-interval] Where device is the device holding the VTDPY program. For the HSC50, the device is DD1:, and for the HSC70, the device is DX0:. The update-interval is in seconds, from 2 to 420. If this update interval is not provided, VTDPY prompts: VTDPY-Q Interval (secs) ? If the response is outside the allowable range, VTDPY displays an error message. The higher the number for the update interval, the less the performance impact on the HSC. VTDPY terminates after you enter a CTRL/Y or a CTRL/C. The screen is cleared upon termination. The following control keys are used in VTDPY: CTRL/E--Displays tape status on the next refresh. Thereafter, the display alternates with disk status on subsequent re- freshes. CTRL/D--Displays disk status on the next refresh. Thereafter, the display alternates with tape status on subsequent re- freshes. CTRL/V--Displays host path status information (i.e., A, B, or a diamond) on the next refresh only. CTRL/W--Refreshes the screen. __________________________________________________________ 8.1.1 Using the VTDPY Display This section presents a sample VTDPY display and explains the meaning of the fields in the display. HSC70 V39A HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12 UP: 113.49 42.9% Idle 39 Work Requests/Sec 40 Sectors/Sec 0 Records/Sec Free Lists Process Pr St Time% Disk Status CTRL Blks 2269 + Kernel 16.4% 1111111111 SLCB/DCB 32 + 2 VTDPY 11 Rn 19.2% +1234567890123456789 Buffers 889 + 50 DEMON 11 Bl 0..................... 52 PDEMON 7 Bl 20A.A..........A...... Pool Sizes 54 PSCHED 13 Rn 42.9% 40..........A.A.A..... SYSCOM 1800 + 72 DISK 9 Rn 16.0% 60.AA.......O..A..O... Kernel 6504 + 110 ECC 6 Bl 80.................... Program 821120 + 120 TAPE 8 Bl 100.................... Control 32436 + 122 TTRASH 7 Bl 120.................... 124 HOST 4 Bl .9% 140...........O........ Data B/W used: .0% 126 POLLER 5 Bl 160.................... 130 SCSDIR 5 Bl .9% 180..................A. Host Connections 200A................... 111111111122222222223333333333 220.................... 0123456789012345678901234567890123456789 240.................... 0MM..C....V....M......................... 40........................................ The VTDPY display is continuously updated at the update interval you have set and it changes as the internal state of the HSC changes. These changes are made for all fields in the display, except those fields relating to HSC memory. Memory statistics are updated by pressing CTRL/W. The major fields are explained in the following paragraphs. As you read this section, refer to the VTDPY display to see where the fields are located and to the paragraphs below the sample fields to interpret the meaning of the fields. HSC70 V370 HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12 UP: 113.49 The top line, reading from left to right, shows the HSC model num- ber (HSC70); the baselevel of the operating software (V3.90); the system name (HSC001); the HSC id number, given as a hexadecimal number unique in the cluster (in this case 0000000000DD); and the system date and time. The last number on the right indicates the hours and minutes the HSC has been running since the last boot or reboot. 42.9% Idle 39 Work Requests/Sec 40 Sectors/Sec 0 Records/Sec This second line in the display shows the percentage of current P.io idle time, average number of work requests (i.e., MSCP and TMSCP) per second, number of disk data sectors transferred per second, and number of tape data records transferred per second. These numbers are normalized to match the update interval. Free Lists CTRL Blks 2269 + SLCB/DCB 32 + Buffers 889 + Pool Sizes SYSCOM 1800 + Kernel 6504 + Program 821120 + Control 32436 + This field represents the quantity of available memory and memory structures. The units used in the display are: CTRL Blks -- Blocks SLCB/DCB -- Number of structures Buffers -- Number of buffers Pool Sizes -- All are given in words of memory The numbers are usually followed by plus signs. If the numbers are followed by minus signs, the system is in memory deficit. During memory deficit, the HSC slows down and, if the deficit lasts long enough, the HSC could crash. Data B/W used: .0% This display shows the percentage of HSC data bus bandwidth used. This is an instantaneous display and may often show 0% when the HSC is busy, because the bandwidth was zero at the instant the sample was taken. Host Connections 111111111122222222223333333333 0123456789012345678901234567890123456789 0MM..C....V....M......................... 40........................................ This display indicates host connection status. The two horizontal rows of numbers below the Host Connections heading represent host node numbers 0 through 39. Each digit on the first line is read with the digit directly below it to form the numbers 10 through 39. The connection status for host node numbers above 40 is read on the last line of the display. Add the base number 40 at the far left of the last line to the number above the display to derive these host node numbers. The next line indicates the status of the host connections. A C on this line indicates one connection to that host, and an M indicates multiple connections. Because each host can make a separate connection to each of the disk, tape, and DUP servers, this field frequently shows multiple connections. In the example, nodes 0, 1, and 14 show multiple connections, and node 4 shows one connection. If no letter corresponds to the node number, that host does not have any connection to the HSC. If a V appears on that line, a Virtual Circuit only is open and no connection is present. This usually means the host is in a transitional state. The example shows node 9 with only a virtual circuit open. Host Path Status 111111111122222222223333333333 0123456789012345678901234567890123456789 0^A..^....B....A......................... 40........................................ When you press CTRL/V, the display toggles to an alternate Host Path Status display for one refresh only. This display contains CI path status information and each position can contain either a diamond symbol, an A, or a B. If one path (A or B) goes down, this display alternates on every other refresh with the Host Connections display until that path comes back. The meanings of the symbols are as follows: o A solid diamond symbol means normal operation (both paths operating). This symbol is represented in the example with a caret (^). o An A or B indicates only one CI path is operational. If an A is displayed, path A is active, but path B is not; if a B is displayed, path B is active, but path A is not. These conditions indicate a probable hardware problem. The example shows that nodes 0 and 4 have both paths operating. Nodes 1 and 14 have only path A operating, and node 9 has only path B operating. Process Pr St Time% Kernel 16.4% 2 VTDPY 11 Rn 19.2% 50 DEMON 11 Bl 52 PDEMON 7 Bl 54 PSCHED 13 Rn 42.9% 72 DISK 9 Rn 16.0% 110 ECC 6 Bl 120 TAPE 8 Bl 122 TTRASH 7 Bl 124 HOST 4 Bl .9% 126 POLLER 5 Bl 130 SCSDIR 5 Bl .9% The previous portion of the display shows the active processes. The columns in this display (from left to right) mean the follow- ing: o The first column is the process number. o The Process column shows the name of the process running at the time. o The Pr column shows the priority of the process. o The St column shows the status of the process, either running (Rn) or blocked (Bl). o The Time% column is the percentage of P.io time each currently running process is using. Names in the process column under Kernel (the operating system) are defined as follows: o VTDPY is running. However, another utility could be running, in which case the priority number might change also. o DEMON indicates that demand and automatic device integrity tests are running. o PDEMON indicates that periodic device integrity tests are running. o PSCHED is the scheduler for periodic device integrity tests. This is the HSC idle loop. o DISK is the disk server. o ECC is the error correction code process and is displayed when disk I/O is active. o TAPE is the tape server. o TTRASH is displayed when the tape server is active. This pro- cess sends tape error logs to the host. o HOST is the process that interfaces to the host. It is always present. o POLLER polls for the host processors and is present when a connection is present. o SCSDIR processes directory requests from the host. Not all active processes are necessarily shown. Because of lim- ited space on the screen, the display of some processes may be truncated and the CPU time percentages may not total 100 percent depending on the polling interval of the data sample. Disk Status 1111111111 +1234567890123456789 0.................... 20A.A..........A...... 40..........A.A.A..... 60.AA.......O..A..O... 80.................... 100.................... 120.................... 140...........O........ 160.................... 180..................A. 200A................... 220.................... 240.................... The last area in the display alternates between disk and tape status displays when both device types are connected to the HSC. The two horizontal rows of numbers under the Disk Status heading represent the numbers 0 through 19. Each 1 on the first line is read with the digit directly below it to form the numbers 10 through 19. This number is added to the numbers 0 through 240 given on the vertical axis of the display to derive the disk unit number. For example, the letter O in the approximate center of the display refers to disk unit 151 because it is at the intersection of the number 140 on the vertical axis and the number 11 in the horizontal rows, and the sum of 140 and 11 is 151. The drive status is coded as follows: o The letter O indicates the drive is Online. That is, the drive is in use by a host, an HSC utility, or an HSC device integrity test. In the example, drive unit number 151 is on line. o An A indicates the drive is Available but not mounted. Drive unit number 62 is available. o A D indicates the HSC is connected to Duplicate units (two or more drives with the same unit number). o A U indicates the drive is in an Undefined state. The letters and method of determining the drive unit number are the same when tape status is displayed. In the tape status dis- play, an additional letter, F, indicates that no tape is mounted on the tape drive. __________________________________________________________ 8.1.2 VTDPY Error Messages This utility has two error messages, as follows: VTDPY-E Illegal Interval Value (2 to 420 seconds) Explanation: You have entered an update interval outside the range permitted. VTDPY reprompts for the update interval. User Action: Reenter a value within the correct range. VTDPY-F Insufficient Common Pool Explanation: This message indicates insufficient memory to run VTDPY. User Action: Retry VTDPY when the demands on memory are reduced. __________________________________________________________ 8.2 Volume Shadowing This release supports VMS Volume Shadowing. HSC Version 3.9A supports the maximum number of shadow sets specified in the VMS Volume Shadowing Software Product Description (SPD). When you run volume shadowing, adhere to the following rules: o Use only identical disk types with the same geometry within a shadow set. o Do not attempt to dismount the source shadow member of a shadow set while a VMS Shadow Copy operation is in progress. The VMS command, SHOW DEVICE, indicates whether such an operation is executing. o Read Section 2.8 of the VAX/VMS Volume Shadowing Manual, which describes a method using a particular former shadow set member as the source for all copy operations involved in rebuilding the shadow set. o Always include the device names for all shadow set members in the shadowing MOUNT command. The VMS operating system will correctly select between source and target members for you. Note the following items specific to volume shadowing: o During a copy operation, different VAXcluster members may have different views of the shadow set's membership, as shown by the SHOW DEVICE command. This situation corrects itself when the copy operation completes. Differences appear when a shadow set is first mounted and during the copy operations resulting from shadow set failover processing. Although this situation can be confusing, it is relatively harmless. If the condition results from a MOUNT command, the SHOW DEVICE output on the VAXcluster member where the MOUNT command was executed is the most accurate view of the shadow set. o During a merge copy operation (initiated either by a MOUNT command or as a result of a shadow set failover), only the VAXcluster member executing the copy indicates a merge copy is executing. All other VAXcluster members indicate a full copy is being done. This is part of the volume shadowing design used by the HSC controller and VMS operating system. o Hardware write-protected shadow sets are not supported. If you write protect the members of a shadow set, any data degradation errors will be unrecoverable. o Shadow set members with foreign file structures (that is, not FILES-11 ODS 2) receive limited support. Full volume shadowing support requires the ability to store shadow set context some- where on the shadow set member volumes. This is not possible for volumes with a foreign file structure. Read the VAX/VMS Volume Shadowing Manual carefully before attempting to use volume shadowing on volumes with foreign file structures. __________________________________________________________ 8.3 Exception Codes This section provides error codes and user actions. 004106 DRAT allocation failure Facility: DISK, MSCP Explanation: While preparing to read the Factory Control Table (FCT) during online processing, the DRAT allocation subroutine failed. User Action: Submit an SPR with the crash dump. 004107 Command not completed after drive declared inoperative Facility: DISK, MSCP Explanation: Get Command Status processing declared the drive inoperative, but the command still failed to complete within the timeout period. User Action: Submit an SPR with the crash dump. Note the type of the drive identified in the error message. The error message identifies the unit number; the drive type for the unit number may be obtained from a SHOW DISKS display. 004110 GCS Status Overflow Facility: DISK, MSCP Explanation: Get Command Status processing determined that the calculated status will result in a overflow. User Action: Submit an SPR with the crash dump. 004111 A timer has link field values inconsistent with its current opera- tional state Facility: DISK, many Explanation: When a timer was added or removed from the active list, it was in a state that should not exist. User Action: Submit an SPR with the crash dump. 004112 A unit is incorrectly marked as a shadow set member Facility: DISK, many Explanation: A unit was incorrectly marked as being a member of a shadow set. User Action: Submit an SPR with the crash dump. 004113 No DRAT list invalid Facility: DISK, many Explanation: During Fragment Request Block (FRB) retirement while declaring a drive inoperative, the NO DRAT list was found to be invalid. User Action: Submit an SPR with the crash dump. 004114 Connection closed after delay in ATTN process Facility: DISK, ATTN Explanation: While the disk server was waiting to acquire re- sources to send an attention message to the host, the connection closed. User Action: Submit an SPR with the crash dump. 007022 Invalid BMB address Facility: CIMGR, CIMISCPRC Explanation: A Host Message Block (HMB) arrived at the resource collector with an invalid Big Message Block (BMB) address attached to it. User Action: Note the K.pli microcode revision level with a SETSHO SHOW REQUESTORS command. The K.ci MC version reported by this command is the K.pli microcode revision level. If the revision level is less than revision 45, contact your Digital Field Service Representative for a K.pli microcode update. Also, note the cur- rent disk configuration. If the K.pli microcode revision level is greater than or equal to 45, submit an SPR with the crash dump and the noted disk configuration. 007023 SCS buffer retrieval failure Facility: CIMGR, CISUBRS Explanation: When changing the status of the virtual circuit, the CIMGR tried to retrieve the SCS buffer from the K.ci .KHSRR queue. This buffer should have been on the queue because it was not in use at the time of the crash. If no elements have been queued to the .KHSRR queue, CIMGR would have forced a crash. User Action: Submit an SPR with the crash dump. 062002 Common Pool memory returned twice Facility: Many Explanation: A process attempted to return a memory segment that was already in the Common Pool. User Action: Submit an SPR with the crash dump. ****************************************************************************** 32 From: GENRAL::FIALA "Eschew Obfuscation." 17-NOV-1989 10:45:48.95 To: VOLKS::BRASSARD CC: FIALA Subj: CLD CX4373.. Hi. Rone Repka forwarded your memo to me. I have the invidious position of remedial support for CX controllers. You obviously need HSC V39A. I am hand managing distribution of 39A untill SDC start shipping mid January. I have 3 bundled savesets with everything needed [including instructions] inside. 1 for HSC50, 1 for HSC70, 1 for both. Which kit do you want?. Where do you want it copied to?. Let me know "where to stick it" !!!. Stefan Fiala PS: Your phone # in your mail header still has 240-6492. Elf lists your old number also. From: GENRAL::FIALA "Eschew Obfuscation." 17-NOV-1989 17:24:16.82 To: KERNEL::CLARK CC: FIALA Subj: HSC's and dropping VC's. Hi. Bob Brassard forwarded your memo to me. The problems you have are probably to do with:- Bad install of V390(4), KCI 2.54 an/or HSC V39A or reasons unknown. There is a buffer/credit starvation fixed in 254 and a hack to the CI wire handling in 39A. I beleive there are other VMS things too. Bad install of V390(4) can cause all these things and more... By and large if the HSC is reporting VC closed [info] and VMS 5.2 is running and Backup is involved [maybe with a large blocksize or /nocrc] then the phenomena you outline fits. Use "SHO REQ" on the HSC to check for 2.54. Fix this first. Check the way V390(4) was installed:- o Did they get no HSC prompt after installation. [yes=Bad] o Did they follow the correct install proceedure.[No=bad] o How many tape formatters does "SHO TAPE" indicate. [MUST BE LESS THAN 24]. [>12 BAD] o Suspect VMS credit starvation for HSC disk/tapes. [Difficult] o Devices on that HSC "run slow". [Difficult] o If in any doubt [reinstall V390(4):- [Easy] o SHOW ALL o Check disks wil failover. o <online> out. o Press <init> hold in <fault> o "Inipio-Booting..." o Let go of <fault>. o Use SETSHO to reset any paramaters from SHO ALL. o Reboot as necessary. o <online> in. I generated a Blitz about badly installing V390(4) some time ago. [For VC closes you get a message "VC closed by request from KCI" but no reason... No retndat/cnf timeout, for instance.] I couldnt tell the whole story... But if in doubt re-initialise it. Submit a Prism/Cld for a pre-release of HSC V39A. V39A wont exit SDC or SSB [new name?] till mid January [in the USA]. Do me a favour and spread the word amongst the CSC folks about the VC closures being "hidden" by the HSC error level. And the bad-install of V390(4) issue. If you have funnies like HSC goes offline/tapes mysteriously rewind/ shadowcopies start spontaneously/drive drop offline/etc. SET ERROR INFO immediately... Stefan Fiala CX CSSE Product Support.