[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

60.0. "TAPES" by KERNEL::MOUNTFORD () Fri Aug 11 1989 15:44

    
    THIS TOPIC IS DEDICATED TO THE INTERCHANGE  OF INFORMATION & ASSISTANCE
    ------------------------------------------------------------------------                  
                       WITHIN THE DEVICE GROUP FOR TAPES
                       ----------------------------------
T.RTitleUserPersonal
Name
DateLines
60.1TA90 no power gotchaKERNEL::ARCHERGraham Archer Devices DiagnosisSat Aug 19 1989 17:3232
    
    Here's a possible TA90 gotcha,
    
    Scenario
    --------
    Customer uses the "Unit Emergency" switch to power down the TA90,
    or powers the TA90 off from the mains breaker on the wall.
    
    Result
    ------
    When either switch is turned back on, the TA90 fails to power up.
    No DC power is present.
    
    Solution
    --------
    With the power switches set back to "on" press the Local Power Enable
    button, located immediately to the left of the floppy drive inside
    the TA90. You will have to open the front door of the TA90 to do
    this. Press the right hand side of the door hard with the palm of
    the hand to release the door catch to gain access to the inside
    of the TA90.
    
    Reason
    ------
    The Unit Emergency switch kills the entire TA90 Power system immediately,
    TA90 power is still held off, even when the ac input is restored,
    until the Local power enable switch is set.
                                                 
    
    
    Graham Archer.
    
60.2Tapes and disks going offline temporarilyKERNEL::BARTLEYMon Nov 20 1989 09:222520
I have read nearly half of this and it seems to have something to do
    with tapes.  I think!!  In any case it's extremely interesting.
    I think!!  Isn't it?  (TFB)
    ***************************************************************
    
    Dear Colleagues....
	As a result of a telesupport query, and some probing into the ol'
STARS database, I found the name of someone who seemed to know something about 
it, so I asked him!....The result was a blown disk quota, so I thought I'd 
share it with you!!
	Please observe the point about CSSE needing to manage the restricted 
release, and report all cases to Bob Brassard/George White.

	Now read on!!



	I have been discussing a problem with a field engineer, which appears 
to have all the hallmarks of the scenario outlined in a STARS article, for 
which various workarounds are indicated.
	The "cluster" is a single 8800/cibci with hw/sw revision from 
"show cluster"=80007, HSC RP_REV for HSC=236. CRONIC is 394, VMS is 4.7.
*(hsc=236 indicates K.CI u-code 54 which is latest following installation of the
L0107-YA FCO .)
	The HSC supports 4 x RA82's, 1 x TA81, and 7 x S.I. 97C's on DEC 
requestors.
	The "problem" is two edged.
(1)The customer is complaining about long backup times.
(2)Two months ago the HSC started "dropping off line" (Online light going out, 
all drive port lights going out, tape would rewind and restart backups)
After a short period it would recover the V.C. (online light came on again, 
drive port lights came on again)
This has become more frequent, causing much loss of performance. This morning 
there were six observed events in one hour, with no errlog/hsc-console reports.

There were no identifiable changes to either software or hardware corresponding 
to the start of the problem, although I havent yet been able to rule out such 
things as changes to backup command files or trainee operators, ...etc.
	There are frequent indications of clock dropouts from the S.I. drives, 
on the HSC console, and as HSC datagrams, but not corresponding in time to the 
HSC events, and  (as the STARS scenario suggests......) 

	*****	No PAA0, or HSC errors on either VMS or HSC consoles. ****
		NO ERRORLOG ENTRIES FOR THE HSC GOING AWAY!!

	These S.I. dropouts were occuring before this problem started.

	The ERROR threshold on the HSC was FATAL. This has been reset to "INFO"
and immediately gave "vc closure" events corresponding to the HSC online light 
going out, however these didn't indicate the reason (e.g. VC closed due to 
timeout of RTNDAT/CNF...etc).
	I have a feeling that I may be missing out on some information which 
may have already been published on this topic, if so I apologise. However, I 
would be grateful if you could maybe enlarge on this problem and any fixes 
(VMS 5.2??) or point me in the right direction.
	Thanks in anticipation......
				Dave Clark
				UK-CSC
				Devices Support (Disks)

*******************************************************************************

From:	VOLKS::BRASSARD     "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 20-SEP-1989 14:49:22.21
To:	BRASSARD
CC:	
Subj:	FAS UPDATE

7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS
    DEVICE: HSC50,HSC70,CIXXX                    (Added: 16-DEC-1988)
    CLD #: CXO02335,CXO02677  * PRISM #:         (Update: 16-JAN-1989)
						 (Updated 20-SEP-1989)

	*** UPDATE SEP-89 ***  VMS-5.2 "BACKUP" UTILITY I/O buffering
	enhancements will cause many more/new sites to be impacted
	by this VC-CLOSURE (CI CMDQ-0 STARVATION) problem, AS SOON
	AS THEY UPGRADE TO VMS-5.2 !!  Although an "official" CRONIC
	new patch-version fix to the underlying Tape-write "CI buffer-
	data priority" problem was intended to be released in NOV-89,
	EVAL TESTING errors with this CRONIC (tentatively V-39A) fix is 
	requiring the solution strategy to be re-evaluated by CI/HSC/VMS
	Engineering and CSSE.

	!!!!  THERE IS CURRENTLY NO ENGINEERING APPROVED WORKAROUND !!!!
	!!!!  OR PATCH (VMS-PADRIVER, HSC-CRONIC, OR OTHERWISE) FOR !!!!
	!!!!  FOR VMS-5.2 CUSTOMERS EXPERIENCING THIS PROBLEM; BUT  !!!!
	!!!!  SOME FORM OF WORKAROUND WILL SOON (by NOV-89) BE 	    !!!!
	!!!!  AVAILABLE.   UNTIL THEN, ONLY THE FOLLOWING TWO (2)   !!!!
	!!!!  SUGGESTIONS ARE AVAILABLE FOR VMS-5.2 CUSTOMERS:	    !!!!

		1. DELAY VMS-5.2 UPGRADE UNTIL CSSE PUBLISHED AND OFFERS
		   A WORKAROUND PATCH (PADRIVER, CRONIC, etc.);
		2. REDUCE VMS-5.2 "BACKUP" PROCESS I/O PERFORMANCE BY 
		   LOWERING BACKUP-USER/ACCOUNT "UAF-RECORD" (AUTHORIZE 
		   FILE) I/O QUOTAS SUCH AS "DIOLM, BYTLM".  THIS WILL
		   CAREFULLY HAVE TO MANAGED TO AVOID REDUCING VMS-5.2
		   BACKUP PERFORMANCE TO LEVELS UNACCEPTABLE TO CUSTOMER.

	!!!!  CUSTOMER SITUATIONS WHERE THE ABOVE IS NOT SATISFACTORY !!!!
	!!!!  SHOULD BE ESCALATED VIA CLD-PROCESS TO VAX CSSE (George !!!!
	!!!!  White, Bob Brassard) FOR EXCEPTION HANDLING ADVICE.     !!!!

	!!!!  VMS-4.7, 5.0, & 5.1 SITES SHOULD BE MANAGED AS DICTATED !!!!
	!!!!  BELOW: NO CLD IS REQUIRED; BUT VAX CSSE MUST BE 	      !!!!
	!!!!  CONTACTED TO MANAGE THE "RESTRICTED DISTRIBUTION"       !!!!
	!!!!  VMS-4.7/5.0/5.1 "PADRIVER" PATCH.			      !!!!

	
SYMPTOMS: 

	*** CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART,
 	*** SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY !

    During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B,
    CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk 
    BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the 
    CIxxx/VAX-host.   Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO"
    (default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure
    and the "VC-CLOSURE" itself *** WILL NOT BE REPORTED BY THE HSC *** !
    By reducing the HSC ERROR-LEVEL to "INFO", the following general error 
    messages will be seen:

	HSC ERROR-MESSAGES
	------------------
	Path A has gone from good to bad.
	Path B has gone from good to bad.

	HOST-W-SEQ 100. xx:xx:xx  (time-stamp)
	VC closed with node-5 (node-name) due to request from K.CI
	DISK-I-SEQ 101. xx:xx:xx
	VC closed due to timeout of RTNDAT/CNF from host node-5.
	HOST-I-SEQ 102. xx:xx:xx
	VC opened with node-5 (node-name).

    VMS on the affected VAX-host typically *** WILL NOT REPORT ANY
    PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES *** either !!
    Normally, the most obvious symptom visible to the customer is 
    simply tapes rewind/restart during BACKUP (due to lack of VMS
    TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS
    begin "COPYING" (if only mounted from 1 node); or unexplained
    DISK MOUNT-VERIFY messages.  These symptoms are simply the
    result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery
    from the VC-CLOSURE with the HSCxx.

    The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically 
    recover the virtual-circuit within 5-10 seconds.  Although this
    recovery is automatic, significant BACKUP time is wasted in
    re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE;
    and significant SHADOW-SET performance is lost during a "COPY".
    The customer may be legitamately and justifiably concerned...

      *** DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !!  ***

PROBLEM DESCRIPTION:
    An intensive cross-functional CI/HSC Engineering and CSSE investigation
    has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT"
    VC-CLOSURE problem.  To understand the 3 causes, a definition of
    "RTNDAT/CNF TIMEOUT" is required.   The HSCxx K.CI uses a 3-second
    timer on each of its oldest CI data-transfers (usually a 4-8 block
    fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring 
    more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT",
    implying that the VAX CI-PORT has not returned the expected
    "RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST-
    PACKET) packet within 3 seconds.

	1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
	A microcode bug in the KCI supervisor loop stalls SNDDAT
  	packet processing, when DMA_CREDITS for DATREQ are exhausted.
 	SNDDAT and DATREQ processing should be independent.  This
	bug is corrected by KCI V2.54 (L0107 REV-E*) firmware,
	soon to be released as FCO for RA70 support.

	2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
	Current CI-PORT command-queue prioritization may cause excessive 
	COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in 
	HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during 
	heavy BACKUP-applic. tape-write/disk-read activity.   CI 
	processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message-
	commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG) 
	can pre-empt servicing of COMQ0, thus indefinitely delaying 
	DATREQ0 HSC disk-write data-requests and resulting in HSC data-
	transfer timeout.  A "VMS SUPPORTED RESTRICTED DISTRIBUTION"
	PADRIVER.EXE patch is available from VAX CSSE as a short-term
	(6 month) workaround to this problem: THE MOST PREDOMINANT
	CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !!

	3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit
	SNDDAT "LAST-PACKET" out-of-sequence, due to performance 
	optimization which allows KCI firmware to packetize up to
	3 different SNDDAT operations at 1 time on a single VC.
	The SNDDAT packet queue-manipulation can result in indefinite
	pre-emption of the oldest SNDDAT/LP packet, depending on
	SNDDAT "CNF" credit-return timing and queue-position.
	This is a low-frequency/low-impact problem, likely only
	occurring once or twice per-year !  There is currently no
	KCI firmware fix, since the optimization is desirable in
	most cluster CI traffic situations.

STATUS:
   VAX CSSE has short-term workarounds for each of the above problems,
   intended only for critical customer situations at this time.

	1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
	The KCI V2.54 firmware fix will soon be released as HSCxx
	FCO to L0107 module, but CSSE can supply preliminary parts 
	for critical sites.

	2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
	VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1.
	THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE 
	APPROVAL AND AUTHORIZATION !!  SIGNIFICANT CLUSTER PERFORMANCE 
	(LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION 
	CI-TRAFFIC LOADS, thus requiring careful characterization of
	CI-traffic at candidate sites.  This patch is only a short-
	term (6-month) workaround.

		*** NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL 
		*** DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT
		*** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !!

	A cross-functional CI/VMS/HSC Engineering team is actively 
	considering and investigating an appropriate long-term fix,
	which will not jeopardize performance.  Pending results,
	VMS Engineering will formally adopt an optimized-patch or
	any team-recommended CI/PA-architecture changes into the
	next VMS major release (V5.2) and a retrofittable patch.

	3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC
	V370 patch is available from VAX CSSE to extend the "RTNDAT/
	CNF TIMEOUT" from 3 to 45 seconds, an effective workaround.
	This is a low-risk/impact patch, and is normally not required,
	but advised as a guarantee to avoid VC-Closure on political 
	sites.  HSC Engineering is developing a KCI firmware fix,
	likely to be included in an HSC/KCI future upgrade product.

    
SOLUTION/WORK-AROUND:

    CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN
    CURRENT WORKAROUNDS:

		- Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
		  DDD 508-474-6492;
		- George White, VOLKS::WHITE,   DTN 240-6490,
		  DDD 508-474-6492.

    The following immediate workarounds will lessen or completely avoid
    problem impact while awaiting VAX CSSE qualification, approval, and 
    workarounds; or while awaiting formal Engineering release of
    solutions.  Note that each measure will lengthen the time required
    for customer's daily/weekly BACKUP procedures !!

	+ BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command 
	  buffer-count parameter to default of "3 buffers" or less 
	  in customer BACKUP command files, or BACKUP procedures.

	+ CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the 
	  number of concurrently running TA/TUxx BACKUP tape-drives/jobs,
	  to a number avoiding or limiting HSC VC-CLOSURE to an 
	  acceptable leve.

   
INTRIM STATUS: (16 Jan 89) Field tests at two customer sites of the patched
	       PA driver appear to have been completly successful thus far.

From:	VOLKS::BRASSARD     "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 23-JAN-1989 20:02:25.06
To:	MYFILE
CC:	
Subj:	F.A.S. HSC-VC-CLOSE CSSE-PROBLEM-DESC. & VMS-4.7/5.X PADRIVER PATCH...

(EXTRACT OF VAX-CSSE DEC-88 FOCUS-PRODUCT-REPORT)
-------------------------------------------------
7.10 TITLE: CIXXX HSCXX RTNDAT/CNF TIMEOUT VC-CLOSURE: NEW PROBLEMS
    DEVICE: HSC50,HSC70,CIXXX                    (Added: 16-DEC-1988)
    CLD #: CXO02335,CXO02677  * PRISM #: 

SYMPTOMS: 

	*** CUSTOMER SYMPTOM: DURING BACKUP, TAPES REWIND/RESTART,
 	*** SHADOW-SETS COPIES INITIATED, OR DISK-MOUNT-VERIFY !

    During heavy file-transfer activity between a CIxxx (CIBCA-A, CIBCA-B,
    CIBCI, or CI7x0) and an HSC50/70, such as during disk-tape or disk-disk 
    BACKUPs, the HSC may initiate a "Virtual-Circuit" (VC) closure with the 
    CIxxx/VAX-host.   Unless HSC "ERROR or OUTBAND-LEVEL" is set to "INFO"
    (default is "INFO"), the "RTNDAT/CNF TIMEOUT" causing the VC-Closure
    and the "VC-CLOSURE" itself *** WILL NOT BE REPORTED BY THE HSC *** !
    By reducing the HSC ERROR-LEVEL to "INFO", the following general error 
    messages will be seen:

	HSC ERROR-MESSAGES
	------------------
	Path A has gone from good to bad.
	Path B has gone from good to bad.

	HOST-W-SEQ 100. xx:xx:xx  (time-stamp)
	VC closed with node-5 (node-name) due to request from K.CI
	DISK-I-SEQ 101. xx:xx:xx
	VC closed due to timeout of RTNDAT/CNF from host node-5.
	HOST-I-SEQ 102. xx:xx:xx
	VC opened with node-5 (node-name).

    VMS on the affected VAX-host typically *** WILL NOT REPORT ANY
    PAA0/CIxxx or VIRTUAL-CIRCUIT STATUS CHANGES *** either !!
    Normally, the most obvious symptom visible to the customer is 
    simply tapes rewind/restart during BACKUP (due to lack of VMS
    TA/TUxx TAPE-MOUNT-VERIFICATION support); or DISK SHADOW-SETS
    begin "COPYING" (if only mounted from 1 node); or unexplained
    DISK MOUNT-VERIFY messages.  These symptoms are simply the
    result of DISK/TAPE MSCP-CLASS-DRIVER automatic fault recovery
    from the VC-CLOSURE with the HSCxx.

    The HSCxx and CIxxx/PAA0 VMS CI-port software will automatically 
    recover the virtual-circuit within 5-10 seconds.  Although this
    recovery is automatic, significant BACKUP time is wasted in
    re-writing TA/TUxx tapes from the beginning on each VC-CLOSURE;
    and significant SHADOW-SET performance is lost during a "COPY".
    The customer may be legitamately and justifiably concerned...

      *** DO NOT REPLACE CIxxx OR HSCxx HARDWARE FOR THIS PROBLEM !!  ***

PROBLEM DESCRIPTION:
    An intensive cross-functional CI/HSC Engineering and CSSE investigation
    has isolated 3 separate causes for this HSC "RTNDAT/CNF TIMEOUT"
    VC-CLOSURE problem.  To understand the 3 causes, a definition of
    "RTNDAT/CNF TIMEOUT" is required.   The HSCxx K.CI uses a 3-second
    timer on each of its oldest CI data-transfers (usually a 4-8 block
    fragment SNDDAT or DATREQ to VAX-host CI-PORT); any requiring 
    more than 3 seconds to complete causes the "RTNDAT/CNF TIMEOUT",
    implying that the VAX CI-PORT has not returned the expected
    "RETDAT" (for DATREQ) or "CNF" (Confirmation after SNDDAT LAST-
    PACKET) packet within 3 seconds.

	1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
	A microcode bug in the KCI supervisor loop stalls SNDDAT
  	packet processing, when DMA_CREDITS for DATREQ are exhausted.
 	SNDDAT and DATREQ processing should be independent.  This
	bug is corrected by KCI V2.54 (L0107 REV-E*) firmware,
	soon to be released as FCO for RA70 support.

	2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
	Current CI-PORT command-queue prioritization may cause excessive 
	COMQL (CI-CMD.-QUEUE-0 / COMQ0) service latencies resulting in 
	HSCxx "RTNDAT/CNF TIMEOUT" VC-closure on disk-writes, during 
	heavy BACKUP-applic. tape-write/disk-read activity.   CI 
	processing of Tape-writes/DATREQ2/COMQ2, VMS MSCP message-
	commands/SNDMSG/COMQ1, and received-packets (SNDDAT, RECMSG) 
	can pre-empt servicing of COMQ0, thus indefinitely delaying 
	DATREQ0 HSC disk-write data-requests and resulting in HSC data-
	transfer timeout.  A "VMS SUPPORTED RESTRICTED DISTRIBUTION"
	PADRIVER.EXE patch is available from VAX CSSE as a short-term
	(6 month) workaround to this problem: THE MOST PREDOMINANT
	CAUSE OF HSC "RTNDAT/CNF TIMEOUT" VC-CLOSURES !!

	3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: K.CI may transmit
	SNDDAT "LAST-PACKET" out-of-sequence, due to performance 
	optimization which allows KCI firmware to packetize up to
	3 different SNDDAT operations at 1 time on a single VC.
	The SNDDAT packet queue-manipulation can result in indefinite
	pre-emption of the oldest SNDDAT/LP packet, depending on
	SNDDAT "CNF" credit-return timing and queue-position.
	This is a low-frequency/low-impact problem, likely only
	occurring once or twice per-year !  There is currently no
	KCI firmware fix, since the optimization is desirable in
	most cluster CI traffic situations.

STATUS:
   VAX CSSE has short-term workarounds for each of the above problems,
   intended only for critical customer situations at this time.

	1. HSCxx K.CI (L0107) V2.43 FIRMWARE "SNDDAT STALL" BUG:
	The KCI V2.54 firmware fix will soon be released as HSCxx
	FCO to L0107 module, but CSSE can supply preliminary parts 
	for critical sites.

	2. CI-PORT COMMAND-QUEUE PRIORITIZATION "RESOURCE STARVATION":
	VAX CSSE has a PADRIVER.EXE patch for VMS-4.7 and 5.0/5.1.
	THIS PATCH is "RESTRICTED DISTRIBUTION", REQUIRING VAX CSSE 
	APPROVAL AND AUTHORIZATION !!  SIGNIFICANT CLUSTER PERFORMANCE 
	(LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN APPLICATION 
	CI-TRAFFIC LOADS, thus requiring careful characterization of
	CI-traffic at candidate sites.  This patch is only a short-
	term (6-month) workaround.

		*** NOTE: VAX CSSE SITE QUALIFICATION IS CRITICAL 
		*** DUE TO THE POTENTIAL PERFORMANCE IMPACT OF CURRENT
		*** PADRIVER.EXE PATCH TO CUSTOMER APPLICATION !!

	A cross-functional CI/VMS/HSC Engineering team is actively 
	considering and investigating an appropriate long-term fix,
	which will not jeopardize performance.  Pending results,
	VMS Engineering will formally adopt an optimized-patch or
	any team-recommended CI/PA-architecture changes into the
	next VMS major release (V5.2) and a retrofittable patch.

	3. HSCxx K.CI "SNDDAT SEQUENTIALITY" PROBLEM: An HSC CRONIC
	V370 patch is available from VAX CSSE to extend the "RTNDAT/
	CNF TIMEOUT" from 3 to 45 seconds, an effective workaround.
	This is a low-risk/impact patch, and is normally not required,
	but advised as a guarantee to avoid VC-Closure on political 
	sites.  HSC Engineering is developing a KCI firmware fix,
	likely to be included in an HSC/KCI future upgrade product.

    
SOLUTION/WORK-AROUND:

    CONTACT VAX CSSE FOR CUSTOMER SITE QUALIFICATION, AND TO OBTAIN
    CURRENT WORKAROUNDS:

		- Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
		  DDD 508-474-6492;
		- George White, VOLKS::WHITE,   DTN 240-6490,
		  DDD 508-474-6492.

    The following immediate workarounds will lessen or completely avoid
    problem impact while awaiting VAX CSSE qualification, approval, and 
    workarounds; or while awaiting formal Engineering release of
    solutions.  Note that each measure will lengthen the time required
    for customer's daily/weekly BACKUP procedures !!

	+ BACKUP COMMAND FILES: REDUCE "BACKUP/BUFFER=xxxxx" command 
	  buffer-count parameter to default of "3 buffers" or less 
	  in customer BACKUP command files, or BACKUP procedures.

	+ CONCURRENT TA/TUXX TAPE OPERATION: Incrementally reduce the 
	  number of concurrently running TA/TUxx BACKUP tape-drives/jobs,
	  to a number avoiding or limiting HSC VC-CLOSURE to an 
	  acceptable leve.

From:	VOLKS::BRASSARD     "Bob B., VAX CSSE, 240-6492, AET 1-1/6" 16-DEC-1988 19:39
To:	NM%VOLKS::WHITE,MYFILE
Subj:	CONTENTS OF FAS$PADRIVER DIRECTORY

Below are all the files necessary to implement the HSC VC-CLOSURE
workarounds on any sites.

	Regards, Bob Brassard

Directory VOLKS::FAS$PADRIVER:
	  ($1$DUA1:[FAS_PADRIVER])

HSC70_R002_KCI_V254.FCO;1
                         25  16-DEC-1988 16:16:39.40  (RE,RWED,RE,RE)
	HSC K.CI V2.54 FIRMWARE FCO: L0107 REV-E*

HSC_KCI_TIMEOUT.PATCH;1
                          3  16-DEC-1988 19:34:37.14  (RE,RWED,RE,RE)
	HSC CRONIC V370 PATCH TO EXTEND HOST-TIMEOUT TO 45-SECONDS

HSC_VC_CLOSE.FOCUS;1
                         16  16-DEC-1988 19:33:17.98  (RE,RWED,RE,RE)
	HSC VC-CLOSURE FOCUS-REPORT ENTRY/PROBLEM DESCRIPTION
	
PADRIVER_V47_MSG0.COM;2
                         11  16-DEC-1988 15:16:28.37  (RE,RWED,RE,RE)
	VMS-4.7 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION

PADRIVER_V47_MSG0.EXE;2
                         40  15-DEC-1988 19:49:53.25  (RE,RWED,RE,RE)
	VMS-4.7 PATCHED PADRIVER.EXE IMAGE

PADRIVER_V50_MSG0.COM;2
                         11  16-DEC-1988 15:15:20.76  (RE,RWED,RE,RE)
	VMS-5.0 PADRIVER.EXE PATCH COMMAND FILE & PATCH DESCRIPTION

PADRIVER_V50_MSG0.EXE;2
                         46  15-DEC-1988 19:50:01.32  (RE,RWED,RE,RE)
	VMS-5.0 (ALSO 5.0-1, 5.0-2, 5.1 FT) PATCHED PADRIVER.EXE IMAGE

Total of 7 files, 152 blocks.

! VMS-5.0-x PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE
! -------------------------------------------------------------
! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88
!
! 	WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING
!	VAX CSSE APPROVAL AND AUTHORIZATION !!  SIGNIFICANT CLUSTER
! 	PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN
!	APPLICATION CI-TRAFFIC LOADS !!
!
!	SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch.
!	Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
!	DDD 508-474-6492; or George White) with any problems.
!
!	VERSION APPLICABILITY:  This patch *** ONLY *** applies
!	to VMS-5.0 distributed PADRIVER.EXE (also used for V5.0-1
!	V5.0-2, and current V5.1 FT sites) with this "image
!	ident & link-date" (ANAL/IMAGE PADRIVER.EXE):
!
!	Image Identification Information
!
!		image name: "PADRIVER"
!		image file identification: "X-9"
!		link date/time:  8-APR-1988 05:41:19.56
!		linker identification: "04-92"
! 
!  	ECO50   RRB0050 (R.R.Brassard, CSSE)	15-DEC-88
!	MODULE: SCSXPORT.MAR of PADRIVER.EXE
!
!	PROBLEM: Current CI-PORT command-queue prioritization
!	may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0)
!	service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT"
!	VC-closure on disk-writes, during heavy BACKUP-applic.
!	tape-write/disk-read activity.   CI processing of Tape-
!	writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/
!	COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt
!	servicing of COMQ0, thus indefinitely delaying DATREQ0
!	HSC disk-write data-requests and resulting in HSC data-
!	transfer timeout: currently defined in V370 CRONIC at 
!	3 seconds.
!
!	SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC)
!	closures are only reported with HSC "OUTBAND & ERROR"
!	level at "INFO" (default = ERROR).  The first customer
!	indication may only be "tapes rewinding/restarting",
!	"shadow-set copying", or "mount verification" messages
!	during heavy multiple concurrent disk/tape BACKUP 
!	activity.
!
!	FIX: Modify SCS$FPC_SENDMSG routine to direct all CI SYSAP-
!	MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND-
!	QUEUE, instead of current COMQH (CI COMQ1). Therefore,
!	new MSCP command messages (and unintentionally all SYSAP 
!	MSGs) will only be sent if CI can service COMQ0, effectively
!	throttling CI data-transfer work to the rate at which CI
!	can send new MSCP commands to HSCxx; thus guaranteeing 
!	reasonable COMQ0 service latency.
!	
!		**** PERFORMANCE IMPLICATIONS ****
!	WARNING:  This patch requires VAX CSSE authorization for
!	implementation, due to cluster performance risks.  
!	Significant reduction of CI's sequenced-message I/O
!	(SYSAP MESSAGEs sent) performance, of up to 65%, will
!	occur under CI-port data-transfer saturation: approx.
!	1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for
!	CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/
!	CPU combinations.  DCL "$ MONITOR SCS" (KB_MAP) provides 
!	an instantaneous CI data-transfer measurement; VPA and
!	MONITOR/RECORD can be used for long-term monitoring.
!	
!	Sequenced messages are used by VMS for LOCK_MGR, CLUSTER
!	CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR 
!	issuing most of these messages.  Increased LOCK_MGR "lock
!	granting" latencies will directly impact cluster-wide 
!	file/record/database I/O applications, since LOCK "MASTERing"
!	and LOCK "DIRECTORYing" is a distributed function within a 
!	cluster.  In other words, even with this patch on only 1/offline
!	node, message slowdown will impact MASTER/DIRECTORY functions
!	performed on behalf of other cluster nodes.
!
!	Sequenced-message I/O reduction is especially dependent on 
!	disk-write (DATREQ0) data-transfers, which also use COMQ0.
!	This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET
!	datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write
!	HSC data-requests) and used by VMS for CI-polling.  Therefore,
!	SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0
!	(from HSC) and VMS polling, instead of before (higher priority)
!	this activity on COMQ1 without this patch.
!
!	Under non-saturated CI-port data-transfer conditions, this
!	patch should only result in a 5% sequenced-message rate
!	reduction.  Of benefit, this patch may significantly improve
!	disk-write performance during heavy mass-storage I/O activity.
!	Datagrams (used mostly for DECNET) will also benefit.
!
!	INSTALLATION:
!	1. COPY this PATCH command file (PADRIVER_V50_MSG0.COM) to 
!	   work-directory.
!	2. COPY SYS$LOADABLE_IMAGES:PADRIVER.EXE to work area.  
!	3. APPLY PATCH: "$ @PADRIVER_V50_MSG0.COM" or type in below
!	   patch-commands.  Verify patch correctly installed: use
!	   ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text.
!	4. COPY PADRIVER.EXE SYS$COMMON:[SYS$LDR]PADRIVER.EXE.  If 
!	   patch only intended for 1 system, copy to SYS$SPECIFIC:
!	   [SYS$LDR]PADRIVER.EXE.
!	5. REBOOT SYSTEM, coordinating with customer.
!
! BEGINNING OF PATCH COMMANDS....
! -------------------------------
!
$ PATCH PADRIVER.EXE
SET ECO 50
VERIFY/INSTRUCTION 1FA8
"MOVW #04,B^0F2(R2)"
EXIT
REPLACE/INSTRUCTION 5147
"BSBW 1F3B"
EXIT
"BSBW 1FA8"
EXIT
UPDATE
EXIT
$ EXIT
$ ! 
$ ! END OF PADRIVER PATCH FILE

! VMS-4.7 PADRIVER.EXE "COMQ0 MESSAGE" PATCH FOR HSC VC-CLOSURE
! -------------------------------------------------------------
! Created by: Bob Brassard, VAX CSSE, VOLKS::BRASSARD, 15-DEC-88
!
! 	WARNING !!!: PATCH is "RESTRICTED DISTRIBUTION", REQUIRING
!	VAX CSSE APPROVAL AND AUTHORIZATION !!  SIGNIFICANT CLUSTER
! 	PERFORMANCE (LOCK_MGR) DEGRADATION MAY OCCUR UNDER CERTAIN
!	APPLICATION CI-TRAFFIC LOADS !!
!
!	SUPPORT: VMS-supported RESTRICTED-DISTRIBUTION patch.
!	Call VAX CSSE (Bob Brassard, VOLKS::BRASSARD, DTN 240-6492,
!	DDD 508-474-6492; or George White) with any problems.
!
!	VERSION APPLICABILITY:  This patch *** ONLY *** applies
!	to VMS-4.7 distributed PADRIVER.EXE with this "image
!	ident & link-date" (ANAL/IMAGE PADRIVER.EXE):
!
!	Image Identification Information
!
!		image name: "PADRIVER"
!		image file identification: "X-3"
!		link date/time: 22-MAY-1987 23:50:26.53
!		linker identification: "04-00"
! 
!  	ECO50   RRB0050 (R.R.Brassard, CSSE)	15-DEC-88
!	MODULE: PAFPCALL.MAR of PADRIVER.EXE
!
!	PROBLEM: Current CI-PORT command-queue prioritization
!	may cause excessive COMQL (CI-CMD.-QUEUE-0 / COMQ0)
!	service latencies, resulting in HSCxx "RTNDAT/CNF TIMEOUT"
!	VC-closure on disk-writes, during heavy BACKUP-applic.
!	tape-write/disk-read activity.   CI processing of Tape-
!	writes/DATREQ2/COMQ2, VMS MSCP message-commands/SNDMSG/
!	COMQ1, and received-packets (SNDDAT, RECMSG) can pre-empt
!	servicing of COMQ0, thus indefinitely delaying DATREQ0
!	HSC disk-write data-requests and resulting in HSC data-
!	transfer timeout: currently defined in V370 CRONIC at 
!	3 seconds.
!
!	SYMPTOM: HSC "RTNDAT/CNF TIMEOUT" VIRTUAL-CIRCUIT (VC)
!	closures are only reported with HSC "OUTBAND & ERROR"
!	level at "INFO" (default = ERROR).  The first customer
!	indication may only be "tapes rewinding/restarting",
!	"shadow-set copying", or "mount verification" messages
!	during heavy multiple concurrent disk/tape BACKUP 
!	activity.
!
!	FIX: Modify FPC$SENDMSG routine to direct all CI SYSAP-
!	MESSAGES on low-priority COMQL (CI COMQ0) CI-COMMAND-
!	QUEUE, instead of current COMQH (CI COMQ1). Therefore,
!	new MSCP command messages (and unintentionally all SYSAP 
!	MSGs) will only be sent if CI can service COMQ0, effectively
!	throttling CI data-transfer work to the rate at which CI
!	can send new MSCP commands to HSCxx; thus guaranteeing 
!	reasonable COMQ0 service latency.
!	
!		**** PERFORMANCE IMPLICATIONS ****
!	WARNING:  This patch requires VAX CSSE authorization for
!	implementation, due to cluster performance risks.  
!	Significant reduction of CI's sequenced-message I/O
!	(SYSAP MESSAGEs sent) performance, of up to 65%, will
!	occur under CI-port data-transfer saturation: approx.
!	1.2 Mb/sec for CIBCA-A on 85/87/88xx, 2.2 Mb/sec for
!	CIBCA-B on 85/87/88xx, 1.5-1.8 Mb/sec for other CIxxx/
!	CPU combinations.  DCL "$ MONITOR SCS" (KB_MAP) provides 
!	an instantaneous CI data-transfer measurement; VPA and
!	MONITOR/RECORD can be used for long-term monitoring.
!	
!	Sequenced messages are used by VMS for LOCK_MGR, CLUSTER
!	CONNECTION_MGR, and MSCP Command functions, with LOCK_MGR 
!	issuing most of these messages.  Increased LOCK_MGR "lock
!	granting" latencies will directly impact cluster-wide 
!	file/record/database I/O applications, since LOCK "MASTERing"
!	and LOCK "DIRECTORYing" is a distributed function within a 
!	cluster.  In other words, even with this patch on only 1/offline
!	node, message slowdown will impact MASTER/DIRECTORY functions
!	performed on behalf of other cluster nodes.
!
!	Sequenced-message I/O reduction is especially dependent on 
!	disk-write (DATREQ0) data-transfers, which also use COMQ0.
!	This patch moves SYSAP SNDMSG from COMQ1 (also used by DECNET
!	datagrams) to COMQ0, used by CI to service DATREQ0 (disk-write
!	HSC data-requests) and used by VMS for CI-polling.  Therefore,
!	SYSAP-MESSAGEs (SNDMSG) will now be serviced "FIFO" with DATREQ0
!	(from HSC) and VMS polling, instead of before (higher priority)
!	this activity on COMQ1 without this patch.
!
!	Under non-saturated CI-port data-transfer conditions, this
!	patch should only result in a 5% sequenced-message rate
!	reduction.  Of benefit, this patch may significantly improve
!	disk-write performance during heavy mass-storage I/O activity.
!	Datagrams (used mostly for DECNET) will also benefit.
!
!	INSTALLATION:
!	1. COPY this PATCH command file (PADRIVER_V47_MSG0.COM) to 
!	   work-directory.
!	2. COPY SYS$SYSTEM:PADRIVER.EXE to work area.  
!	3. APPLY PATCH: "$ @PADRIVER_V47_MSG0.COM" or type in below
!	   patch-commands.  Verify patch correctly installed: use
!	   ANAL/IMAGE PADRIVER.EXE, examining PATCH info & text.
!	4. COPY PADRIVER.EXE SYS$COMMON[SYSEXE]:PADRIVER.EXE.  If 
!	   patch only intended for 1 system, copy to SYS$SPECIFIC:
!	   [SYSEXE]PADRIVER.EXE.
!	5. REBOOT SYSTEM, coordinating with customer.
!
! BEGINNING OF PATCH COMMANDS....
! -------------------------------
!
$ PATCH PADRIVER.EXE
SET ECO 50
VERIFY/INS 2485
"SUBL2 W^0B4(R4),R2"
EXIT
REPLACE/INSTRUCTION 1627
"BSBW 2450"
EXIT
"BSBW 2485"
EXIT
UPDATE
EXIT
$ EXIT
$ !
$ ! END-OF-PATCH
$ !=======================================================================

From:	CVG::BRASSARD     19-DEC-1988 12:11
To:	VOLKS::BRASSARD
Subj:	HSC Timeout

From:	SSDEVO::ENGLUND      "Glenn Englund, HSC Engineering Manager" 16-DEC-1988 19:13:15.83
To:	CVG::TOMASWICK,KOLLER,MOE,SHIVELY,BEAN,LARY,NM%VOLKS::WHITE,CVG::BRASSARD
CC:	
Subj:	No change to HSC host timeout - it should remain at 3 seconds

Unfortunately the suggested change to raise the HSC host timeout value from 3
seconds to 45 seconds was never tested (so I am told).  I guess it fell through
the cracks out here.

Since it was not tested, it seems that the right thing to do is to leave it 
unchanged, rather than delay the release of this patch in order to test the
change.

I would recommend changing the following note from George White to remove 
any reference to a change to the HSC's host timer.

							- Glenn


From:	27054::WHITE "VAX CSSE SUPPORT  12-Dec-1988 1303" 12-DEC-1988 11:10:20.30
To:	@FAS
CC:	
Subj:	FAS,FORD,IRVINV TRUST VC CLOS. STATUS



-----------------------------
! d ! i ! g ! i ! t ! a ! l !		I N T E R O F F I C E  M E M O
-----------------------------


TO: DISTRIBUTION			DATE:   12 DEC 88
					FROM:	GEORGE WHITE
					DEPT:	MID-RANGE VAX CSSE
					DTN:	240-6490
					LOCN:	AET 1-1/6
					ENET:	VOLKS::WHITE
					DECMAIL: WHITE @VOLKS @AET

SUBJECT: FAS - (CXO2335), FORD - (CXO2677), IRVING TRUST  VC CLOSURE STATUS


9 DEC 88, FROM BOB BRASSARD

The 2nd cause of the HSC VC Closure (RTNDAT/CNF TIMEOUT during heavy
BACKUP between 1 85/87/88xx and multi-HSC disk/tapes) was isolated about
3 weeks ago.  As you will remember, the 1st bug was with HSC KCI ucode:
SNDDAT packets would not be sent if DATREQs were in DMA_CREDIT stall...
essentially KCI supervisor loop (scheduler) bug.

The 2nd problem involves the use of the CI's 4 prioritized commandd
queues: COMQ0 (low), COMQ1, COMQ2, and COMQ3 (highest).  VMS sends messages
normally on COMQ1 (including MSCP) except for VC-Closure on COMQ0;
disk-writes use DATREQ0 (COMQ0), initiated by HSC; tape-writes use
DATREQ2 (COMQ2).  If CI-port is transferring data at its limit
(1.2 Mb for CIBCA on 85/87/88xx), DATREQ2/COMQ2 and MSCP-MSG/COMQ1
activity will pre-empt CI ever looking at COMQ0 (disk-write DATREQ0);
COMQ0 latencies as high as 90-seconds were observed.

The short term solution will be a PADRIVER patch to put all
messages on COMQ0.  This way, if CI is too busy to look at COMQ0,
HSC will run out of work (reads/writes), thus throttling data-transfers
until CI works on more messages from COMQ0.


The VMS PADRIVER patch is only a short-term solution.  The CI-Architectual
committee is re-investigating CI-PORT prioritization algorithms, with
possible major scheduling changes for future CI products.

The PADRIVER patch was just tested during the past 2 weeks for performance
impact on message rates: negligible except for data-xfer saturated 
CI-ports where message rates dropped 60%.  I will be generating a 
work-around package/procedures/documentation for the 3 required fixes:
PADRIVER patch, KCI V2.54 ucode (L0107-YA @ Rev-E2/3/4), CRONIC V370 patch
to extend host data-xfer timeout from 3 to 45 seconds (workaround for HSC
SNDDAT pipelining/sequencing problem: finishes SNDDATs out of order sometimes). 
BTW, KCI V2.54 will soon be released as HSC50/70 FCO required for RA70 drives;
initial RA70s  will include 2 sets of 12-PROMS each.

Best Regards, Bob Brassard


! CVG FAS-TESTING INTEREST DISTRIBUTION LIST: CVG_FAS.DIS
! =======================================================
NM%SSDEVO::LARY
NM%SSDEVO::BEAN
NM%SSDEVO::SHIVELY
NM%SSDEVO::MOE
NM%SSDEVO::KOLLER
NM%SSDEVO::ENGLUND
NM%SSDEVO::REPKA
NM%SSDEVO::ELMER
NM%HYEND::BLYONS
NM%CVG::TODHUNTER
NM%ACTIVE::GOELZ
NM%CSSE32::GOELZ
NM%VCSESU::TODHUNTER
NM%HYEND::WERTH
NM%HYEND::HJAKIELA
NM%HYEND::AVERY
N%INANNA::BALKOVICH
NM%HYDRA::BOAEN
NM%HYDRA::NIELSEN
NM%HYDRA::HAYAKAWA
NM%FROBUS::CONNOR
NM%CVG::TOMASWICK
NM%CVG::VIEIRA
NM%CVG::BAKER
NM%VOLKS::FREEMAN
NM%VOLKS::WHITE
NM%VOLKS::BRASSARD
NM%CVG::BRASSARD
NM%PYONS::BRANNON
NM%CSSE::MILLER
NM%CSSE::HOWINGTON
NM%SUPVAX::BLENDINGER
NM%PTOVAX::PEARLMAN
MTS$"FHO::BILL NOSEWORTHY"
MTS$"OHF::RICH LYONS"
MTS$"CYO::ROBERT B LEWIS"
MTS$"PTO::STEPHEN STEVENS"
MTS$"PTO::BILL REIGHT"



From:	STAR::OSHAUGHNESSY "Dan, ZKO3-4/U14, DTN 381-1268, pole T/B8" 16-DEC-1988 11:26
To:	VOLKS::BRASSARD,VOLKS::WHITE,CHIN,FOX,THIEL
Subj:	VMS SUPPORT OF RESTRICTED DISTRIBUTION OF FAS PATCH

          DIGITAL       INTEROFFICE MEMORANDUM


          TO:      Bob Brasssard             DATE: December 15, 1988
                   George White              FROM: Dan O'Shaughnessy
                                             DEPT: 354
                                             EXT:  381-1268
                                             LOC:  ZK03-4/U14
                                             ENET: STAR::OSHAUGHNESSY


          cc:      T. Chin
                   M. Fox                               
                   D. Thiel


          SUBJECT: VMS Support of FAS Patch


          VMS supports the restricted distribution of the "FAS" patch
          written by Bob Brassard. Suitable warnings concerning the
          impact to a system's sequenced message I/O performance (con-
          nection manager and lock manager traffic) will accompany
          the patch. Bob Brassard will manage the distribution of the
          patch to insure that the performance impact on a candidate
          site has been carefully considered. The patch should not
          be published or made generally available for at least 6 months.
          This time period should provide us with sufficient infor-
          mation on how often the problem occurs on customer sites
          and of any unintended side effects the patch may have.

          A longterm solution should be provided by the SCA and CI
          Architecture groups. Another meeting including VMS,CSSE,SASE
          and architecture representatives should be planned in 3 months,
          March 1989, to discuss and reevaluate the situation. At this
          time a decision should be made to allow the general (un-
          restricted release) release of the "FAS" patch in June 1989
          or whether some other "midterm solution" is needed before
          a "longterm" architectured solution is available.


From:	SSDEVO::ELMER "Randy Elmer MLDS CSSE CESG MGR. 522-3874					Being flexible means never being bent out of shape  25-Oct-1989 1614" 25-OCT-1989 18:24:12.77
To:	MOE
CC:	RON,GARY,VOLKS::BRASSARD
Subj:	V39A access over the net

Karen

The 4x4 today agreed that when we ship to SSB V39A saying this is good code 
we should also it on the net and make it available to all internal customers 
for early exposure to V39A.  We also agreed that for a handful of customers 
that may have a specific CLD open that V39A will fix we hand manage those 
site and provide an early release as well across the NET.

Can we get Stacy to place V39A into HSC$ENETKITS with the release notes and 
remove V390/V394?

Thanks

Randy

From:	GENRAL::SSDEVO::ELMER        "Randy Elmer MLDS CSSE CESG MGR. 522-3874					Being flexible means never being bent out of shape"  9-NOV-1989 18:34:24.98
To:	GENRAL::VOLKS::BRASSARD
CC:	RON
Subj:	RE: HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ?

Bob

I have answered your questions below.

Randy

=============================================

From:	GENRAL::VOLKS::BRASSARD "Bob B., VAX CSSE, 240-6492, AET 1-1/6  09-Nov-1989 1813"  9-NOV-1989 16:15:30.05
To:	GENRAL::ELMER,SSDEVO::REPKA,MYFILE
CC:	
Subj:	HSC CRONIC V39A AVAILABILITY FOR FIELD TEST ?? FT-AGREEMENT ? ENET LOCATION ? RELEASE NOTES ?

Hi Randy & Ron,

I have not seen any announcement on ENET availability of V39A.

Is this now copyable on ENET ?

>>> Yes, but we need to hand manage it until the SSB release date of 18 
>>> Decemeber, incase we find a bug that needs to have the code recalled.  Only 
>>> provide this code to the sites that are of political nature or we feel would 
>>> be a good field test.

>>> ENET location is SSDEVO::HSC$FIELDTEST:

Are there release notes to copy with it ?

>>> Yes in the same location.

Do we still need Field Test Agreement ?

>>> No the 4X4 agreed that because it did go to SSB and that we would hand 
>>> manage the code no field test agree in needed.  We just need to track it 
>>> and ensure it does not become public.

Neither have I seen Bob Lyons FAS meeting minutes with the approval status 
on the release/SDC-submission of CRONIC V39A with FAS fix.  Have you seen
any status ?  Rumor has it approved.

>>> The code was submitted to SSB with the FAS fix.  I to did not see the 
>>> minutes.

BTW, I am currently on CLD in Hartford, Ct.; so mail response may be slow.

Best Regards, Bob Brassard

From:	SSDEVO::REPKA        "RON REPKA HSC CSSE 522-6195" 14-NOV-1989 14:05:33.92
To:	VOLKS::BRASSARD
CC:	
Subj:	V39A Release Notes

             HSC VERSION V3.9A
             SOFTWARE RELEASE NOTES

             Order Number: AA-GMFAH-TK

             These release notes contain a summary of the features in the V3.9A
             software.

             digital equipment corporation maynard, massachusetts

             January, 1990

             The information in this document is subject to change without
             notice and should not be construed as a commitment by Digital
             Equipment Corporation. Digital Equipment Corporation assumes no
             responsibility for any errors that may appear in this document.

             The software described in this document is furnished under a
             license and may be used or copied only in accordance with the
             terms of such license.

             No responsibility is assumed for the use or reliability of soft-
             ware on equipment that is not supplied by Digital Equipment
             Corporation or its affiliated companies.

             Copyright (c)1990 by Digital Equipment Corporation

             All Rights Reserved.
             Printed in U.S.A.

             The postpaid READER'S COMMENTS form on the last page of this
             document requests the user's critical evaluation to assist in
             preparing future documentation.

             The following are trademarks of Digital Equipment Corporation:

             DEC             DIBOL         UNIBUS
             DEC/CMS         EduSystem     VAX
             DEC/MMS         IAS           VAXcluster
             DECnet          MASSBUS       VMS
             DECsystem-10    PDP           VT
             DECSYSTEM-20    PDT
             DECUS           RSTS
             DECwriter       RSX           DIGITAL


             This document was prepared using VAX DOCUMENT, Version 1.0

                                                                       Contents

               1    INTRODUCTION                                              1

               2    PREINSTALLATION CONSIDERATIONS                            1
                     2.1      Software Restrictions                            1
                      2.1.1     HSC50 Restricted to One Operator-Loaded
                                 Utility                                       2
                      2.1.2     Maximum of 12 Tape Drives and 12 Formatters   2

               3    HSC VERSION 3.9A SOFTWARE INSTALLATION                    3
                     3.1      Preinstallation Backup                           3
                     3.2      Software Installation Procedure                  3

               4    FEATURES IN HSC SOFTWARE VERSION 3.90                     6
                     4.1      Disk Server                                      7
                     4.2      Utilities                                        7

               5    MISCELLANEOUS ENHANCEMENTS                               10

               6    MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A         10
                     6.1      Disk Server                                     10
                     6.2      Tape Server                                     11
                     6.3      Block Size Recommendation For Non-TA90 Tape
                             Drives                                          12
                     6.4      Utilities                                       14
                     6.5      Miscellaneous Changes                           14

               7    HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR
                     MESSAGES                                                 15
                     7.1      HSC Version 3.90 Software Exception Codes       15
                     7.2      HSC Version 3.90 Software Error Messages        16
                     7.3      Operator Control Panel Fault Codes              17


               8    TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES          17
                     8.1      VTDPY Operation                                 18
                      8.1.1     Using the VTDPY Display                      19
                      8.1.2     VTDPY Error Messages                         27
                     8.2      Volume Shadowing                                27
                     8.3      Exception Codes                                 29

          1  INTRODUCTION

             The HSC Version 3.9A software release package contains these HSC
             Version 3.9A Software Release Notes and the Version 3.9A software
             distribution media. The software for the HSC70 is distributed on
             diskette. The software for the HSC50 is distributed on two TU58
             cassettes.

             These release notes document the following:

              o  All information in the HSC Version 3.90 Software Release Notes.

              o  Maintenance changes provided in Version 3.9A software to cor-
                rect identified problems in the Version 3.90 software.

          2  PREINSTALLATION CONSIDERATIONS

             This section contains information you should consider before
             upgrading your HSC to Version 3.9A software.

          2.1  Software Restrictions

             This section describes some restrictions of the HSC Version 3.9A
             software.

          2.1.1  HSC50 Restricted to One Operator-Loaded Utility
             Because of the smaller memory size of the HSC50, Version 3.9A
             software allows you to run only one utility at a time. This limi-
             tation ensures that sufficient memory is available to run Device
             Integrity Tests when scheduled by the CRONIC executive. If you at-
             tempt to run a second utility, the following message is displayed:

               KMON-F All Utility Partitions in Use

             Start the second utility when the first utility has completed, or
             press CTRL/C to terminate the first utility.

             This change does not affect the operation of Version 3.9A software
             on the HSC70, which retains its ability to run two utilities
             simultaneously.

          2.1.2  Maximum of 12 Tape Drives and 12 Formatters
             HSC Version 3.9A software supports a maximum of 12 tape drives
             and 12 formatters on each HSC. For example, 3 TA78 formatters
             with 4 tape drives on each formatter reach the 12-tape drive
             configuration limit. If more than 12 formatters or drives are
             configured on an HSC, one of the following messages is displayed:

               No tape formatter structures available for Requestor x Port y

             or:

               No tape drive structures available for Requestor x Port y

             When the HSC boots, resources are allocated to formatters on
             requestors in ascending requestor priority order until the limit
             of 12 tape drives and 12 formatters is reached. Resources are
             allocated among tape drives on the same formatter according to the
             arbitrary order in which the drives became known to the HSC.

          3  HSC VERSION 3.9A SOFTWARE INSTALLATION

             Use the following procedure to install the software supplied in
             this kit.

          3.1  Preinstallation Backup

             Before installing the software, use a blank diskette or cassette
             to make a backup copy of the software. Instructions for the copy
             procedure are in Chapter 10 of the HSC User Guide.

             If you need additional backup copies, order blank, formatted
             RX33 diskettes from the Software Distribution Center. Extra TU58
             cassettes can be ordered from the DECdirect catalog.

          
          3.2  Software Installation Procedure

                                            NOTE

                 If your HSC cluster has RA90 disk drives connected to it,
                 use the SHOW DISKS command to verify that the RA90 drives
                 report a minimum software revision level of MC = 10.

                 If the drives do not report the minimum software revi-
                 sion, ask your Digital Field Service Representative to
                 install FCO RA90X-O001 prior to installing HSC Version 3.9A
                 software.

             Use the following procedure to install the software:

              1. On each HSC being upgraded to Version 3.9A software:

                 o  Press CTRL/C.

                 o  Enter the SHOW SYSTEM command.





                                                                              3

 







                This produces a hard copy of system parameters, as shown in the
                following example:

          <CTRL/C>

                   HSCxx>SHOW SYSTEM <RETURN>
                   17-JUL-l988 14:42:43.41  Boot:  17-JUL-1988 11:31:11,41  Up:  3:11
                   Version: V39A           System ID:  %X0000000000B7       Name:  HSC006
                   Front Panel: Enabled                                     HSC Type: HSC70
                   Console Dump: Enabled    Load Dump: Disabled
                   Automatic DITs:  Enabled
                   Periodic DITs:  Enabled,  Interval = 1
                   Disk Allocation Class:  0     Tape Allocation Class:    0
                   Start-up Command File:  Disabled
                   Disk Drive Controller Timeout: 2 seconds
                   Maximum Tape Drives:  12
                   Maximum Formatters:  12
                   SETSHO-I Program Exit

              2. Print a hard copy of these system parameters if your system
                does not automatically produce a copy. Use this copy later in
                the procedure to reset your HSC's parameters.

              3. If your cluster does not have failover capabilities, shut down
                the cluster and perform Steps 6 through 17.

              4. Failover all disk and tape drives to the alternate HSC. Make
                sure none of the tape or disk drives are on line to the HSC you
                are upgrading to Version 3.9A software. The failover procedure
                is described in the Guide to VAXclusters.

              5. After successful failover, set the Online button on the HSC
                operator control panel to the out position.

              6. Open the HSC front panel, remove all old load media, and in-
                stall the new software system/utility media in the HSC load
                device. The new software system/utility media must be write
                enabled.




          4

 







                Instructions for loading the software are in the following
                sections of your HSC User Guide:

                 o  If you have an HSC70, refer to Section 4.2.

                 o  If you have an HSC50, refer to Section 4.3.

              7. Press and release the Init button as you hold in the Fault
                button. Hold in the Fault button until the following message
                appears:

                  INIPIO-I Booting

              8. When booting has completed:

                 o  Press CTRL/C.

                 o  Enter the RUN SETSHO command at the HSC> prompt.

                 o  If the SETSHO> prompt is not displayed, review the pre-
                   vious steps to ensure that you have properly installed
                   the software, booted the HSC, and entered the RUN SETSHO
                   command.

              9. At the SETSHO> prompt, enter the SHOW SYSTEM command to print a
                hard copy of the default parameters on the new load media.

              10.Compare this list of default parameters to the list you made in
                Step 1 for this HSC. At the SETSHO> prompt, use the following
                commands to reset the parameters to their former values:

                  SETSHO> SET NAME HSCaaaaaa <RETURN>
                  SETSHO> SET ID %Xnnnnn <RETURN>
                  SETSHO> SET ALLOCATE DISK n <RETURN>
                  SETSHO> SET ALLOCATE TAPE n <RETURN>
                  SETSHO> SET SERVER DISK DRIVE_TIMEOUT=n <RETURN>

                Chapter 6 of your HSC User Guide contains detailed descriptions
                of how to set each of these parameters.

              11.Set any other parameters required by your system configuration.
                When all parameters are set, enter the EXIT command at the
                SETSHO> prompt.


                                                                              5

 







              12.The HSC prompts ask if you wish to reboot the HSC. Enter YES.

              13.After the HSC reboots, press CTRL/C and enter the SHOW SYSTEM
                command.

              14.Compare the new parameters with the ones on the list you made
                in Step 1 to verify that all parameters are the same. If the
                parameters are not identical:

                 o  Enter the RUN SETSHO command at the HSC> prompt.

                 o  Return to Step 10 and set the parameters that need changing.

                 o  Continue with this procedure from Step 11.

              15.If all parameters are correct, press the operator control panel
                Online button to the in position. This allows the cluster to
                re-establish connections to this HSC.

              16.Enter the SHOW VIRTUAL_CIRCUITS command to verify that all
                connections have been made. This command lists nodes that
                have established virtual circuits with the HSC. Check that
                all active hosts have established virtual circuits to this HSC.
                If they have not, reboot the HSC and repeat this step.

              17.Failover the drives to the HSC on which you have just installed
                the new software. After all units have failed over, install the
                new software on the alternate HSC. After making the hard copy
                list of system parameters in Step 1, go to Step 5 and complete
                the software installation procedure.


          __________________________________________________________
          4  FEATURES IN HSC SOFTWARE VERSION 3.90

             This section describes the features in the Version 3.90 software.







          6

 






          __________________________________________________________
          4.1  Disk Server

             The HSC disk environment has been improved in the following ways:

              o  Some potential causes of IOT 4076 crashes have been located and
                resolved.

              o  A problem in the disk path that caused the HSC to report
                databus overrun errors has been resolved.

              o  HSC V3.90 software supports the maximum number of shadow sets
                allowed by your version of VMS. Refer to Section 8.2 for fur-
                ther discussion of Volume Shadowing. Detailed information on
                VMS shadowing support is found in the VAX/VMS Volume Shadowing
                Manual.

          __________________________________________________________
          4.2  Utilities

                                          DKCOPY

             A new message for DKCOPY clears confusion when the requested tar-
             get device is either in use or nonexistent. Refer to Section 7.2
             for a description of this error message.

             A new message for DKCOPY warns you when the target device of
             a disk-to-disk copy is hardware write protected. Refer to
             Section 7.2 for a description of this error message.

                                          SETSHO

             SETSHO has been updated as follows:

              o  SET MAX_FORMATTER -- Changed to improve performance.

              o  SET MAX_TAPES -- Changed to improve performance.

              o  SET SERVER -- Changed to improve performance.

              o  SET PROMPT -- Allows you to select your own prompt on the HSC.

              o  SET REQUESTOR -- Allows you to select the correct data channel
                microcode.


                                                                              7

 







              o  SET RESTART CLEAR -- Renamed to SET EXCEPTION CLEAR to more
                closely describe the command function. This change deletes the
                SET RESTART command.

              o  SET SECTOR_SIZE -- Deleted and the default sector size set to
                512 bytes.

              o  SET OUTBAND/SHOW OUTBAND -- Merged with the SET ERROR and SHOW
                ERROR commands.

              o  SET DEVICE [NO]HOST_ACCESS -- Changes the state of the requestor
                when you exit SETSHO instead of when the command is entered.
                This allows you to exit SETSHO with a CTRL/Y if you want the
                requestor to remain in the previous state.

              o  SHOW REQUESTOR -- Displays information about each requestor
                connected to the HSC.

              o  SHOW CONNECTIONS -- Displays information about all virtual
                circuits and connections the HSC has with other nodes.

                                      BACKUP/RESTOR

             BACKUP/RESTOR has been updated as follows:

              o  A Write Ring Missing problem that caused BACKUP to abort has
                been resolved. BACKUP now allows you to correct the problem and
                continue without aborting the backup operation.

              o  BACKUP/RESTOR no longer supports 576 byte records.

              o  New features have been added to enhance tape unloading:

                 --  The tape drive will not unload if you press CTRL/C before a
                   tape drive has started reading or writing operations.

                 --  If a tape drive finishes a backup or restore operation
                   without using all of the mounted tapes, the extra tape is
                   not unloaded. This reduces the amount of wasted tape because
                   it identifies an empty tape by not unloading it.





          8

 







              o  New prompts improve performance and reduce operator interven-
                tion.

                 --  You may run the BACKUP and RESTOR utilities without operator
                   interaction. The HSC prompts with either of the following:

                     Would you like to run BACKUP with "NO OPERATOR"? [N]

                   or

                     Would you like to run RESTOR with "NO OPERATOR"? [N]

                   If you press RETURN or answer N, the utility continues
                   to prompt you for appropriate responses. If you leave the
                   terminal during a backup or restore operation, the utility
                   times out 5 minutes after a query and aborts.

                   If you answer Y, further system queries are disabled. The
                   utility bypasses all further prompts and uses the default
                   responses instead of your inputs. This feature allows you to
                   leave the terminal and perform other tasks without further
                   interaction with the utility.

                                                  NOTE

                       If you disable the queries, you may not know when the
                       volume to back up has expired or when the save-set of
                       a restore has expired. However, the warning message
                       for this condition appears almost immediately after
                       the NO OPERATOR query. When you see this message, you
                       may press CTRL/C to abort the operation. Otherwise,
                       the operation continues after a 10-second delay.

                 --  When BACKUP encounters a tape reel that has reached its
                   limit of media errors (hard write errors), it prompts you to
                   increase the error threshold with the following message:

                     Do you wish to increase the media error limit for this tape reel
                     and continue? (Y or N) [N]:




                                                                              9

 







                   Press RETURN or answer N if you do not wish to change the
                   error threshold. You are then prompted to change the tape
                   reel.

                   Answer Y to increase the threshold and continue. You are
                   then prompted for the increased error limit. After you enter
                   the new limit, the operation continues.

                 --  HSC70 users may now perform a backup or restore operation
                   using a 16K-byte record size instead of the default 8K-byte
                   record size. This feature is described in Chapter 7 of your
                   updated HSC User Guide.


          __________________________________________________________
          5  Miscellaneous Enhancements

                                       Node Lockout

             A node lockout problem in the Diagnostics and Utilities Protocol
             (DUP) Server is now resolved.

          __________________________________________________________
          6  MAINTENANCE CHANGES IN HSC SOFTWARE VERSION 3.9A

             This section describes the maintenance changes in the Version 3.9A
             software.

          __________________________________________________________
          6.1  Disk Server

             The following disk server changes have been implemented in the
             Version 3.9A software:

              o  A possible problem in which an MMU crash may result if a forced
                error is detected on a 2- or 3-member shadow set is resolved.
                This fix also corrects the possible problem in which a repair
                operation may not be performed as expected on an LBN with
                forced error set.


          10

 







              o  A potential cause of excessive positioner errors and IOT 4076
                crashes has been corrected.

              o  An extremely rare problem in which a primary revectored block
                is not handled properly has been corrected.

              o  A possible problem of virtual circuit closures on disks during
                extremely heavy tape activity has been corrected.

          __________________________________________________________
          6.2  Tape Server

             The following tape server changes have been implemented in Version
             3.9A software:

              o  A possible problem in which failover may not occur in the
                unlikely event that an operator releases a selected port button
                on a tape drive while the drive is transferring a heavy data
                load has been resolved.

              o  The ILTAPE diagnostic, in all circumstances, recognizes a TA90
                tape drive and prompts for write memory region parameters.

              o  The restriction of no more than one TA90 tape formatter on the
                same requestor has been lifted.

              o  An error flag problem that caused VAXsimPLUS to erroneously
                signal alarms on tape drives that were actually operative is
                resolved.

              o  When you run heavy TA90 loads and either a Cache Data Lost or
                Cache Busy condition occurs, an IOT 6037 crash will no longer
                occur.

              o  The default drive timeout has been increased from 30 to 80
                seconds to provide a workaround for the problem of drives
                unexpectedly changing to the AVAILABLE state. This may cause
                failover to take longer.





                                                                             11

 







              o  The following changes have been made in pipeline error report-
                ing:

                 -  A problem that caused improper reporting of pipeline errors
                   has been corrected.

                 -  The severity level of the message that reports a pipeline
                   error has been changed from ERROR to WARNING.

                 -  When an application or operating system issues a tape com-
                   mand with the inhibit error recovery condition set, the HSC
                   treats a pipeline error as recoverable. For example, if a
                   pipeline error occurs when you are running VMS Backup, the
                   HSC recovers the error even though the default is to inhibit
                   the error recovery. A pipeline error is NOT a tape error.

          __________________________________________________________
          6.3  Block Size Recommendation For Non-TA90 Tape Drives

             The HSC CI interface can be significantly faster than the host CI
             adapter when performing multiple backups. Therefore, it is possi-
             ble that all of the CI bandwidth can be used by tape traffic and
             cause data timeouts and virtual circuit closures. To prevent this,
             a change in the V39A software more evenly distributes the data
             flow over the CI to the hosts without noticeably affecting the
             overall data throughput. Because of this change, it is strongly
             recommended that you use the following operational parameters if
             you wish to use a block size greater than 24Kb with VMS Backup
             (the default is 8Kb):

              o  If only one requestor is configured for tape in the HSC, the
                maximum recommended block size is 48Kb.

              o  If two requestors are configured for tape in the HSC, the
                maximum recommended block size is 32Kb.

              o  If more than two requestors are configured for tape, the maxi-
                mum recommended block size is 24Kb.




          12

 







                                            NOTE

                 These guidelines apply to the number of tape requestors
                 configured for tape in the HSC (not the number of tape
                 requestors actively transferring data).

                 They DO NOT apply to the cached TA90 tape drive. It is
                 still recommended that a 64Kb block size be used with the
                 TA90.

             Use the following procedure to determine how many requestors are
             configured for tape in your HSC:

              1. Press CTRL/Y on the HSC console or terminal.

              2. Type SHOW REQUESTOR. Each requestor displayed as type K.sti is
                configured for tape.

             Failure to follow these recommendations can result in pipeline and
             drive- detected EDC errors when running multiple backup streams.
             Pipeline errors are the result of the CI and host momentarily
             not being able to supply data fast enough to the HSC during tape
             writes. Due to the bursty nature of TA78, TA79, and TA81 tape
             transfers, it is not uncommon to see an occasional pipeline error.
             However, the more even distribution of data flow between tape
             and disk in the V39A software will cause these errors to be seen
             much more frequently. Following the recommended block size will
             eliminate the possibility of these errors occurring and will have
             minimal performance impact.

                                            NOTE

                 Pipeline errors DO NOT indicate any hardware or software
                 fault in the HSC or host.

             If a pipeline error occurs, the VMS Error Log prints the following
             message for the MSLG$EVENT field in the error log entry:

               Data OVRFLW due to pipeline error




                                                                             13

 







             The associated drive-detected EDC error can be recognized by a
             code of 0440 in the ERRN1/ERRNUM field in the error log entry. It
             will also have the same command reference number as the pipeline
             error. To eliminate pipeline errors, reduce your block size ac-
             cording to the recommendations provided. These errors are fully
             recoverable.

          __________________________________________________________
          6.4  Utilities

                                          BACKUP

             You can now use the "NO OPERATOR" feature when you run BACKUP.
             This feature is described in Chapter 7 of the HSC User Guide.

             When you run BACKUP and reach the media error limit, the following
             conditions occur:

              o  If you have chosen to run a backup operation without operator
                interaction, the media error limit is automatically increased
                and the backup operation continues.

              o  If you have chosen to run a backup operation with operator
                interaction, BACKUP prompts you to increase the media error
                limit.

          __________________________________________________________
          6.5  Miscellaneous Changes

                       Booting a System with a Shadowed System Disk

             The HSC polling algorithm has been changed. This provides a
             workaround to decrease the system boot time when booting from
             a shadowed system disk when the virtual unit for the shadow set
             has not yet been formed.







          14

 






          __________________________________________________________
          7  HSC VERSION 3.90 SOFTWARE EXCEPTION CODES AND ERROR MESSAGES

             This section lists the exception codes and error messages in the
             Version 3.90 software.

          __________________________________________________________
          7.1  HSC Version 3.90 Software Exception Codes

          4115

             DCB address inconsistency

             Facility: DISK, SDI

             Explanation: While processing an error on a seek DCB, the facil-
             ity found an inconsistency between the current seek DCB address
             and the DCB address stored in the DRAT. This new crash code was
             created in connection with the fixes for IOT 4076 crashes.

             User Action: Submit an SPR with the crash dump.

          4116

             Bad error completion queue in DCB

             Facility: DISK, MSCP

             Explanation: The DCB error completion queue was not properly
             restored during DCB completion.

             User Action: Submit an SPR with the crash dump.

          4117

             No DRAT on DRAT list head when expected

             Facility: DISK, ERROR

             Explanation: No elements were found on the DRAT queue when the
             error process tried to remove a DRAT from the head of the queue.

             User Action: Submit an SPR with the crash dump.



                                                                             15

 






          __________________________________________________________
          7.2  HSC Version 3.90 Software Error Messages

          DKCOPY-E-INVALR--Invalid unit id. Valid range is 0 through 4094

             Explanation: You have entered a unit identification number that is
             not in the range of 0 through 4094.

             User Action: Enter a unit identification number within the valid
             range.

          DKCOPY-E-OFFLINE--Specified unit is offline or nonexistent

             Explanation: You have entered a unit identification number that is
             not recognized by the system.

             User Action: Check the unit identification number and enter the
             command again.

          DKCOPY-F-RUNSTOP--No volume mounted or drive disabled via RUN/STOP

             Explanation: One or both of the drives that you are using to
             perform a disk-to-disk copy does not have a volume mounted or
             is spun down.

             User Action: Check to see that a volume is mounted and that both
             drives are spun up.

          DKCOPY-F-WRITEPROTECT--Unit is write protected

             Explanation: The target unit of a disk-to-disk copy is hardware
             write protected.

             User Action: Press and release the write protect button. Run
             DKCOPY again.

          KMON-F All Utility Partitions in Use

             Explanation: You attempted to run more than one utility at a time.

             User Action: Wait until the currently running utility has com-
             pleted or terminate its operation.




          16

 







          VERIFY-E-INVALR-Invalid unit id. Valid range is 0 through 4094

             Explanation: You have entered a unit identification number that is
             not in the range of 0 through 4094.

             User Action: Enter a unit identification number within the valid
             range.

          __________________________________________________________
          7.3  Operator Control Panel Fault Codes

             Your operator control panel may display the following fault code:

             __________________________________________________________________
             Status_Code_(Octal)__Description__________________________________


             33                   Invalid hardware configuration


             __________________________________________________________________

             This fault code indicates that the configuration of modules in
             your HSC is not supported. Contact your Digital Field Service
             Representative if this fault code is displayed on your operator
             control panel.

             Chapter 3 of your HSC User Guide contains a complete listing of
             operator control panel fault codes.

          8  TOPICS FROM PREVIOUS HSC SOFTWARE RELEASE NOTES

             This section contains important topics that are carried forward
             and updated from previous issues of the HSC Release Notes. You
             will need this information if you are a new user of the HSC.

          8.1  VTDPY Operation

             VTDPY is a utility for gathering and displaying system statistics.
             VTDPY can display system throughput, status of the disk and tape
             drives, and utilities running on other terminals. This utility
             also indicates which nodes have virtual circuits, connections, and
             multiple connections to the HSC.

                                            NOTE

                 Avoid running VTDPY using the VMS command SET HOST/HSC with
                 VMS versions prior to V4.6.

             This utility requires a video terminal and does not display on
             a hard-copy printer. Either a VT100, a VT220, or a VT320, set
             at 9600 baud, must be attached to the EIA port on the HSC to run
             VTDPY.

             To run VTDPY, enter the following command at the HSC> prompt:

                 HSC> RUN [device]:VTDPY [update-interval]

             Where device is the device holding the VTDPY program. For the
             HSC50, the device is DD1:, and for the HSC70, the device is DX0:.

             The update-interval is in seconds, from 2 to 420. If this update
             interval is not provided, VTDPY prompts:

                 VTDPY-Q Interval (secs) ?

             If the response is outside the allowable range, VTDPY displays an
             error message. The higher the number for the update interval, the
             less the performance impact on the HSC.

             VTDPY terminates after you enter a CTRL/Y or a CTRL/C. The screen
             is cleared upon termination.

          The following control keys are used in VTDPY:

                CTRL/E--Displays tape status on the next refresh. Thereafter,
                the display alternates with disk status on subsequent re-
                freshes.


                CTRL/D--Displays disk status on the next refresh. Thereafter,
                the display alternates with tape status on subsequent re-
                freshes.


                CTRL/V--Displays host path status information (i.e., A, B, or a
                diamond) on the next refresh only.


                CTRL/W--Refreshes the screen.

          __________________________________________________________
          8.1.1  Using the VTDPY Display
             This section presents a sample VTDPY display and explains the
             meaning of the fields in the display.

	HSC70 V39A HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12  UP: 113.49

	42.9% Idle    39 Work Requests/Sec     40 Sectors/Sec      0 Records/Sec

	Free Lists             Process Pr St  Time%         Disk Status
	CTRL Blks   2269 +          Kernel       16.4%               1111111111
	SLCB/DCB      32 +       2 VTDPY  11 Rn  19.2%     +1234567890123456789
	Buffers      889 +      50 DEMON  11 Bl           0.....................
	52 PDEMON  7 Bl          20A.A..........A......
	Pool Sizes           54 PSCHED 13 Rn  42.9%   40..........A.A.A.....
	SYSCOM      1800 +      72 DISK    9 Rn  16.0%   60.AA.......O..A..O...
	Kernel      6504 +     110 ECC     6 Bl          80....................
	Program   821120 +     120 TAPE    8 Bl         100....................
	Control    32436 +     122 TTRASH  7 Bl         120....................
			       124 HOST    4 Bl    .9%  140...........O........
	Data B/W used:    .0%  126 POLLER  5 Bl         160....................
			       130 SCSDIR  5 Bl    .9%  180..................A.
	Host Connections                                200A...................
		   111111111122222222223333333333       220....................
	 0123456789012345678901234567890123456789       240....................
                   0MM..C....V....M.........................
                  40........................................

             The VTDPY display is continuously updated at the update interval
             you have set and it changes as the internal state of the HSC
             changes. These changes are made for all fields in the display,
             except those fields relating to HSC memory. Memory statistics are
             updated by pressing CTRL/W.

             The major fields are explained in the following paragraphs. As
             you read this section, refer to the VTDPY display to see where the
             fields are located and to the paragraphs below the sample fields
             to interpret the meaning of the fields.

               HSC70 V370 HSC001 Id 0000000000DD On 14-Apr-1988 12:28:13.12  
		UP: 113.49

             The top line, reading from left to right, shows the HSC model num-
             ber (HSC70); the baselevel of the operating software (V3.90); the
             system name (HSC001); the HSC id number, given as a hexadecimal
             number unique in the cluster (in this case 0000000000DD); and the
             system date and time. The last number on the right indicates the
             hours and minutes the HSC has been running since the last boot or
             reboot.

               42.9% Idle   39 Work Requests/Sec   40 Sectors/Sec   0 Records/Sec

             This second line in the display shows the percentage of current
             P.io idle time, average number of work requests (i.e., MSCP and
             TMSCP) per second, number of disk data sectors transferred per
             second, and number of tape data records transferred per second.
             These numbers are normalized to match the update interval.

                      Free Lists
                   CTRL Blks   2269 +
                   SLCB/DCB      32 +
                   Buffers      889 +

                      Pool Sizes
                   SYSCOM      1800 +
                   Kernel      6504 +
                   Program   821120 +
                   Control    32436 +

             This field represents the quantity of available memory and memory
             structures. The units used in the display are:

                CTRL Blks -- Blocks
                SLCB/DCB -- Number of structures
                Buffers -- Number of buffers
                Pool Sizes -- All are given in words of memory

             The numbers are usually followed by plus signs. If the numbers are
             followed by minus signs, the system is in memory deficit. During
             memory deficit, the HSC slows down and, if the deficit lasts long
             enough, the HSC could crash.

                 Data B/W used:    .0%

             This display shows the percentage of HSC data bus bandwidth used.
             This is an instantaneous display and may often show 0% when the
             HSC is busy, because the bandwidth was zero at the instant the
             sample was taken.

                      Host Connections
                              111111111122222222223333333333
                    0123456789012345678901234567890123456789
                   0MM..C....V....M.........................
                  40........................................

             This display indicates host connection status. The two horizontal
             rows of numbers below the Host Connections heading represent host
             node numbers 0 through 39. Each digit on the first line is read
             with the digit directly below it to form the numbers 10 through
             39.

             The connection status for host node numbers above 40 is read on
             the last line of the display. Add the base number 40 at the far
             left of the last line to the number above the display to derive
             these host node numbers.

             The next line indicates the status of the host connections. A
             C on this line indicates one connection to that host, and an
             M indicates multiple connections. Because each host can make a
             separate connection to each of the disk, tape, and DUP servers,
             this field frequently shows multiple connections. In the example,
             nodes 0, 1, and 14 show multiple connections, and node 4 shows one
             connection.

             If no letter corresponds to the node number, that host does not
             have any connection to the HSC. If a V appears on that line, a
             Virtual Circuit only is open and no connection is present. This
             usually means the host is in a transitional state. The example
             shows node 9 with only a virtual circuit open.

                      Host Path Status
                              111111111122222222223333333333
                    0123456789012345678901234567890123456789
                   0^A..^....B....A.........................
                  40........................................

             When you press CTRL/V, the display toggles to an alternate Host
             Path Status display for one refresh only. This display contains
             CI path status information and each position can contain either a
             diamond symbol, an A, or a B. If one path (A or B) goes down,
             this display alternates on every other refresh with the Host
             Connections display until that path comes back.

             The meanings of the symbols are as follows:

              o  A solid diamond symbol means normal operation (both paths
                operating). This symbol is represented in the example with
                a caret (^).

              o  An A or B indicates only one CI path is operational. If an
                A is displayed, path A is active, but path B is not; if a
                B is displayed, path B is active, but path A is not. These
                conditions indicate a probable hardware problem.

             The example shows that nodes 0 and 4 have both paths operating.
             Nodes 1 and 14 have only path A operating, and node 9 has only
             path B operating.

                    Process Pr St  Time%
                      Kernel       16.4%
                   2 VTDPY  11 Rn  19.2%
                  50 DEMON  11 Bl
                  52 PDEMON  7 Bl
                  54 PSCHED 13 Rn  42.9%
                  72 DISK    9 Rn  16.0%
                 110 ECC     6 Bl
                 120 TAPE    8 Bl
                 122 TTRASH  7 Bl
                 124 HOST    4 Bl    .9%
                 126 POLLER  5 Bl
                 130 SCSDIR  5 Bl    .9%

             The previous portion of the display shows the active processes.
             The columns in this display (from left to right) mean the follow-
             ing:

              o  The first column is the process number.

              o  The Process column shows the name of the process running at the
                time.

              o  The Pr column shows the priority of the process.

              o  The St column shows the status of the process, either running
                (Rn) or blocked (Bl).

              o  The Time% column is the percentage of P.io time each currently
                running process is using.

             Names in the process column under Kernel (the operating system)
             are defined as follows:

              o  VTDPY is running. However, another utility could be running, in
                which case the priority number might change also.

              o  DEMON indicates that demand and automatic device integrity
                tests are running.

              o  PDEMON indicates that periodic device integrity tests are
                running.

              o  PSCHED is the scheduler for periodic device integrity tests.
                This is the HSC idle loop.

              o  DISK is the disk server.

              o  ECC is the error correction code process and is displayed when
                disk I/O is active.

              o  TAPE is the tape server.

              o  TTRASH is displayed when the tape server is active. This pro-
                cess sends tape error logs to the host.

              o  HOST is the process that interfaces to the host. It is always
                present.

              o  POLLER polls for the host processors and is present when a
                connection is present.

              o  SCSDIR processes directory requests from the host.

             Not all active processes are necessarily shown. Because of lim-
             ited space on the screen, the display of some processes may be
             truncated and the CPU time percentages may not total 100 percent
             depending on the polling interval of the data sample.

                          Disk Status
                              1111111111
                    +1234567890123456789
                   0....................
                  20A.A..........A......
                  40..........A.A.A.....
                  60.AA.......O..A..O...
                  80....................
                 100....................
                 120....................
                 140...........O........
                 160....................
                 180..................A.
                 200A...................
                 220....................
                 240....................

             The last area in the display alternates between disk and tape
             status displays when both device types are connected to the HSC.

             The two horizontal rows of numbers under the Disk Status heading
             represent the numbers 0 through 19. Each 1 on the first line is
             read with the digit directly below it to form the numbers 10
             through 19. This number is added to the numbers 0 through 240
             given on the vertical axis of the display to derive the disk unit
             number.

             For example, the letter O in the approximate center of the display
             refers to disk unit 151 because it is at the intersection of
             the number 140 on the vertical axis and the number 11 in the
             horizontal rows, and the sum of 140 and 11 is 151.

             The drive status is coded as follows:

              o  The letter O indicates the drive is Online. That is, the drive
                is in use by a host, an HSC utility, or an HSC device integrity
                test. In the example, drive unit number 151 is on line.

              o  An A indicates the drive is Available but not mounted. Drive
                unit number 62 is available.

              o  A D indicates the HSC is connected to Duplicate units (two or
                more drives with the same unit number).

              o  A U indicates the drive is in an Undefined state.

             The letters and method of determining the drive unit number are
             the same when tape status is displayed. In the tape status dis-
             play, an additional letter, F, indicates that no tape is mounted
             on the tape drive.

          __________________________________________________________
          8.1.2  VTDPY Error Messages
             This utility has two error messages, as follows:

          VTDPY-E Illegal Interval Value (2 to 420 seconds)

             Explanation: You have entered an update interval outside the range
             permitted. VTDPY reprompts for the update interval.

             User Action: Reenter a value within the correct range.

          VTDPY-F Insufficient Common Pool

             Explanation: This message indicates insufficient memory to run
             VTDPY.

             User Action: Retry VTDPY when the demands on memory are reduced.

          __________________________________________________________
          8.2  Volume Shadowing

             This release supports VMS Volume Shadowing. HSC Version 3.9A
             supports the maximum number of shadow sets specified in the VMS
             Volume Shadowing Software Product Description (SPD).

             When you run volume shadowing, adhere to the following rules:

              o  Use only identical disk types with the same geometry within a
                shadow set.

              o  Do not attempt to dismount the source shadow member of a shadow
                set while a VMS Shadow Copy operation is in progress. The VMS
                command, SHOW DEVICE, indicates whether such an operation is
                executing.

              o  Read Section 2.8 of the VAX/VMS Volume Shadowing Manual, which
                describes a method using a particular former shadow set member
                as the source for all copy operations involved in rebuilding
                the shadow set.

              o  Always include the device names for all shadow set members
                in the shadowing MOUNT command. The VMS operating system will
                correctly select between source and target members for you.

             Note the following items specific to volume shadowing:

              o  During a copy operation, different VAXcluster members may have
                different views of the shadow set's membership, as shown by the
                SHOW DEVICE command. This situation corrects itself when the
                copy operation completes. Differences appear when a shadow set
                is first mounted and during the copy operations resulting from
                shadow set failover processing.

                Although this situation can be confusing, it is relatively
                harmless. If the condition results from a MOUNT command, the
                SHOW DEVICE output on the VAXcluster member where the MOUNT
                command was executed is the most accurate view of the shadow
                set.

              o  During a merge copy operation (initiated either by a MOUNT
                command or as a result of a shadow set failover), only the
                VAXcluster member executing the copy indicates a merge copy is
                executing. All other VAXcluster members indicate a full copy is
                being done. This is part of the volume shadowing design used by
                the HSC controller and VMS operating system.

              o  Hardware write-protected shadow sets are not supported. If you
                write protect the members of a shadow set, any data degradation
                errors will be unrecoverable.

              o  Shadow set members with foreign file structures (that is, not
                FILES-11 ODS 2) receive limited support. Full volume shadowing
                support requires the ability to store shadow set context some-
                where on the shadow set member volumes. This is not possible
                for volumes with a foreign file structure. Read the VAX/VMS
                Volume Shadowing Manual carefully before attempting to use
                volume shadowing on volumes with foreign file structures.

          __________________________________________________________
          8.3  Exception Codes

             This section provides error codes and user actions.


          004106

             DRAT allocation failure

             Facility: DISK, MSCP

             Explanation: While preparing to read the Factory Control Table
             (FCT) during online processing, the DRAT allocation subroutine
             failed.

             User Action: Submit an SPR with the crash dump.

          004107

             Command not completed after drive declared inoperative

             Facility: DISK, MSCP

             Explanation: Get Command Status processing declared the drive
             inoperative, but the command still failed to complete within the
             timeout period.

             User Action: Submit an SPR with the crash dump. Note the type
             of the drive identified in the error message. The error message
             identifies the unit number; the drive type for the unit number may
             be obtained from a SHOW DISKS display.

          004110

             GCS Status Overflow

             Facility: DISK, MSCP

             Explanation: Get Command Status processing determined that the
             calculated status will result in a overflow.

             User Action: Submit an SPR with the crash dump.

          004111

             A timer has link field values inconsistent with its current opera-
             tional state

             Facility: DISK, many

             Explanation: When a timer was added or removed from the active
             list, it was in a state that should not exist.

             User Action: Submit an SPR with the crash dump.

          004112

             A unit is incorrectly marked as a shadow set member

             Facility: DISK, many

             Explanation: A unit was incorrectly marked as being a member of a
             shadow set.

             User Action: Submit an SPR with the crash dump.

          004113

             No DRAT list invalid

             Facility: DISK, many

             Explanation: During Fragment Request Block (FRB) retirement while
             declaring a drive inoperative, the NO DRAT list was found to be
             invalid.

             User Action: Submit an SPR with the crash dump.

          004114

             Connection closed after delay in ATTN process

             Facility: DISK, ATTN

             Explanation: While the disk server was waiting to acquire re-
             sources to send an attention message to the host, the connection
             closed.

             User Action: Submit an SPR with the crash dump.

          007022

             Invalid BMB address

             Facility: CIMGR, CIMISCPRC

             Explanation: A Host Message Block (HMB) arrived at the resource
             collector with an invalid Big Message Block (BMB) address attached
             to it.

             User Action: Note the K.pli microcode revision level with a SETSHO
             SHOW REQUESTORS command. The K.ci MC version reported by this
             command is the K.pli microcode revision level. If the revision
             level is less than revision 45, contact your Digital Field Service
             Representative for a K.pli microcode update. Also, note the cur-
             rent disk configuration. If the K.pli microcode revision level is
             greater than or equal to 45, submit an SPR with the crash dump and
             the noted disk configuration.

          007023

             SCS buffer retrieval failure

             Facility: CIMGR, CISUBRS

             Explanation: When changing the status of the virtual circuit, the
             CIMGR tried to retrieve the SCS buffer from the K.ci .KHSRR queue.
             This buffer should have been on the queue because it was not in
             use at the time of the crash. If no elements have been queued to
             the .KHSRR queue, CIMGR would have forced a crash.

             User Action: Submit an SPR with the crash dump.

          062002

             Common Pool memory returned twice

             Facility: Many

             Explanation: A process attempted to return a memory segment that
             was already in the Common Pool.

             User Action: Submit an SPR with the crash dump.

********************************************************************************
          32

From:	GENRAL::FIALA        "Eschew Obfuscation." 17-NOV-1989 10:45:48.95
To:	VOLKS::BRASSARD
CC:	FIALA
Subj:	CLD CX4373..

Hi.
	Rone Repka forwarded your memo to me.
	I have the invidious position of remedial support for CX controllers.
	
	You obviously need HSC V39A. I am hand managing distribution of 39A
	untill SDC start shipping mid January.
	I have 3 bundled savesets with everything needed [including 
	instructions] inside.
	1 for HSC50, 1 for HSC70, 1 for both.
	Which kit do you want?. Where do you want it copied to?.
	Let me know "where to stick it" !!!.

				Stefan Fiala

PS:	Your phone # in your mail header still has 240-6492.
	Elf lists your old number also.	
	
From:	GENRAL::FIALA        "Eschew Obfuscation." 17-NOV-1989 17:24:16.82
To:	KERNEL::CLARK
CC:	FIALA
Subj:	HSC's and dropping VC's.

Hi.
	Bob Brassard forwarded your memo to me.
	The problems you have are probably to do with:-
	Bad install of V390(4), KCI 2.54 an/or HSC V39A or reasons unknown.
	There is a buffer/credit starvation fixed in 254 and a
	hack to the CI wire handling in 39A.
	I beleive there are other VMS things too.
	Bad install of V390(4) can cause all these things and more...

	By and large if the HSC is reporting VC closed [info]
	and VMS 5.2 is running and Backup is involved [maybe with a large
	blocksize or /nocrc] then the phenomena you outline fits.

	Use "SHO REQ" on the HSC to check for 2.54. Fix this first.
	Check the way V390(4) was installed:-
	o	Did they get no HSC prompt after installation. [yes=Bad]
	o	Did they follow the correct install proceedure.[No=bad]
	o	How many tape formatters does "SHO TAPE" indicate.
		[MUST BE LESS THAN 24]. [>12 BAD]
	o	Suspect VMS credit starvation for HSC disk/tapes. [Difficult]
	o	Devices on that HSC "run slow". [Difficult]
	o	If in any doubt [reinstall V390(4):- [Easy]
		o	SHOW ALL
		o	Check disks wil failover.
		o	<online> out.
		o	Press <init> hold in <fault>
		o	"Inipio-Booting..."
		o	Let go of <fault>.
		o	Use SETSHO to reset any paramaters from SHO ALL.
		o	Reboot as necessary.
		o	<online> in.

	I generated a Blitz about badly installing V390(4) some time ago.
	[For VC closes you get a message "VC closed by request from KCI"
	but no reason... No retndat/cnf timeout, for instance.]

	I couldnt tell the whole story... But if in doubt re-initialise it.

	Submit a Prism/Cld for a pre-release of HSC V39A.
	V39A wont exit SDC or SSB [new name?] till mid January [in the USA].
	
	Do me a favour and spread the word amongst the CSC folks about
	the VC closures being "hidden" by the HSC error level.
	And the bad-install of V390(4) issue.

	If you have funnies like HSC goes offline/tapes mysteriously rewind/
	shadowcopies start spontaneously/drive drop offline/etc.
	SET ERROR INFO immediately...

				Stefan Fiala
			 CX CSSE Product Support.