[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6592.0. "HSD10 Drive Number Issue" by DABEAN::REED () Thu Apr 17 1997 08:23

    This is a VERY BIG Conference. I have tried a few search commands but
    they take so long that I have decided to become part of the problem and
    write a new entry. Moderator, please move or rename if you can find a
    better place for my query.
    
    I have encountered a device numbering problem with the following (new)
    configuration:
    	SYSTEM: VAX 4000 Model 108 (49DCM)
    	CONTROLLER: HSD10-EB, a "skinless" variant unique to this type 
    system, resident within the system box, and connected to the system bus
    via a KFDDA DSSI adapter.
    	STORAGE: one RZ28D-E inside the system box connected to a daisy-
    chain ribbon cable which starts at the HSD10 and ends in an external
    SCSI connection (cabinet kit). BN21H-01 50-pin HD cable to a BA35X-MG
    8-bit personality card in a BA356. The BA356 has drives in slots
    1 through 4, 0 and 5 are empty, and there is a BA35X-HF 2.8A pwr supp
    in slot 6 and a BA35X-HA 2.0A P/S in slot 7. Drives are at F/W 0010.
    The revision of the flex strips is NOT visible, unlike the two spares
    I obtained which show a "c02" in the etch. All devices are seen by the
    system as DSSI drives; There are no devices on the "real" DSSI except a
    terminator (lit up) on the external port of the KFDDA. The HSD10 has
    been set up to pass $2$DIA0: thru $2$DIA4: to Open VMS (v6.2). Each
    Unit consists of a single Disk whose SCSI target corresponds to the 
    VMS device number for sanity's sake. The HSD10 is happy with this setup
    and will format, qualify, and test any combination of drives I have
    tried. 
    	PROBLEM: (and it took a while to define it) Whenever EVEN-NUMBERED
    shelves in the BA356 are used, bad stuff happens. The exact nature of
    the bad stuff varies with the activity attempted. The simplest case, a
    MOUNT/FOREIGN command, ultimately works most of the time but seems
    slow. Sometimes the slowness turns into a hang. Control-C'ing out of
    the MOUNT usually works, returning a message that the mount (with the
    correct OR sometimes a null label) has completed. Often the hang
    persists. I lose patience after 5 minutes or so and halt the machine.
    Any attempts to write data, and some mounts or reads, cause a timeout
    and errorlog entries against the system drive and usually the target
    drive also. This is true no matter which drive has been booted or is
    being accessed. DATA GETS CORRUPTED. "Invalid Alternate Home Block" , 
    "I/O Error in Storage Bitmap", and steady streams of
    "Mount-W-Bitmaperr" on the console have been observed when slots 2, 4,
    and/or 6 are occupied. 
    	WHAT WORKS: ANY combination of odd-numbered drives, with or without
    the "hard drive" (0) inside the cabinet. I used a DIA0-1-3-5 combination
    when I needed to create and verify Files-11 volumes for testing and to 
    date this configuration has generated no errors or hangs. I am using
    BACKUP/IMAGE/VERIFY to slop data back and forth among the drives. Early
    in the troubleshoot I built Standalone on one of the drives and used it
    to pull a clean image copy off DIA0:, which is now on all the disks.
    On Day 1, Remote Support had me "float" one drive through Slots 1-6 of
    the BA356 and a quick check test indicated that this minimum
    configuration will work even in slots 2-4-6. I have not re-tried this
    experiment since then, however. Maybe today.
    	WHAT HAS NO EFFECT ON THE SYMPTOM: 
    	-Different drives. I ordered 2 RZ28D-VA's and an RZ28D-E from stock
    and tried them. No change. After I had a bootable system built on each
    of the 5 disks, I removed DIA0 from the configuration by pulling the
    internal ribbon cable off it and disconnecting its power. No change.
    	-Different power supplies. I have had 2 different BA35X-HF's from
    logistics and have tried all combinations of 1-2 pwr supps. (That -HA
    variant came with the hardware as delivered; I never tried that one
    by itself due to Tech Tip warnings about 7200 RPM drives etc.)
    	-Different Personality cards and cables
    	-Different BA Boxes. We had both a BA356 tiebreaker and a BA350,
    which eliminates the personality board entirely; other folks checked
    my sanity on the respective terminators and jumpers. (Both boxes were
    removed from actual service in single narrow SCSI bus
    environments, anyhow.)
    	-We only have OPEN VMS v6.2 (dash nothing) available, but Software
    Support says the System, HSD, and KFDDA all are supported.
    
    This is a new install, completed by another engineer. I have sent him
    mail asking for any info/ early symptoms he may have noted. The
    customer is OK so far, but I know he had an application in mind or he
    wouldn't have bought it.
    
    [In case the 4000-108 is as new to you as it was to me, it looks like a
    mini-tower PC. It has a system board and pwr supp that look like PC
    pieces, and a SIMM carrier for memory, and a CPU card, and the
    controllers mentioned above plug into the backplane like EISA options.
    In my case, the system is on its left side on a rack-mount shelf with
    the BA356 rack-mounted above it. One non-PC feature is an M/F pair of 3-row
    "D" connectors which extend the Q-Bus to a B400 cabinet on the floor;
    the only option in this cabinet is a Q-Bus FDDI card.]
    
    ***What I really want is an HSD10-EB to troubleshoot with. I have had
    an URGENT order in for one, plus a ribbon cable and a DSSI card just in
    case, for nearly a week. I am entering this note in case it is a known
    problem or in case I missed something in the troubleshoot. ***
    
    Chris Reed   MCS E. Prov., RI     (PVO)     DTN: 322-4126  
                                                                 
T.RTitleUserPersonal
Name
DateLines
6592.1Check if you used JA1.SSDEVO::KOWALLStorageWorks Engineering SupportThu Apr 17 1997 17:0813
        
        Oh Boy a new product.

        What the problem looks like to me is that you have plug the BN21H-01
into JB1 on the personality module instead of JA1 and therefor have not
terminated the beginning of the bus. Adding drives 0 2 4 6 is at the the second
half of the bus. The more drives you add the worse the termination problem.
Probably why 1 drive works.

        If thats not it I have another idea for later.

        Regards
                        John Kowall
6592.2Also try SSDEVO::HSD05_PRODUCTSEDSWS::SIAREYFri Apr 18 1997 03:47168
    Chris,
    
    Probably not relevent to your current issues, but just incase you have
    not seen it here is a blitz about the VAX4000-108 and DSSI node ids.
    
    Also try SSDEVO::hsd05_product for HSD10 topics.
    
    Regards,
    Colin S.
    
    =========================================================================
    
Copyright (c) Digital Equipment Corporation 1997. All rights reserved.

+---------------------------+TM
|    |   |   |   |   |   |   |
|  d | i | g | i | t | a | l |           TIME   DEPENDENT   BLITZ
|    |   |   |   |   |   |   |
+---------------------------+


   BLITZ TITLE: Firmware Update for Vax 4000-108



   PRIORITY LEVEL:

   DATE:March 26,1997
   TD #: 2269

   AUTHOR:Heather Kane
   DTN:223-4712
   EMAIL:[email protected]
   DEPARTMENT:RSE

   =================================================================

   PRODUCT NAMES:Vax 4000 108

   PRODUCT FAMILY:

   Storage         ___
   Systems/OS      _X_
   Networks        ___
   PC/Peripherals  ___
   Software Apps.  ___


   BLITZ TYPE:

   Maintenance Tip           TIMA::INFO_X_
   Service Action Requested  ___


   IF SERVICE ACTION IS REQUESTED: (Check all that apply.)

   Labor Support Required     _X_
   Material Support Required  ___


   Estimated time to complete activity (in hours):
   Will this require a change in the field's inventory:  Yes __X_  No ___
   Will an FCO be associated with this advisory?  Yes ___  No _X_


   DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):

	Firmware needs to be updated from V1.0 to V2.0 on VAX 4000-108

    *******************************************************************


   PROBLEM STATEMENT:

	When trying to install VAX 4000-108's in a DSSI cluster with other
	Vax4000-108's or any other clusterable sytem the cluster may not
	configure properly and may cause hangs when a second system is booted.

   SYMPTOM:

	No matter what DSSI ID is set at console, when OVMS boots the ID is
	always 7 so there is a conflict.

   SOLUTION:	

	This is caused by OVMS not using the ID configured by the console with
	the SET_DSSI ID command. This is caused by a legacy code issue,
	whereby OVMS will only use the DSSI ID value if Console firmware
	Version is above 2.X.	    		

	The firmware needs to be updated to V2.0.  The version is available
	by copying from :

  	may21::WRK:[MOPLOAD]kacat_v20_1.sys

   UPDATING PROCESS:

	The new firmware file needs to be copied to a mop$load area and then
	perform the update.  Most systems have the firmware enable jumper
	(W3) on the CPU modules installed, which allows Firmware updates.  If
	there is any problem with updating refer to the Vax4000-108 On-line
	Service Guide as a reference.

***** On Server System *****

$ MCR NCP

NCP>SET CIRCUIT ISA-0 STATE OFF
NCP>SET CIRCUIT ISA-0 SERVICE ENABLED
NCP>SET CIRCUIT ISA-0 STATE ON
NCP>EXIT

$
$ COPY kacat_v20_1.sys MOM$LOAD:*.*
$
***** On Client System *****
>>>b/100 eza0

 (BOOT/R5:100 EZA0)

  2..

Bootfile: kacat_v20_1

-EZA0

 1..0..

FEPROM update program
                        ---CAUTION---
--- Executing this program will change your current FEPROM ---

Do you want to continue [Y/N] ? : y

Blasting in V2.0-1.   The program will take at most several minutes.

DO NOT ATTEMPT TO INTERRUPT PROGRAM EXECUTION

Doing so may result in loss of operable state !!!
+----------------------------------------+

10...9...8...7...6...5...4...3...2...1...0

FEPROM Programming successful
?06 HLT INST
        PC = 00008E24

>>>

cycle power


   VERIFICATION:

	Set DSSI ID on multi-node cluster as required and verify there are
	no conflicts.


   LARS INFORMATION: (Supplied by MCS)

       Attention Service Personnel: Begin the comment field of your LARS
       with the word "BLITZ" when you perform an activity associated with a
       BLITZ Type "Service Action Requested".



                     *** DIGITAL INTERNAL USE ONLY ***
    
6592.3Parts on the Horizon!DABEAN::REEDFri Apr 18 1997 07:0120
    Thanks, but I'm still looking. Re: .1, sorry, John, I'm in "JA1" thanks
    to the notes I saved from the LEX Storageworks clinic (the "Mike and
    Mike Show".) Also, you'll remember I tried a whole different cable and
    a BA350 with no change in the symptoms. I agree that it acts like a
    termination problem, and that's one of the reasons I'm so anxious to
    get hold of some spares. (I'm thinking of the other end of the bus.)
    
    Reply .2 also duly noted; this is a single-host system with no DSSI
    nodes beyond 7 and 0 since all the peripherals are SCSI. The HSD10
    hands the disks over to the O/S as "DIA" devices (type HSXX) but really
    they are all SCSI, and all on the same bus including the
    non-Storageworks internal RZ28D-E which even the Userguide calls the
    "hard drive". 
    
    Joni in Logistics tells me she has found the three major parts which go
    into the HSD10-EB, so by the time you read this it may be fixed. I
    hardly dare hope... I'm sure glad the customer isn't earning a living
    with this thing yet- those banks have NO sense of humor.
    
    Thanks to all-  CR
6592.4AgainSSDEVO::KOWALLStorageWorks Engineering SupportFri Apr 18 1997 10:3315

        Yes, this is more scsi ID rather than node ID problem.
    Heres my second thought.  This new box has three different SCSI cables.
    I'm figuring about 9 feet. The limit for the HSD10 10 feet. Because the
    three different section present a different impedance to each other add
    some more length.....  So my next step to try, if the HSD10 replacement
    does not fix this, is to lower the bus speed on the HSD10 SCSI bus from 
    10MHZ to 7.  To do this you it is done on a per drive bases but you 
    SHOULD do it for each drive on the bus; at least for this test. 

        Set UNIT (MSCP DEVICE)/sync=7

    Regards
                John Kowall     
6592.5Well, it ain't the HSD...FREEBE::REEDFri Apr 18 1997 12:1135
    Interesting! I'll try it next trip, since the 54-25703-01 HSD which
    I finally obtained this morning was no help. 
    
    I have other questions. Since I cannot obtain a 17-04440-01 internal
    drive daisy-chain cable, and since my problem is unaffected by the
    presence or absence of $2$DIA0: (d100), can I eliminate this cable?
    I didn't think of this until after I left today. If I can still see the
    external SCSI port of the HSD10 when the daisy chain cable is not
    present, it will be a simple matter to boot $2$DIA1: and test the
    array. (I have tested the system without drive zero, but never without 
    the daisy-chain cable.)
    
    Is Switch 7 on the HSD switchpack really unused? The spare came in with
    Sw 7 and 8 OPEN (up), the HSD10 User Guide shows both down, and the one
    in the system is set Sw 7 OPEN, Sw 8 CLOSED. (otherwise the same: DSSI
    node 0, SCSI node 7.)
    
    It's a drag hooking up the serial port on this machine to do the
    initial setups on a spare HSD. Spares ship with the DSSI port disabled,
    and the VAX 4000/108 assumes you'll manage the device in-band. It would
    not be such a big deal on a free-standing one, but the rack mount kit
    has some accessibility shortcomings. (Not to mention that when the
    system is on its side there is a subtle pressure on the front panel
    bezel which makes the on/off switch inop or intermittent. I have not
    addressed this problem yet; I just push down hard on the left upper
    corner of the case and click the switch.)
    
    I've escalated this through CAPE. Help is on the way through official 
    channels, but don't be shy if you have suggestions.
    
    OH: John, regarding your suggestion in .4-- the cable I used is only a
    meter long (BN21H-01), so I don't think that's the problem, although I
    will try slowing down the bus on my next visit.
    
    CR
6592.6SSDEVO::KOWALLStorageWorks Engineering SupportFri Apr 18 1997 14:5725
        I didn't think it was the HSD10 with the errors you are getting. 
    You are missing a great tool if you are not keeping a terminal tied to the
HSD10 and do a MON DSSI. Also should be showing the SCSI errors.

        Internal, external and shelf should add up to about 9 feet I thought.
you have to count EVERY thing.

        SW 7 is not used. SW 8 down is to terminate the SCSI bus when the HSD
is at one end of the bus. Not used when its the second HSD in a BA35x for duel
redundant configs. HSD10,s can be on the same or different DSSI bus.

        If you can remove the internal cable for a test to shorten the total
length that would be a good idea. Really depends on being able to hook up the
rest.

        BTW how are they terminating the DSSI bus at the internal HSD10?  Does
the bus come to the bulkhead to terminate?

        HSD10's come with the DSSI port turned off so you can set the
configuration first before turning it on. Good thing to do if the new HSD has
some conflicting cluster information in it. 

        Cheers
                John
6592.7Begins to Look Like a CableDABEAN::REEDFri Apr 18 1997 15:5027
    re: -.1, yes, there is a lit up (=terminator power OK) terminator on the 
    bulkhead of the 54-24705-01 KFDDA-BB CDAL-to-DSSI adaptor. On the
    system end I'm not sure what they use. I get the impression from what
    I've read that it is bounded, I.E. permanently terminated, on the
    system end. (I could be wrong. I've read a lot, rather quickly.)
    
    I saw a second VT320 on the floor at the site. I'll hook it up on
    Monday and do the MON DSSI as John suggested. Logistics found me a
    drive cable in the pipeline (two, actually- a 17-04440-01 as in the
    machine now and a 17-04442-01 which has the same verbal description
    in QRL. If it is shorter, I'll use it.)
    
    The impression I get from the folks at Berkshire Computer (VAR #1; WYLE
    is VAR #2) is that the SCSI bus goes from the HSD daughter card via the 
    single inter-card ribbon cable to one of the Berg connectors on the HSD
    mother card, then out to the system-box drives via the 17-04440-01,
    back into the HSD mother board, and then finally off the end of the 
    mother board to the 50-pin HD external connector. In other words,
    whether or not there are external drives, the HSD is always at one end
    of the SCSI bus. No need to worry about double termination, or the HSD
    "seeing" itself at the middle of the bus if there are external drives.
    Is this correct? In a single-box configuration, a SCSI terminator would
    be required on the external port of the HSD, correct?
    
    Stay tuned- more on Monday.
    
    CR
6592.8SSDEVO::KOWALLStorageWorks Engineering SupportMon Apr 21 1997 11:179
        It was hard to follow the cable connections. :-)  But the bottom line
is you must terminate only both ends of the bus. If the HSD10 was in the
middle of the bus and the bus was terminated at both ends then you must put SW
8 up to disable termination.

        Later

                jk
6592.9My Kingdom for a 17-04440-01FREEBE::REEDTue Apr 22 1997 07:2139
    Since Logistics LOST what was apparently the only 17-04440-01 internal
    drive cable in the entire world, I went onsite last night to re-try
    some of my earlier troubleshooting, mostly as a sanity check. I tried
    disconnecting both ends of the internal drive cable from the HSD10 I/F
    card, to see if I could work around this possibly bad ribbon cable. The
    HSD10 was able to AUTOCONFIGURE/LOG with only the external SCSI in
    place and the Termination switch DOWN (factory default). However,
    booting from the external drives was impossible. With the cable OFF at
    both ends and Switch 8 UP, the disk subsystem would not even configure
    correctly; the HSD saw phantom drives at targets 0 and 6, for example.
    I then re-connected the internal drive cable and verified that I had
    not corrupted or broken anything. Next I re-tried the full
    configuration with Switch 8 UP: no go. Same with drives 0-1-3-5 only
    and Switch 8 UP. The switch seems to have to be in the position shown
    in the user guide (DOWN) to get anywhere at all. This is good,
    actually, since it indicates that the documentation is correct. I also
    tried another station on the internal drive cable for the internal
    DIA0: no change. Likewise for removing internal Drive 0 from the
    configuration entirely and booting from DIA1. It seems I can reproduce
    my earlier experiments.
    
    All other considerations aside, I'm not too crazy about the rack mount
    kit we developed for this system. It consists of a dedicated shelf on a
    pair of slides, and works great except that tolerance pileup in the
    system enclosure results in downward pressure on the front panel bezel
    when the system is on the shelf. The mechanical portion of the power
    switch is in the bezel, and no longer engages the "real" switch
    correctly when the system is fully engaged in the shelf. You have to
    apply considerable downward force on the metal portion of the cabinet
    to get the switch to sequence on and off. (The whole system case is
    kind of willowy. Every time I remove the cover it seems I have to
    reseat the memory SIMM carrier in the backplane due to flexing in the
    sheet metal enclosure. Bodes ill for the long term.)
    
    So here I am, thumb inserted, still waiting for a 17-04440-01 drive
    cable. My manager has authorized the swapping of the system, but of
    course there are none out there. More later-
    
    CR
6592.10It Wasn't the CableFREEBE::REEDFri Apr 25 1997 07:2138
    Well, Support Engineer Dave Yatkola and I spent the day on site
    refining and verifying my diagnosis. A more accurate problem
    statement would be: the system will work with the internal RZ28
    plus up to three even numbered Storageworks drives OR up to three
    odd numbered storageworks drives OR SOME combinations of even/odd
    drives. 1-2-4 is no go, but 2-3-4 and 1-3-4 seem stable. We retried
    all the hardware configurations I had done solo, and we had the
    long-awaited 17-04440-01 internal SCSI cable plus a KFDDA. I had high
    hopes for the cable and none for the KFDDA, and neither had any effect
    on the problem.
    
    Dave suggested we use the spare RZ28D-E as "DIA1" inside the cabinet
    and 2-3-4 in the BA Box; this config tested OK, giving more and more
    weight to the bus length/termination scenario. Our theory is that this
    box may not have been thoroughly tested with high-performance SCSI
    options outside the cabinet, since the planners provided for up to
    five RZ drives inside the system box.
    
    The customer is not ready to accept that his BA356 is "full" with only
    three drives in it, so we may try a further workaround: a DWZZx pair
    between the HSD10 and the BA Box to stop the bus length count as seen
    by the HSD10 at the near-side DWZZx. I have to see if this is supported
    downstream from an HSD10. I have seen warnings about bus length on the
    SCSI side of the HSD, and I imagine this new type may be similarly 
    sensitive, especially since the cable characteristics change several 
    times, from etch to ribbon to etch through connectors, then through
    more connectors to the outside world and BN21H-xx, BAxxx, etc.
    
    More later. Drawings for John Kowall as soon as I can XEROX the
    customer's User Guide.
    
    *The User Guide DOES mention external connections to the bus in
    question, but not as to type, and arrays are not mentioned
    specifically. (This is just a caption to a picture in the Options
    and Add-Ons chapter.)
    
    cr
                
6592.11my thoughts....SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Fri Apr 25 1997 09:4021
I've not been following this note so I had to go back and review a few things..

1. Don't mess with the internal KFDDA -> HSD cables or DSSI settings, you have a
back-end SCSI problem, not a DSSI problem. DSSI is packet protocol and if you
had a cable problem it ould NOT show the symptom you see.

2. If you have a 1 meter external cable, the BA356 is .9m then you better look
close at the internal shelf cable length, if it's longer than 1.1M you loose. As
John said, the even numbers are closer to the end of the bus and a
length/termination issue usually shows up as "devices on the end of the bus
having problems".

3. I see you've swapped a lot of parts and spent a lot of time and now are
talking system swaps. Have you ever thought of just escalating a case through
proper channels to the 4000 folk???? The folk in PKO are VERY good at what they
do and KNOW this system in and out and also know IF what you are doing is ever
qualified. If it ain't qualified and/or supported you are wasting a LOT of
company resources.

roger.
6592.12"Roger that" Roger!PCBUOA::WHITECParrot_TrooperMon Apr 28 1997 10:097
    
    Your thinking is pretty RADICAL Roger!!!  ;^)
    
    
    Keep up th egood work/thoughts.
    
    Chet
6592.13SSDEVO::KOWALLStorageWorks Engineering SupportMon Apr 28 1997 12:3310
        After looking at a rough drawing in this system my guess is that you
can not have both SCSI buses operating at the same time. Either Or. I think is
is a termination problem with the design, but I'd have to see a print of this
NEW mother board that allows adding an additional external SCSI bus to its
internal LOOP. 
        The 54-24703 is not up on any system for me to get the prints yet.

        Later
                John  
6592.14A (late) UpdateFREEBE::REEDMon Apr 28 1997 14:1632
    Referring to .11 from Roger, YES, we are very close to the ragged edge
    on SCSI bus length. We've known this all along, and have been poring
    over the documentation to find some evidence that this configuration is
    unsupported, but to date with no success. The 17-04440-01 8-drop SCSI Cable
    inside the system box is 2 feet, 10 inches long- too close to call. Perhaps
    the combination of cable types and different connectors within the bus is
    the problem. I always thought that the SCSI bus length restrictions
    were "worst-case", but maybe we've discovered an even WORSE case.
    
    I have been corresponding off-line with John Kowall about this problem
    and putting out other little fires as well, so my apologies for not
    updating this Notesfile. We did take the IPMT route effective Friday, so
    we now have spun off a separate issue: what the company will do to help
    prevent this configuration from zapping someone else, as opposed to
    what the local FE can do to fix the current customer. Ernie Lyford, who
    is the Technical Manager (correct title?) for the 4000-108, was the one
    who told us to start the IPMT process. (This is the first one I've been
    involved with.) He also suggested what sounds like the most elegant fix
    for the current problem: another HSD10-AA for the top shelf of the BA356
    and a DSSI cable to connect it to the KFDDA in the system cabinet. Then we'd
    terminate the SCSI bus which used to go to the BA356 Personality Card.
    The personality card would stay in the BA356, with no connections, if I
    understand the HSD10 User Guide correctly. The 8-bit pers card would
    serve to terminate the end of the new SCSI, while the active
    termination on the HSD10-AA would take care of the beginning.
    
    As an aside, the KFDDA in the system came with no terminating resistor
    SIP's, and a DSSI terminator in its external connector. Is this a
    choice, or a requirement? The spare KFDDA I obtained had three terminating
    resistors installed, and I was wondering when they might be used.
    
    CR
6592.15Good boy Chris.SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Mon Apr 28 1997 16:015
Now your on the right track. Ernie is the man for that system. I did not want to
point him out directly, but as you did,,,,,


6592.16Separate HSD for BA356 = OK!DABEAN::REEDTue Apr 29 1997 17:3250
    REFERENCE 6592.14: it seems that the terminators on the KFDDA module are 
    necessary, and the system should have been shipped that way. I'll
    confirm tomorrow, but I'm seeing DSSI bus inits when I run the "MONITOR
    DSSI" utility from the serial port of the HSD10. They occur from 30 sec
    to 2 minutes apart while a backup is running. I hadn't noticed it
    before, because system error logger/ SHO ERR are both quiet. 
    
    Other info gathered today: I installed an HSD10-AA in slot 0 of the
    BA356 and cabled it up to the DSSI port on the back of the KFDDA. Then
    I terminated the external (SCSI) port on the internal HSD10-EB and
    created some disks and mapped them to "DIA2" through "DIA5". I deleted
    the old same-numbered units on the HSD10-EB. The new
    HSD10 was renamed "HSD11" and switches were set for DSSI Node 1. SET
    PORT 0 /ENABLE and away we go. Remember, 2 drives are internal and 
    connected to the old HSD10-EB, while I had 4 in the Stgwks box now
    connected via the external port of the existing KFDDA. As mentioned
    above, the KFDDA was unterminated because it came that way. I also had
    a DSSI terminator on the unused leg of the HSD10-AA's tri-link. Everything
    worked fine! 12 minutes per BACKUP/IMAGE, with no difference between
    internal-to-internal, internal-to-external, or all external. So there
    is a viable configuration to give/sell to the customer, expandable both
    within and outside the system box. HOWEVER: since I did not have a
    narrow terminator for the backplane of the BA356, I had to use the
    8-Bit Personality Card with no cable connections, and in this
    configuration I got fault lights (but no other symptoms) on all four
    external drives. This is with both SHELF_OK jumpers OFF. I could make
    the fault lights go out by installing a SCSI terminator on JA1 (the
    input side) of the Personality Card. Whether it still functioned this
    way I don't know, because I needed the terminator on the external port
    of the HSD10-EB for any tests under VMS. Maybe tomorrow.
    
    The HSD MONITOR DSSI Utility shows some peculiarities. It put up
    statistics on DSSI nodes which were not configured, and vice versa.
    I saw numbers on Node 6 when I had Nodes 0,1, and 7 on the bus- maybe
    this has something to do with the spare HSD10-AA having firmware rev
    B259 versus the current B475 as in the HSD10-EB. After coming back to
    the office, I conducted some tests with a VT terminal and an empty
    BA350 just to make sure the HSD switch pack settings were correctly
    documented. They were, at least according to the User Guide packed with
    the spare HSD10. 
    
    I'll go through the Web tonight and see if I can find the source for
    the B475 firmware. I want to update the spare anyway,since I've never
    done one. We found B259 all over the place this afternoon but couldn't
    find the pointer to B475.
    
    More science tomorrow if my pager stays quiet. Dave Yatkola is busy
    elsewhere.
                                                               
    However, 
6592.17And more to comeSSDEVO::KOWALLStorageWorks Engineering SupportWed Apr 30 1997 17:5383
                On going Saga
****
You did not say who is issuing the bus resets. If it is the host then that
number of resets during backup is OK. You should not be seeing any resets
issued by the HSD10 as normal.
******

                
    REFERENCE 6592.14: it seems that the terminators on the KFDDA module are 
    necessary, and the system should have been shipped that way. I'll
    confirm tomorrow, but I'm seeing DSSI bus inits when I run the "MONITOR
    DSSI" utility from the serial port of the HSD10. They occur from 30 sec
    to 2 minutes apart while a backup is running. I hadn't noticed it
    before, because system error logger/ SHO ERR are both quiet. 
    
*******
The DSSI terminator on the unused leg of the HSD10-AA's tri-link is correct.
This is the end of the DSSI bus.

I have a note in the HSD05/10 notes file that explains the HSD10-AA fault
lights. Basically the fault light on a HSD10 are meaningless. And the customer
needs to buy the terminator if he can't stand to see them lit.

By putting the terminator in JA1 you have double terminated the SCSI bus
input. Not recommended. The correct terminator goes at the end  of the bus JB1
to short the leds to ground.  
*****************
    Other info gathered today: I installed an HSD10-AA in slot 0 of the
    BA356 and cabled it up to the DSSI port on the back of the KFDDA. Then
    I terminated the external (SCSI) port on the internal HSD10-EB and
    created some disks and mapped them to "DIA2" through "DIA5". I deleted
    the old same-numbered units on the HSD10-EB. The new
    HSD10 was renamed "HSD11" and switches were set for DSSI Node 1. SET
    PORT 0 /ENABLE and away we go. Remember, 2 drives are internal and 
    connected to the old HSD10-EB, while I had 4 in the Stgwks box now
    connected via the external port of the existing KFDDA. As mentioned
    above, the KFDDA was unterminated because it came that way. I also had
    a DSSI terminator on the unused leg of the HSD10-AA's tri-link. Everything
    worked fine! 12 minutes per BACKUP/IMAGE, with no difference between
    internal-to-internal, internal-to-external, or all external. So there
    is a viable configuration to give/sell to the customer, expandable both
    within and outside the system box. HOWEVER: since I did not have a
    narrow terminator for the backplane of the BA356, I had to use the
    8-Bit Personality Card with no cable connections, and in this
    configuration I got fault lights (but no other symptoms) on all four
    external drives. This is with both SHELF_OK jumpers OFF. I could make
    the fault lights go out by installing a SCSI terminator on JA1 (the
    input side) of the Personality Card. Whether it still functioned this
    way I don't know, because I needed the terminator on the external port
    of the HSD10-EB for any tests under VMS. Maybe tomorrow.
***************
No HSD10s do not see information from another strange host just because of the
two different software revs.  The software revs only have to be the same if
you are running duel redundant.  I would suspect a bus problem if the HSD10 is
seeing data from a port that does not exist.
*************** 
    
    The HSD MONITOR DSSI Utility shows some peculiarities. It put up
    statistics on DSSI nodes which were not configured, and vice versa.
    I saw numbers on Node 6 when I had Nodes 0,1, and 7 on the bus- maybe
    this has something to do with the spare HSD10-AA having firmware rev
    B259 versus the current B475 as in the HSD10-EB. After coming back to
    the office, I conducted some tests with a VT terminal and an empty
    BA350 just to make sure the HSD switch pack settings were correctly
    documented. They were, at least according to the User Guide packed with
    the spare HSD10. 
*************
Upgrade software for customers who want to upgrade because they would like to
use RAID 1 must ordered the upgrade from SSB. Its free.  For this site you
don't really need the upgrade for what you are doing. Don't waste you time
upgrading for the troubleshooting you are now doing. the two HSD will work
fine together.
                Regards

                                John Kowall

************   
    
    I'll go through the Web tonight and see if I can find the source for
    the B475 firmware. I want to update the spare anyway,since I've never
    done one. We found B259 all over the place this afternoon but couldn't
    find the pointer to B475.
    
6592.18More Answers, Fewer QuestionsDABEAN::REEDThu May 01 1997 07:2946
    With reference to the last few replies:
    
    __ I learned yesterday that by adding Single-Inline-Package terminator
    resistors to the KFDDA I made the HSD-initiated DSSI Bus Resets go
    away. The 60 or 70 per hour HOST-initiated resets were worrying me a
    little until John's previous reply.
    
    __ The continuous fault lights are on the DRIVES in the BA356. The
    HSD's show normal on/blink green LED's. My previous statement that
    adding a SCSI terminator to JA1 of the 8-bit Pers Card was a lie. The
    lights went out for me once but I haven't been able to duplicate the
    condition. (Perhaps if I had a proper 8-Bit backplane terminator for
    the BA356... our "shop dog" BA356 uses an 8-bit personality card for
    I/O, so it has no terminator either.) I have tried all combinations of
    SHELF_OK jumpers at the 4-pin field near Slot 1 on the backplane. How
    do these work with/against the jumpers on the cabinet jumper module at
    Slot 5? I forgot to check them when I was there.
    
    __ MONITOR DSSI on the HSD10s' serial ports still shows a flurry of
    activity on Node 6 (successful TX's and RX's) during bootup. At the
    halt prompt, the VAX shows DSSI Nodes at *6*, 0, and 1. By zeroing the
    counters on the HSD10 MONITOR Utility, I was able to determine that
    there was never any activity on the phony "Node 6" after bootup. I
    suspect that the SYSTEM console firmware, which is NOT up to rev, is
    messing with the imbedded DSSI controller's Node number during init
    time. I am no longer worried about this, having verified that both
    HSD10 Monitors agree with each other (interestingly, they don't see
    their own TX's and RX's but the numbers match up with the counts on the
    other HSD. Each node's "perception" of system DSSI adapter activity is
    with reference to itself, not global. This is my inference.)
    
    All further concerns for this customer are administrative: i.e., who
    buys what additional hardware for him to make a stable system with the
    growth potential he needs.
    
    If only I can make those Drive fault lights go out... I'll get a narrow
    terminator for the BA356 and await the decisions from the IPMT process
    and various customer reps and managers before I go back there to do the 
    final visit. (you STILL have to pull the system box out on its shelf
    and squash the enclosure with one hand while pressing the ON/OFF switch
    to get it to cycle, and the Memory SIMM carrier STILL works out of its
    seat every time you take the top/right side cover off. I'm not too
    crazy about this system enclosure.)
    
    Thanks to all who supplied info on this matter! I'll do one final
    wrap-up on it when it is resolved administratively.
6592.19LEDS :-(SSDEVO::KOWALLStorageWorks Engineering SupportThu May 01 1997 09:3112
Chris....
        You missed what I said about the fault lights yesterday completely.
    Engineering knowns the fault lights are intermittent with the HSD10-AA
    There is no control over the LED's.  Putting a terminator on the rear of
    the BA356 will not short the fault but to ground. Putting the correct
    terminator at JB1 as explained in a note in the HSD05/10 notes file will
    short the false signal to ground that is generated by the powering up of
    the shelf.   Once again forget the fault leds and fix what you can to fix.

    Regards
                John

6592.20HOUNDD::BASSETTBillThu May 01 1997 12:2114
    re: last few....
    
    HUH? - putting an 8-bit teminator (BA35X-MB) on the back of your BA356
    won't do ANYTHING to the fault bus lines that the BA35X-MG personality
    card isn't doing.  It will ground your upper 8 data lines and parity, 
    that's it.  (and if you have wide drives on the bus, you'll disable
    these drives!)  Do what John suggests to eliminate the noise.
    
    Shelf_OK is NOT linked on the backplane to Fault_Bus.  The jumper (2x2) 
    header simply lets you pick the routing of the shelf_OK signal either 
    to the personality card or slot 0 (or to both when 2 jumpers are used.)
    Fault_bus is routed directly to all slots and the personality card slot.
    
    				Bill
6592.21Use tapeSSDEVO::KOWALLStorageWorks Engineering SupportThu May 01 1997 14:468
BILL,

        You have to remove the shelf OK jumper AS Chris did so that to don't
ground the signal in the power supply and turn off its power supply OK LED.

        So JUST put some dam green tape over the Amber LED and forget it. :-)

        John
6592.22HOUNDD::BASSETTBillThu May 01 1997 17:4010
    John, 
    
     So the terminator you are using to fix the fault light is a "normal" 
    scsi terminator - as in it grounds ALL non-signal, non-control lines?  
    So since Fault_bus and Shelf_OK are not part of the SCSI-2 spec as 
    signal or control lines, they are grounded?  
    
     I take it the HSD10 does not support Shelf_OK?
    
    				Bill
6592.23SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Fri May 02 1997 00:467
    
    Sounds like a good use for the grey part number sstickers on the front
    bezel. Strip them off the left and stick them over the right and over
    the fault light. ;^)
    
    roger.
    
6592.24Lights OUTDABEAN::REEDFri May 02 1997 07:387
    With a pointer from John Kowall, I found the answer to this last
    remaining problem with the "elegant" solution for my customer. 
    SSDEVO::HSD05_PRODUCT Note 201.x describes my problem and its solution.
    I am ordering a 12-44025-01 8-bit terminator to put on JB1 of the
    Narrow Personality Module. Now I have enough stuff in my loan file to
    cover ANY possible solution agreed upon by CUS/VAR/DEC managers.
    CR
6592.25Stock=0DABEAN::REEDFri May 02 1997 07:422
    "None out there" per local Logistics Coordinator; went ahead
    and ordered it anyway. Hope it's not a Virtual Part.
6592.26Waiting for partsSSDEVO::KOWALLStorageWorks Engineering SupportFri May 02 1997 11:3011
                                  -< Stock=0 >-

        Yes are wonderful logistics dept will not stock or save anything that
does not have a crisp turnover of inventory. I don't know how you guys do it!

        Many times they go to the vendor and buy "one" after you order it. 

        
        Oh, I couldn't resist

                                jk