[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5220.0. "Question about Host Clear" by TKTVFS::AEBA (_Mamoru Aeba MCS/CHIBA/JAPAN) Wed Jan 29 1997 06:57

 Hello.

 I have a question about host clear.

 I think the host clear sequences are

	1. HOST(VAX,Alpha) did not talk to same HSC by CI BUS.
	2. Time out occured
	3. Still HOST not talk to the HSC , then HOST resets the HSC

 How many times (or How long) HOST wait for the HSC , then resets the HSC?
 How about time out value?

 Regards.

 Mamoru /MCS Japan

T.RTitleUserPersonal
Name
DateLines
5220.1check the errorlogVMSSG::JENKINSKevin M Jenkins VMS Support EngineeringWed Jan 29 1997 10:4540
    
    Well that is sort of a tough question... I'll make some assumptions.

    We'll assume the Host Clear is a result of the DUDRIVER timing out
    an IO...

    Check the errorlog for MSCP Command Timeouts

    What happens is that there is a timeout interval specified by the
    controller... for an HSJ it's 200 seconds. There is a timer routine
    that wakes up every 200 seconds and checks the CDDB for the HSJ if
    it finds there is an IO outstanding it issues a GET COMMAND status
    to the controller it expects the controller to respond with a command
    status indicating it's making progress... if it does then the timer
    routine is happy. If it doesn't the driver will issue a host clear..
    Now if there were no IO on the CDDB we issue a Get Unit Status to
    unit zero... all we want to do is let the controller know we still
    exist..

    When the timer routine is doing this it sets a flag called IMPEND
    if when it starts it finds this flag is still set we will issue
    a host clear and log an MSCP Immediate Mode command timeout... this
    is an indication that the controller has simply stopped talking to us.


    Now if you get the latest DUDRIVER kit for your VMS version and
    CPU type.. VAX of ALPHA. The DUDRIVER has enhanced errorloging
    for the COMMAND Timeout where we log the actual command that
    has timed out, this can often be helpful in figuring out what
    is going on...

    Third.. if you already have the latest code... it's timer code
    is designed to recognize an HSJ/HSD that is lying about making
    progress on a command and shoot it anyway... this used to result
    in a hang.

    So you need to check the version of you DUDRIVER and check your
    errorlog for evidence of why the host clear was  issued.

    Kevin
5220.2ThanksKEIKI::WHITEMIN(2�,FWIW)Wed Jan 29 1997 13:429
    
    	Kevin,
    
    	How many versions of OpenVMS did they retrofit these changes
    for DUDRIVER into, Also I have seen alot of host clears from Tape
    drivers to HSC's. Are the timeouts the same or am I mistaken, Do
    we even issue host clears from the magetape drivers to HSJ/HSC's?
    
    					Bill
5220.3TKTVFS::AEBA_Mamoru Aeba MCS/CHIBA/JAPANThu Jan 30 1997 02:4617
  Hello,Kevin
  Thanks for your comment.

  I understood your comment below. Correct?

	1. DUDRIVER's timer routine wake up every 200 second(for HSJ's).
	2. Check the CDDB for HSJ.
	3. If outstanding command is available,then it issues "GET COMMAND
	   STATUS"
	4. If HSJ not respond, DUDRIVER set the IMPEND flag.
	5. Next timer routine wake up again,then if IMPEND flag still set,
	   it issues controller reset.
	       
  Thanks.

  /Mamoru
  
5220.4VMSSG::JENKINSKevin M Jenkins VMS Support EngineeringFri Jan 31 1997 07:4934
    
    .1
    	We backported the DUDRIVER changes through V5.5-2
    
    The TUDRIVER has pretty much the same timer code but we didn't
    do anything with it.
    
    .2
    

<	1. DUDRIVER's timer routine wake up every 200 second(for HSJ's).

    	correct the actual timer value is controller specific.
    
<	2. Check the CDDB for HSJ.
<	3. If outstanding command is available,then it issues "GET COMMAND
<	   STATUS"

    	correct
    
<	4. If HSJ not respond, DUDRIVER set the IMPEND flag.
    
	sort of... while the GET COMMAND STATUS is outstanding the IMPEND
    	bit remains set... so if 200 seconds later it is still set it
    	is assumed that the controller is hung and we shoot it. We also
    	shoot it if we don't like the command progress status that is
    	returned for the GET COMMAND STATUS
    
>	5. Next timer routine wake up again,then if IMPEND flag still set,
>	   it issues controller reset.
 
 	Right.
    
Kevin    	
5220.5Thanks.TKTVFS::AEBA_Mamoru Aeba MCS/CHIBA/JAPANSun Feb 02 1997 03:060