[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5202.0. "cluster transition .vs. optical jukebox" by TAPE::SENEKER (Head banging causes brain mush) Thu Jan 09 1997 10:15

T.RTitleUserPersonal
Name
DateLines
5202.1MVerify strikes againhelena.zko.dec.com::EVERHARTThu Jan 09 1997 13:568
5202.2can you have your cake and it it too...TAPE::SENEKERHead banging causes brain mushThu Jan 09 1997 17:4912
5202.3UTRTSC::jvdbu.uto.dec.com::JurVanDerBurgChange mode to Panic!Fri Jan 10 1997 01:3614
5202.4Here's another hint....STAR::CROLLFri Jan 10 1997 10:2714
5202.5looking for "less obvious" stuffTAPE::SENEKERHead banging causes brain mushFri Jan 10 1997 11:4537
5202.6UTRTSC::thecat.uto.dec.com::JurVanDerBurgChange mode to Panic!Mon Jan 13 1997 02:1733
5202.7STAR::CROLLMon Jan 13 1997 11:0817
5202.8possible code changesTAPE::SENEKERHead banging causes brain mushTue Feb 04 1997 10:5445
     tstl    g^scs$gl_mscp                   ; MSCP server running?
     MOVL    G^SCS$GL_MSCP_NEWDEV,R2         ; Get address of MSCP routine

     Will both of the above SCS addresses always be valid or will
     both be zero?  Will there be a case were one has a value and
     the other doesn't?

-------------------------------------------------------------------------------
    My pseudo-driver sets the _CLU bit via a DPT_STORE and two code
    segments.  Based on the previous replies I think the following code
    segment should modified.  Please check and comment.
    
-------------------------------------------------------------------------------
     MOVB    #DT$_GENERIC_DK, -              ; Set device type to GENERIC_DK
             UCB$B_DEVTYPE(R5)               ;  to allow serving
     BISL2   #DEV$M_CLU,-                    ; Set device shareable
             UCB$L_DEVCHAR2(R5)              ;  cluster-wide
     MOVL    G^SCS$GL_MSCP_NEWDEV,R2         ; Get address of MSCP routine
     BGEQ    60$                             ; Branch if not available
     JSB     (R2)                            ; Make this unit available to
                                             ;  be served
60$: movl    ucb$l_vcb(r5),r0                ; Get VCB address
-------------------------------------------------------------------------------
	to:
-------------------------------------------------------------------------------
     MOVL    G^SCS$GL_MSCP_NEWDEV,R2         ; Get address of MSCP routine
     BGEQ    60$                             ; Branch if not available
     MOVB    #DT$_GENERIC_DK, -              ; Set device type to GENERIC_DK
             UCB$B_DEVTYPE(R5)               ;  to allow serving
     BISL2   #DEV$M_CLU,-                    ; Set device shareable
             UCB$L_DEVCHAR2(R5)              ;  cluster-wide
     JSB     (R2)                            ; Make this unit available to
                                             ;  be served
60$: movl    ucb$l_vcb(r5),r0
-------------------------------------------------------------------------------

     Also, the setting of the _CLU bit in the DPT_STORE should be removed.
     Do you agree?
-------------------------------------------------------------------------------

     Thanks for taking the time to investigate information related to
     this problem.

     Rob
5202.9UTRTSC::jvdbu.uto.dec.com::JurVanDerBurgChange mode to Panic!Wed Feb 05 1997 02:5440
>     tstl    g^scs$gl_mscp                   ; MSCP server running?
>     MOVL    G^SCS$GL_MSCP_NEWDEV,R2         ; Get address of MSCP routine
>
>     Will both of the above SCS addresses always be valid or will
>     both be zero?  Will there be a case were one has a value and
>     the other doesn't?

In the normal case they are both zero or non-zero.

>     MOVL    G^SCS$GL_MSCP_NEWDEV,R2         ; Get address of MSCP routine
>     BGEQ    60$                             ; Branch if not available
>     MOVB    #DT$_GENERIC_DK, -              ; Set device type to GENERIC_DK
>             UCB$B_DEVTYPE(R5)               ;  to allow serving
>     BISL2   #DEV$M_CLU,-                    ; Set device shareable
>             UCB$L_DEVCHAR2(R5)              ;  cluster-wide
>     JSB     (R2)                            ; Make this unit available to
>                                             ;  be served
>60$: movl    ucb$l_vcb(r5),r0

If the only object is to enable mscp-serving your devices then this should
do it:

      TSTL    G^SCS$GL_MSCP                   ; MSCP server loaded?
      BGEQ    60$                             ; Branch if not available
      BISL2   #DEV$M_CLU,-                    ; Set device shareable
              UCB$L_DEVCHAR2(R5)              ;  cluster-wide
60$:

That's all. The routine pointed at by SCS$GL_MSCP_NEWDEV only checks if
a device is mscp-served. It only returns a status, for the rest it does
nothing. The real enabling of mscp serving is done via the scs process poller
which notifies the CONFIGURE process, which then enables serving of your device.

>     Also, the setting of the _CLU bit in the DPT_STORE should be removed.
>     Do you agree?

Yes, get rid of it.

Jur.

5202.10let configure do it....STAR::CROLLWed Feb 05 1997 09:5341
SCS$GL_MSCP is the global cell used by everybody to determine if the MSCP server
is running.  It's either zero, or the base address of the base server data
structure.  It's explicitly initialized to zero by system initialization, and
set to non-zero by successful completion of the MSCP server initialization.

There are a few cases where SCS$GL_MSCP_NEWDEV is used to determine if the
server is running (PEDRIVER uses it) -- this cell is also explicitly initialized
to zero when the system is initialized and set to the address of the "is this
device served" routine during MSCP server initialization.

This routine returns a "yes, I'm serving it" or a "no, never heard of it"
indication.  It's used by PEDRIVER, to determine if it should respond to a
"Solicit Service" message from a satellite system attempting to do MOP booting.

I would do what Jur recommended in .9, except I'd make the branch a BEQL (and
the BGEQ should be BGEQU, anyway...) The CONFIGURE process periodically scans
the I/O database looking for new gizmos to serve, and when one is found, it
notifies the MSCP server and sets everything up properly.  I recommend this
'cause it'll be a bit less work for you when QIOserver comes along.

The other alternative you have is to use the SCS$DISK_MSCP_NEWDEV routine.
This is a routine in the MSCP server that's called by DUDRIVER (see
[DRIVER]DUTUSUBS\DUTU$LINK_NEW_UCB) when a new device is found and it should be
served.  (You'd think that SCS$GL_MSCP_NEWDEV and SCS$DISK_MSCP_NEWDEV did
pretty much the same thing, wouldn't you?  It's clear they were added at
different times by different people -- they do two totally different things,
and, incidently, use two completely different mechanisms for how they're
initialized and set up.  Standards are only any good when people know they
exist....)

Note that DKDRIVER doesn't call SCS$DISK_MSCP_NEWDEV.  And it's not documented,
and therefore isn't part of the standard driver interface we're all sworn to
uphold.

The SCS$DISK_MSCP_NEWDEV routine does essentially what CONFIGURE does, except
there's no wait for CONFIGURE's polling loop.  However, I'm going to take this
call out of DUDRIVER when QIOserver comes along, forcing all served device
discovery to take place in CONFIGURE, so I can apply cluster-wide configuration
rules to served devices.

John
5202.11Where do IO$_NOP's get issued?TAPE::SENEKERHead banging causes brain mushMon Feb 24 1997 19:0131
Based on this note stream's input and since most drivers didn't do any
of this, I just took all the code segments including the _CLU setting
and JSB SCS$GL_MSCP_NEWDEV out all together?

Testing shows that the customers problem is corrected but....

(What is the Alpha DRDRIVER doing, it has some _CLU and MSCP_NEWDEV stuff)?

    and
    
It appears that there is a "undesirable feature" in VMS when MSCP_LOAD
is set to 1.

It appears that "available to cluster via MSCP Server" disk devices
(on Alpha's) are sent IO$_NOP requests when a node joins the cluster.
While this extra I/O operations has very little impact to overall
performance, for a magnetic disk, it has a large impact to the performance
of a OSMS JB type device.  That is, it forces every mounted volume to be
placed in a physical drive just to service the IO$_NOP request.  This means
no useful work gets done but some large jukeboxes swap platters for hours.

I can code around this but, is there a way to prevent these I/O requests
from be generated?

Does a IO$_NOP ring a bell with anybody in this type of situation? 

Also remember, the customer said the jukebox did not have this problem when
it was connected to the VAX.  Could some piece of software be ohhhh soooooo
different on Alpha and be queuing these IO$_NOP's?

Rob
5202.12DRDRIVER sets _CLU and calls SCS$DISK_MSCP_NEWDEVSTAR::CROLLTue Feb 25 1997 10:2410
I just checked the V7.1 listings for DRDRIVER, and it sets the _CLU bit and
calls SCS$DISK_MSCP_NEWDEV in two different places, both in situations where
a new unit is being set up.

I haven't any idea about IO$_NOP, but I'll poke around some.  Are you sure it's
IO$_NOP and not IO$_PACKACK?  I'm pretty sure PACKACKs are sent by DUDRIVER when
it discovers a new unit, regardless of the state of that unit -- it's part
getting the unit set up.  I don't know about IO$_NOP.  But, I'll poke around...

John
5202.13STAR::CROLLTue Feb 25 1997 10:4923
Here's a bit more info about IO$_NOP.

DUDRIVER turns IO$_NOP requests into MSCP Set Unit Characteristics commands. 
This probably isn't relevant to your situation.

This part probably is.  If the MSCP server has a unit ONLINE to a host, and some
other host tries to put the unit online (as DUDRIVER in a new node booting into
the cluster and discovering units will attempt to do), the MSCP server issues an
IO$_NOP to the unit.  (See the code starting at label ONLINE: in MSCP.LIS).  The
comments say that the IO$_NOP is issued to set device characteristics in the
local UCB -- DUDRIVER turns it into a Set Unit Characteristics, which is what
accomplishes this.  This is skipped for non-MSCP devices.

So, what's going on appears to be this.  The unit is ONLINE to some client. 
Another client boots, discovers the unit, issues an ONLINE.  The MSCP server
sees that the unit is ONLINE already, and issues an IO$_NOP to the unit to
ensure the unit characteristics are up to date in the server's UCB. Then the
MSCP server then copies the characteristics into the MSCP end message and
returns this to the new client.  This apparently saves a PACKACK and some
related processing.  In your case, however, it appears to force real I/O to the
device, thrashing the jukebox as a result.

John
5202.14more thanks but why is vax different?TAPE::SENEKERHead banging causes brain mushTue Feb 25 1997 12:0127
    Thanks for your investigation John.  I'll look at MSCP.LIS.
    
    Since my last reply, I have added code to stop the jukebox thrashing
    and to return success for the IO$_NOP.  Initially, all appears to be
    well, but I have only tested with two volumes so far.  
    
    I have a command file mounting 200 volumes as I enter this so I can
    test with a large number of volumes.  Put a reply in later with the
    result.
    
    If I understand this, the IO$_NOP will has value because the MSCP
    server is able to validate the unit characteristics based on the fact
    that the IO$_NOP completed successfully.
    
    FYI:  The reason that this I/O was forcing real I/O (work) to the device
    is that the JKDRIVER is designed to insure the proper volume is in a
    drive before the I/O is processed.  In this case, no real I/O exists,
    so the jukebox thrashes.  I have added some more code to deal with this
    special case along with the MVIRP IO$_PACKACK case.
    
    Any clues on why the VAX does not do the same thing? Is the MSCP code
    the same for VAX in the V6.2 timeframe?  The reason, I ask is I will
    have to generate a patch for this fix, and I need to gather as many
    facts as possible to determine if it will be a only for the Alpha code
    stream.
    
    Rob
5202.15EVMS::MORONEYUHF ComputersTue Feb 25 1997 13:384
The VAX and Alpha MSCP server code are similar but not the same.  The handling
of IO$_NOP doesn't seem to be one of the differences, however. 

-Mike
5202.16corruption avoidanceSTAR::EVERHARTTue Feb 25 1997 17:1819
    IO$_NOP gets sent to devices generally to force through any operations
    sequentially. It provides a guarantee that all other I/O has already
    taken place, so that nothing is left in the device itself. This is
    by way of ensuring that when other nodes start accessing the device,
    no old writes are "left lying around" to clobber later writes that
    may take place. If you have a way to be sure this is not going to
    happen you can of course just ignore them. I am referring to writes
    that may be in a device queue here, or somewhere along the way,
    not writes that are "at" a source node. The idea is that a new
    node sends this out to ensure that anything else from a failed node
    has already happened before any new activity starts. This is
    necessary to prevent data corruption in the odd rare case where
    a write was in progress just as a node died and the new node tries
    to write the same block, and has the block clobbered by the old
    write because something (e.g. the drive) reordered them.
    
    If you send a packack down the wire when you stick something new
    in the drive, this might not be an issue. (MY code did!)
    
5202.17current solution statusTAPE::SENEKERHead banging causes brain mushWed Feb 26 1997 16:0120
    The new code has handled 218 volumes correctly when a node was added
    to the cluster but the jukebox was idle.  No thrashing and the new
    node entered the cluster normally.
    
    Glenn, thanks for the info in .16.  I don't think it will matter for
    the JKDRIVER because it is a low performance simple FIFO queue.  But
    to be sure, I moved the IO$_NOP check to a new place to insure that
    all the pending I/O has completed.
    
    Short summary:  The removal of incorrect usage of DEV$V_CLU and the
    addition of special processing IO$_NOP requests in the JKDRIVER
    prevents two undesirable situations.  In both situations each jukebox
    pseudo volume was being placed in a jukebox drive to service I/O
    requests causing serious performance degradation for jukeboxes with
    as little as four mounted volumes.
    
    I am working at creating an ECO, my first one.  After this is created,
    I plan to add another reply with a pointer for the ECO.
    
    Rob