[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6649.0. "Library robots and bus reset" by DECWET::KOWALSKI (Official Beer Test Dummie) Mon May 05 1997 10:08

    My understanding is that our tape library robots currently have no
    defined behavior on SCSI bus reset.  With regard to Digital hardware,
    what would be the appropriate group to approach to discuss a change to
    this?  I've looked through SEP's common technical requirements for SCSI
    devices and don't see what I'm looking for (defined behavior).
    
    Thanks/Mark 
T.RTitleUserPersonal
Name
DateLines
6649.1LEFTY::CWILLIAMSCD or not CD, that's the questionMon May 05 1997 10:1614
    T. Tran owns the libraries... 
    Joe Smith is our apps engineer for integration issues.
    I seem to own global architecture issues, as of the last month or so.
    
    You can start a discussion here, in MAGTAPE, or SCSI, or contact us
    directly. Depending on what you want to see, it may or may not be
    possible, given the 3rd party origions of the libraries.
    
    Yes, it's not defined. Yes, it needs to be. It won't happen instantly,
    as the behavior seems to truly be undefined. Sigh.
    
    Chris
    
    
6649.2Wobbly stake in the groundDECWET::KOWALSKIOfficial Beer Test DummieMon May 05 1997 13:2518
    Knowing there are more people interested, my feeling is that a more
    public discussion would be appropriate to get their input.  
    
    From a requirements standpoint, I would like to see clear definition of
    what a robot does on bus or device reset.  Desirable behavior would be
    similar to that for a sequential device in terms of completion of
    operations in progress and return to a defined state, although I do not
    understand the full implications of that w.r.t. robotic instruction
    sets.
    
    For example, what are the complications of stating that on reset, the
    robot should complete the operation in progress, then return to the
    home position?  If that means that the robot ends up in home position
    with a tape in its jaws, is this a physical, logical, or operational
    problem for it?
    
    Mark
    
6649.3NABETH::alanDr. File System's Home for Wayward Inodes.Mon May 05 1997 15:1932
	For "operations in progress" do you mean in the sense of the
	SCSI-2 Medium Changer spec. or in whatever components the
	robot decomposes such commands?  From the standpoint of the
	SCSI-2 spec. something won't be left in the transport at the
	end of an operation unless the destination of the Move Medium
	was the transport.  Internally, a robot might break up a slot
	to slot move as:

		Position to X of Slot
		Position to Y of slot,
		Pick medium
		Position to X of destination
		Position to Y of destination
		Place medium

	If you allow the internal subset of commands to be interrupted
	by a device reset, then you should require that the robot be
	left in a state from which software implementing the SCSI-2
	spec. can recover.  If the particular robot allows element to
	transport moves (not all do), then leaving a tape in the transport
	can be corrected.  If the robot doesn't allow such moves and the
	reset leaves the medium in the transport, manual intervention
	is required and not desirable.

	My opinion is that the operation should complete or abort as
	though nothing had happened; complete the move or put it back
	where it came from.  The latter is hard to do on a TL820 Move
	Medium from the inport.  This might be even harder if you
	happen to abort a Initialize Element Status.  This causes a
	physical inventory of most medium changers and would requires
	saving and restoring the starting inventory to roll it back
	to the starting point.
6649.4LEFTY::CWILLIAMSCD or not CD, that's the questionTue May 06 1997 08:4326
    Another issue: If the robot gets a reset during a move from a TZ8x to a
    slot, and puts the media back where it came from, the media reloads,
    and requires a mount/dismount cycle to get the drive to eject it again.
    
    Most SW is not very good at getting media left in a picker back to a
    spot where it can be used again - another issue.
    
    Also, not all Libraries/Jukeboxes have bar code readers, to figure out
    exactly what piece of media is where, after a "strange event", such as
    reset, happens.
    
    The RW5xx Optical Libraries always undo an operation interrupted by a
    Reset. The Read Element Status after the reset is cleared will reflect
    this. Because of the first issue above, this behavior is probably not
    appropriate for the TL8xx libraries, even if it could be implemented.
    
    Deterministic operation is required for error recovery. Note that this
    does not mean that all Libraries have to do it the same way - it just
    has to be documented and deterministic. In an ideal world, they would
    all work the same, but we are already at a point where they do not.
    The control SW is going to have to deal with that, unfortunately.
    
    Good discussion so far... Thanks.
    
    CHris
    
6649.5investigate problem, propose solution, take actionTAPE::SENEKERHead banging causes brain mushTue May 06 1997 09:0342
    What does "an interpretation" of the SCSI standard imply?
    
    I would suggest that if this problem truely wants to be fixed that a
    serious effort be made to characterize the impact of resets on all
    operations, determine which operations are impacted, and produce a
    report of suggested corrections.  This report could then be reviewed
    by the manufacture(s), and software teams using the products and they
    could respond.
    
    In the optical space, I have found that the integration of the drive
    and robots into a library make the understanding of the intent of the
    SCSI spec more difficult.  To me it shows that the spec attempts to
    define how the devices should act but being created by humans it doesn't
    always describe exactly what should take place.
    
    Example, I had a problem were a optical library would complete a
    command after a reset, return sucess to the host, but then "undo"
    the operation.  For this case the reset happened after the robot
    had placed the media into the transport.  This destroyed the software
    to hardware mapping because the host was informed that the operation
    completed and media was now in slot xxx but really it was back in the
    original location.  In this case we interprated the SCSI spec as having
    two options, 1) complete the command, return success, then execute the
    reset or 2) complete the command, return failure (bus was reset status),
    execute the reset which causes the operation in progress to be undone.
    
    Either of these would allow the host software to work correctly, after
    providing traces to HP they found a hole in the firmware and decided
    that option two was the best interpratation of the spec.
    
    I was frustrated that this problem was not found by the people in
    storage that verify SCSI operations of our storage devices.  To me
    the characterization of actions after a SCSI bus reset should be
    critical to that process.
    
    I hope if effort is made to clean up this issue for tape libraries
    that the work is done in a broader general purpose robotic mechanism
    context.  If it cannot be done in that context, I hope the reasons
    various actions are requested/required are well documented and recorded
    so others in the future don't have to revisit the same problems.
    
    Rob (from the optical jukebox world)
6649.6DECWET::KOWALSKIOfficial Beer Test DummieWed May 07 1997 10:2414
    Good discussion.  I need to go back to review the SCSI-2 spec wrt
    media changers.  I'll be back in a couple weeks after some travel
    and vacation.
    
    >I would suggest that if this problem truely wants to be fixed that a
    >serious effort be made to characterize the impact of resets on all
    >operations, determine which operations are impacted, and produce a
    >report of suggested corrections.  This report could then be reviewed
    >by the manufacture(s), and software teams using the products and they
    >could respond.
    
    Good idea.  Anyone in SSAG want to form a working group?
    
    Mark
6649.7TAPE::SENEKERHead banging causes brain mushThu May 08 1997 08:183
    I'll volunteer to represent software aspects for optical libraries.
    
    Rob
6649.8Valid Element Address is a legal holding place..SUBSYS::TRANStraight <Left> Hitter..Thu May 08 1997 08:3636
    
    Like Chris stated, I'm responsible for tape library.
    
    In my view, if a transport is a legal element address then robot should
    be allow to store cartridge there in case of reset abort, same as move
    medium to transport. The recovery is to allow next command to succeed 
    without hardware failure and moving from transport to any legit element 
    is a legal operation. 
    
    Transport is not a legal element address in TL800 series or TZ8xx loader
    but it is in TL810 and TL820 series. Below is the list of DLT tape
    library series my group own.
    
		DLT Tape Library Family Table
		=============================

-----------------------------------------------------------------------------
Family		Product	Vendor 	    DLT Drive	INQUIRY   Product Geometry
		 Name	Equivalent		  PID	
-----------------------------------------------------------------------------
TL800 Series	TL891	LXB7110	    TZ89	TL800	  1 Drive/10 Slots
	(1)	TL891 	LXB7210	    TZ89	TL800	  2 Drives/10 Slots

TL810 Series	TL810	ATL 4/52    TZ87	TL810	  4 Drives/52 Slots
		TL812	ATL 4/52    TZ88	TL810	  4 Drives/52 Slots
		TL894	ATL 4/52    TZ89	TL810	  4 Drives/52 Slots

TL820 Series	TL820	ATL 2640    TZ87	TL820	  3 Drives/264 Slots
		TL822	ATL 2640    TZ88	TL820	  3 Drives/264 Slots
		TL826	ATL 6/176   TZ88	TL820	  6 Drives/176 Slots
		TL893	ATL 2640    TZ89	TL820	  3 Drives/264 Slots
		TL896	ATL 6/176   TZ89	TL820	  6 Drives/176 Slots
-----------------------------------------------------------------------------
NOTE:	(1)	TL891 + Upgrade Drive
=============================================================================
    
6649.9Should go to a slot!LEFTY::CWILLIAMSCD or not CD, that's the questionThu May 08 1997 09:0619
    As often is the case, I disagree with T on this one. 
    
    If a move is executed with a source and destination, the cartridge
    should end up in either the source or destination if a reset occurs.
    Any other behavior could cause loss of context, and require manual 
    intervention to find and put the cartridge back where it is supposed
    to be (equivalent to Reset = failure!)
    
    In the case of a DLT or other tape library, I would want the cartridge
    to always end up in the storage slot, not the drive, due to the
    previously mentioned load/unload issues. I'd be more open to options on
    the optical disk side, though most of those have no media ID bar code
    reader, so it is really easy to "lose" a piece of media if wierd things
    happen.
    
    If the Library breaks, all bets are off, of course.
    
    Chris
    
6649.10no special jukebox recovery needed by a properly-written applicationDECWET::TRESSELPat TresselFri May 09 1997 04:3499
I'm going to play devil's advocate here...

(This is a modified version of a note I just sent to the TCR folks for
comments.)

I'm in the process of making the Unix changer driver behave nicely on a
shared bus.  The Unix NetWorker folks and I talked about what sort of
recovery the jukebox and driver should do on a reset (or other disruptive
event), and concluded that...

...for the most part, the software needs *no* special recovery from either
the jukebox or the driver.  (There are some small amounts of polite behavior
required, but these tend to fall in the category of not being "broken".)

Ok, *why* not?

  -- A reset can mean that someone's been meddling with the media, e.g. they
     opened the door and added tapes, or moved some up to "fill in the holes".

     So, after a reset, it's unsafe for the application to assume that any
     media are where they were before the reset.  In particular, if the
     changer was about to grab something out of a slot and put it into a
     drive, the thing in that slot could be different now.  It would be a
     Very Bad Thing if someone's tape were to be overwritten because it was
     assumed that the media hadn't shifted position.

  -- Similarly, real recovery involves checking that the changer is
     operating  on the same media as it was before the reset.  But...the
     changer has no way to identify media -- only the application does.
     (Bar codes might help here, but because they're not universal, the
     application can't rely on their presence, so it'll have to be able
     to deal with this problem by itself, anyway.)

  -- There are other conditions besides resets that are associated with
     disarranged media (e.g. someone pushing the eject button on a drive not
     in a jukebox).  These should also already be handled by the application.

  -- Since there are already ways for media to move around behind the
     application's back, it should already be checking that a newly mounted
     medium is the right one.  For medium that's already in a drive, upon
     receiving an error that could indicate meddling with the drive, the
     application should verify that the same medium is still present (and
     should reposition if needed).

This led us to conclude that *no matter what* sort of recovery the jukebox
or changer driver attempted, it would *not remove the requirement* that the
application verify the media identity.  We also couldn't see that having the
changer do any moving around after a reset would help the application to
recover, and it might make recovery more difficult.

Since the application should already be verifying media identity after *any*
load into a drive (NetWorker does do this), whether there was an error or
not, then the application should already be able to prevent data loss due
to overwriting the wrong medium, or confusion due to trying to read the
wrong medium.

Regarding the "lost tape" syndrome:  If the media are disarranged in the
slots, the application's bottom line recovery is to inventory all the slots.
This can be done in a "lazy" manner -- as long as the application is finding
the media it wants, it doesn't have to do an inventory.  It's only when the
medium it wants is *not* found in the expected slot that it will have to go
hunting.

In order to reduce the probability of disarrangement (other than by means of
some person opening the jukebox door and moving things around), we *don't*
want the jukebox to go squirreling things away by itself.  That is, if a
reset or other failure (e.g. power) occurs while medium is in the picker,
the medium should *stay* in the picker.  This allows the application to load
the medium in a drive, find out what it is, and store it in the correct slot,
or complete whatever operation it wanted done with that medium.  This is much
less labor-intensive for the application than recovering from a "lost tape"
by doing an inventory.  (Recall that the changer can't tell which medium it's
got hold of.  Someone could even have pulled the medium out of the picker and
substituted another one, so it can't assume it has the same one as it did
before the disruptive event.)

You may be saying to yourself just now, "But how does the application find
out there's medium in the picker?  And *which* picker, if it's a multi-tower,
and was in the middle of passing the medium along the row?"  Well, this is
the prime case of "polite behavior" that I mentioned above.  The application
does need a way to discover these things, and, because this is a form of
failure that occurs *only* in changers (i.e. doesn't have an analogue in a
bare drive), then applications may not already deal with this case.  But, as
long as the medium remains in the picker, and the application does get an
error (e.g. on the next operation involving that picker or the medium in it),
then it can assume the move failed, and check all pickers.  The delivery of
an error is likely to be dealt with in the driver -- it may happen "naturally"
as later operations fail, or as the driver becomes aware of the bus reset.

This is the other half of the polite behavior:  There must be a way to get
the picker to put the medium into a drive.  (I'd consider the jukebox to be
"broken" if there were no way to do this.)  This seems to be working now in
the tape jukeboxes we have -- NetWorker is already able (somehow) to get the
picker to relinquish its medium.  But it would be good to have a reasonably
small set of actions that (among them) would allow getting the medium out of
the picker and into a drive.  I don't know how to do this -- suggestions
would be appreciated.

-- Pat Tressel
6649.11LEFTY::CWILLIAMSCD or not CD, that's the questionFri May 09 1997 09:0228
    Given that all elements in the JB, including the input element, picker,
    drives, slots, transfer elements, etc, are supposed to have a unique
    element address, and an element type reflecting what they are, it is
    indeed possible in the general case for the application to move media
    from any element to any other element. A move from a slot to a drive
    usually implies moving thru a transport element, lick a picker.
    
    The problem comes in the fact that some JB's cannot determine whether
    there is media in some of their elements, due to lack of sensors, bad
    design, etc. This makes it difficult to deal with media left in a
    picker after a state change. Not impossible, just harder.
    
    If all JB vendors were religious about implementing all the allowed
    fields for media tracking, media detection, etc, it would be easier. 
    They are not today.
    
    If the applications are written to properly do all the required error
    recovery, then .10 has validity. Most of the apps I've seen are not
    very good at finding media in pickers. They do fairly well finding
    media lost in a slot or drive, but picker issues give them grief, as
    they do not seem to understand the concept of a picker as a seperate
    element. Thus my recommendation to put the media back in the slot it
    came from, which almost all JB's have the intelligence to do.
    
    CHris
    
    
    
6649.12need a common design goalTAPE::SENEKERHead banging causes brain mushFri May 09 1997 09:4525
	RE: .8,.10

	I agree, conceptually.

	As Chris stated in .11, not all jukeboxes/libraries are created
	equal.  If the various software drivers and applications, that are
	used to control these beast, do not take that into account then
	control problems are going to happen.

	What I am suggesting, is that an effort be made to define what
	the perfect jukebox/library world would look like. Then design
	software systems for use with "real-world" systems and documenting
	the limitations and differences from the "perfect-world".

	Maybe with enough information and a team Digital instead of team
	Networker, team MRU, team SEP tapes, team SEP optical, team OSMS, etc.
	We could work together and get the manufactures to build some real
	good jukebox/library systems.  If a small company like Perceptics
	can get jukebox manufactures to make changes then Digital should
	have enough pull to get things changed.

	Maybe I am a dreamer, but if you don't strive for your dreams
	you sure as hell won't ever get to them.

	Rob
6649.13DECWET::RWALKERRoger Walker - Media ChangersFri May 09 1997 10:5529
	Assuming that the current move request will not ever report
	completion after a bus reset, the cleanest option is for
	the hardware to put the media back where it started.  This
	will eliminate any recovery action by an application that
	tracks the locations since they will not have changed their
	status yet.

	This isn't always possible so the second perfered action would
	be to complete the move.  This will cause a mismatch in state
	since the appication will not receive the completion status.
	If the application logs the planned move to disk then it can
	quickly verify if the element state matches.

	The third is just to stop with the meida in a vaild location 
	including the transport if the device allows moves to and
	from the transport.  If the device leaves the meida somewhere
	else then it is broken.
	
	The worst case here is a move is requested, the node making the
	move goes down.  The bus is reset and the application restarts
	on the other node.  If it only had memory traking of the move
	request it will not know where the media was from or going to.
	If the jukebox was powered off it will not know where it came
	from either.

	Verifing that the proper media is loaded in the dirve before use
	is a key safety factor here but it does not lead to easy recovery
	for the application without user intervention.  It would be better
	to avoid this if possible.
6649.14Need consistency in closed loop.SUBSYS::TRANStraight <Left> Hitter..Fri May 09 1997 12:0019
    
    As Rob stated, consistancy is the key word here..
    
    As of the Current DLT Tape library implementation goes -
    
    TZ8xx Loader and TL800 Series will complete the move incase of reset 
    once it started since the transport (picker) is not a valid element.
    
    TL810/TL820 Series may end up with cartridge in picker depends on how
    far along the move is, it may also complete the move as well for the
    same reason.
    
    Asking the hardware to decide what to do base on condition during reset
    is complicated. I'm talking about if the destination is the drive then
    do one or two things, if it's a slot the do another, if it's a port
    then yet another. If we can sort these out and come up with some way
    that covers all conditions then I'm with it.
    
    T.
6649.15accepting our limitation as wellTAPE::SENEKERHead banging causes brain mushFri May 09 1997 16:2642
    RE: .13
    
    Roger, good points.  I would also like to point out that while we
    are critiquing the programmed actions of various jukebox/library
    robotic mechanisms that we need to distinguish between limitations
    of the hardware and limitations of our own software.
    
    Examples:
    
    1) OSMS could do moves to and from the robot but it doesn't. It
    always does moves to and from data transfer elements, data storage
    elements, or import/export elements.  Due to a "implementation"
    limitation there is no way to ask OSMS code to recover from media
    being stuck in a robot/picker without human assistance.  This
    limitation could be removed but then we are back to the "real-world"
    again with code changes, project schedules, impact, etc.
    
    2) OSMS could do volume (data set) validation upon media insertion
    into a data transfer element, but again it doesn't.  The initial
    product was designed when the customer demand was driven by
    requirements where minimal swap times were more important than data
    integrity.  Again this feature could be added.
    
    Rarely does the customer base see the impact of these implementation
    decisions since the hardware used with OSMS is very reliable.  But
    these are examples of cases that show a need to document how we
    accept the "real-world" but dreaming on how we would like to see the
    "perfect-world".
    
    Time to market considerations and most engineers desire to just get
    something working make it difficult to see that these implementation
    decisions get documented.  But if it is to be done, the engineers are
    really the only ones that will get it done.  It would help alot if
    management made this level of documentation a project requirement. I
    try from time to time but I am guilty of having more of this
    information in my head than down on paper or as part of some project
    documentation.
    
    As I ramble on I hear myself saying, "how much quality is good
    enough?".
    
    Rob
6649.16NetWorker for Unix requirementsDECWET::TRESSELPat TresselFri May 09 1997 21:0113
NetWorker for Unix can live with any of the following recovery behaviors:

  -- Leave the tape in the picker.

  -- Complete the move.

  -- Undo the move.

The one thing we *don't* want is to have the medium put in some location
that was not part of the original move, i.e. it should not be parked in
some slot that was neither the source nor the destination.

-- Pat
6649.17a plea for OSMS mount verificationDECWET::TRESSELPat TresselFri May 09 1997 21:1630
Rob --

> 2) OSMS could do volume (data set) validation upon media insertion
>    into a data transfer element, but again it doesn't.

Ouch.

> Again this feature could be added.

Please!  People *do* open up their jukeboxes and rearrange media...
(Not necessarily the customer -- it can happen while the jukebox is being
serviced.)

What OSMS might do is to make sure the label name field in the disk label
is filled in when a new platter side is initialized, then read that name
when the platter is loaded.

> Rarely does the customer base see the impact of these implementation
> decisions since the hardware used with OSMS is very reliable.

What's saving OSMS is that the thing that's on the media is a filesystem,
not raw data.  So after the thing is mounted, it'll usually be quite
noticeable that the expected files and directories are not present.  But
if someone were *writing*, and the pathname didn't already exist, then
they wouldn't get an error, but their data would not be where they want
it.  And if they were writing to a pathname that happened to exist on
both the real and incorrectly loaded filesystems, they'd overwrite what
something they didn't mean to.

-- Pat
6649.18see quoted string after my nameTAPE::SENEKERHead banging causes brain mushMon May 12 1997 11:2026
    Pat,
    
    You asessments are all correct and your pleas have been requested
    by others in the past.  OSMS is 9 years old, (based on the age of
    it parent product LaserStar) and problem history, such as IPMT, has
    shown that good quality hardware, customer education, and software
    error detection equates to a well performing product.
    
    In OSMS's case, the consideration to improve data integrity involves
    these product aspects, product swap performance, product design and code
    changes, customer problem reports.  OSMS is a old, stable, and minimally
    funded.  The slight degradation in swap performance due to media
    validation, rare to non-existant problem reports, and the very real
    cost to modify the product have prevented any changes from occurring.
    
    Designs are on the shelf to implement these changes but business
    justification does not exist to fund the work.
    
    Also, in OpenVMS and OSMS terminology, validation of media after it has
    been placed in a drive is different than mount verification.  OSMS
    supports the needs of OpenVMS mount verification.
    
    Sigh.  If only the technical aspects of product functionality
    determined its next project definitions.
    
    Rob
6649.19DECWET::TRESSELPat TresselThu May 15 1997 02:4023
Rob --

> Also, in OpenVMS and OSMS terminology, validation of media after it has
> been placed in a drive is different than mount verification.

Right, but checking that the same medium is still in a drive after a bus
reset is the equivalent of mount verification, just as though the drive
had dropped offline.  This is not a changer issue, but is relevant to
the the impact of bus resets on the jukebox as a whole, including drives.
Resets and offlines both can indicate that media have been meddled with.
(Or sometimes not...  I remember, back in the old days, having a TU78 that
would occasionally lose vacuum.  After I got the thing going again, VMS
would wind it aaaaall the way back to the beginning to read the label.)

> rare to non-existant problem reports

Right -- OSMS can get away with it because the user can recognize the
media.  NetWorker can't (get away with it, I mean), because the media
have no cues to their identity other than their labels.  We think we've
got a Plan that will protect NetWorker media across resets and failovers,
without too hideously much overhead...  ;-)

-- Pat
6649.20media verify .vs. VMS mount verifyTAPE::SENEKEROSDS/OSMS, 1992-1997, R.I.P.Thu May 15 1997 09:2747
Pat,

RE: .19

I agree.  This reply is not meant to be a argumentive but an explaination
of a technical subtly.

> Right, but checking that the same medium is still in a drive after a bus
> reset is the equivalent of mount verification, just as though the drive
> had dropped offline.  This is not a changer issue, but is relevant to
> the the impact of bus resets on the jukebox as a whole, including drives.
> Resets and offlines both can indicate that media have been meddled with.
    
OSDS/OSMS uses the OpenVMS "medium offline" status return mechanism in
response to SCSI bus resets to allow OpenVMS to initiate mount verification
for a file structured mounted optical disk drive, either standalone or as
part of a jukebox.  I agree, this case it is not a changer issue.  Other
"offline" conditions also start the same mechanism.

For technical discussions I believe it is important to distinguish
between the above "OpenVMS mount verification" and a more generic "media
verification".  "Media verification" is the process of ensuring that a
particular media still matches the associated software data structures
that are used to identify the media to a software control system.  By this
definition "OpenVMS mount verification" is a "media verification" process
but this process is limited to the needs of OpenVMS for file structured
mounted volumes only.

OSMS uses a "trusted" partner/human assistance process to ensure the
association between physical media and software data structures are
maintained.  If the "trust" is broken, then the media to software association
is called into question and OSMS disallows access to the media until human
assistance verifies the association is valid.  The biggest problem with
this process is the detection of broken "trust".

OSMS's "trust" process works well but could be improved.  Areas of weakness
are:

o media transportation induced disassociations
o pro-active detection of media disassociations instead of re-active
  detection, example detecting a person exchanging slot 10 and slot 20
  media before the system/application made the next I/O request to either
  of those systems.

Please see the next note for the next step beyond "media verification".

Rob
6649.21seperate processes, verify and correctionTAPE::SENEKEROSDS/OSMS, 1992-1997, R.I.P.Thu May 15 1997 09:5921
The next step beyond "media verification" is "media disassociation
correction".  This is the corrective action process started once a
disassociation between the software and the media is detected.

In the OpenVMS mount verification world this process involves the
OPCOM messages and the periodic retry of the mount verification induced
I/O operations.  This process relies on external intervention since it
cannot do anything more than put out the OPCOM messages, not being able
to take any corrective action itself.

In a jukebox environment, many software controlled corrective action
opportunities exist.

I hope this summarizes my ideas that mount verification is not the same
as media verification and media disassociation correction and that some
discussion require the distintion before real communication takes place.

Thank you for your patience, I will step down off my high-horse now.

Rob