[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::archive_backup

Title:Archive/Backup
Moderator:COOKIE::MHUAIG
Created:Wed Sep 08 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:479
Total number of notes:2283

428.0. "Volume set creation in parallel save requests" by ATZIS3::PIEBER (chaos has many faces) Wed Apr 16 1997 08:45

    Hi,
    
    here is a summary entry about a phenomen we are seeing at a customer
    site using six parallel ABS save requests (SR) into a TL826.
    
    We have one storage class (SC), one execution envrionment (EE) and 19 
    SR.
    
    Six out of 19 SRs start at 02:00 a.m. to grab all six drives, create or
    re-use volume sets (VS) and start to backup data. At 02:30 we start all
    the remaining 13 SRs and let ABS work out the schedule for them. All SR
    run on the CSD. Not yet implemented, but due soon, we need another six
    SRs coming in from six VMS client systems. They will be started at
    approx. 04:00 a.m.
    
    We have set EE.Drives to 1 
            and SC.Streams to 6.
    
    What we expected to happen, is that the first six SRs create six VS
    during their very first run and use them. All other SR will use these
    VS as they become available, as SR complete. Next time these SR are 
    executed, they will re-use their six VS, or create six more VS, depending 
    on the consolidation interval. What we further expected, was six VS-names 
    in the field SC.AFS.MDMS_Opt.Static.VS_Name = {list of six VS}
    
    We had everything defined from scratch: 
    
    - EPCOT_DB, 
    - MDMS_DB, 
    - Volume_Pool,
    - ABS Catalog.
    
    What we actually found:
    
    = Seven VS were created, instead of six.
    = four VS were listed in SC.AFS.MDMS_Opt.Static.VS_Name
      we thus can not use three of the VS automatically. We can manually
      put six of them back into the SC.
    
    = there seems to be a mechanism, that tries to re-use the VS in the
      list SC.AFS.MDMS_Opt.Static.VS_Name from the beginning of the list
      towards it's end. This means, that VS located near the beginning of
      the list will see more attempts to use them and thus eventually get
      re-used more often, than VS near the end of the list. This might
      lead to a situation, where the first VS are extended with extra
      tapes, while VS at the end of the list might have sufficient free
      space. 
      
      We will monitor the behavior of the SR, and the creation of VS
      for a while now, while we are running the phase-in tests.
    
    Ewald.
T.RTitleUserPersonal
Name
DateLines
428.1research result and workaroundCOOKIE::MHUAWed Apr 16 1997 09:2419
    
    After much investigation, the enginnering now knows why this was
    happening at Ewald's site. They run about 20 save jobs a night.  Since
    they have 6 drvies in a tape libaray system, they can run up to 6 jobs
    concurrently.
    
    They start up to 4 jobs at the same time (02:00).  We have discovered 
    that ABS has a very tight, small timing window that the update of the
    volume name used for the save can be written at the very same time
    and can overwrite each other's data. This is a bug that should be
    addressed in the next version of ABS release.
    
    To workaround the problem, I advised Ewald to start the jobs at least 2
    minutes apart, so that they do not hit this timing window.
    
    Ewald is going to try this workaround and let us know how it went.
    
    Masami