[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::archive_backup

Title:Archive/Backup
Moderator:COOKIE::MHUAIG
Created:Wed Sep 08 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:479
Total number of notes:2283

446.0. "disk skipped for a fatal error" by ROM01::MENCHICCHI () Thu Apr 24 1997 10:20

    Hi,
    
    customer have a save request that perform backup of 4 disks.
    During the backup of the second disk he had a fatal error on the drive.
    ABS then had retired the current volume and after n-minutes 
    it had mounted another volume on another drive in order to continue 
    the backup operation.
    
    It is OK but ABS go to the next disk (the third) giving the whole
    backup incomplete.... (the second disk is not saved)
    
     
    How tell to ABS that it must not skip the current disk (the second) but 
    retry the same disk-backup?
    
    
    Thanks, 
    Lorena         
     
T.RTitleUserPersonal
Name
DateLines
446.1COOKIE::MHUAThu Apr 24 1997 12:4117
    
    If the disk has a fatal error, if the same operations is retried 
    in a few minutes, it will most likely to fail again won't it?
    If it's a fatal error, the right thing is to go to the next operation,
    won't it?
    
    Currently there is no way to have ABS redo the operation on the failed
    disk only in the sceario you have described.
    
    If you wish to make the re-do of particular disk backup operation
    easily done (has to be done manually since ABS will not know when 
    you fix the disk problem), I recommend to break apart the 4 disk
    backup to be 4 different save requests.
    
    Thanks,
    Masami
     
446.2drive error and not disk errorROM01::MENCHICCHITue Apr 29 1997 05:1714
    Sorry, Masami
    but in the note I write that 'a fatal error occurs on the drive' and NOT
    on the disk. So is correct that ABS change the drive but isn't right 
    that ABS go to the next operation.
    
    You recommend to break apart the 4 disks backup to be 4 different save
    requests, but my 4 disks contain a single Oracle table so in order to
    have a right backup I must have all the 4 disks saved in the same time.
    If ABS skips a disk my backup is not good!!!
    
    Customer needs a answer.
    
    Thanks, Lorena
    
446.3retry parameters in envCOOKIE::MHUAWed Apr 30 1997 11:2910
    
    In that case, check the retry count and retry interval in execution
    environment object that is used for the operation.
    
    The failed backup will be retried for the number of times specified
    before it goes on to the next disk save.  
    
    Will that help?
    
    Masami
446.4retry parameters settedROM01::MENCHICCHITue May 06 1997 06:5311
    
   > The failed backup will be retried for the number of times specified
   > before it goes on to the next disk save.  
    
    I have a retry count = 3 and retry interval = 15 min in EE. 
    Nevertheless ABS skip the current disk save and go to next disk save.
    
    Any ideas or this is the correct way to work for ABS?
    
    Thanks, Lorena
    
446.5Should retry 3 timesCOOKIE::HEISLERChris Heisler, ABS EngineeringTue May 06 1997 09:1411
    Lorena,
    
    	If you have the retry count set to 3, it should retry the
    	backup of the disk 3 times.  
    
    	Can you post the log of that save operation?  I would like
    	to see the sequence of the actions in the log file.
    
    	Thanks,
    
    	Chris
446.6log of save operationROM01::MENCHICCHITue May 06 1997 11:14140
    
    This is the log of the save operation; I put here only the section that
    is interesting...
    
    
$ SET VERIFY
Executing ABS LOGIN.COM
Completed execution of ABS LOGIN.COM
"@abs_syste:coordinator.com A7A7B86A-C60E-11D0-A990-08002B9730F5" executing,
output follow :
____________________________________________________________________________
Starting New Request at 6-MAY-1997 12:54:36.47
   Name:   P_OUTP_SAVE
   UID:    A7A7B86A-C60E-11D0-A990-08002B9730F5

COORDINATOR: Created new volume set: ABZ026
COORDINATOR: Mounting volume set member: ABZ026 RVN 1
COORDINATOR:    (Selected drive $1$MUA10:)
COORDINATOR: Initializing scratch volume ABZ026
%MOUNT-I-MOUNTED, ABZ026 mounted on _$1$MUA10: (HSJ011)
COORDINATOR: Skipping $1$MUA10: to End of Tape...
THREAD #1:

Operation #1 starting at 6-MAY-1997 12:56:40.15

   Data Movement Type:		FULL_SAVE
   Incremental Level:		Full Operation

   Object Set:
      Object Type:		VMS_FILES
      Include List:		USER:
      Exclude List:	

   Archive Information:
      Storage Class Name:	P_OUTP_CLASS
      Saveset Location:		ABZ026
      Saveset Name:		6MAY199712543661.

   Execution Enviroment:
      Name:			P_OUT_ENV
      Number of retries:	3
      Retry Interval:		15 minute(s)

THREAD #1: $
THREAD #1: SET NOON
THREAD #1: $ version = F$EXTRACT(0,4,f$getsyi("VERSION"))
THREAD #1: $ IF (VERSION .eqs. "V6.1") then $DEFINE BACKUP ABS$SYSTEM:ALTERNATE
_BACKUP.EXE
THREAD #1: $ DEFINE SYS$COMMAND SYS$INPUT:
THREAD #1: $ BACKUP /IMAGE USER: /LIST=_MBA7874:/FULL/RECORD/ -
THREAD #1: _$ IGNORE=(INTERLOCK)/NOCRC/NOVERIFY/VERIFY/CRC -
THREAD #1: _$ -
THREAD #1: _$ $1$MUA10:6MAY199712543661./SAVE/STOR=V2SLS/NOASSIST -
THREAD #1: _$ /EXACT_ORDER
THREAD #1: VOL1ABZ026
          3
THREAD #1: VOL1ABZ026
          3
THREAD #1: HDR16MAY199712543661.ABZ02600010001000100 97126 97126 000000DECVMSBA
CKUP
THREAD #1: HDR16MAY199712543661.ABZ02600010001000100 97126 97126 000000DECVMSBA
CKUP
THREAD #1: HDR2F0819208192		M		00

THREAD #1: HDR2F0819208192		M		00

THREAD #1: --- start the backup of USER disk
	.
	.  
	.
	. -- now the error
THREAD #1: %BACKUP-E-FATALERROR, fatal error on $1$MUA10:[]6MAY199712543661.;
THREAD #1: -SYSTEM-F-VOLIN, volume is not software enabled
THREAD #1: %BACKUP-I-SPECIFY, specify option (QUIT or CONTINUE)
THREAD #1: BACKUP>
CORRDINATOR: Retiring volume set ABZ026 (due to fatal error during save)
COORDINATOR: Dismounting volume set member: ABZ026 RVN 1
THREAD #1: Operation will be retried in 15 minutes...
COORDINATOR: Created new volume set: ABZ027
COORDINATOR: Mounting volume set member: ABZ027 RVN 1
COORDINATOR:     (Selected drive $1$MUA11:)
COORDINATOR: Initializing scratch volume ABZ027
%MOUNT-I-MOUNTED, ABZ027 mounted on -$1$MUA11: (HSJ011)
COORDINATOR: Skipping $1$MUA11: to End of Tape...
THREAD #1: SET NOON
THREAD #1: %BACKUP-I-SPECIFY, specify option (QUIT or CONTINUE)
THREAD #1: BACKUP>
THREAD #1: A fatal error was detected in data mover
COORDINATOR: Skipping $1$MUA11: to End of Tape...
THREAD #1:
THREAD #2:

Operation #2 starting at 6-MAY-1997 13:25:25.02

   Data Movement Type:		FULL_SAVE
   Incremental Level:		Full Operation

   Object Set:
      Object Type:		VMS Files
      Include List:		SPOOL:
      Exclude List:	

   Archive Information:
      Storage Class Name:	P_OUTP_CLASS
      Saveset Location:		ABZ026
      Saveset Name:		6MAY199713251966.

   Execution Enviroment:
      Name:			P_OUT_ENV
      Number of retries:	3
      Retry Interval:		15 minute(s)

THREAD #2: $
THREAD #2: SET NOON
THREAD #2: $ version = F$EXTRACT(0,4,f$getsyi("VERSION"))
THREAD #2: $ IF (VERSION .eqs. "V6.1") then $DEFINE BACKUP ABS$SYSTEM:ALTERNATE
_BACKUP.EXE
THREAD #2: $ DEFINE SYS$COMMAND SYS$INPUT:
THREAD #2: $ BACKUP /IMAGE SPOOL: /LIST=_MBA8162:/FULL/RECORD/ -
THREAD #2: _$ IGNORE=(INTERLOCK)/NOCRC/NOVERIFY/CRC -
THREAD #2: _$ -
THREAD #2: _$ $1$MUA11:6MAY199713251966./SAVE/STOR=V2SLS/NOASSIST -
THREAD #2: _$ /EXACT_ORDER
THREAD #2: VOL1ABZ027
          3
THREAD #2: VOL1ABZ027
          3
THREAD #2: HDR16MAY199713251966.ABZ02700010001000100 97126 97126 000000DECVMSBA
CKUP
THREAD #2: HDR16MAY199713251966.ABZ02700010001000100 97126 97126 000000DECVMSBA
CKUP
THREAD #2: HDR2F0819208192		M		00

THREAD #2: HDR2F0819208192		M		00

THREAD #2: --- start the backup of SPOOL disk  
	.
        .
	.
and continue to backup the other disk with normal successful completion
446.7ThanksCOOKIE::HEISLERChris Heisler, ABS EngineeringTue May 06 1997 15:099
    Lorena,
    
    	Thanks for posting the log file.
    
    	We'll take a look at the code to see if the state of the
    	process caused it to skip the retries.  We thought that
    	it should be retrying the first save.
    
    	Chris