[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::networker

Title:NetWorker
Notice:kits - 12-14, problem reporting - 41.*, basics 1-100
Moderator:DECWET::RANDALL.com::lenox
Created:Thu Oct 10 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:750
Total number of notes:3361

105.0. "Recover a Networker backup (was note 7693 in the DIGITAL_UNIX conference)" by DECWET::TRESSEL (Pat Tressel) Thu Oct 31 1996 17:45

T.RTitleUserPersonal
Name
DateLines
105.1DECWET::ONOThe Wrong StuffThu Oct 31 1996 20:1322
105.2Need to develope strategy for TruCluster recoveryUSPS::FPRUSSFrank Pruss, 202-232-7347Mon Apr 07 1997 15:1521
    Here is a wrinkle:
    
    We have a TruCluster that is running ORACLE OPS 7.3.2.3.  They only
    have tapes drives in the Exabyte 8mm libraries, and these libraries are
    only attached to the one node running NSR Server.
    
    Now the operator can use a library drive as a "single" drive, so it is
    should be possible to easily do a traditional level 0 backup of the
    system disk on this node.
    
    I believe we should actually be able to use a tape drive remotely to
    possible achieve a backup of the node that has no drive, or dump
    partitions to a file on a disk NFS served by the unit with the tapes. 
    
    But I'm not sure how we could go about a restore of these level 0 dumps
    to bring back a TruCluster from a "disaster".
    
    NB ALL drives and partitions are LSM Mirrored for "safety".  But I fear
    that this will add complexity to the restore process!
    
    
105.3DECWET::FARLEEInsufficient Virtual um...er....Tue Apr 08 1997 14:2716
You'll have to fill in some gaps in your question here:

First off, where does NetWorker fit in your scenario?  Are you
talking about using NetWorker for your "level 0 dumps"?  If not, 
exactly what are you proposing?

Secondly, we have produced a disaster recovery manual.  If you consider
the cluster node with the NetWorker server as the server node, and the
other node(s) in the cluster as clients  (which is how NetWorker thinks of them)
then your problem is no different than any other disaster recovery
involving several systems.

Maybe you can state more clearly why you think that TruCluster makes
the disaster recovery more complex?

Kevin
105.4Here the tricks ...BACHUS::DEVOSManu Devos DEC/SI Brussels 856-7539Wed Apr 09 1997 01:4732
    Frank,
    
    If the problem is to take a "vdump -0" of the Cluster's systems without
    tapes (in anticipation of a disaster), then you can simply use this
    command. Let's say that SYSTEM-A has the tape and SYSTEM-B & SYSTEM-C
    have no tape device:
    
    SYSTEM-B # vdump -0 -f - / | rsh SYSTEM-A "dd of=/dev/nrmt0h bs=60k"
    SYSTEM-C # vdump -0 -f - / | rsh SYSTEM-A "dd of=/dev/nrmt0h bs=60k"
    
    The option "bs=60k" is needed to keep the vdump format (See vdump(8)).
    
    So, if SYSTEM-B is completely crashed, you can boot it from the CD, 
    create the root and/or /usr disk device files, mount the root disk
    device on /mnt, and now the trick:
    
    # hostname SYSTEM-B
    # ifconfig tu0 x.x.x.x netmask y.y.y.y
    # echo "w.w.w.w SYSTEM-A" > /etc/hosts
    # rsh SYSTEM-A "dd if=/dev/nrmt0h bs=60k" | vrestore -x -f - -D /mnt
    
    Replace X.X.X.X by the IP address of SYSTEM-B, Y.Y.Y.Y by its netmask
    and you should also create a /etc/hosts file which knows "SYSTEM-A" by
    the above "echo" command. Replace W.W.W.W by the IP address of SYSTEM-A
    
    Then, proceed similarly for /usr. You can now reboot the system and
    contact the NSR server to restore / and /usr to their last state...
    
    Easy, isn't it ?
    
    Manu.
    
105.5This looks good! Ever thought of moving to Missouri?USPS::FPRUSSFrank Pruss, 202-232-7347Wed Apr 09 1997 18:1218
    Mano,
    
    If you are saying that we can now get the network up after booting the
    UNIX CD in .4, then there is no problem!
    
    We knew how to take the appropriate vdumps, but wasn't sure of the best
    way to restore.  We always had the last resort of moving "B-TAPELESS's"
    disks over to A for the restore, then putting them back on B, but want
    to avoid that.
    
    I did not catch the mandatory block size for vdump/vrestore before,
    thanks!
    
    Now to figure a way to model this on a one node system.  Shouldn't be
    too bad...
    
    FJP
    
105.6Pointer to Disaster Manual?USPS::FPRUSSFrank Pruss, 202-232-7347Wed Apr 09 1997 18:1816
    Re: .3 Disaster manual?
    
    Is this the addendum that I have copied but not yet printed?  
    
    Sorry if my questions cover stuff already documented.
    
    I have been spending energy to build up a UNIX 4.0b system with adequate
    resources to model stuff we'd like to recommend, and haven't had the
    time to review this addendum.  Customer is pressing us with questions
    faster than we can get answers.
    
    If there is a different "Disaster Recovery" manual than the Addendum,
    please provide pointer.
    
    FJP
    
105.7BRSSWS::DEVOSManu Devos DEC/SI Brussels 856-7539Thu Apr 10 1997 08:4911
    Frank,
    
    >> If you are saying that we can now get the network up after booting
    >> the UNIX CD in .4, then there is no problem!
    
    The procedure I gave in .4 is working. I personnaly used it on V3.2D
    and on V4.0B. 
    
    Happy to help you, ... from Brussels :-)
    
    Manu.
105.8Memory Channel available?USPS::FPRUSSFrank Pruss, 202-232-7347Thu Apr 10 1997 21:573
    I don't suppose booting from CD supports memory channel as the network?
    
    FJP
105.9Cute, but needs tweaking.USPS::FPRUSSFrank Pruss, 202-232-7347Thu Apr 10 1997 22:2915
    I know we are straying from NETWORKER specific a bit here.
    
    I have played with this and find that in trying to use vrestore, I need
    to use dd to read the tape.
    
    If I try to use vrestore -i -f tape to inspect the backup, I get a core
    dump.  If I use dd if=(tape) bs=60K | vrestore -i -f - to look at the
    tape, it is fine.  I suspect that vrestore and dd do not agree on the
    meaning of 60k, or that because the initial vdump went to "-" instead
    of tape, vdump did not block the data at 60K.
    
    Specifically vrestore -i -f (tape) bitches that it only got "60k"
    (61440) when it was looking for 64K (65536).
    
    It's time for a nap.  I'll check this tomorrow.
105.10We must use bs=64kBACHUS::DEVOSManu Devos DEC/SI Brussels 856-7539Fri Apr 11 1997 04:5214
    Hello Frank,
    
    You're right... I composed the note .4 from my mind, and didn't remember
    exactly the block size to use. So, I quickly checked the vdump(8) man
    page to find it, and typed 60K. Today, I made the test in real
    situation and it appears that vrestore wants to read 64K blocks. So I
    change the bs=64K at the vdump and vrestore time, and all is OK. 
    
    Anyway, I am confuse, because I am quite sure that I had previously used
    the bs=60K! (Maybe, it was on an older version, today I made the test 
    with V4.0B).  
    
    Regards, Manu.
    
105.11DECWET::RWALKERRoger Walker - Media ChangersFri Apr 11 1997 09:097
	r.e. last few

	I'd like to thank you guys for these last few replies, they help
	us understand alternatives for quick disaster recovery.  Since
	it takes a lot to get a system up enough to run a NetWorker
	recovery we like to hear about every way to make this happen
	faster for the customer.  It really adds to the whole package.
105.12DECWET::FARLEEInsufficient Virtual um...er....Fri Apr 11 1997 10:2322
I've been mulling a few ideas around, and I'd like to get feedback
from you folks on how much demand you see:

1) NetWorker recovery from RIS boot...
	You can set up a system as a RIS server.  One of the options when
	booting RIS is to choose the "system management" option on the menu.
	If you put srecover (a statically linked version of recover) into
	the directory where these functions live, you can use it to recover
	all of your disks from a NetWorker server without ever installing
	the OS.

	This is not as simple a process as it sounds, and there are catches,
	but it is possible.  Would it be a desirable feature if  we were to 
	document the setup, and write scripts which would help to automate
	the tricky bits?

2) NetWorker recovery from bootable tape/cd
	This would be either a bootable CD with a script which would prompt
	you for info (node name/address, server name/address, routing info, 	
	etc.), and then run recover based on that.

Thoughts?  Feedback?
105.13KITCHE::schottEric R. Schott USG Product ManagementFri Apr 11 1997 12:3910
Does the procedure a few notes back work with LSM mirrored /, /usr,
/var?

I expect you have to change a few things after the restore to
turn off lsm long enough to re-encapsulate...

Also, appropriate command when init'ing disks to ensure
they boot...


105.14Time to get serious!USPS::FPRUSSFrank Pruss, 202-232-7347Fri Apr 11 1997 18:1826
    LSM Note 643 has a lot of hints (and even details) on how to go about 
    recovering an LSM configuration.
    
    I am asking these questions in preparation for getting my customer
    (really a DIGITAL consultant working with the customer) ready to
    develope a detailed distaster recovery plan for a UNIX TruCluster/ASE
    supporting ORACLE OPS.  I expect this to be taken to the level that the
    system can be rebuilt from "new,  un-used parts" at a different site. 
    Similar to the work Ron Ginger has been doing for his customer as
    described in the LSM conference.
    
    In 18 months, it is conceivable that this system will be hosting 0.5 to
    1.5 Tb of ORACLE data, including all development, training and QA
    instances.  (Right now it is only capable of 100 to 180 Gb, depending
    on whether they stay at LSM Mirrors or move to controller RAID-5)
    
    I expect the development of the plan to be performed by DIGITAL NSIS,
    and intend that it be delivered in the form of a WORD/PDF document,
    including any scripts to save and restore configuration data.
    
    Hopefully, when this is all done, it will be an "interesting" document
    to share internally.
    
    I don't have the equipment here in the home office to play with all
    aspects of the technical requirements _yet_.  But I will be scrounging
    around!
105.15Reply to .13 and .12BACHUS::DEVOSManu Devos NSIS Brussels 856-7539Sat Apr 12 1997 10:2441
    R: .13                   
    
    Hi Eric,
    
    Yes, the procedure should be adapted for LSM/ADVFS setup on the system
    disk, but my proposal was only aimed to show that it is NOT needed to
    re-install a whole UNIX from CD. We can use the network from the SAS
    Unix with only the following three commands:
    
    	# hostname SYSTEM_NAME
    	# ifconfig net_device address netmask
    	# create a one line hosts file
    	
    And then your externally saved data (tape-disk-NSR) are accessible.
    
    --------------------------------------------------------------------------
    
    R: .12
    
    I think that the two proposals are good for us. The first for the big
    sites and the DEC sites, and the second  more specifically for the
    ordinary customers.
    
    But, the most interesting development in this area could be an LSM
    program which would "read" the LSM PRIVATE AREA of a rootdg disk and
    save the info in a file. That file could then be interpreted by a
    second LSM program at the recovery time to AUTOMATICALLY re-create the
    LSM private area on the new disk. 
    
    You will say that volsave/volrestore is already existing, but
    they are NOT working for the system disk. ANd the main problem of a
    crash recovery is on the system disk. So, I am preaching for that.
    
    LSM has already provided program like "volprivutil dumpconfig" so, the
    contrary should be possible. Any taker in the LSM group?
    
    So, to summarize my point of view, again I think that we need a better
    integration of NSR-LSM-ADVFS-ASE.
    
    
    Regards, Manu.