T.R | Title | User | Personal Name | Date | Lines |
---|
105.1 | | DECWET::ONO | The Wrong Stuff | Thu Oct 31 1996 20:13 | 22 |
105.2 | Need to develope strategy for TruCluster recovery | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Mon Apr 07 1997 15:15 | 21 |
| Here is a wrinkle:
We have a TruCluster that is running ORACLE OPS 7.3.2.3. They only
have tapes drives in the Exabyte 8mm libraries, and these libraries are
only attached to the one node running NSR Server.
Now the operator can use a library drive as a "single" drive, so it is
should be possible to easily do a traditional level 0 backup of the
system disk on this node.
I believe we should actually be able to use a tape drive remotely to
possible achieve a backup of the node that has no drive, or dump
partitions to a file on a disk NFS served by the unit with the tapes.
But I'm not sure how we could go about a restore of these level 0 dumps
to bring back a TruCluster from a "disaster".
NB ALL drives and partitions are LSM Mirrored for "safety". But I fear
that this will add complexity to the restore process!
|
105.3 | | DECWET::FARLEE | Insufficient Virtual um...er.... | Tue Apr 08 1997 14:27 | 16 |
| You'll have to fill in some gaps in your question here:
First off, where does NetWorker fit in your scenario? Are you
talking about using NetWorker for your "level 0 dumps"? If not,
exactly what are you proposing?
Secondly, we have produced a disaster recovery manual. If you consider
the cluster node with the NetWorker server as the server node, and the
other node(s) in the cluster as clients (which is how NetWorker thinks of them)
then your problem is no different than any other disaster recovery
involving several systems.
Maybe you can state more clearly why you think that TruCluster makes
the disaster recovery more complex?
Kevin
|
105.4 | Here the tricks ... | BACHUS::DEVOS | Manu Devos DEC/SI Brussels 856-7539 | Wed Apr 09 1997 01:47 | 32 |
| Frank,
If the problem is to take a "vdump -0" of the Cluster's systems without
tapes (in anticipation of a disaster), then you can simply use this
command. Let's say that SYSTEM-A has the tape and SYSTEM-B & SYSTEM-C
have no tape device:
SYSTEM-B # vdump -0 -f - / | rsh SYSTEM-A "dd of=/dev/nrmt0h bs=60k"
SYSTEM-C # vdump -0 -f - / | rsh SYSTEM-A "dd of=/dev/nrmt0h bs=60k"
The option "bs=60k" is needed to keep the vdump format (See vdump(8)).
So, if SYSTEM-B is completely crashed, you can boot it from the CD,
create the root and/or /usr disk device files, mount the root disk
device on /mnt, and now the trick:
# hostname SYSTEM-B
# ifconfig tu0 x.x.x.x netmask y.y.y.y
# echo "w.w.w.w SYSTEM-A" > /etc/hosts
# rsh SYSTEM-A "dd if=/dev/nrmt0h bs=60k" | vrestore -x -f - -D /mnt
Replace X.X.X.X by the IP address of SYSTEM-B, Y.Y.Y.Y by its netmask
and you should also create a /etc/hosts file which knows "SYSTEM-A" by
the above "echo" command. Replace W.W.W.W by the IP address of SYSTEM-A
Then, proceed similarly for /usr. You can now reboot the system and
contact the NSR server to restore / and /usr to their last state...
Easy, isn't it ?
Manu.
|
105.5 | This looks good! Ever thought of moving to Missouri? | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Wed Apr 09 1997 18:12 | 18 |
| Mano,
If you are saying that we can now get the network up after booting the
UNIX CD in .4, then there is no problem!
We knew how to take the appropriate vdumps, but wasn't sure of the best
way to restore. We always had the last resort of moving "B-TAPELESS's"
disks over to A for the restore, then putting them back on B, but want
to avoid that.
I did not catch the mandatory block size for vdump/vrestore before,
thanks!
Now to figure a way to model this on a one node system. Shouldn't be
too bad...
FJP
|
105.6 | Pointer to Disaster Manual? | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Wed Apr 09 1997 18:18 | 16 |
| Re: .3 Disaster manual?
Is this the addendum that I have copied but not yet printed?
Sorry if my questions cover stuff already documented.
I have been spending energy to build up a UNIX 4.0b system with adequate
resources to model stuff we'd like to recommend, and haven't had the
time to review this addendum. Customer is pressing us with questions
faster than we can get answers.
If there is a different "Disaster Recovery" manual than the Addendum,
please provide pointer.
FJP
|
105.7 | | BRSSWS::DEVOS | Manu Devos DEC/SI Brussels 856-7539 | Thu Apr 10 1997 08:49 | 11 |
| Frank,
>> If you are saying that we can now get the network up after booting
>> the UNIX CD in .4, then there is no problem!
The procedure I gave in .4 is working. I personnaly used it on V3.2D
and on V4.0B.
Happy to help you, ... from Brussels :-)
Manu.
|
105.8 | Memory Channel available? | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Thu Apr 10 1997 21:57 | 3 |
| I don't suppose booting from CD supports memory channel as the network?
FJP
|
105.9 | Cute, but needs tweaking. | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Thu Apr 10 1997 22:29 | 15 |
| I know we are straying from NETWORKER specific a bit here.
I have played with this and find that in trying to use vrestore, I need
to use dd to read the tape.
If I try to use vrestore -i -f tape to inspect the backup, I get a core
dump. If I use dd if=(tape) bs=60K | vrestore -i -f - to look at the
tape, it is fine. I suspect that vrestore and dd do not agree on the
meaning of 60k, or that because the initial vdump went to "-" instead
of tape, vdump did not block the data at 60K.
Specifically vrestore -i -f (tape) bitches that it only got "60k"
(61440) when it was looking for 64K (65536).
It's time for a nap. I'll check this tomorrow.
|
105.10 | We must use bs=64k | BACHUS::DEVOS | Manu Devos DEC/SI Brussels 856-7539 | Fri Apr 11 1997 04:52 | 14 |
| Hello Frank,
You're right... I composed the note .4 from my mind, and didn't remember
exactly the block size to use. So, I quickly checked the vdump(8) man
page to find it, and typed 60K. Today, I made the test in real
situation and it appears that vrestore wants to read 64K blocks. So I
change the bs=64K at the vdump and vrestore time, and all is OK.
Anyway, I am confuse, because I am quite sure that I had previously used
the bs=60K! (Maybe, it was on an older version, today I made the test
with V4.0B).
Regards, Manu.
|
105.11 | | DECWET::RWALKER | Roger Walker - Media Changers | Fri Apr 11 1997 09:09 | 7 |
| r.e. last few
I'd like to thank you guys for these last few replies, they help
us understand alternatives for quick disaster recovery. Since
it takes a lot to get a system up enough to run a NetWorker
recovery we like to hear about every way to make this happen
faster for the customer. It really adds to the whole package.
|
105.12 | | DECWET::FARLEE | Insufficient Virtual um...er.... | Fri Apr 11 1997 10:23 | 22 |
| I've been mulling a few ideas around, and I'd like to get feedback
from you folks on how much demand you see:
1) NetWorker recovery from RIS boot...
You can set up a system as a RIS server. One of the options when
booting RIS is to choose the "system management" option on the menu.
If you put srecover (a statically linked version of recover) into
the directory where these functions live, you can use it to recover
all of your disks from a NetWorker server without ever installing
the OS.
This is not as simple a process as it sounds, and there are catches,
but it is possible. Would it be a desirable feature if we were to
document the setup, and write scripts which would help to automate
the tricky bits?
2) NetWorker recovery from bootable tape/cd
This would be either a bootable CD with a script which would prompt
you for info (node name/address, server name/address, routing info,
etc.), and then run recover based on that.
Thoughts? Feedback?
|
105.13 | | KITCHE::schott | Eric R. Schott USG Product Management | Fri Apr 11 1997 12:39 | 10 |
| Does the procedure a few notes back work with LSM mirrored /, /usr,
/var?
I expect you have to change a few things after the restore to
turn off lsm long enough to re-encapsulate...
Also, appropriate command when init'ing disks to ensure
they boot...
|
105.14 | Time to get serious! | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Fri Apr 11 1997 18:18 | 26 |
| LSM Note 643 has a lot of hints (and even details) on how to go about
recovering an LSM configuration.
I am asking these questions in preparation for getting my customer
(really a DIGITAL consultant working with the customer) ready to
develope a detailed distaster recovery plan for a UNIX TruCluster/ASE
supporting ORACLE OPS. I expect this to be taken to the level that the
system can be rebuilt from "new, un-used parts" at a different site.
Similar to the work Ron Ginger has been doing for his customer as
described in the LSM conference.
In 18 months, it is conceivable that this system will be hosting 0.5 to
1.5 Tb of ORACLE data, including all development, training and QA
instances. (Right now it is only capable of 100 to 180 Gb, depending
on whether they stay at LSM Mirrors or move to controller RAID-5)
I expect the development of the plan to be performed by DIGITAL NSIS,
and intend that it be delivered in the form of a WORD/PDF document,
including any scripts to save and restore configuration data.
Hopefully, when this is all done, it will be an "interesting" document
to share internally.
I don't have the equipment here in the home office to play with all
aspects of the technical requirements _yet_. But I will be scrounging
around!
|
105.15 | Reply to .13 and .12 | BACHUS::DEVOS | Manu Devos NSIS Brussels 856-7539 | Sat Apr 12 1997 10:24 | 41 |
| R: .13
Hi Eric,
Yes, the procedure should be adapted for LSM/ADVFS setup on the system
disk, but my proposal was only aimed to show that it is NOT needed to
re-install a whole UNIX from CD. We can use the network from the SAS
Unix with only the following three commands:
# hostname SYSTEM_NAME
# ifconfig net_device address netmask
# create a one line hosts file
And then your externally saved data (tape-disk-NSR) are accessible.
--------------------------------------------------------------------------
R: .12
I think that the two proposals are good for us. The first for the big
sites and the DEC sites, and the second more specifically for the
ordinary customers.
But, the most interesting development in this area could be an LSM
program which would "read" the LSM PRIVATE AREA of a rootdg disk and
save the info in a file. That file could then be interpreted by a
second LSM program at the recovery time to AUTOMATICALLY re-create the
LSM private area on the new disk.
You will say that volsave/volrestore is already existing, but
they are NOT working for the system disk. ANd the main problem of a
crash recovery is on the system disk. So, I am preaching for that.
LSM has already provided program like "volprivutil dumpconfig" so, the
contrary should be possible. Any taker in the LSM group?
So, to summarize my point of view, again I think that we need a better
integration of NSR-LSM-ADVFS-ASE.
Regards, Manu.
|