Title: | AdvFS Support/Info/Questions Notefile |
Notice: | note 187 is Freq Asked Questions;note 7 is support policy |
Moderator: | DECWET::DADDAMIO |
Created: | Wed Jun 02 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1077 |
Total number of notes: | 4417 |
I ran into an odd bug the other day. We were adding some new storage to our crash dump server at the Atlanta CSC. Several filesystems wound up migrating to new homes. I did the migration with addvol/ rmvol which worked beautifully in all cases except one. A domain consisted of three volumes, each an RZ29-B. It was about 50% full, with the three volumes being about 65%, 55%, and 30% full respectively. I was moving it to a single volume 12-GB stripe set on an HSZ50. The domain was otherwise inactive at the time. I addvol'd the stripe set -- no problem. I rmvol'd the first RZ29B -- no problem. I rmvol'd the second RZ29B, and got the error message E_CANT_MIGRATE_HOLE, complaining about one of the vmcore files I was moving. Since this is a crash dump server, there are lots of huge sparse files on it. However, the file it called out was by no means the largest sparse file. I tried rmvol'ing the third RZ29B and it complained about the same file. I moved this file to another filesystem. I tried again to rmvol the second RZ29B and it gave me the same error message on another file. I removed this file and tried again. It complained about a third file. I removed that one too, and the rmvol succeeded. I then tried again to rmvol the third volume and it succeeded. The domain was entirely migrated to the new stripe set. Any ideas on this? Unfortunately, I had to destroy the evidence (this was a production machine and I had to get it back up quickly.) Martin Moore Digital UNIX Support
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
1058.1 | Try a balance - it seems to work for me | UNIFIX::HARRIS | Juggling has its ups and downs | Mon May 12 1997 06:46 | 16 |
As to why, I don't know, but I've experienced a similar problem in the past (in my case it was a domain where I analyzed CLD crash dumps when I worked in USEG). I was constantly adding and removing partitions from the domain, as I shifted from needing more space for crash dump to needing more space for test file systems and back again. Anyway, I got around the problem by doing a balance on the domain. It is my guess that the balance moved things around and as a side effect defrag'ed a few things. After that I was able to do my rmvol's with out a problem. Maybe the same effect could have occurred if I had defragmented the domain. I don't know. So maybe this can be a workaround for you. Someone else will have to try to answer the question of why. Bob Harris | |||||
1058.2 | DECWET::MARTIN | Mon May 12 1997 16:10 | 22 | ||
Well, I've poked around and come up with a little bit of info. First, you should be able to use addvol, rmvol, defragment, and balance on file systems that are active and busy. So, in theory, you could have had this machine up and available to users while doing this transfer. Note the key words "in theory". We had a problem with defragment where it would look up the extent table of a file, then a user would modify the file (truncation would cause the problem easiest), *then* defragment would attempt to move it, which would fail because it thought that there was data there that was no longer there. We fixed this by just ignoring the error, on the assumption that it would be OK on the next pass, and besides, if the file didn't get defragged, it wasn't that big a deal. I haven't investigated rmvol to see what its migration scheme is like. Since you say the file domain was quiescent while the rmvol was happening, that can't have been the problem. I would highly recommend opening a QAR against rmvol, so we can allocate engineering efforts into fixing this. I'm really curious if this might have worked had you just tried to run rmvol a second time. --Ken | |||||
1058.3 | RHETT::MOORE | Tue May 13 1997 05:45 | 6 | ||
re .2 -- I actually did try the failing rmvol more than once (though I didn't mention that in the original post.) It acted the same way each time. Martin | |||||
1058.4 | DECWET::MARTIN | Tue May 13 1997 13:02 | 4 | ||
Interesting. Definitely a bug in rmvol, then. (Well, it's a bug either way, but that's more information to help solve it....) A QAR would be much appreciated. | |||||
1058.5 | RHETT::MOORE | Tue May 13 1997 13:43 | 3 | ||
Entered as QAR 52975. Martin |