Title: | RMS asks, 'R U Journaled?' |
Moderator: | STAR::TSPEER UVEL |
Created: | Tue Mar 11 1986 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 3031 |
Total number of notes: | 12302 |
<<< MOVIES::DISK$SYSDATA:[NOTES$LIBRARY]DECDTM-VMS.NOTE;1 >>> -< DECDTM-VMS >- ================================================================================ Note 352.0 Error while trying to recover. 1 reply BACHUS::LEEN "Jaak Leen, TP/IM Support Belgium 856-8738" 49 lines 25-APR-1997 09:26 -------------------------------------------------------------------------------- We had a strange problem at a customers site a few days ago. They have an application that is using 'recover unit journaling' and there are several RMS files participating in the transaction. Now they have always the same error when they open one of the files invoked they get the same error messages from OPCOM. %%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.05 %%%%%%%%%%% Message from user FIELD on AGBVR2 %RMSREC-F-OPRSERVER, error occurred during detached recovery unit recovery; process ID (PID) 00011463 %%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.06 %%%%%%%%%%% Message from user FIELD on AGBVR2 -RMSREC-F-FILE, file DISK$USER2:[IMPCON.MTRD3]WIPS.IMP;3 %%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.06 %%%%%%%%%%% Message from user FIELD on AGBVR2 -RMSREC-F-INVDDTM, error occurred processing prepare record %%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.07 %%%%%%%%%%% Message from user FIELD on AGBVR2 -SYSTEM-F-NOSUCHPART, specified participant not found %%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.07 %%%%%%%%%%% Message from user FIELD on AGBVR2 -SYSTEM-W-DEVOFFLINE, device is not in configuration or not available I couldn't find anything that could explain the DEVOFFLINE. Via LMCP I found a active transaction for that day that was COMMITTED because the the problems started around that time I advised them to delete the journal-file. Record number 3 (00000003), 64 (0040) bytes Transaction state (2): COMMITTED Transaction ID: B2ABC8AF-B6D7-11D0-8CCB-414742565232 (17-APR-1997 04:04:32.55) DECdtm Services Log Format V1.1 Type ( 3): LOCAL RM Log ID: 037100C0-0029-0003-7C23-000000000000 Name (22): "RMS$USER2.......*.D..." (0000 0144162A 00000000 00000032 52455355 24534D52) The disk with label 'RMS$USERS2' was online. We saved the journal-file and the transaction-log. Any idea where to look next. Thanks in advance, Jaak ================================================================================ Note 352.1 Error while trying to recover. 1 of 1 MOVIES::POTTER "http://www.vmse.edo.dec.com/~potter/" 7 lines 25-APR-1997 10:39 -------------------------------------------------------------------------------- Jaak, I think this is more of an RMS-Journaling issue than DECdtm - have you asked the RMS folk? regards, //alan
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
3025.1 | STAR::TSPEER | Tue Apr 29 1997 10:15 | 26 | ||
Jaak, RMS is failing detached recovery because one of its calls to DECdtm is failing -- RMS simply passes on this error. I strongly suspect that the DECdtm call is $GETDTI, which RMS uses during recovery to request that DECdtm return the global status (committed or aborted) of a transaction in which RMS was a resource manager. NOSUCHPART from DECdtm means that for some reason DECdtm fails to recognize RMS as having been involved in the transaction. While there could be numerous theoretical reasons for this failure, including a bad $GETDTI call from RMS, the fact that this appears to be specific to certain files involved in a specific transaction makes me wonder whether DECdtm is having trouble accessing its log information for the data it must return in the $GETDTI call. Could DEVOFFLINE be referring to the device where the DECdtm transaction log was located? Does *any* attempt to access the affected file(s) -- e.g. a simple DCL OPEN -- cause the same error? Is only one file involved, or do all files involved in the transaction exhibit the same behavior? Before bouncing this entirely back on the DECdtm people, you might check the file SYS$MANAGER:RMSREC$SERVER_ERROR.LOG, which is created/appended to whenever detached RMS recovery encounters a fatal recovery error. There is a (slim) chance it may contain additional information. Tom Speer | |||||
3025.2 | MOVIES::POTTER | http://www.vmse.edo.dec.com/~potter/ | Wed May 07 1997 05:20 | 43 | |
>%%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.05 %%%%%%%%%%% >Message from user FIELD on AGBVR2 >%RMSREC-F-OPRSERVER, error occurred during detached recovery unit recovery; >process ID (PID) 00011463 > >%%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.06 %%%%%%%%%%% >Message from user FIELD on AGBVR2 >-RMSREC-F-FILE, file DISK$USER2:[IMPCON.MTRD3]WIPS.IMP;3 > >%%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.06 %%%%%%%%%%% >Message from user FIELD on AGBVR2 -RMSREC-F-INVDDTM, error occurred processing prepare record > >%%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.07 %%%%%%%%%%% >Message from user FIELD on AGBVR2 >-SYSTEM-F-NOSUCHPART, specified participant not found > >%%%%%%%%%%% OPCOM 17-APR-1997 10:55:50.07 %%%%%%%%%%% >Message from user FIELD on AGBVR2 >-SYSTEM-W-DEVOFFLINE, device is not in configuration or not available Okay, let's see what we know here. RMS-Journaling is trying to recover the log, and is getting SS$_NOSUCHPART from the transaction manager. That means that DECdtm has given up knowledge of RMS-Journaling for this file being ionvolved in the transaction, either because it never was involved, because it received a response from RMS-J saying that RMS-J was not going to ask about the outcome of the transaction again, or because of a DECdtm bug. No such DECdtm bug has been reported or observed previously. Can you explain to me what the last line of this sentence means? Have you advised the customer to change any records in the DECdtm log, or alter a DECdtm journal file in any way? >I couldn't find anything that could explain the DEVOFFLINE. Via LMCP I found >a active transaction for that day that was COMMITTED because the the problems >started around that time I advised them to delete the journal-file. As for DEVOFFLINE, I have no idea where that is coming from... reards, //alan | |||||
3025.3 | A small window ... | BACHUS::LEEN | Jaak Leen, TP/IM Support Belgium 856-8738 | Thu May 08 1997 12:53 | 12 |
Thanks for the explaination. No, I didn't advise the customer to change the DECdtm log's but because I suspected the one transaction flagged as committed the customer could remove the RMS journal file so RMS would not attempt to do the recovery. Is it possible that there's a small window in which RMS tells DECdtm that it nolonger is participating in the transaction and RMS actually deleting the file? Regards, Jaak | |||||
3025.4 | STAR::TSPEER | Thu May 08 1997 14:09 | 18 | ||
> > Thanks for the explaination. No, I didn't advise the customer to > change the DECdtm log's but because I suspected the one transaction > flagged as committed the customer could remove the RMS journal file so > RMS would not attempt to do the recovery. I hope you were absolutely sure before removing the RMS journal that there was no uncommitted transactions in that journal; otherwise some transactional updates may be lost. > Is it possible that there's a small window in which RMS tells DECdtm > that it nolonger is participating in the transaction and RMS actually > deleting the file? I know of no windows in RMS journaling's use of DECdtm services which can lead to the problem you seem to be describing. Tom |