| Title: | File Shelving |
| Moderator: | COOKIE::HOLSINGER |
| Created: | Mon Mar 15 1993 |
| Last Modified: | Thu Jun 05 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 346 |
| Total number of notes: | 1204 |
Manual shelf commands are cancelled on a specific device, $2$DKF104. Anal/disk
and Anal/rms uncover no problems. The shelf error log contains entries like;
********************************************************************************
** 1486 ** REQUEST ERROR REPORT
Error detected on request number 1486 on node BLUE
Entry logged at 22-MAY-1997 06:41:53.28
** Request Information:
Identifier: 0
Process: 21A00FC0
Username: SLS
Timestamp: 22-MAY-1997 06:38:04.19
Client Node: BLUE
Source: System
Type: File fault
Flags: FileID
State: Original Validated
Status: Error
** Request Parameters:
File: $2$DKF104:[BALTIMRE]BDBALTIMRE_70327_345406_351852.9703740_R;1
Volume: _$2$DKF104:
FileID: (4735,1,0,0)
** Error Information :
%HSM-E-OFFLINERROR, offline system error, function not performed
%SYSTEM-S-NORMAL, normal successful completion
** Request Disposition:
Non-fatal shelf handler error
Fatal request error
Operation was rolled back
** Exception Information:
Exception Module Line
SHP_OFFLINE_READ_ERROR SHP_OFFLINE 4510
Platform Status Message Text
00000001 %SYSTEM-S-NORMAL, normal successful completion
Exception Module Line
SHP_OFFLINE_READ_ERROR SHP_OFFLINE_VMS 1158
Platform Status Message Text
10A38012 %BACKUP-E-OPENOUT, error opening !AS as output
Exception Module Line
SHP_OFFLINE_ERROR SHP_OFFLINE_VMS 1126
Platform Status Message Text
10A38012 %BACKUP-E-OPENOUT, error opening !AS as output
%HSM-E-OFFREADERR, offline read error on drive _$3$MKA300:
The custumoer is using his entire RW500 as a cache. We're able to access the various
optical platters both in one of the four drives and those that were out of a drive from
VMS.
The here's their HSM configuration;
Cache device _$2$ODA0: is disabled, Cache flush is held until after
6-OCT-1996 12:45:33.12, Backup is performed at flush intervals,
Cached files are not held on delete of online file
Block size: 0
Highwater mark: 100%
Flush interval: <none>
.
.
.
Cache device _$2$ODA162: is enabled, Cache flush is held until after
6-OCT-1996 17:15:58.47, Backup is performed at flush intervals,
Cached files are not held on delete of online file
Block size: 0
Highwater mark: 100%
Flush interval: <none>
Cache device _$2$ODA163: is enabled, Cache flush is held until after
12-MAY-1997 15:43:40.33, Backup is performed at flush intervals,
Cached files are held on delete of online file
Block size: 0
Highwater mark: 100%
Flush interval: 0 06:00:00.00
.
.
.
Cache device _$2$ODA163: is enabled, Cache flush is held until after
12-MAY-1997 15:43:40.33, Backup is performed at flush intervals,
Cached files are held on delete of online file
Block size: 0
Highwater mark: 100%
Flush interval: 0 06:00:00.00
HSM$ARCHIVE01 has been used
Identifier: 1
Media type: TK87K
Density: COMP
Label: S26959
Position: 1932
Device refs: 2
Shelf refs: 4
Current pool: HSM
Enabled pools: HSM
HSM drive HSM$DEFAULT_DEVICE is enabled.
Shared access: < shelve, unshelve >
Drive status: Not configured
Enabled archives: <none>
HSM drive _$3$MKA200: is enabled.
Shared access: < shelve, unshelve >
Drive status: Configured
Enabled archives: HSM$ARCHIVE01 id: 1
HSM drive _$3$MKA300: is enabled.
Shared access: < shelve, unshelve >
Drive status: Configured
Enabled archives: HSM$ARCHIVE01 id: 1
Policy AMA_ARCHIVE_OCC_POLICY is enabled for shelving
Policy History:
Created: 25-FEB-1997 16:57:35.29
Revised: 26-FEB-1997 19:56:26.95
Selection Criteria:
State: Enabled
Action: Shelving
File Event: Modification date
Elapsed time: 45 00:00:00
Before time: <none>
Since time: <none>
Lowwater mark: 77%
Primary Policy: Least Recently Used (LRU)
Secondary Policy: Space Time Working Set (STWS)
Verification:
Mail notification: <none>
Output file: <none>
Policy AMA_ARCHIVE_POLICY is enabled for shelving
Policy History:
Created: 25-FEB-1997 16:57:36.53
Revised: 25-FEB-1997 16:57:36.53
Selection Criteria:
State: Enabled
Action: Shelving
File Event: Modification date
Elapsed time: 90 00:00:00
Before time: <none>
Since time: <none>
Lowwater mark: 60%
Primary Policy: Least Recently Used (LRU)
Secondary Policy: Space Time Working Set (STWS)
Verification:
Mail notification: <none>
Output file: <none>
Policy HSM$DEFAULT_OCCUPANCY is enabled for shelving
Policy History:
Created: 25-FEB-1997 16:57:37.54
Revised: 25-FEB-1997 16:57:37.54
Selection Criteria:
State: Enabled
Action: Shelving
File Event: Expiration date
Elapsed time: 180 00:00:00
Before time: <none>
Since time: <none>
Lowwater mark: 80%
Primary Policy: Space Time Working Set (STWS)
Secondary Policy: Least Recently Used (LRU)
Verification:
Mail notification: <none>
Output file: <none>
Policy HSM$DEFAULT_POLICY is enabled for shelving
Policy History:
Created: 25-FEB-1997 16:57:38.56
Revised: 25-FEB-1997 16:57:38.56
Selection Criteria:
State: Enabled
Action: Shelving
File Event: Expiration date
Elapsed time: 180 00:00:00
Before time: <none>
Since time: <none>
Lowwater mark: 80%
Primary Policy: Space Time Working Set (STWS)
Secondary Policy: Least Recently Used (LRU)
Verification:
Mail notification: <none>
Output file: <none>
Policy HSM$DEFAULT_QUOTA is enabled for shelving
Policy History:
Created: 25-FEB-1997 16:57:39.57
Revised: 25-FEB-1997 16:57:39.57
Selection Criteria:
State: Enabled
Action: Shelving
File Event: Expiration date
Elapsed time: 180 00:00:00
Before time: <none>
Since time: <none>
Lowwater mark: 80%
Primary Policy: Space Time Working Set (STWS)
Secondary Policy: Least Recently Used (LRU)
Verification:
Mail notification: <none>
Output file: <none>
Shelf HSM$DEFAULT_SHELF is enabled for Shelving and Unshelving
Catalog File: DISK$DISK104:[HSM.CATALOG]HSM$CATALOG.SYS
Shelf History:
Created: 25-FEB-1997 16:57:29.00
Revised: 9-MAY-1997 15:04:09.64
Backup Verification: Off
Save Time: <none>
Updates Saved: All
Archive Classes:
Archive list: HSM$ARCHIVE01 id: 1
Restore list: HSM$ARCHIVE01 id: 1
\
\Any assistance or thoughts would be greatly appreciate,
\David
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 341.1 | comments... | COOKIE::HOLSINGER | HSM Engineering, DTN 522-2843 | Wed May 28 1997 09:51 | 26 |
Hello David.
Thank you for providing detailed information about your HSM configuration.
Here are a couple of observations:
1. The error entry was logged for a file fault (auto unshelve). The
root error was %BACKUP-E-OPENOUT, which usually means that Backup
could not locate a specific saveset on the appropriate tape. This
means that the corresponding HSM catalog entry and tape contents
don't agree. One of the two (tape or catalog) has been modified
outside of HSM. The error is not associated with HSM caching.
You can use SMU LOCATE/FULL for the file in question, to display the
particular tape and saveset that HSM is trying to find. Then, mount
the tape /FOREIGN, and use BACKUP $1$MUA0:*.*/SAVE/LIST/OUT=TAPE.LIS
to get a list of the saveset files and members on the tape. We can
use this info to try and isolate where and how the discrepancy occured.
2. The MO cache configuration looks OK. However, the default shelf is
configured to flush only to Archive class 1. I did not see any other
Archive class definitions. HSM should always be configured with
multiple redundant Archive classes. This should be corrected ASAP.
Regards,
/Paul
| |||||
| 341.2 | $2$DKF104: is also the catalog disk | CX3PST::WSC217::SWANK | David | Wed May 28 1997 10:41 | 33 |
\Paul, \ >Here are a couple of observations: > > 1. The error entry was logged for a file fault (auto unshelve). The > root error was %BACKUP-E-OPENOUT, which usually means that Backup > could not locate a specific saveset on the appropriate tape. This > means that the corresponding HSM catalog entry and tape contents > don't agree. One of the two (tape or catalog) has been modified > outside of HSM. The error is not associated with HSM caching. The the catalog is on device $2$DKF104: and they're having some problems with it as well from an SLS backup standpoint. I working with the customer on that problem as well but suspect the catalog itself could be bad. > You can use SMU LOCATE/FULL for the file in question, to display the > particular tape and saveset that HSM is trying to find. Then, mount > the tape /FOREIGN, and use BACKUP $1$MUA0:*.*/SAVE/LIST/OUT=TAPE.LIS > to get a list of the saveset files and members on the tape. We can > use this info to try and isolate where and how the discrepancy occured. I'll recommend the above procedure to the customer. Is there a catalog "health check" procedure to verify its internal structure and functionality? > 2. The MO cache configuration looks OK. However, the default shelf is > configured to flush only to Archive class 1. I did not see any other > Archive class definitions. HSM should always be configured with > multiple redundant Archive classes. This should be corrected ASAP. I've already noted the lack of redundancy to the customer, thanks for your collaboration. \ \Regards, David | |||||
| 341.3 | shelf error log entry w/ SYSTEM-W-ACCONFLICT | CX3PST::WSC217::SWANK | David | Wed May 28 1997 11:02 | 53 |
Paul,
After my last reply (.2) I when back to error log that the customer sent and I
may have not sent the corresponding error log entry to the shelf command that
fails. Does the following entry provide any additional insight as to why the
SHELF command would file with a shelf-w-cancel?;
** 1455 ** REQUEST ERROR REPORT
Error detected on request number 1455 on node BLUE
Entry logged at 22-MAY-1997 05:25:16.25
Identifier: 20316808
Process: 21A00136
Username: HSM$SERVER
Timestamp: 22-MAY-1997 05:25:11.65
Client Node: BLUE
Source: Application
Type: Shelve file
Flags: FileID Makespace
State: Canceled Original Validated
Status: Error
File: $2$DKF104:[AURORA]BDAURORA_70325_011259_011591.9703706_R;1
Volume: _$2$DKF104:
FileID: (4690,1,0,0)
%HSM-E-FILERROR, file $2$DKF104:[AURORA]BDAURORA_70325_011259_011591.9703706_
%SYSTEM-W-ACCONFLICT, file access conflict
%HSM-I-RECOVERPRESHLV, inconsistent state found, file preshelved
Non-fatal shelf handler error
Fatal request error
Operation was rolled back
Exception Module Line
(SHP_ONLINE_READ_ERROR) SHP_FILE 4239
Platform Status Message Text
00000800 %SYSTEM-W-ACCONFLICT, file access conflict
Exception Module Line
SHP_ONLINE_ERROR SHP_REQUEST 7672
Exception Module Line
SHP_ONLINE_WRITE_ERROR SHP_ONLINE 5567
It's a %SYSTEM-W-ACCONFLICT error that I'm currently investigating from the SLS
backup side of the house. Have not yet received the SLS log to know exactly
what's happening there.
\
\David
| |||||
| 341.4 | correction to .1 | COOKIE::HOLSINGER | HSM Engineering, DTN 522-2843 | Thu May 29 1997 17:05 | 32 |
Hello David.
I believe I was mistaken in my .1 analysis of the backup error. Since it is
%BACKUP-E-OPENOUT, the error occured when backup tried to open the temporary
restore file during the unshelve. The error did not occur because backup could
not open the file on the archive tape. The error may indicate a problem with
the HSM$MANAGER device or directory.
Please find and post the corresponding HSM$LOG:HSM$SHELF_HANDLER.LOG file.
Note, there is a new file created each time HSM is started. This file should
contain additional error info from backup.
WRT the %SYSTEM-W-ACCONFLICT error, this may be the result of a race condition
within HSM. The error you posted shows a shelve command failing on a preshelved
file, probably during file truncation. Most all conflicting requests are caught
by HSM, with the exception of conflicts due to cache flushing. The was done for
performance reasons. If the scenario is as I suspect, a cache flush was in
progress during a makespace shelve. In any case, the error is not serious, as
no data is affected, and the makespace operation will simply continue with the
next candidate file to shelve.
Please verify the situation by posting the following:
1. SMU LOCATE/FULL the file in question
2. DIR/FULL the file in question
3. locate a cache flush entry in HSM$LOG:HSM$SHP_AUDIT.LOG which
possesses the 22-MAY-1997 05:25:16.25 error timestamp
Also, what version of HSM is the customer running?
Regards,
/Paul
| |||||
| 341.5 | CX3PST::WSC217::SWANK | David | Fri May 30 1997 08:52 | 11 | |
\Paul, \ The customer has deleted old HSM$LOG:*.LOG files and the shelf command is now working. Customer is going to disable the flush on the RW500 devices that had a flush interval. If the problem re-occures they will send the log files HSM$LOG:HSM$SHP_AUDIT.LOG & HSM$LOG:HSM$SHP_ERROR.LOG that correspond to the time fram of the incident as well as the output of SMU LOCATE/FULL and DIR/FULL of the file in question. \ \Thanks for your help so far, \David | |||||