[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | MAGNETIC TAPEDRIVES |
|
Moderator: | STKHLM::GJOHNSSON |
|
Created: | Mon Sep 21 1987 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 3775 |
Total number of notes: | 13147 |
3649.0. "I:TLZ Troubleshooting Guide" by UTRTSC::VISSER () Fri Feb 07 1997 13:38
After being beaten by too much TLZ-problems, I documented my
experiences in the document attached.
Jan.
**** TLZ04/TLZ06/TLZ07/TLZ09 TROUBLESHOOTING ****
Rev. 07-FEB-1997 Jan Visser
This document contains information to insure RELIABLE TLZ-backup operation
and methods to recover from problem-situations.
CONTENTS
========
Preventing Problems
Correcting Problems
Collect Error Information
Tape stuck in Drive
Drive Head Cleaning
Condition of the Drive
Failing Data Cartridges
Software/Driver versions
Appendixes
Increasing Reliability
Regular Head Cleaning
Corrective Head Cleaning
Use of Head Cleaning Cartridge
Errorlog Decoding (OpenVMS, Digital UNIX, Windows NT)
Firmware Upgrade Matrix
Drive Reported Errors (Media Related)
PREVENTING PROBLEMS
===================
You and your customer must be aware, when the guidelines for TLZ-
treatment are not followed, this can result in unreliable backup-
operation for an extended period. Replacing hardware MAY NOT stop
this degradation.
To prevent problems, use the information in the sections "INCREASING
RELIABILITY" and "USE OF HEAD CLEANING".
To increase drive-reliability, it is recommended to upgrade the
drive firmware to the latest version.
CORRECTING PROBLEMS
===================
This section describes what information needs to be collected,
how to interpret it and take the proper actions.
o COLLECT ERROR INFORMATION
Use the following sources to collect error information:
o Observations of System Manager/Operator.
o Error messages from Backup process or application.
o System errorlogs (see "ERRORLOG DECODING")
o Backup-logs. Specially starting times of the backup and
(when available) starting times of the verify can be
important.
POSSIBLE ERROR TYPES:
Using the error-/event-logs,
o determine if the primary errors are related to the
SCSI-port/-bus.
o Collect SENSE-KEY, ASC and ASCQ codes for errors reported by
the tape-device or the related error-description.
SOLUTION STEPS:
o SCSI-port errors can be caused by:
o Improperly configured SCSI-bus:
o SCSI-bus too long. If the SCSI-port is FAST SCSI
and any fast devices (e.g. disks) are connected to
the bus, the MAXIMUM length is 3 meters.
NOTE: The TLZ09/TLZ9L are FAST SCSI devices, while
the other TLZ-variants have SLOW (5MB/s) SCSI.
o Incorrect Termination. There must ONLY be TWO
terminators on the bus, ONE at each end. Devices,
which are NOT at the end, MUST have their termination
removed/disabled.
o Bad or broken SCSI cabling.
o Power Supply or power-connection problems. This applies
to Table-top drives and embedded drives (poor connection
on 5/12V power plug).
o Some error-recovery sequences in the TLZ-drive may take a
long time (up to minutes). This typically occures when the
drive has problems READING or WRITING. Part of the recovery
can be a tape-RETENSION (drive spools tape (high-speed) to
End of Tape and back).
This can result in a TIMEOUT, which is reported on the
SCSI-port (with the SCSI-ID of the tape-drive).
Check the errorlog for any MEDIA-related errors!
o Drive reported errors:
These are reported as SENSE_KEY/ASC/ASCQ codes in the
errorlog. Refer to the section "DRIVE REPORTED ERRORS" for
a problem/solution list.
o TAPE STUCK IN DRIVE
When the EJECT-button is pressed, the cartridge does not come
out.
CAUSES:
o Mechanical problem in the drive. The tape may be wrapped
around drum or being caught by a tape-guide or -roller.
o The Backup-process is hung or locked-up, preventing the
the drive to eject the tape (SCSI-command 'PREVENT MEDIA
REMOVAL') has been issued to the drive by the driver.
o The drive electronics or firmware detected a severe or
unrecoverable problem.
o When a drive contains older firmware, the drive may lock up
when the EJECT-button is pressed repeatedly.
SOLUTION-STEPS:
o Power down the drive. While pressing the EJECT-button, power
on the drive. (Shutdown of Operating System may be required).
WARNING:: Do NOT power-cycle a SCSI-device, which is
connected to an active SCSI-bus.
o Cartridge ejects now.
Examine the cartridge and the media for any visual damage.
Inspect the inside of the drive through the front-entrance
and look for any abnormalities.
If none is obvious, verify loading and ejecting of a
cartridge works properly. Repeat this several times.
If the cartridge or media is damaged, try to determine
if this is caused by the drive.
o Cartridge does NOT eject.
Inspect the drive. The tape may be wrapped around posts
or the drum.
o When a cartridge is ejected from a drive, the tape in the
cartridge must be straight between the two guides in the
cartridge. If the tape shows a loop, loading-problems will
result.
To inspect this, slide the bottem-shutter of the cartridge
towards the front/label-side, while pressing the TWO small
notches.
o If you can correct the tape-stuck situation and the firmware
of the drive is low, upgrade the firmware.
o DRIVE HEAD CLEANING
Regular Head Cleaning is critically important to maintain the
TLZ-drive in good shape.
Head Cleaning can also used to recover from MEDIA/HEAD-related
problems. Refer to the sections "REGULAR and CORRECTIVE HEAD
CLEANING".
o CONDITION OF THE DRIVE
Extended extensive use of the drive can result in accumulated
debris buildup on the capstan and tape guide-posts.
The Head Cleaning Cartridge is non-effective for the capstan.
Using a bright flashlight, look into the drive from the
front (keep the lid open with one finger).
On the righthand side, appr. 2" (5 cm) behind the lid, is
the capstan (silver-colored metal shaft), embedded at the
top and bottem in a frame-scructure.
If two dark-colored debris-rings can be observed on the
capstan, the capstan should be cleaned. It's recommended to
have this done at the repair-center.
Also observe the amount of dust in the drive. If the interiour
contains a lot of dust, clean it or replace the drive to have
it cleaned at the repair-center.
Observe the environment around the system for a high amount
of dust. Cleaning the environment or removal of dust-source
must be considered.
o FAILING DATA CARTRIDGES
Data cartridges may fail in similar ways due to:
o Media is mechanicly damaged (scrambled tape, scratches,..)
o Control and Data information is badly written, due to a
problem during writes.
MOST of the MEDIUM related problems occur near BOT.
Whenever a drive with a clogged head fails during writing to
tape (typically: 03/03/02 [Sense_Key/ASC/ASCQ] Excessive
Write Errors), and this head-clogging is fixed, this SAME
cartridges MAY fail again.
The REASON for this is twofold:
o Some tracks on the tape-header (used for drive internal
control) are poorly written.
o Tape Label area on the tape is poorly written.
MOST backup-procedures/utilities want to do a LABEL-check
of the tape, which implies reading, before they start writing.
The read will fail and, as such, the backup may NOT run.
NOTE: Before doing this step, be sure the drive is inspected
and has been cleaned with the Head Cleaning Cartridge.
Verify drive-operation by testing with a "known good
Cartridge" before checking suspected cartridges.
ANY suspected tapes MUST first manually be INITIALIZED to
insure any poorly written tracks are over-written.
It is recommended to do this outside production-hours, since
some failures during inititialize may affect the system or may
require a reboot.
If this risk is unacceptable for the customer, put these tapes
aside with a label BAD or SUSPECT.
WARNING: Failing to Initialize some cartridges, which are
suspected, is NOT a drive-problem, but a Cartridge-
problem. Refer to the section "DRIVE REPORTED ERRORS".
Tapes, which can be read successfully, may FAIL when they are
being written. Typically the drives reports 04/44/xx (Internal
Target Failure; xx=any value).
If this is observed, label the tape as BAD and stop using it.
o SOFTWARE VERSION
Some problems MAY be caused by a software-problem. Trying to
relate actual problems with release notes and fixes for
patch-kits or Service Packs is sometimes very difficult or
impossible.
In case a software-problem is suspected, ask the customer if
any patches have been applied to the system.
Check with your software people if applying a patch is
appropriate.
============================ APPENDIXES ====================================
INCREASING RELIABILITY
======================
To insure reliable and error-free backups on the 4mm DAT-drive, the following
conditions must be met:
o Regular Head Cleaning (procedure attached)
o Use of good media. Cartridges must be DDS or DDS-2 types. (printed on
cartridge)
o Using the appropriate cartridge media-length for the drive:
DRIVE ALLOWED MEDIA LENGTH
----------- -------------------------------
TLZ04 60m (DDS-1)
TLZ06/TLZ6L 60m, 90m (DDS-1)
TLZ07/TLZ7L 60m, 90m (DDS-1); 120m (DDS-2)
TLZ09/TLZ9L 60m, 90m (DDS-1); 120m (DDS-2)
Note: DDS-1 cartridges are identified as "DDS" on the cartridge.
o Good media care.
Store cartridges allways in their protective boxes in a dust-free
environment.
Allow media to acclimate before inserting into a drive (when the storage-
environment has a different temperature and humidity).
Tape labeling must ONLY be done at two area's (top and 'front-side'), using
the supplied, pre-cut labels.
o It is STRONGLY recommended to do regular checks of the backups by:
o Performing a VERIFY-operation with the backup.
o Regular full or partial restores of files.
o Whenever possible, keep the time a cartridge is in a drive to a minimum.
A good practice would be (when backups run at night) to remove the cartridge
from the drive at the beginning of the working-day and insert the next
at the end of the working day.
This minimizes the possibility of dust on the exposed part of the media in
the drive.
REGULAR HEAD CLEANING
=====================
Regular Head-cleaning MUST be performed to maintain optimal reliability of
the TLZ-drive.
o Head cleaning must be performed every 25 hours op tape-operation or every
two weeks, whichever comes first.
o If new media is being used, cleaning must be performed every 8 hours of
tape-operation for the first 5 times the media is in use.
Example: If the backup-scheme has a two-week cycle (tapes are used every
two weeks), perform cleaning every 8 operation-hours for the first
2.5 months. Then revert back to the 25-hour cleaning cycle.
If a few new tapes are phased into an existing tape-set, it is recommended
to use the same 8 hour cleaning interval initially, until the NEW tapes have
been used 2 to 3 times.
Example: With the above backup-scheme, after a few tapes are replaced or
added with new ones, use the 8 hour cleaning interval for
1 to 1.5 month.
CORRECTIVE HEAD CLEANING
========================
If errors are reported during tape operation, which are suspected to be
write or read related, perform the following steps:
o Perform a head cleaning operation FOUR[4] times.
o Use another data cartridge to determine if it is related to the cartridge or
the drive.
Document when a cartridge has given errors. If errors repeat on a cartridge,
the cartridge is suspected and should be replaced.
See the section "DRIVE REPORTED ERRORS" for diagnosing suspected cartridges.
o Revert back to an 8-hour cleaning interval for one or two weeks.
If these steps do not result in error-free backups again, the drive is
suspected.
USE OF HEAD CLEANING CARTRIDGE
==============================
1. Have the drive powered on.
2. Insert the Head Cleaning Cartridge (Digital partnumber TLZ04-HA) into
the drive.
3. With the Head Cleaning Cartridge inserted, the drive automaticly
executes head cleaning. The drive ejects the Head Cleaning Cartridge
after approximately 30 seconds.
4. Locate the CARD enclosed with the Head Cleaning Cartridge.
It is STRONGLY recommended to enter the DATE of cleaning on the card
every time you use the cartridge.
Use one cleaning cartridge for each drive to simplify cleaning
tracking.
Under normal conditions, the Head Cleaning Cartridge is used for
about 25 Cleanings.
If the Head Cleaning Cartridge is OVERUSED, both the Cartridge and
Write-Protect LEDs will flash. Press the EJECT button to remove
the Cleaning Cartridge. No Cleaning action will have occurred.
Discard it and use a new TLZ04-HA.
ERRORLOG DECODING
=================
Use the system's errorlog to find error details.
This section briefly covers how to retrieve this information for OpenVMS,
Digital UNIX and Windows NT.
OpenVMS
=======
Whenever possible, use DECevent to analyse the errorlog ("diagnose"-
command). DECevent is the preferred method.
Whenever DECevent is not available, use "ANALYSE/ERROR". You have to
be aware that OpenVMS ALPHA may not allways display Extended Sense
Data Bytes in the errorlog-output.
Errors reported on the MK-device (or MU-device when behind a MSCP-
type of controller) typically contain SCSI Extended Sense Bytes,
such as SENSE_KEY, ASC and ASCQ.
Errors reported against the PK-device are typically SCSI-BUS related.
Whenever there are multiple errors in a short time-frame, the FIRST
TWO[2] entries are most important for diagnosing.
Digital UNIX
============
Whenever possible, use DECevent to analyse the errorlog ("diagnose"-
command). DECevent is the preferred method.
To insure you retrieve all error-details (like SENSE_KEY, ASC, etc),
the command must be: "dia -o full".
Whenever DECevent is not available, use: "uerf -o full".
These errors are reported as SCSI-CAM errors (event 199).
Device reported errors have SENSE_KEY, ASC and ASCQ in the errorlog-
entry.
SCSI-port errors do not contain device-specific error-information.
Whenever there are multiple errors in a short time-frame, the FIRST
TWO[2] entries are most important for diagnosing.
Using a "-R" (reverse order) qualifier with 'dia' or 'uerf' makes
it difficult to find the first entries of a burst. Once you know
WHEN a burst of errors has occured, use the following qualifiers
with the 'dia' or 'uerf' command:
"-o full -t s:dd-mmm-yyyy,hh:mm:ss"
Windows NT
==========
Login as Administrator and select "Event Viewer" under Administrative
tools. From the three event logs (System, Security and Application),
select "System".
Each event is shown as one line.
TLZ-Device events are typically reported as '4mmdat', while SCSI-port
errors are identified with the SCSI-chip identification (e.g. ncrc810).
With the mouse-pointer, double-click an error-line to get more details.
The '4mmdat' entries appear as shown below. Click the "WORDS"-button
to format the output:
Source 4mmdat , Event ID: 7, Descr.:
"The device, \Device\Tape0, has a bad block"
0000: 00180003 006a0001 00000000 c004000b (WORD-format)
0010: 00000101 c0000185 00000000 00000000
0020: 00000000 00000000 00000000 00000002
0030: 00000000 00000008 0000c402 00033100
||||||
Sense-key-++||||
ASC---------++||
ASCQ----------++
FIRMWARE UPGRADE MATRIX
=======================
This matrix is the quick guide to determine if a firmware-upgrade must
be done and WHICH UPGRADE-TAPE to use.
+------------+--------------------+--------------------+------------------+
| DRIVE-TYPE | CURRENT FIRMWARE * | UPGRADE TO | USE UPGRADE-TAPE |
+------------+--------------------+--------------------+------------------+
| TLZ06 | 0374, 0389, 0435, | 4BQE | AO-Q9XM0-0B.C01 |
| | 0491, 491A, 4BH0 | | |
| | | | Release 4BQE-19 |
+------------+--------------------+--------------------+------------------+
| TLZ07 old | 4BE0 | 4BQE | AO-Q9XM0-0B.C01 |
| | | | |
| | | | Release 4BQE-19 |
+------------+--------------------+--------------------+------------------+
| TLZ07 new | 5330, 553A | 553B | AO-QSUS0-0B.B01 |
| | | | |
| | | | Release 553B-19 |
+------------+--------------------+--------------------+------------------+
| TLZ09 | 0162, 0165 | 0167 | AO-R0H90-0B.A01 |
| | | | |
| | | | Release 0167 |
+------------+--------------------+--------------------+------------------+
+------------+--------------------+--------------------+------------------+
| DRIVE-TYPE | CURRENT FIRMWARE * | UPGRADE TO | USE UPGRADE-TAPE |
+------------+--------------------+--------------------+------------------+
| TLZ6L | 0491, 4BH0 | 4BQE ** | AO-Q9XN0-0B.C01 |
| | | | |
| | | | Release 4BQE-419 |
+------------+--------------------+--------------------+------------------+
| TLZ7L | 04??, 4??? | 4BQE ** | AO-Q9XN0-0B.C01 |
| | | | |
| | | | Release 4BQE-419 |
+------------+--------------------+--------------------+------------------+
*: Possible current/old firmware variants.
**: TLZ6L/TLZ7L, used with Windows NT and Backup Exec V6.0, need firmware
"4BQH" [Upgrade-tape AO-Q9XN0-0B.D01] and appropriate drivers.
DRIVE REPORTED ERRORS
=====================
Check the Event-/Errorlog and collect the drive reported information.
Most common Media/Cartridge related errors are shown below.
Used format: SENSE_KEY/ASC/ASCQ
KEY/ASC/ASCQ
------------
03/03/02 Excessive Write Errors
This error causes the data-write to abort and may leave
the tape un-readable. When this occurs at BOT, problems may
occur later, when the tape is used again.
Possible causes:
o Head clog. See the section "CORRECTIVE HEAD CLEANING".
o Damaged tape. Scrambled tape or scratches. Inspect Cartridge.
o It has been observed, that a drive with head clog may write
such a track-pattern on the tape, that a GOOD drive needs
a few writes, before all OLD FLUXES have been removed. This
typically occurs at BOT.
You may retry the write/initialize. Be sure no Head Clog
exists.
01/5B/xx Log Exception/ Log counter at maximum/ Recovered with retries.
This type of error indicates degrading write or read
performance.
Possible causes:
o Head clog. See the section "CORRECTIVE HEAD CLEANING".
o Degraded cartridge.
03/30/xx Cannot read tape (unknown or incompatible format)
Possible causes:
o Tape has been written on an incompatible drive.
o Tape has been written on a dirty/degraded drive
03/31/xx Tape Format corrupted
Possible causes:
o Tape has been written on a degraded drive (head clog).
03/3B/xx Sequential Positioning Error
Possible causes:
o Tape written on a dirty drive (capstan, tape-posts, heads)
o Damaged Cartridge
04/44/80 Compression Hardware Fault
Possible causes:
o Problem has occured during Write of the media.
o If it's a TLZ07 with firmware 5xxx, call your CSC.
04/44/xx Internal Target Failure
Many different causes may exist for this error-group.
Possible causes:
o Data Cartridge related. This can both be poorly written
control-information on the media or a mechanical reel
problem. If a cartridge repeatedly gives this error, reject
the cartridge (assuming other cartridges work OK).
o Drive electronic problem.
o (Loader Only) Cartridge/magazin/transport problem.
Check for incorrectly applied labels on the cartridge or
labels coming loose.
06/29/00 Power on or Reset occured.
Possible causes:
o When a SCSI-bus reset was invoked by the Host SCSI-port,
this is just informational. This is the MOST COMMON cause
for this event.
o A Power problem may exist or power has been cycled.
06/5A/01 Media removal requested by operator.
Possible causes:
o Operator pressed Eject during tape operation.
07/27/00 Write protected.
Possible causes:
o Cartridge was set Write Protected (with write protect tab)
(Operator issue)
o If the SAME cartridge keeps reporting this, this cartridge
has a mechanical tolerance problem.
o If multiple cartridges report this problem, while they are
NOT set write protected, it's a drive-problem.
08/xx/xx Blank Check.
End of data or unrecorded tape encountered.
Possible causes:
o A new, un-initialized tape is being read. Initialize tape.
o While the tape was written, the write-operation aborted.
This must have been caused by some error (drive/SCSI-bus).
- end -
T.R | Title | User | Personal Name | Date | Lines |
---|
3649.1 | This is what we need | JGODCL::KRAAN | | Fri Feb 07 1997 15:43 | 12 |
| Jan,
It took me some time to read this, but it is very worth while. Here
in the Nijmegen repaircenter, we have done a lot investigations on
the TLZ products. Your memo/instruction describes very good the actions
the can be done at the customer site to increase the reliability of the
TLZ's. I do fully agree with its contents and I'm considering of making
a Insert from it, so that these instructions are shipped with every
repaired drive.
Peter van der Kraan
Mass-Storage Engineering, Nijmegen
|
3649.2 | See 3623.3 for F'ware bug-fix history and more | KERNEL::LOANE | Comfortably numb!! | Fri Feb 07 1997 21:08 | 1 |
|
|
3649.3 | Good TLZ Info Located Here | KYOSS1::LUIZZA | | Thu Feb 13 1997 21:58 | 13 |
|
Jan,
Good guide picked up lots of good pointers. Could I request a listing
of error codes from the displays as well as the indicator lites when
these blink it means this list be also posted here? The cassette and
write protect on means I'm dirty and will not work till cleaned?
If you could edit your header so a search from tima will gather this
note from a TLZ search? (I:TLZ) It will have more people find this great
information also.
Thanks for the good stuff keep it comming.
/Irv Luizza
|