[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference stkhlm::magtape

Title:MAGNETIC TAPEDRIVES
Moderator:STKHLM::GJOHNSSON
Created:Mon Sep 21 1987
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3775
Total number of notes:13147

3649.0. "I:TLZ Troubleshooting Guide" by UTRTSC::VISSER () Fri Feb 07 1997 13:38

    After being beaten by too much TLZ-problems, I documented my
    experiences in the document attached.
    						Jan.
    
              **** TLZ04/TLZ06/TLZ07/TLZ09 TROUBLESHOOTING ****

Rev. 07-FEB-1997  Jan Visser

This document contains information to insure RELIABLE TLZ-backup operation
and methods to recover from problem-situations.

CONTENTS
========

	Preventing Problems

	Correcting Problems
		Collect Error Information
		Tape stuck in Drive
		Drive Head Cleaning
		Condition of the Drive
		Failing Data Cartridges
		Software/Driver versions

	Appendixes
		Increasing Reliability
		Regular Head Cleaning
		Corrective Head Cleaning
		Use of Head Cleaning Cartridge
		Errorlog Decoding (OpenVMS, Digital UNIX, Windows NT)
		Firmware Upgrade Matrix
		Drive Reported Errors (Media Related)



PREVENTING PROBLEMS
===================
	You and your customer must be aware, when the guidelines for TLZ-
	treatment are not followed, this can result in unreliable backup-
	operation for an extended period. Replacing hardware MAY NOT stop
	this degradation.
	To prevent problems, use the information in the sections "INCREASING
	RELIABILITY" and "USE OF HEAD CLEANING".

	To increase drive-reliability, it is recommended to upgrade the
	drive firmware to the latest version.

CORRECTING PROBLEMS
===================
	This section describes what information needs to be collected,
	how to interpret it and take the proper actions.

	o COLLECT ERROR INFORMATION
		Use the following sources to collect error information:
		  o Observations of System Manager/Operator.
		  o Error messages from Backup process or application.
		  o System errorlogs (see "ERRORLOG DECODING")
		  o Backup-logs. Specially starting times of the backup and
		    (when available) starting times of the verify can be
		    important.

		POSSIBLE ERROR TYPES:
		Using the error-/event-logs,
		o determine if the primary errors are related to the 
		  SCSI-port/-bus.
		o Collect SENSE-KEY, ASC and ASCQ codes for errors reported by
		  the tape-device or the related error-description.

		SOLUTION STEPS:
		o SCSI-port errors can be caused by:
		  o Improperly configured SCSI-bus:
			o SCSI-bus too long. If the SCSI-port is FAST SCSI
			  and any fast devices (e.g. disks) are connected to 
			  the bus, the MAXIMUM length is 3 meters.
			  NOTE: The TLZ09/TLZ9L are FAST SCSI devices, while
				the other TLZ-variants have SLOW (5MB/s) SCSI.
			o Incorrect Termination. There must ONLY be TWO 
			  terminators on the bus, ONE at each end. Devices,
			  which are NOT at the end, MUST have their termination
			  removed/disabled.
		  o Bad or broken SCSI cabling.
		  o Power Supply or power-connection problems. This applies
		    to Table-top drives and embedded drives (poor connection
		    on 5/12V power plug).
                  o Some error-recovery sequences in the TLZ-drive may take a
                    long time (up to minutes). This typically occures when the
                    drive has problems READING or WRITING. Part of the recovery
                    can be a tape-RETENSION (drive spools tape (high-speed) to
                    End of Tape and back).
                    This can result in a TIMEOUT, which is reported on the
                    SCSI-port (with the SCSI-ID of the tape-drive).
                    Check the errorlog for any MEDIA-related errors!

                o Drive reported errors:
                  These are reported as SENSE_KEY/ASC/ASCQ codes in the
                  errorlog. Refer to the section "DRIVE REPORTED ERRORS" for
                  a problem/solution list.

	o TAPE STUCK IN DRIVE
		When the EJECT-button is pressed, the cartridge does not come
		out.
		CAUSES:
		o Mechanical problem in the drive. The tape may be wrapped
		  around drum or being caught by a tape-guide or -roller.
		o The Backup-process is hung or locked-up, preventing the
	 	  the drive to eject the tape (SCSI-command 'PREVENT MEDIA
		  REMOVAL') has been issued to the drive by the driver.
		o The drive electronics or firmware detected a severe or
		  unrecoverable problem.
		o When a drive contains older firmware, the drive may lock up
		  when the EJECT-button is pressed repeatedly.

		SOLUTION-STEPS:
		o Power down the drive. While pressing the EJECT-button, power
		  on the drive. (Shutdown of Operating System may be required).
		  WARNING:: Do NOT power-cycle a SCSI-device, which is 
			    connected to an active SCSI-bus.
		  o Cartridge ejects now.
		    Examine the cartridge and the media for any visual damage.
		    Inspect the inside of the drive through the front-entrance
		    and look for any abnormalities.
		    If none is obvious, verify loading and ejecting of a
		    cartridge works properly. Repeat this several times.
		    If the cartridge or media is damaged, try to determine
		    if this is caused by the drive.
		  o Cartridge does NOT eject.
		    Inspect the drive. The tape may be wrapped around posts
		    or the drum.
		o When a cartridge is ejected from a drive, the tape in the
		  cartridge must be straight between the two guides in the
		  cartridge. If the tape shows a loop, loading-problems will
		  result. 
		  To inspect this, slide the bottem-shutter of the cartridge 
		  towards the front/label-side, while pressing the TWO small 
		  notches. 
		o If you can correct the tape-stuck situation and the firmware
                  of the drive is low, upgrade the firmware.

        o DRIVE HEAD CLEANING
                Regular Head Cleaning is critically important to maintain the
                TLZ-drive in good shape.
                Head Cleaning can also used to recover from MEDIA/HEAD-related
                problems. Refer to the sections "REGULAR and CORRECTIVE HEAD
                CLEANING".

        o CONDITION OF THE DRIVE
                Extended extensive use of the drive can result in accumulated
                debris buildup on the capstan and tape guide-posts.
                The Head Cleaning Cartridge is non-effective for the capstan.
                Using a bright flashlight, look into the drive from the
                front (keep the lid open with one finger).
                On the righthand side, appr. 2" (5 cm) behind the lid, is
                the capstan (silver-colored metal shaft), embedded at the
                top and bottem in a frame-scructure.
                If two dark-colored debris-rings can be observed on the
                capstan, the capstan should be cleaned. It's recommended to
                have this done at the repair-center.

                Also observe the amount of dust in the drive. If the interiour
                contains a lot of dust, clean it or replace the drive to have
		it cleaned at the repair-center.
		Observe the environment around the system for a high amount
		of dust. Cleaning the environment or removal of dust-source
		must be considered.

        o FAILING DATA CARTRIDGES
                Data cartridges may fail in similar ways due to:
                o Media is mechanicly damaged (scrambled tape, scratches,..)
                o Control and Data information is badly written, due to a
                  problem during writes.

                MOST of the MEDIUM related problems occur near BOT.
                Whenever a drive with a clogged head fails during writing to
                tape (typically: 03/03/02 [Sense_Key/ASC/ASCQ] Excessive
                Write Errors), and this head-clogging is fixed, this SAME
                cartridges MAY fail again.
                The REASON for this is twofold:
                o Some tracks on the tape-header (used for drive internal
                  control) are poorly written.
                o Tape Label area on the tape is poorly written.

                MOST backup-procedures/utilities want to do a LABEL-check
                of the tape, which implies reading, before they start writing.
                The read will fail and, as such, the backup may NOT run.

                NOTE: Before doing this step, be sure the drive is inspected
                      and has been cleaned with the Head Cleaning Cartridge.
                      Verify drive-operation by testing with a "known good
                      Cartridge" before checking suspected cartridges.

                ANY suspected tapes MUST first manually be INITIALIZED to
                insure any poorly written tracks are over-written.
                It is recommended to do this outside production-hours, since
                some failures during inititialize may affect the system or may
                require a reboot.
                If this risk is unacceptable for the customer, put these tapes
                aside with a label BAD or SUSPECT.

                WARNING: Failing to Initialize some cartridges, which are
                         suspected, is NOT a drive-problem, but a Cartridge-
                         problem. Refer to the section "DRIVE REPORTED ERRORS".

                Tapes, which can be read successfully, may FAIL when they are
                being written. Typically the drives reports 04/44/xx (Internal
                Target Failure; xx=any value).
                If this is observed, label the tape as BAD and stop using it.
                
	o SOFTWARE VERSION
		Some problems MAY be caused by a software-problem. Trying to
		relate actual problems with release notes and fixes for 
		patch-kits or Service Packs is sometimes very difficult or
		impossible. 
		In case a software-problem is suspected, ask the customer if
		any patches have been applied to the system.
		Check with your software people if applying a patch is
		appropriate.


============================  APPENDIXES  ====================================


INCREASING RELIABILITY
======================

To insure reliable and error-free backups on the 4mm DAT-drive, the following
conditions must be met:

o Regular Head Cleaning (procedure attached)

o Use of good media. Cartridges must be DDS or DDS-2 types. (printed on
  cartridge)

o Using the appropriate cartridge media-length for the drive:

	DRIVE		ALLOWED MEDIA LENGTH
	-----------	-------------------------------
	TLZ04		60m (DDS-1)
	TLZ06/TLZ6L	60m, 90m (DDS-1)
	TLZ07/TLZ7L	60m, 90m (DDS-1); 120m (DDS-2)
	TLZ09/TLZ9L	60m, 90m (DDS-1); 120m (DDS-2)

  Note: DDS-1 cartridges are identified as "DDS" on the cartridge.

o Good media care.
  Store cartridges allways in their protective boxes in a dust-free
  environment.
  Allow media to acclimate before inserting into a drive (when the storage-
  environment has a different temperature and humidity).
  Tape labeling must ONLY be done at two area's (top and 'front-side'), using
  the supplied, pre-cut labels.

o It is STRONGLY recommended to do regular checks of the backups by:
  o Performing a VERIFY-operation with the backup. 
  o Regular full or partial restores of files.

o Whenever possible, keep the time a cartridge is in a drive to a minimum.
  A good practice would be (when backups run at night) to remove the cartridge
  from the drive at the beginning of the working-day and insert the next
  at the end of the working day.
  This minimizes the possibility of dust on the exposed part of the media in
  the drive.


REGULAR HEAD CLEANING
=====================

Regular Head-cleaning MUST be performed to maintain optimal reliability of
the TLZ-drive. 

o Head cleaning must be performed every 25 hours op tape-operation or every
  two weeks, whichever comes first.

o If new media is being used, cleaning must be performed every 8 hours of 
  tape-operation for the first 5 times the media is in use.
  Example: If the backup-scheme has a two-week cycle (tapes are used every
           two weeks), perform cleaning every 8 operation-hours for the first
	   2.5 months. Then revert back to the 25-hour cleaning cycle.
  If a few new tapes are phased into an existing tape-set, it is recommended
  to use the same 8 hour cleaning interval initially, until the NEW tapes have
  been used 2 to 3 times.
  Example: With the above backup-scheme, after a few tapes are replaced or
           added with new ones, use the 8 hour cleaning interval for 
           1 to 1.5 month. 

CORRECTIVE HEAD CLEANING
========================

If errors are reported during tape operation, which are suspected to be
write or read related, perform the following steps:

o Perform a head cleaning operation FOUR[4] times.

o Use another data cartridge to determine if it is related to the cartridge or
  the drive.
  Document when a cartridge has given errors. If errors repeat on a cartridge,
  the cartridge is suspected and should be replaced.
  See the section "DRIVE REPORTED ERRORS" for diagnosing suspected cartridges.

o Revert back to an 8-hour cleaning interval for one or two weeks.

If these steps do not result in error-free backups again, the drive is
suspected.


USE OF HEAD CLEANING CARTRIDGE
==============================

    1.	Have the drive powered on.

    2.  Insert the Head Cleaning Cartridge (Digital partnumber TLZ04-HA) into 
        the drive.

    3.  With the Head Cleaning Cartridge inserted, the drive automaticly
	executes head cleaning. The drive ejects the Head Cleaning Cartridge
	after approximately 30 seconds.

    4.  Locate the CARD enclosed with the Head Cleaning Cartridge.
	It is STRONGLY recommended to enter the DATE of cleaning on the card 
        every time you use the cartridge.
	Use one cleaning cartridge for each drive to simplify cleaning
	tracking.

	Under normal conditions, the Head Cleaning Cartridge is used for
	about 25 Cleanings.
	If the Head Cleaning Cartridge is OVERUSED, both the Cartridge and
	Write-Protect LEDs will flash. Press the EJECT button to remove
	the Cleaning Cartridge. No Cleaning action will have occurred.
        Discard it and use a new TLZ04-HA.


ERRORLOG DECODING
=================

Use the system's errorlog to find error details.
This section briefly covers how to retrieve this information for OpenVMS,
Digital UNIX and Windows NT.

OpenVMS
=======
        Whenever possible, use DECevent to analyse the errorlog ("diagnose"-
        command). DECevent is the preferred method.
        Whenever DECevent is not available, use "ANALYSE/ERROR". You have to
        be aware that OpenVMS ALPHA may not allways display Extended Sense
        Data Bytes in the errorlog-output.

        Errors reported on the MK-device (or MU-device when behind a MSCP-
        type of controller) typically contain SCSI Extended Sense Bytes,
        such as SENSE_KEY, ASC and ASCQ.

        Errors reported against the PK-device are typically SCSI-BUS related.        

        Whenever there are multiple errors in a short time-frame, the FIRST
        TWO[2] entries are most important for diagnosing.

Digital UNIX
============
        Whenever possible, use DECevent to analyse the errorlog ("diagnose"-
        command). DECevent is the preferred method.
        To insure you retrieve all error-details (like SENSE_KEY, ASC, etc),
        the command must be:  "dia -o full".
        Whenever DECevent is not available, use: "uerf -o full".
        These errors are reported as SCSI-CAM errors (event 199).

        Device reported errors have SENSE_KEY, ASC and ASCQ in the errorlog-
        entry.
        SCSI-port errors do not contain device-specific error-information.

        Whenever there are multiple errors in a short time-frame, the FIRST
        TWO[2] entries are most important for diagnosing.
        Using a "-R" (reverse order) qualifier with 'dia' or 'uerf' makes
        it difficult to find the first entries of a burst. Once you know
        WHEN a burst of errors has occured, use the following qualifiers
        with the 'dia' or 'uerf' command:
                "-o full -t s:dd-mmm-yyyy,hh:mm:ss"

Windows NT
==========
        Login as Administrator and select "Event Viewer" under Administrative
        tools. From the three event logs (System, Security and Application),
        select "System".
        Each event is shown as one line. 
        TLZ-Device events are typically reported as '4mmdat', while SCSI-port
        errors are identified with the SCSI-chip identification (e.g. ncrc810).

        With the mouse-pointer, double-click an error-line to get more details.

        The '4mmdat' entries appear as shown below. Click the "WORDS"-button
        to format the output:

                Source 4mmdat , Event ID: 7, Descr.:
                "The device, \Device\Tape0, has a bad block"

                0000: 00180003 006a0001 00000000 c004000b     (WORD-format)
		0010: 00000101 c0000185 00000000 00000000
		0020: 00000000 00000000 00000000 00000002
                0030: 00000000 00000008 0000c402 00033100
                                                   ||||||
                                         Sense-key-++||||
                                         ASC---------++||
                                         ASCQ----------++
					 

FIRMWARE UPGRADE MATRIX
=======================

   This matrix is the quick guide to determine if a firmware-upgrade must
   be done and WHICH UPGRADE-TAPE to use.

  +------------+--------------------+--------------------+------------------+
  | DRIVE-TYPE | CURRENT FIRMWARE * | UPGRADE TO         | USE UPGRADE-TAPE |
  +------------+--------------------+--------------------+------------------+
  | TLZ06      | 0374, 0389, 0435,  |      4BQE          | AO-Q9XM0-0B.C01  |
  |            | 0491, 491A, 4BH0   |                    |                  |
  |            |                    |                    | Release 4BQE-19  |
  +------------+--------------------+--------------------+------------------+
  | TLZ07 old  | 4BE0               |      4BQE          | AO-Q9XM0-0B.C01  |
  |            |                    |                    |                  |
  |            |                    |                    | Release 4BQE-19  |
  +------------+--------------------+--------------------+------------------+
  | TLZ07 new  | 5330, 553A         |      553B          | AO-QSUS0-0B.B01  |
  |            |                    |                    |                  |
  |            |                    |                    | Release 553B-19  |
  +------------+--------------------+--------------------+------------------+
  | TLZ09      | 0162, 0165         |      0167          | AO-R0H90-0B.A01  |
  |            |                    |                    |                  |
  |            |                    |                    | Release 0167     |
  +------------+--------------------+--------------------+------------------+

  +------------+--------------------+--------------------+------------------+
  | DRIVE-TYPE | CURRENT FIRMWARE * | UPGRADE TO         | USE UPGRADE-TAPE |
  +------------+--------------------+--------------------+------------------+
  | TLZ6L      | 0491, 4BH0         |      4BQE  **      | AO-Q9XN0-0B.C01  |
  |            |                    |                    |                  |
  |            |                    |                    | Release 4BQE-419 |
  +------------+--------------------+--------------------+------------------+
  | TLZ7L      | 04??, 4???         |      4BQE  **      | AO-Q9XN0-0B.C01  |
  |            |                    |                    |                  |
  |            |                    |                    | Release 4BQE-419 |
  +------------+--------------------+--------------------+------------------+
  *:  Possible current/old firmware variants.
  **: TLZ6L/TLZ7L, used with Windows NT and Backup Exec V6.0, need firmware
      "4BQH" [Upgrade-tape AO-Q9XN0-0B.D01] and appropriate drivers.

DRIVE REPORTED ERRORS
=====================
Check the Event-/Errorlog and collect the drive reported information.
Most common Media/Cartridge related errors are shown below.

Used format: SENSE_KEY/ASC/ASCQ

 KEY/ASC/ASCQ
 ------------
  03/03/02  Excessive Write Errors
                This error causes the data-write to abort and may leave
                the tape un-readable. When this occurs at BOT, problems may
                occur later, when the tape is used again.
                Possible causes:
                o Head clog. See the section "CORRECTIVE HEAD CLEANING".
                o Damaged tape. Scrambled tape or scratches. Inspect Cartridge.
                o It has been observed, that a drive with head clog may write
                  such a track-pattern on the tape, that a GOOD drive needs
                  a few writes, before all OLD FLUXES have been removed. This
                  typically occurs at BOT.
                  You may retry the write/initialize. Be sure no Head Clog
                  exists.

  01/5B/xx  Log Exception/ Log counter at maximum/ Recovered with retries.
                This type of error indicates degrading write or read
                performance.
                Possible causes:
                o Head clog. See the section "CORRECTIVE HEAD CLEANING".
                o Degraded cartridge.

  03/30/xx  Cannot read tape (unknown or incompatible format)
                Possible causes:
                o Tape has been written on an incompatible drive.
                o Tape has been written on a dirty/degraded drive

  03/31/xx  Tape Format corrupted
                Possible causes:
                o Tape has been written on a degraded drive (head clog).

  03/3B/xx  Sequential Positioning Error
                Possible causes:
                o Tape written on a dirty drive (capstan, tape-posts, heads)
                o Damaged Cartridge

  04/44/80  Compression Hardware Fault
                Possible causes:
                o Problem has occured during Write of the media.
                o If it's a TLZ07 with firmware 5xxx, call your CSC.

  04/44/xx  Internal Target Failure
                Many different causes may exist for this error-group.
                Possible causes:
                o Data Cartridge related. This can both be poorly written
                  control-information on the media or a mechanical reel
                  problem. If a cartridge repeatedly gives this error, reject
                  the cartridge (assuming other cartridges work OK).
                o Drive electronic problem.
                o (Loader Only) Cartridge/magazin/transport problem.
                  Check for incorrectly applied labels on the cartridge or
                  labels coming loose.

  06/29/00  Power on or Reset occured.
                Possible causes:
                o When a SCSI-bus reset was invoked by the Host SCSI-port,
                  this is just informational. This is the MOST COMMON cause
		  for this event.
                o A Power problem may exist or power has been cycled.

  06/5A/01  Media removal requested by operator.
                Possible causes:
                o Operator pressed Eject during tape operation.

  07/27/00  Write protected.
                Possible causes:
                o Cartridge was set Write Protected (with write protect tab)
                  (Operator issue)
                o If the SAME cartridge keeps reporting this, this cartridge
                  has a mechanical tolerance problem.
                o If multiple cartridges report this problem, while they are
                  NOT set write protected, it's a drive-problem.

  08/xx/xx  Blank Check.
                End of data or unrecorded tape encountered.
                Possible causes:
                o A new, un-initialized tape is being read. Initialize tape.
                o While the tape was written, the write-operation aborted.
                  This must have been caused by some error (drive/SCSI-bus).


- end -

T.RTitleUserPersonal
Name
DateLines
3649.1This is what we needJGODCL::KRAANFri Feb 07 1997 15:4312
    Jan,
    
    It took me some time to read this, but it is very worth while. Here
    in the Nijmegen repaircenter, we have done a lot investigations on
    the TLZ products. Your memo/instruction describes very good the actions
    the can be done at the customer site to increase the reliability of the
    TLZ's. I do fully agree with its contents and I'm considering of making
    a Insert from it, so that these instructions are shipped with every
    repaired drive.
    
    Peter van der Kraan
    Mass-Storage Engineering, Nijmegen
3649.2See 3623.3 for F'ware bug-fix history and moreKERNEL::LOANEComfortably numb!!Fri Feb 07 1997 21:081
3649.3Good TLZ Info Located HereKYOSS1::LUIZZAThu Feb 13 1997 21:5813
    
    Jan, 
    Good guide picked up lots of good pointers. Could I request a listing
    of error codes from the displays as well as the indicator lites when
    these blink it means this list be also posted here? The cassette and
    write protect on means I'm dirty and will not work till cleaned?
    If you could edit your header so a search from tima will gather this
    note from a TLZ search? (I:TLZ) It will have more people find this great
    information also.
    
    Thanks for the good stuff keep it comming.
    
    /Irv Luizza