T.R | Title | User | Personal Name | Date | Lines |
---|
3922.1 | | SSAG::LARY | Laughter & hope & a sock in the eye | Wed Feb 22 1995 18:10 | 12 |
3922.2 | ef51 longwords | NETRIX::"[email protected]" | DAVE CLARK | Tue Mar 11 1997 07:13 | 79 |
| I have the following longword information from an EF51 drive detected error
(MSLG$W_EVENT 00EB)
.
. CONTROLLER DEPENDENT INFORMATION
LONGWORD 1. 00000000
/..../
LONGWORD 2. 00000000
/..../
LONGWORD 3. 00003700
/.7../
ANAL/ERR DAVE.BIN/OUT=DAVE.DAT
I have not been able to find a decoder for the EF51 error information in this
format.
HISTRY or PARAMS>stat log do not provide any clues.
The drive seems to log errors at the rate of two per day, and usually
fairly close together.
DATE/TIME 1-MAR-1997 13:14:37.01 SYS_TYPE
01370501
DATE/TIME 1-MAR-1997 13:14:48.16 SYS_TYPE
01370501
DATE/TIME 2-MAR-1997 13:14:40.34 SYS_TYPE
01370501
DATE/TIME 2-MAR-1997 13:14:52.90 SYS_TYPE
01370501
DATE/TIME 3-MAR-1997 14:04:12.57 SYS_TYPE
01370501
DATE/TIME 3-MAR-1997 14:04:12.61 SYS_TYPE
01370501
DATE/TIME 4-MAR-1997 12:46:15.72 SYS_TYPE
01370501
DATE/TIME 4-MAR-1997 12:46:17.09 SYS_TYPE
01370501
DATE/TIME 5-MAR-1997 12:46:19.05 SYS_TYPE
01370501
DATE/TIME 5-MAR-1997 12:46:21.82 SYS_TYPE
01370501
DATE/TIME 6-MAR-1997 12:46:22.37 SYS_TYPE
01370501
DATE/TIME 6-MAR-1997 12:46:26.55 SYS_TYPE
01370501
DATE/TIME 7-MAR-1997 12:46:25.70 SYS_TYPE
01370501
DATE/TIME 7-MAR-1997 12:46:31.28 SYS_TYPE
01370501
DATE/TIME 8-MAR-1997 12:46:29.03 SYS_TYPE
01370501
DATE/TIME 8-MAR-1997 12:46:36.01 SYS_TYPE
01370501
DATE/TIME 9-MAR-1997 12:46:32.36 SYS_TYPE
01370501
DATE/TIME 9-MAR-1997 12:46:40.74 SYS_TYPE
01370501
DATE/TIME 10-MAR-1997 12:46:35.68 SYS_TYPE
01370501
DATE/TIME 10-MAR-1997 12:46:45.47 SYS_TYPE
01370501
DATE/TIME 11-MAR-1997 12:46:39.02 SYS_TYPE
01370501
DATE/TIME 11-MAR-1997 12:46:50.21 SYS_TYPE
01370501
I have checked the ESE50 service guide for longword decoding, however
it only works on longwords #1, and #2 (both '00000000' in this case) but does
not deal with longword#3 which is the only one I have with any bits set.
Any help with either decoding the information, or a pointer to a
resource would be welcome.
Regards...
Dave Clark
[Posted by WWW Notes gateway]
|
3922.3 | you did not list the entire error log but,,, | SUBSYS::VIDIOT::PATENAUDE | Ask your boss for ARRAY's... | Tue Mar 11 1997 07:53 | 173 |
|
If the 37 in those longwords is the DER code then 37 = Replace Old Battery.
You may be running slow because ALL I/O is going to the retention device and NOT
to RAM.
I've attached a BLITZ I sent out about a year ago and resent recently to the
Field.
Roger.
From: SUBSYS::BABAGI::TOCSIN::TIMA_MGR "06-Feb-1997 1653" 6-FEB-1997
16:53:49.27
To: BABAGI::PATENAUDE
CC:
Subj: GRAM: [TD 2021-A] Test/Replace Batteries - EF51R, EF52R, EF54R - BLITZ
Author : LINDA WARREN
User type : DBA
Location : USTIMA
Vaxmail address : BSS::LWARREN
Copyright (c) Digital Equipment Corporation 1996, 1997. All rights reserved.
NOTE: This BLITZ supersedes TD 2021.
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | TIME DEPENDENT BLITZ
| | | | | | | |
+---------------------------+
BLITZ TITLE: Testing and replacement of batteries in the EF51R, EF52R,
and EF54R.
PRIORITY LEVEL: 2
DATE: February 6, 1997
TD #: 2021-A
AUTHOR: Roger Patenaude
DTN: 237-3705
EMAIL: SUBSYS::PATENAUDE or [email protected]
DEPARTMENT: Storage External Products, Continuation Engineering
=================================================================
PRODUCT NAME(S): EF51R, EF52R, and EF54R.
PRODUCT FAMILY(IES): {Check all that apply}
Storage _X_
Systems/OS ___
Networks ___
PC/Peripherals ___ {includes printers, monitors, etc.}
Software Apps. ___
BLITZ TYPE: {Check all that apply}
Maintenance Tip _X_ {Info. will assist servicing the product}
Service Action Requested ___ {MCS is requested to perform an activity}
IF SERVICE ACTION IS REQUESTED: (Check all that apply.)
Labor Support Required ___ {Requires MCS to provide service labor}
Material Support Required ___ {Requires MCS to provide material}
Estimated time to complete activity (in hours):
Will this require a change in the field's inventory: Yes ___ No ___
Will an FCO be associated with this advisory? Yes ___ No ___
DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):
**********************************************************************
SYMPTOM:
Customer can lose all data contained on Solid State Disks during
power failure if the retention battery has failed.
PROBLEM STATEMENT:
The mode in which NiCad batteries (as those used in the EF5xR
products) most commonly fail, is that they will test as having
voltage and current, however, in fact they can be holding next to no
reserve. This, if left undetected in a EF5xR implementation can render
the drive in a state that is only recovered by reformatting the unit.
It was for this reason that the EF5xR family of devices has on-board
battery test diagnostics, and that the batteries must be tested and
replaced every three years or if diagnostics fail during a yearly
test as part of normal service procedure.
SOLUTION:
Replace any batteries that are 3 years old or fail annual battery
tests.
Inspection of the battery manufacture date, located on the battery
label is the only true way of finding out the age of the battery.
Refer to section 4 of EK-EF5XX-UG for proper procedures to access
and run BATTST utility.
NOTE: As per section 4 of EK-EF5XX-UG, you may view how many days the
current battery has left before the 3 year replacement by looking at
the PARAMS values of BSS_MAXR and BSS_REPL. BSS_MAXR is the total
number of days a battery can live before proactive replacement and
BSS_REPL is number of days left on current battery. Once BSS_REPL
reaches "0" the unit will issue an errorlog datagram with a DER code
of 37(x) (Replace Old Battery) once per week.
Refer to section 7 of the EK-EZ5XX-UG for proper procedures to
replace a battery pack (The EF5XX User Guide omits the actual
replacement procedure).
Note: Both of the manuals are available online at;
SUBSYS::LCA:[SPECS.SOLID_STATE.EZXX] or,
SUBSYS::LCA:[SPECS.SOLID_STATE.EFXX] or,
TIMA TOOLS in .PDF and .PS format
Note: The replacement battery part number has recently been CHANGED!
OLD battery pack PN# 12-37620-01
NEW battery pack PN# 29-33445-01
These batteries may discharge during storage and at times when the
unit's power is removed for more than a month. Upon initial receipt
of a new EF5xR or after replacement of the batteries in an existing
EF5xR, you may find the device is write protected due to insufficient
battery charge-level for data retention. It is recommended that EF5xR
devices be powered on for a minimum of four hours before operating.
EF5xR batteries must be replaced every 3 years as part of normal
service procedure. This item is considered a "wearable" item and NOT
covered under warranty or field contract. Any replacement is the
responsibility of the user and should be charged per call/time and
material.
Once the battery pack has been replaced on a EF5XX, you must reset
the internal battery 3 year counter saved in the parameters of the
device. BSS_REPL (as mentioned above) is a read only word, and to
reset it to factory default of 1095 (3 years in days) you must write
a "1" into parameter BSS_REST (this may take a minute to be detected
by the firmware). Then, either power cycle the unit or set the bit
BSS_UPNV (Update Non-volitile) to force it take an immediate effect.
VERIFICATION:
After battery has had sufficient charge time, a successful pass of
BATTST indicates a good battery.
It is also a good idea inspect the EF5X battery pack for visible
signs of "leakage" whenever service is performed on the unit.
LARS INFORMATION:
*** DIGITAL INTERNAL USE ONLY ***
\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT
|
3922.4 | decevent's view of things... | NETRIX::"[email protected]" | dave clark | Tue Mar 11 1997 07:53 | 94 |
| I tried running the binary errorlog information past DECEVENT, but I'm not too
convinced by it's interpretation of the third longword:-
******************************** ENTRY 6 ********************************
Logging OS 1. OpenVMS
System Architecture 1. VAX
OS version V5.5-2
Event sequence number 18435.
Timestamp of occurrence 03-MAR-1997 14:04:12
Time since reboot 0 Day(s) 1:07:59
Host name MARS01
SID register x14000006
System type register x01370501 Unrecognized System Type
Unique CPU ID x00000000
System Model VAX type not decoded yet
Entry type 100. Logged Message
---- Device Profile ----
Unit DISK2$DIA3
Product Name EF51 DSSI Solid State Disk
---- MSCP Logged Msg ----
Logged Message Type Code 1. Disk Message
Command Reference number x00000000
Unit Number 3.
MSCP Sequence number 0.
Logged Message Format 4. Small Disk Error
MSCP Flags x00 No MSCP Flags indicated
MSCP Unique Controller-ID x0000408332101779
MSCP Controller Model 105. EF5X
MSCP Controller Class 1. Mass Storage Controller class
Controller SW version x3A
Controller HW version x01
Unit SW version x3A
Unit HW version x01
MSCP SDE Event code x00EB Drive detected error.
Multiunit code x0000
Cylinder 0.
Volume Serial Number 0.
RF Disk DER Code x00 Undefined DER Code
Servo Event Code x00 No Servo Error.
Physical Sector 0.
Head 0.
Logical Block Number 0.
Bad Block Space left 0.
DDASP Write Fault Reg x34
Cancel
MSCP Unique Unit-ID x0000408332101779
MSCP Unit Model 51. EF5X
MSCP Unit Class 2. Disk class - DEC Std 166 disk
Unit SW version x3A
Unit HW version x01
MSCP SDE Event code x00EB Drive detected error.
Multiunit code x0000 }
Cylinder 0. }
Volume Serial Number 0. } longword#1 ??
RF Disk DER Code x00 Undefined DER Code }
Servo Event Code x00 No Servo Error.
Physical Sector 0.
Head 0.
Logical Block Number 0.
Bad Block Space left 0.
DDASP Write Fault Reg x37 Disable Write Gate Bit Set. } from lw#3
??
Wrt Lock Fault. Not properly Locked to
Internal Ref Clock.
Write Enabled Fault. Disable Write Gate
Set During Write.
Write Unsafe. Often Result Of Another
Write Fault Condition.
Sector Write Overrun. Attempt to Write
Over Servo Burst.
Servo Status Reg x0000
Phoenix Data Status Reg x0000 Cmd Response: State Machine Idle.
MSCP Unique Unit-ID x0000408332101779
MSCP Unit Model 51. EF5X
MSCP Unit Class 2. Disk class - DEC Std 166 disk
[Posted by WWW Notes gateway]
|
3922.5 | Pre-emptive strike! | KERNEL::CLARK | STRUGGLING AGAINST GRAVITY... | Tue Mar 11 1997 07:57 | 5 |
| Roger...
Many thanks...you pr-empted my reply '.4' by seconds!
Regards...
Dave Clark
|
3922.6 | hmmm... | SUBSYS::VIDIOT::PATENAUDE | Ask your boss for ARRAY's... | Tue Mar 11 1997 08:20 | 5 |
|
Your welcome. That DECevent log looks strange. Can you copy a binary of that
error to BOT000::FIREWALL: so I can bit bust it manually?
roger.
|
3922.7 | File copied as requested | KERNEL::CLARK | STRUGGLING AGAINST GRAVITY... | Thu Mar 13 1997 03:18 | 8 |
| Roger...
The file EF51_ERRORS.BIN is now copied as requested. This is the
cluster-merged binary errorlog for the device since 1st march this
year.
Sorry about the delay...I had a day off yesterday (12th)
Dave
|
3922.8 | yup battery. | SUBSYS::VIDIOT::PATENAUDE | Ask your boss for ARRAY's... | Thu Mar 13 1997 07:53 | 11 |
|
I did not have to break them down. Why? If you notice, the errors happen every
24 hours.
The drive tests the battery status every 24 hours after being powered up, if it
fails, your get the every 24 hr error message.
Every 7 days after power on, the drive also test's the internal value of
BSS_REPL and if = 0, will also issue the same error packet, except 7 days appart.
roger.
|
3922.9 | Action in hand. | KERNEL::CLARK | STRUGGLING AGAINST GRAVITY... | Tue Mar 18 1997 10:30 | 4 |
| Roger...
Thanks for the feedback...an action plan has been implemented...
Dave
|