[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | HSJ30/40 Product Conference |
|
Moderator: | SSDEVO::EDMONDS |
|
Created: | Mon Jul 12 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1264 |
Total number of notes: | 4958 |
1247.0. "An answer this time please...Lastfail parameter decode" by KERNEL::CLARK (STRUGGLING AGAINST GRAVITY...) Thu May 08 1997 07:03
This question has been asked many times before and so far as I can
ascertain, has never been answered satisfactorily:-
HSxxx controllers generate lastfail codes, which in some cases are
accompanied by lastfail parameters.
There does not appear to be any information published which enables the
average FE to understand these parameters.
OK, I can understand where the parameter is a PC in a code stream,
or an address of a faulting instruction, but where a FAULT TYPE/SUBTYPE
number is provided, surely it must help the FE to understand what's
really gone wrong if he can decode this information.
Not knowing this information is costing DIGITAL lots of '$' in
un-necessary swaps of the wrong parts.
In the days of HSC's it was sometimes possible to identify a
requestor and port from crash information, to isolate a faulty drive.
Is this information available in HSOF to enable FE's to pin down a
port/drive which might be causing an HSJ crash?
Why am I raising this question again?....Because our old friend
lastfail 01050104 for an HSJ40 running HSOF2.5j has re-occurred.
Previous notes (152,226,239) suggest a selection of causes which
favoured a drive with bad metadata, and solutions which included testing
every drive with DILX, and then re-initialising suspect drives. These
notes incidentally, were raised in 1993, for HSOF 1.1j!!!
In a fully populated SW500 array in a customer production
environment, this approach is just not feasible. There has to be a more
efficient way of dealing with this scenario.
Therefore, is there a number which can be provided, or which can be
extracted, which when decoded, will identify a port/device/channel active at
the time of the lastfail event? Maybe a parameter (n+1)?
Dave Clark
UK-CSC
T.R | Title | User | Personal Name | Date | Lines |
---|
1247.1 | Run FMU and see what it says | SSDEVO::RMCLEAN | | Thu May 08 1997 10:56 | 3 |
| Actually there is a LOT of built in documentation on these. What you do
is run FMU on the controller and then do a sho last all full. This will tell
you what the error code is and give a lot of other info.
|
1247.2 | | GIDDAY::HOBBS | Andy Hobbs. Sydney CSC. -730 5964 | Sat May 17 1997 08:13 | 9 |
|
I think a 'describe' of the lastfail code from within FMU also
gives a useful listing of the lastfail paramters which might
help. I'm miles away from my nearest manual though and the modem
connection I've got doesn't favour PDF-based alternatives.
Check it out.
Andy/.
|
1247.3 | Once more with feeling!!! | KERNEL::CLARK | STRUGGLING AGAINST GRAVITY... | Tue May 27 1997 10:08 | 23 |
| Re: .1
Yes!
It gives you the decode ofthe last fail code, but all it tells you for
parameter (2) is that it's the "Fault type and subtype values"
My question was :-
What do these values indicate?
Supplementary questions are now....
Are they significant?
Would they help a FE to make more sense of the problem?
Would they save DIGITAL money?
What does... "parameter '2' 00020001" ...mean in English?
Where is the decode for this number?
Can we all have a copy of the list of values please?
Dave Clark
|
1247.4 | | KERNEL::CLARK | STRUGGLING AGAINST GRAVITY... | Mon Jun 02 1997 04:35 | 3 |
| Did I say something wrong?
Dave
|