T.R | Title | User | Personal Name | Date | Lines |
---|
1242.1 | Does 3.1 really help? | PFSVAX::WUENSCHELL | | Wed Apr 30 1997 09:15 | 7 |
| Keith;
We continue to fight battery failures in the field even with
batteries dated 97. Are you continuing to have good results with HSOF
3.1? If so, this may be a solution for some of our problem customers.
By the way, are there any patches to 3.1 that you know of? A note in
this conference says there aren't, but then I saw a reference to 3.1-6.
|
1242.2 | | SSDEVO::THOMPSON | Paul Thompson, Colorado Springs | Wed Apr 30 1997 14:35 | 5 |
| Do the 1997 batteries with which you are having problems have white labels?
If so, the manufacturing date of those batteries pre-dates 1997. The date
on the white label shows the date that the battery was most recently
re-charged.
|
1242.3 | White label has MAR 1997 | PFSVAX::WUENSCHELL | | Thu May 01 1997 08:57 | 6 |
| Yes, the batteries have a white label with Mar 1997 on it. They also
have MAR 97 stamped in black on the edge.
Are you saying that these batteries may have problems? How do we
determine which 1997 batteries are good or bad?
|
1242.4 | Batteries WITHOUT white labels | SSDEVO::THOMPSON | Paul Thompson, Colorado Springs | Thu May 01 1997 15:32 | 4 |
| Batteries dated 1997 that do not have a white label on the face of the battery
with this date are good. Batteries with the date on a white label on the face
of the battery were originally manufactured in 1996 and are subject to the
problem from the vendor's manufacturing defect.
|
1242.5 | | GEM::SHERGOLD | We are 100% sure; well almost!! | Tue May 06 1997 07:28 | 3 |
| OK Guys what about an answer to .0??? Any takers?
Keith
|
1242.6 | Answers... | SSDEVO::FAVA | 4 Yrs of Eng Sch & Never Saw a Train | Tue May 06 1997 13:07 | 59 |
| RE: .5
OK, I accept your challenge.
I presume the questions in .0 that you would like answered are the
following:
>>
>> The strange thing was
>> that the next mornng the HSJ reported the battery as good with the
>> added message of "Cache battery is now sufficiently charged". Can
>> someone explain this?
>>
Yes, I can. This entire battery problem has been extremely painful
for everyone. It has been caused by several problems, both hardware
and software.
Major changes were made to the battery diagnostic in V3.1 which
specifically corrected many of the software issues. We know now that
the diagnostic in V2.7 and V3.0 was declaring many batteries "failed"
when there was no problem with them at all. One of our tests here
in the past few days was with a set of batteries which failed
consistently on V3.0 and passed consistently on V3.1. These
batteries have a date code of 10/94 and our testing shows that they
would still hold up a cache for 70 - 80 hours!!
Keep in mind, however, some of the failures detected by the software
were true battery failures. The big problem was to eliminate the
"false" failures while still detecting true failures.
>>
>> Have we been replacing batteries like mad because
>> the old battery testing routine was bad (despite the patch)?
>>
As I mentioned above, some, but not all, of the problems were due
to the software falsely declaring some batteries bad.
>>
>> As there
>> is always some delay in getting all our customers up to V3.1 is there a
>> better patch we can apply to V2.7 to facilitate the same response.
>>
NO.
>>
>> OR
>> (heaven forbid :-) ) has the battery test been fudged to cope with all
>> these failing batteries??
>>
I hope this suggestion was entirely facetious. But if there is any
doubt, NO!!!, the test was NOT fudged simply to pass all batteries,
bad ones included. Many MONTHS of effort by both hardware and
software people have been spent trying to resolve this serious
customer satisfaction problem. No one here has treated it lightly.
The changes in V3.1 were a big step. However, more work is going
on now. This issue is still not closed to our satisfaction.
Hope this helps.
Tom Fava
Colorado Springs
|
1242.7 | Fair's fair! | GEM::SHERGOLD | We are 100% sure; well almost!! | Fri May 09 1997 10:10 | 11 |
| Tom,
Thanks for the reply. Not the one I wanted but at least it is an honest
one and we know where we are.
Oh and by the way the last part was facetious but I didn't know the
symbol for "tongue in cheek". [ :-Q maybe??]
Regards
Keith
|
1242.8 | Cache Battery Low Messages | BSS::BERGLING | | Thu May 22 1997 10:15 | 11 |
| I have a new twist on this.
We have installed 3.1 on about 26 HSJ40's. We are getting cache battery
low notices. When we check the J later, like 8 hours later it says:
"Cache battery is now sufficiently charged"
Any explanation for this?
Thanks,
Vern Bergling
|
1242.9 | Normal operation from what you describe | SSDEVO::RMCLEAN | | Thu May 22 1997 10:49 | 4 |
| Yup... When you get batteries or if you have batteries that have been supporting
the cache there is some chance that they have been discharged somewhat.
Batteries sitting on the shelf or in an unpowered module discharge naturally.
The starting low and later becoming charged is perfectly natural.
|
1242.10 | Installed Batteries | BSS::BERGLING | | Thu May 22 1997 17:53 | 5 |
| These J's have not reported the batteries being low before. It seems to
be running fine and then gets a low indication. After 8-12 hours this
changes back to normal. Is the HSJ recharging the batteries or what?
Thanks,
|
1242.11 | 3.1 Crashes on low Batteries???? | BSS::BERGLING | | Fri May 23 1997 09:42 | 198 |
| The following is a console output from one of these "J"s. It seems that
when we get the first DRAB interrupt the J crashes. It then logs a
number of failure codes all pointing to the cache batteries. 4 hours
later the batteries are again sufficiently charged.
Is this crash a new feature for 3.1?
The batteries will be replaced today.
Vern
22:01:30 HJ2202> SHOW THIS
00:01:29 Controller:
00:01:29 HSJ40 (C) DEC ZG61013832 Firmware V31J-0, Hardware
H09
00:01:29 Configured for dual-redundancy with ZG61013838
00:01:29 In dual-redundant configuration
00:01:29 SCSI address 7
00:01:29 Time: 31-MAR-1997 14:56:04
00:01:29 Host port:
00:01:29 Node name: HJ2202, valid CI node 15, 16 max nodes
00:01:29 System ID 4200100FD4C0
00:01:29 Path A is ON
00:01:29 Path B is ON
00:01:29 MSCP allocation class 30
00:01:29 TMSCP allocation class 30
00:01:29 CI_ARBITRATION = ASYNCHRONOUS
00:01:29 MAXIMUM_HOSTS = 15
00:01:29 Cache:
00:01:29 32 megabyte write cache, version 2
00:01:29 Cache is GOOD
00:01:29 Battery is GOOD
00:01:29 No unflushed data in cache
00:01:29 CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
00:01:29 CACHE_POLICY = A
00:01:29 NOCACHE_UPS
00:01:29 HJ2202> SHOW FAIL
00:01:34 Name Storageset Uses
Used by
00:01:34
----------------------------------------------------------------------
00:01:34
00:01:34 FAILEDSET failedset
00:01:34 Switches:
00:01:34 NOAUTOSPARE
00:01:34 HJ2202>
01:22:45
01:22:45 %LFL--HJ2202> --31-MAR-1997 16:17:20-- Last Failure Code:
010B2380
01:22:55 Occurred on 31-MAR-1997 at 16:17:20
01:22:55 Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
17. Second
01:22:55 Controller Model: HSJ40
01:22:55 Serial Number: ZG61013832 Hardware Version: H09(4F)
01:22:55 Controller Identifier:
01:22:55 Unique Device Number: 000961013832 Model: 40.(28) Class:
1.(01)
01:22:55 Firmware Version: V31J(31)
01:22:55 Node Name: "HJ2202" CI Node Number: 15.(0F)
01:22:55 Instance Code: 01010302
01:22:55 Last Failure Code: 010B2380 (No Last Failure Parameters)
01:22:55
01:22:55 Additional information is available in Last Failure Entry: 4.
01:23:44
01:23:44 Copyright Digital Equipment Corporation 1993, 1997. All rights
reserve
01:23:44 HSJ40 Firmware version V31J-0, Hardware version H09
01:23:44
01:23:44 Last fail code: 010B2380
01:23:44
01:23:44 Press " ?" at any time for help.
01:23:44
01:23:44
01:23:44 Cache battery charge is low
01:23:44 Write-back caching is disabled
01:23:44 HJ2202>
01:23:44
01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 01010302
01:23:44 Template: 1.(01)
01:23:44 Occurred on 01-MAR-1997 at 18:13:07
01:23:44 Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
18. Second
01:23:44 Controller Model: HSJ40
01:23:44 Serial Number: ZG61013832 Hardware Version: H09(4F)
01:23:44 Controller Identifier:
01:23:44 Unique Device Number: 000961013832 Model: 40.(28) Class:
1.(01)
01:23:44 Firmware Version: V31J(31)
01:23:44 Node Name: "HJ2202" CI Node Number: 15.(0F)
01:23:44 Command Reference Number: 00000000 Sequence Number: 0001
01:23:44 Instance Code: 01010302
01:23:44 Last Failure Code: 010B2380 (No Last Failure Parameters)
01:23:44
01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 02052301
01:23:44 Template: 18.(12)
01:23:44 Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
20. Second
01:23:44 Controller Model: HSJ40
01:23:44 Serial Number: ZG61013832 Hardware Version: H09(4F)
01:23:44 Controller Identifier:
01:23:44 Unique Device Number: 000961013832 Model: 40.(28) Class:
1.(01)
01:23:44 Firmware Version: V31J(31)
01:23:44 Node Name: "HJ2202" CI Node Number: 15.(0F)
01:23:44 Command Reference Number: 00000000 Sequence Number: 0002
01:23:44 Memory Address: 00000000
01:23:44 Instance Code: 02052301
01:23:44 HJ2202
01:23:44
01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:10-- Instance Code: 024B2401
01:23:44 Template: 20.(14)
01:23:44 Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
20. Second
01:23:44 Controller Model: HSJ40
01:23:44 Serial Number: ZG61013832 Hardware Version: H09(4F)
01:23:44 Controller Identifier:
01:23:44 Unique Device Number: 000961013832 Model: 40.(28) Class:
1.(01)
01:23:44 Firmware Version: V31J(31)
01:23:44 Node Name: "HJ2202" CI Node Number: 15.(0F)
01:23:44 Command Reference Number: 00000000 Sequence Number: 0003
01:23:44 Reported via low level DRAB interrupt
01:23:44 Memory Address: 40000000
01:23:44 Byte Count: 0.(00000000)
01:23:44 DRAB Registers:
01:23:55 DSR: 00000000 CSR: 00000000 DCSR: 00000000 DER:
00000000 EAR:
01:23:55 EDR: 00000000 ERR: 00000000 RSR: 00000000 CHC:
00000000 CMC:
01:23:55 Diagnostic Registers:
01:23:55 RDR0: 00000000 RDR1: 00000000 WDR0: 00000000 WDR1:
00000000
01:23:55 Instance Code: 024B2401
01:23:55 HJ2202> SHOW THIS
04:01:28 Controller:
04:01:28 HSJ40 (C) DEC ZG61013832 Firmware V31J-0, Hardware
H09
04:01:28 Configured for dual-redundancy with ZG61013838
04:01:28 In dual-redundant configuration
04:01:28 SCSI address 7
04:01:28 Time: 31-MAR-1997 15:23:54
04:01:28 Host port:
04:01:28 Node name: HJ2202, valid CI node 15, 16 max nodes
04:01:29 System ID 4200100FD4C0
04:01:29 Path A is ON
04:01:29 Path B is ON
04:01:29 MSCP allocation class 30
04:01:29 TMSCP allocation class 30
04:01:29 CI_ARBITRATION = ASYNCHRONOUS
04:01:29 MAXIMUM_HOSTS = 15
04:01:29 Cache:
04:01:29 32 megabyte write cache, version 2
04:01:29 Cache is GOOD
04:01:29 Battery is LOW
04:01:29 No unflushed data in cache
04:01:29 CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
04:01:29 CACHE_POLICY = A
04:01:29 NOCACHE_UPS
04:01:29 Cache battery charge is low
04:01:29 Write-back caching is disabled
04:01:29 HJ2202> SHOW FAIL
04:01:33 Name Storageset Uses
Used by
04:01:33
----------------------------------------------------------------------
04:01:34
04:01:34 FAILEDSET failedset
04:01:34 Switches:
04:01:34 NOAUTOSPARE
04:01:34 Cache battery charge is low
04:01:34 Write-back caching is disabled
04:01:34 HJ2202> SHOW THIS
08:01:28 Controller:
08:01:28 HSJ40 (C) DEC ZG61013832 Firmware V31J-0, Hardware
H09
08:01:28 Configured for dual-redundancy with ZG61013838
08:01:28 In dual-redundant configuration
08:01:28 SCSI address 7
08:01:28 Time: 31-MAR-1997 19:23:54
08:01:29 Host port:
08:01:29 Node name: HJ2202, valid CI node 15, 16 max nodes
08:01:29 System ID 4200100FD4C0
08:01:29 Path A is ON
08:01:29 Path B is ON
08:01:29 MSCP allocation class 30
08:01:29 TMSCP allocation class 30
08:01:29 CI_ARBITRATION = ASYNCHRONOUS
08:01:29 MAXIMUM_HOSTS = 15
08:01:29 Cache:
08:01:29 32 megabyte write cache, version 2
08:01:29 Cache is GOOD
08:01:29 Battery is GOOD
08:01:29 No unflushed data in cache
08:01:29 CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
08:01:29 CACHE_POLICY = A
08:01:29 NOCACHE_UPS
08:01:29 Cache battery is now sufficiently charged
08:01:29 HJ2202>
|
1242.12 | what is the date? | SSDEVO::RMCLEAN | | Fri May 23 1997 10:49 | 3 |
| The important thing here is what is the date on the batteries? They may well
be very near failure but they should still be able to hold up the cache for
100 hours.
|