[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | ase |
|
Moderator: | SMURF::GROSSO |
|
Created: | Thu Jul 29 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2114 |
Total number of notes: | 7347 |
I have been struggling, for weeks, with the same member of a rack mounted
Alpha 2100A cluster pair crashing ever other day (or more). It would crash
any time access through the MC occured. The cluster is serving out three DRD
services each with 80 LSM volumes/disks for use by the OPS. Even when all
three
services were on the system that was not crashing the other system would crash
if 'cmon' was run.
Every time it was the same panic
>> rmerror_int: failed to call rmerror_get_errcnt_1k <<<
Which was proceeded by 2 seconds by an adapter error from the MC
>> MC Receive Data Parity Error
MC Receive Header or C/A parity Error <<<
I replaced the MC card; tried a different Hub cards, always with the same
results. So the final test was to replace the cable; it was so nicely
installed
and tucked out of the way, strapped right next to that POWER cabel. As you
can
guess the problem went away when I switched cables. It seems the very neat
installation in the rack had placed the cable in a position to get EMFI from
the power cable..... BUT....
Today the same system went down after having run for 2 weeks and having
hosted each DRD service indivudually and in combinations for varying lengths
of
time. The panic was very similar.
>> rmerror_int: fatal error and no alternate mc to failover <<
The MC adapter error event was present also. However, no error was listed.
The closest thing to an error was in the "PCI Quadword Data" section of the
DIA output it lists PCI Quadword INVALID for both the Device Identifier and
Config Space Address, but the MC error register had no error bits.
Does anyone have an idea about this?
[Posted by WWW Notes gateway]
T.R | Title | User | Personal Name | Date | Lines |
---|
2077.1 | No Idea | TUXEDO::SWEENEY | Tom Sweeney in LKG | Thu May 22 1997 15:47 | 5 |
| But it parallels the problem I noted and am still
experiencing in note 1795.*. No solution yet,
so I'm off to QAR....
t
|