[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

2077.0. "rmerror_int Panic crashes cluster member" by NNTPD::"[email protected]" (Mark Sowards) Tue May 20 1997 17:48

I have been struggling, for weeks, with the same member of a rack mounted 
Alpha 2100A cluster pair crashing ever other day (or more).  It would crash 
any time access through the MC occured.  The cluster is serving out three DRD 
services each with 80 LSM volumes/disks for use by the OPS.  Even when all
three 
services were on the system that was not crashing the other system would crash

if 'cmon' was run.  

Every time it was the same panic 
  >> rmerror_int: failed to call rmerror_get_errcnt_1k <<<
Which was proceeded by 2 seconds by an adapter error from the MC 
  >> MC Receive Data Parity Error
     MC Receive Header or C/A parity Error <<<

I replaced the MC card; tried a different Hub cards, always with the same 
results.  So the final test was to replace the cable; it was so nicely
installed
and tucked out of the way, strapped right next to that POWER cabel.  As you
can
guess the problem went away when I switched cables.  It seems the very neat 
installation in the rack had placed the cable in a position to get EMFI from
the power cable..... BUT....

   Today the same system went down after having run for 2 weeks and having 
hosted each DRD service indivudually and in combinations for varying lengths
of
time.  The panic was very similar.
 >> rmerror_int: fatal error and no alternate mc to failover <<

The MC adapter error event was present also.  However, no error was listed.
The closest thing to an error was in the "PCI Quadword Data" section of the 
DIA output it lists PCI Quadword INVALID for both the Device Identifier and 
Config Space Address, but the MC error register had no error bits.

  Does anyone have an idea about this?
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
2077.1No IdeaTUXEDO::SWEENEYTom Sweeney in LKGThu May 22 1997 15:475
But it parallels the problem I noted and am still
experiencing in note 1795.*.  No solution yet,
so I'm off to QAR....

t