T.R | Title | User | Personal Name | Date | Lines |
---|
4188.1 | | NETCAD::GALLAGHER | | Tue Jan 28 1997 10:56 | 192 |
| Steven,
First, some background. Repeaters support the RMON (rfc1757) Alarm and
Event groups. We ship the repeaters with a few (7?) default alarms set
from the factory. Some of the defaults are:
- when a port is auto-partitioned,
- when a dual-port redundancy failover occurs,
- when a security violation occurs,
- etc.
The one you're seeing is for erptrTotalRptrErrors. This object counts
"bad things" (more on this later) on the repeater ports. The "bad things"
can be coming from any station attached to the repeater.
From your capture:
>#4566
>Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded
> threshold 1 value = 4165484 ( Sample type = 2 alarm in
This tells you that object erptrTotalRptrErrors.0 exceeded a threshold of
1. erptrTotalRptrErrors is defined as follows:
erptrTotalRptrErrors OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The total number of errors which have occurred on all
the groups in a repeater. This object is a summation
of the values of the rptMonitorGroupTotalErrors as
defined in RFC 1516 for all of the groups in a repeater."
REFERENCE
"Reference RFC 1516 repeater MIB"
::= { erptrRptrInfo 6 }
And rptrMonitorGroupTotalErrors is defined in the repeater MIB as follows:
rptrMonitorGroupTotalErrors OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The total number of errors which have occurred on
all of the ports in this group. This counter is
the summation of the values of the
rptrMonitorPortTotalErrors counters for all of the
ports in the group."
::= { rptrMonitorGroupEntry 4 }
And (finally!) rptrMonitorPortTotalErrors is defined as follows:
rptrMonitorPortTotalErrors OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The total number of errors which have occurred on
this port. This counter is the summation of the
values of other error counters (for the same
port), namely:
rptrMonitorPortFCSErrors,
rptrMonitorPortAlignmentErrors,
rptrMonitorPortFrameTooLongs,
rptrMonitorPortShortEvents,
rptrMonitorPortLateEvents,
rptrMonitorPortVeryLongEvents, and
rptrMonitorPortDataRateMismatches.
This counter is redundant in the sense that it is
the summation of information already available
through other objects. However, it is included
specifically because the regular retrieval of this
object as a means of tracking the health of a port
provides a considerable optimization of network
management traffic over the otherwise necessary
retrieval of the summed counters."
::= { rptrMonitorPortEntry 15 }
So, you should look at all the error counters on all of the repeater
ports to find out which port(s), and which error(s) are causing the
traps. I noticed that the value of the error counter is huge (4165484)!
>
> Information:
> Node: offsb2_tp5.domainname
> Enterprise: 1.3.6.1.2.1.16 (rmon) <-- This only tells me it is an RMON
> trap. Not the specific object.
> Trap: RMON_ALARM - #1 <-- Is this alarm 1?
I think it tells you that its an RMON risingAlarm trap. (risingAlarm = 1,
fallingAlarm = 2.)
> Logged Time: Mon Jan 27 06:36:01 1997
> Severity: Critical
> Category: Threshold Events
> Source: Agent
I've deleted all of the erptrRptrInfo.erptrTotalRptrErrors.0 entries from
your log.
>Description: RMON Rising Alarm: erptrMauRptrInfo.erptrHealthTextChanges.
> 0exceedd threshold 1 value 395 ( sample type = 2 alarm inde
>
> Information:
> Node: bmwvax_tp8.domainname
> Enterprise: 1.3.6.1.2.1.16 (rmon)
> Trap: RMON_RALRM - #1
> Logged Time: Mon Jan 27 06:46:36 1997
> Severity: Critical
> Category: Threshold Events
> Source: Agent
erptrMauRptrInfo.erptrHealthTextChanges is defined in the extended repeater
MIB as:
erptrHealthTextChanges OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This counter increments each time the rptrHealthText object
defined in RFC 1516 is modified."
REFERENCE
"Reference RFC 1516 repeater MIB"
::= { erptrRptrInfo 4 }
rptrHealthText OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..255))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The health text object is a text string that
provides information relevant to the operational
state of the repeater. Agents may use this string
to provide detailed information on current
failures, including how they were detected, and/or
instructions for problem resolution. The contents
are agent-specific."
REFERENCE
"Reference IEEE 802.3 Rptr Mgt, 19.2.3.2,
aRepeaterHealthText."
::= { rptrRptrInfo 3 }
This may be telling you of auto-partitioned ports. The value is fairly small
(395). You might want to take a couple looks at "Health Test" in the upper
right hand corner of the ClearVISN MCM Repeater Summary view.
Also, if your auto-partition reconnect algorithm (Repeater Summary View)
is "standard", and this is coming from the same repeater that's experiencing
errors, then one theory might be that:
- a station on the port is generating errors (excessive length
collisions or excessive number of consecutive collisions, etc.),
- the errors generate the erptrTotalRptrErrors trap,
- the repeater auto-partitions the port, generating the
erptrHealthTextChanges trap,
- the station transmits a few good packets,
- the repeater reconnects the auto-partitioned port, generating
another erptrHealthTextChanges trap,
- repeat forever.
It's just a theory.
>#4598
>Description: RMON Falling Alarm: erptrMauRptrInfo.erptrMauTotalMediaUnavailable
> .0fell below threshold 1 value 8 ( sample type = 2
>
> Information:
> Node: bmwvax_tp8.domainname
> Enterprise: 1.3.6.1.2.1.16 (rmon)
> Trap: RMON_RALRM - #2
> Logged Time: Mon Jan 27 06:46:37 1997
> Severity: Critical
> Category: Threshold Events
> Source: Agent
This is telling you that someone connected a MAU. (erptrMauTotalMedia-
Unavailable decreased by one, generating an RMON fallingAlarm trap.)
-Shawn
|
4188.2 | | NETCAD::GALLAGHER | | Tue Jan 28 1997 11:10 | 17 |
|
p.s.
Hopefully you can see that RMON traps are useful in debugging your network.
A obvious problem is that if someone as experienced as Steven can't figure
out what's going on, then how can Joe or Jill Customer be expected to figure
it out?
Generic NMS like Netview can't do a very good job translating the traps
into something readable.
ClearVISN Multi Chassis Manager *can* translate the traps into meaningful
error messages because it knows that specifics about our products. It
could be that work's already going on in this area. Comments from the
MCM folks?
-Shawn
|
4188.3 | Thanks | KERNEL::FREKES | Like a thief in the night | Wed Jan 29 1997 04:31 | 5 |
| Shawn
Thanks for the tip.
I am passing this onto the customer.
Steven
|
4188.4 | FCS errors con BP port causing traps | WOTVAX::SMITHD | | Mon Feb 03 1997 11:22 | 98 |
| Shawn,
I understand what the trap means, my problem is tying the total errors back
to something in the portswitch that makes sense!
This problem occurs on 2 of my busiest hubs. Below are the counters that
trigger the 'total' RMON Rising Alarm. Steven didn't mention that the errors
that trigger the alarms are FCS coming on port 33! Port 33 is the BP port,
right? Loop detection is enabled and both hubs have been checked for proper
configuration. About once a month I lose the connections on the hub. The
trap scenario detailing the outage is in the original note.
Hub version 4.1, portswitch 900tp version is 2.1.0
These two sets of counters below from offsb2.domainname seem to imply that the
Portswitch is sending runt packets that may be causing the FCS errors in the
first place. Why would the BP port be generating significant traffic to cause
this huge number of runt packets?
What do you think? Is this a known problem?
Thanks
Doug Smith
>#4566
>Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded
> threshold 1 value = 4165484 ( Sample type = 2 alarm in
Counters taken a few days later...
mib2...rptrMonitorPortFCSErrors.1.1 : 3
mib2...rptrMonitorPortFCSErrors.2.1 : 0
mib2...rptrMonitorPortFCSErrors.3.1 : 0
mib2...rptrMonitorPortFCSErrors.4.1 : 0
mib2...rptrMonitorPortFCSErrors.5.1 : 0
mib2...rptrMonitorPortFCSErrors.6.1 : 0
mib2...rptrMonitorPortFCSErrors.7.1 : 0
mib2...rptrMonitorPortFCSErrors.8.1 : 0
mib2...rptrMonitorPortFCSErrors.9.1 : 0
mib2...rptrMonitorPortFCSErrors.10.1 : 0
mib2...rptrMonitorPortFCSErrors.11.1 : 0
mib2...rptrMonitorPortFCSErrors.12.1 : 0
mib2...rptrMonitorPortFCSErrors.13.1 : 0
mib2...rptrMonitorPortFCSErrors.14.1 : 0
mib2...rptrMonitorPortFCSErrors.15.1 : 0
mib2...rptrMonitorPortFCSErrors.16.1 : 0
mib2...rptrMonitorPortFCSErrors.17.1 : 0
mib2...rptrMonitorPortFCSErrors.18.1 : 0
mib2...rptrMonitorPortFCSErrors.19.1 : 0
mib2...rptrMonitorPortFCSErrors.20.1 : 0
mib2...rptrMonitorPortFCSErrors.21.1 : 0
mib2...rptrMonitorPortFCSErrors.22.1 : 0
mib2...rptrMonitorPortFCSErrors.23.1 : 0
mib2...rptrMonitorPortFCSErrors.24.1 : 0
mib2...rptrMonitorPortFCSErrors.25.1 : 0
mib2...rptrMonitorPortFCSErrors.26.1 : 0
mib2...rptrMonitorPortFCSErrors.27.1 : 0
mib2...rptrMonitorPortFCSErrors.28.1 : 0
mib2...rptrMonitorPortFCSErrors.29.1 : 0
mib2...rptrMonitorPortFCSErrors.30.1 : 0
mib2...rptrMonitorPortFCSErrors.31.1 : 0
mib2...rptrMonitorPortFCSErrors.32.1 : 0
mib2...rptrMonitorPortFCSErrors.33.1 : 6695441
mib2...rptrMonitorPortRunts.1.1 : 12
mib2...rptrMonitorPortRunts.2.1 : 208
mib2...rptrMonitorPortRunts.3.1 : 0
mib2...rptrMonitorPortRunts.4.1 : 0
mib2...rptrMonitorPortRunts.5.1 : 357
mib2...rptrMonitorPortRunts.6.1 : 0
mib2...rptrMonitorPortRunts.7.1 : 224
mib2...rptrMonitorPortRunts.8.1 : 96
mib2...rptrMonitorPortRunts.9.1 : 18
mib2...rptrMonitorPortRunts.10.1 : 10
mib2...rptrMonitorPortRunts.11.1 : 525
mib2...rptrMonitorPortRunts.12.1 : 147
mib2...rptrMonitorPortRunts.13.1 : 75
mib2...rptrMonitorPortRunts.14.1 : 14
mib2...rptrMonitorPortRunts.15.1 : 14
mib2...rptrMonitorPortRunts.16.1 : 14
mib2...rptrMonitorPortRunts.17.1 : 65
mib2...rptrMonitorPortRunts.18.1 : 0
mib2...rptrMonitorPortRunts.19.1 : 0
mib2...rptrMonitorPortRunts.20.1 : 371
mib2...rptrMonitorPortRunts.21.1 : 0
mib2...rptrMonitorPortRunts.22.1 : 277
mib2...rptrMonitorPortRunts.23.1 : 654
mib2...rptrMonitorPortRunts.24.1 : 194
mib2...rptrMonitorPortRunts.25.1 : 2897
mib2...rptrMonitorPortRunts.26.1 : 211
mib2...rptrMonitorPortRunts.27.1 : 389
mib2...rptrMonitorPortRunts.28.1 : 102
mib2...rptrMonitorPortRunts.29.1 : 30
mib2...rptrMonitorPortRunts.30.1 : 160
mib2...rptrMonitorPortRunts.31.1 : 0
mib2...rptrMonitorPortRunts.32.1 : 0
mib2...rptrMonitorPortRunts.33.1 : 6858079
|
4188.5 | 900 tp management questions - FCS/Runt errs on port 33 | WOTVAX::SMITHD | | Mon Feb 03 1997 14:59 | 14 |
| Shawn,
Forgot to mention that it doesn't matter what virtual lan or thinwire segment
the BP is associated with. I continue getting the trap even when the BP is
on a TOTALLY ISOLATED segment bridged to the backbone via the EF switch.
Needless to say when I disconnect the BP port by disassociating the thinwire
from the other segments (NO CONNECTION) I stop getting the traps.
Could you say a few words around the purpose of the BP port and its association
with the MAC (or port 33) on a portswitch 900 tp?
Thanks
Doug
|
4188.6 | | NETCAD::GALLAGHER | | Mon Feb 03 1997 18:31 | 25 |
| Whew. I'm falling behind in this thread.
Port 33 is your backplane thinwire port. It doesn't have anything to do
with your MAC. (I assume when you say "BP port" you mean "backplane thinwire
port", and when you say "virtual LAN" you mean "backplane LAN".)
>These two sets of counters below from offsb2.domainname seem to imply that the
>Portswitch is sending runt packets that may be causing the FCS errors in the
>first place. Why would the BP port be generating significant traffic to cause
>this huge number of runt packets?
I'm confused by "...Portswitch is sending runt packets...". The PORTswitch
would only be repeating FCS and runt packets received on the port, not
"generating" them right?
Your counters seem to indicate that a large number of FCS and runts are
happening on the Hub's backplane thinwire. What else in the hub is attached
to the thinwire?
Since I'm not getting it, how 'bout painting me the big picture? Can
you tell me more about the hub's configuration? Back up a bit and maybe
I'll be able to catch up.
-Shawn
|
4188.7 | configuration info - 900TP alarms | WOTVAX::SMITHD | | Tue Feb 04 1997 06:19 | 98 |
| Thanks again Shawn. Okay, hopefully this will help. I'll try to answer your
questions then give you the configuration for one of the hubs. Any help you
can lend or things I can try would be greatly appreciated!
|Port 33 is your backplane thinwire port. It doesn't have anything to do
|with your MAC. (I assume when you say "BP port" you mean "backplane thinwire
|port", and when you say "virtual LAN" you mean "backplane LAN".)
Glad to hear it. I just wanted to make sure my understanding of the BP port and
MAC in the portswitch display were indeed different. When I disconnect the BP,
by setting 'no connect' in the portswitch display, will I continue to get traps
if they occur? I do stop getting alarms from the 900tp when I disconnect it
from the portswitch display. But I don't understand why.
>These two sets of counters below from offsb2.domainname seem to imply that the
>Portswitch is sending runt packets that may be causing the FCS errors in the
>first place. Why would the BP port be generating significant traffic to cause
>this huge number of runt packets?
|I'm confused by "...Portswitch is sending runt packets...". The PORTswitch
|would only be repeating FCS and runt packets received on the port, not
|"generating" them right?
Appologies for not explaining the the previous note! My 'assumption' was if the
runt packet counters were incrementing, runt packets are either being sent
(repeated onto) or received from the THINWIRE. Since there are NO ports on the
thinwire and the thinwire is not bridged to any other segments, my assumption
was that the portswitches are sending packets (bleeding from another net or
group?) itself or incorrectly incrementing the runt/FCS counters. The runt and
FCS counters increment together, but always more runts than FCS errors. I
would of ignored the counters and traps if I wasn't seeing intermitent network
problems from this hub.
|Your counters seem to indicate that a large number of FCS and runts are
|happening on the Hub's backplane thinwire. What else in the hub is attached
|to the thinwire?
|Since I'm not getting it, how 'bout painting me the big picture? Can
|you tell me more about the hub's configuration? Back up a bit and maybe
|I'll be able to catch up.
Configuration from one of the hubs generating alarms:
900 mx 900ef 900tp 900tp 900tp 900tm ds900 900tp
x x
THINWIRE ------------------*----------------------------------------*
- see notes below -
FDDI backbone ===+======+================================================
OFFLIN ----------+-------+-------+-------+---------------xx-------x
OFFLIN SEG2 ----------+-------+-------+-------+-------------------------
PRINT ----------+-------+---------------+------------------------x
PCRnet ------------------+---------------+-------x-----------------
MCOFFL ----------+-------+--------+------+------------------------X
notes:
1. The THINWIRE and PCRNET are not being bridged via the EF switch and are
internal to the hub.
2. All ports on the repeater and 900tps are in use, except A1 in slot 3 and
A1 in slot 4. I use these for management (e.g. IRIS, SNIFFER, laptop, etc.)
3. There are no external network connections EXCEPT 1 external connection
on port 5 of the EF switch and the connections to the FDDI backbone
(A into the MX/B into the EF). The FDDI connection (B on MX/A on EF) is
pushed into the backplane.
4. Portswitch 900TP information
slot 3 - thinwire tied to an unused group (1), no ports on group 1 or
thinwire, MAC is on OFFLIN
slot 4 - thinwire - no connection, MAC is on OFFLIN, no ports on
thinwire
slot 5 - thinwire - no connection, MAC is on OFFLIN, no ports on
thinwire
slot 8 - thinwire tied to an unused group (1), no ports on group 1 or
thinwire MAC is on OFFLIN
5. Repeater 900tm is group 2 is on the offlin, group 1 is not connected to a
network.
6. The terminal server has both groups 1 and 2 on the OFFLIN net.
7. The MX is connect to the backbone as described in note 3. It also has
separate SAS connections to 2 8400s and an alphastation 600 workstation.
8. In the current configuration I am only getting alarms from the 900tp in
slot 8. When I associate the BP to a group in slots 4 and 5 I get alarms
from all 4, 5 and 8. I have not been receiving alarms from slot 3, but
this maybe because I have not set up this 900tps trap address. I'll
double check this after I send this note.
9. Firmware on the hub is 4.1.1 and the portswitches are 2.1.0.
|
4188.8 | bleeding traffic onto the Backplane Thinwire? | WOTVAX::SMITHD | | Tue Feb 04 1997 14:12 | 18 |
| Per my note 8 in .-1 I put my LAN Analyzer on the tp's in slot 3 and discovered
my PCRnet traffic is bleeding through into group 1 which is tied to the
Backplane Thinwire in my portswitches in slots 3 and 8. The PCRnet connected
ports are in separate groups to provide isolation from all other nets and
their groups are not tied to the Backplane Thinwire. Group 1 in slots 3 and
8 had no ports assigned except my analyzer. The TPs in slots 4 and 5 are
'not connected' to the backplane thinwire.
The PCRnet traffic was NOT on any of the other virtual LANs just the Backplane
Thinwire. My IRIS analyzer did not see any crc or framing errors on this net
yet the counters were still rising and alarms being generated.
When examining some of the ports (8 total connections) I found 3 ports
with 200-300 runts and about 20-50 FCS errors per port. Have you ever heard
of a Portswitch bleeding traffic across groups?
Thanks again, Doug
|
4188.9 | error continues | WOTVAX::SMITHD | | Tue Mar 25 1997 11:01 | 5 |
| The hub was brought up to the latest and greatest firmware revision. The
problem did not change.
All counters and trace were then recollected and have now been sent to
engineering.
|