[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference netcad::hub_mgnt

Title:	DEChub/HUBwatch/PROBEwatch CONFERENCE
Notice:	Firmware -2, Doc -3, Power -4, HW kits -5, firm load -6&7
Moderator:	NETCAD::COLELLADT

Created:	Wed Nov 13 1991
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4455
Total number of notes:	16761

4188.0. "Trap definition for 1.3.6.1.2.1.16" by KERNEL::FREKES (Like a thief in the night) Tue Jan 28 1997 06:46

Folks

I have a customer, running Netview. He is seeing the following being reported.
I have looked through the repeater mib (DEC-HUB900-ERPTR-MIB-V1-1), and I am
still unable to work out what is going.

What I would ideally like is an explanataion as to what this is telling us, and
is it anything to worry about. 

All I have worked out is that it may be reporting some kind of physical problem
on the LAN. I am stuck beyond that. 

Am I using the correct MIB?

Thanks for any help.

Regards
	Steven F

#4566
Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded 
                         threshold 1 value = 4165484 ( Sample type = 2 alarm in

 Information:
    Node: offsb2_tp5.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)     <-- This only tells me it is an RMON
					      trap. Not the specific object.
    Trap: RMON_ALARM - #1                 <-- Is this alarm 1?
    Logged Time: Mon Jan 27 06:36:01 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:


#4567
 Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded 
                           threshold 1 value = 10729 ( Sample type = 2 alarm in

 Information:
    Node: offsb2_tp8.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)
    Trap: RMON_ALARM - #1
    Logged Time: Mon Jan 27 06:36:50 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:


#4568
 Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded 
                         threshold 1 value = 4165485 ( Sample type = 2 alarm in

 Information:
    Node: offths_tp5.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)
    Trap: RMON_ALARM - #1
    Logged Time: Mon Jan 27 06:37:00 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:


#4569
 Description: Interface 133.8.4.1 down

 Information:
    Node: convex_a_o.domainname
    Enterprise: 1.3.6.1.4.1.2.6.3.1 (NetView)
    Trap: IntfDown - #58916867
    Logged Time: Mon Jan 27 06:37:23 1997
    Severity: Critical
    Category: Status Events
    Source: Netmon

Note:


#4570
 Description: Node 

 Information:
    Node: convex_a_o.domainname
    Enterprise: 1.3.6.1.4.1.2.6.3.1 (NetView)
    Trap: NodeDown - #58916865
    Logged Time: Mon Jan 27 06:37:25 1997
    Severity: Critical
    Category: Status Events
    Source: Netmon

Note:


#4571
 Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded 
                         threshold 1 value = 4165486 ( Sample type = 2 alarm in

 Information:
    Node: offths_tp5.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)
    Trap: RMON_ALARM - #1
    Logged Time: Mon Jan 27 06:37:29 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:


#4572
 Description: Interface FDDI down

 Information:
    Node: off36M_mx1.domainname
    Enterprise: 1.3.6.1.4.1.2.6.3.1 (NetView)
    Trap: NodeDown - #58916867
    Logged Time: Mon Jan 27 06:39:28 1997
    Severity: Critical
    Category: Status Events
    Source: Netmon

Note:



....

more down nodes and repeater errors

....

#4597
Description: RMON Rising Alarm: erptrMauRptrInfo.erptrHealthTextChanges.
                    0exceedd threshold 1 value 395 ( sample type = 2 alarm inde

 Information:
    Node: bmwvax_tp8.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)
    Trap: RMON_RALRM  - #1
    Logged Time: Mon Jan 27 06:46:36 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:

#4598
Description: RMON Falling Alarm: erptrMauRptrInfo.erptrMauTotalMediaUnavailable
                             .0fell below threshold 1 value 8 ( sample type = 2

 Information:
    Node: bmwvax_tp8.domainname
    Enterprise: 1.3.6.1.2.1.16 (rmon)
    Trap: RMON_RALRM  - #2
    Logged Time: Mon Jan 27 06:46:37 1997
    Severity: Critical
    Category: Threshold Events
    Source: Agent

Note:

T.R	Title	User	Personal Name	Date	Lines
4188.1		NETCAD::GALLAGHER		`Tue Jan 28 1997 10:56`	192
	Steven, First, some background. Repeaters support the RMON (rfc1757) Alarm and Event groups. We ship the repeaters with a few (7?) default alarms set from the factory. Some of the defaults are: - when a port is auto-partitioned, - when a dual-port redundancy failover occurs, - when a security violation occurs, - etc. The one you're seeing is for erptrTotalRptrErrors. This object counts "bad things" (more on this later) on the repeater ports. The "bad things" can be coming from any station attached to the repeater. From your capture: >#4566 >Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded > threshold 1 value = 4165484 ( Sample type = 2 alarm in This tells you that object erptrTotalRptrErrors.0 exceeded a threshold of 1. erptrTotalRptrErrors is defined as follows: erptrTotalRptrErrors OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "The total number of errors which have occurred on all the groups in a repeater. This object is a summation of the values of the rptMonitorGroupTotalErrors as defined in RFC 1516 for all of the groups in a repeater." REFERENCE "Reference RFC 1516 repeater MIB" ::= { erptrRptrInfo 6 } And rptrMonitorGroupTotalErrors is defined in the repeater MIB as follows: rptrMonitorGroupTotalErrors OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "The total number of errors which have occurred on all of the ports in this group. This counter is the summation of the values of the rptrMonitorPortTotalErrors counters for all of the ports in the group." ::= { rptrMonitorGroupEntry 4 } And (finally!) rptrMonitorPortTotalErrors is defined as follows: rptrMonitorPortTotalErrors OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "The total number of errors which have occurred on this port. This counter is the summation of the values of other error counters (for the same port), namely: rptrMonitorPortFCSErrors, rptrMonitorPortAlignmentErrors, rptrMonitorPortFrameTooLongs, rptrMonitorPortShortEvents, rptrMonitorPortLateEvents, rptrMonitorPortVeryLongEvents, and rptrMonitorPortDataRateMismatches. This counter is redundant in the sense that it is the summation of information already available through other objects. However, it is included specifically because the regular retrieval of this object as a means of tracking the health of a port provides a considerable optimization of network management traffic over the otherwise necessary retrieval of the summed counters." ::= { rptrMonitorPortEntry 15 } So, you should look at all the error counters on all of the repeater ports to find out which port(s), and which error(s) are causing the traps. I noticed that the value of the error counter is huge (4165484)! > > Information: > Node: offsb2_tp5.domainname > Enterprise: 1.3.6.1.2.1.16 (rmon) <-- This only tells me it is an RMON > trap. Not the specific object. > Trap: RMON_ALARM - #1 <-- Is this alarm 1? I think it tells you that its an RMON risingAlarm trap. (risingAlarm = 1, fallingAlarm = 2.) > Logged Time: Mon Jan 27 06:36:01 1997 > Severity: Critical > Category: Threshold Events > Source: Agent I've deleted all of the erptrRptrInfo.erptrTotalRptrErrors.0 entries from your log. >Description: RMON Rising Alarm: erptrMauRptrInfo.erptrHealthTextChanges. > 0exceedd threshold 1 value 395 ( sample type = 2 alarm inde > > Information: > Node: bmwvax_tp8.domainname > Enterprise: 1.3.6.1.2.1.16 (rmon) > Trap: RMON_RALRM - #1 > Logged Time: Mon Jan 27 06:46:36 1997 > Severity: Critical > Category: Threshold Events > Source: Agent erptrMauRptrInfo.erptrHealthTextChanges is defined in the extended repeater MIB as: erptrHealthTextChanges OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION "This counter increments each time the rptrHealthText object defined in RFC 1516 is modified." REFERENCE "Reference RFC 1516 repeater MIB" ::= { erptrRptrInfo 4 } rptrHealthText OBJECT-TYPE SYNTAX DisplayString (SIZE (0..255)) ACCESS read-only STATUS mandatory DESCRIPTION "The health text object is a text string that provides information relevant to the operational state of the repeater. Agents may use this string to provide detailed information on current failures, including how they were detected, and/or instructions for problem resolution. The contents are agent-specific." REFERENCE "Reference IEEE 802.3 Rptr Mgt, 19.2.3.2, aRepeaterHealthText." ::= { rptrRptrInfo 3 } This may be telling you of auto-partitioned ports. The value is fairly small (395). You might want to take a couple looks at "Health Test" in the upper right hand corner of the ClearVISN MCM Repeater Summary view. Also, if your auto-partition reconnect algorithm (Repeater Summary View) is "standard", and this is coming from the same repeater that's experiencing errors, then one theory might be that: - a station on the port is generating errors (excessive length collisions or excessive number of consecutive collisions, etc.), - the errors generate the erptrTotalRptrErrors trap, - the repeater auto-partitions the port, generating the erptrHealthTextChanges trap, - the station transmits a few good packets, - the repeater reconnects the auto-partitioned port, generating another erptrHealthTextChanges trap, - repeat forever. It's just a theory. >#4598 >Description: RMON Falling Alarm: erptrMauRptrInfo.erptrMauTotalMediaUnavailable > .0fell below threshold 1 value 8 ( sample type = 2 > > Information: > Node: bmwvax_tp8.domainname > Enterprise: 1.3.6.1.2.1.16 (rmon) > Trap: RMON_RALRM - #2 > Logged Time: Mon Jan 27 06:46:37 1997 > Severity: Critical > Category: Threshold Events > Source: Agent This is telling you that someone connected a MAU. (erptrMauTotalMedia- Unavailable decreased by one, generating an RMON fallingAlarm trap.) -Shawn
4188.2		NETCAD::GALLAGHER		`Tue Jan 28 1997 11:10`	17
	p.s. Hopefully you can see that RMON traps are useful in debugging your network. A obvious problem is that if someone as experienced as Steven can't figure out what's going on, then how can Joe or Jill Customer be expected to figure it out? Generic NMS like Netview can't do a very good job translating the traps into something readable. ClearVISN Multi Chassis Manager can translate the traps into meaningful error messages because it knows that specifics about our products. It could be that work's already going on in this area. Comments from the MCM folks? -Shawn
4188.3	Thanks	KERNEL::FREKES	Like a thief in the night	`Wed Jan 29 1997 04:31`	5
	Shawn Thanks for the tip. I am passing this onto the customer. Steven
4188.4	FCS errors con BP port causing traps	WOTVAX::SMITHD		`Mon Feb 03 1997 11:22`	98
	Shawn, I understand what the trap means, my problem is tying the total errors back to something in the portswitch that makes sense! This problem occurs on 2 of my busiest hubs. Below are the counters that trigger the 'total' RMON Rising Alarm. Steven didn't mention that the errors that trigger the alarms are FCS coming on port 33! Port 33 is the BP port, right? Loop detection is enabled and both hubs have been checked for proper configuration. About once a month I lose the connections on the hub. The trap scenario detailing the outage is in the original note. Hub version 4.1, portswitch 900tp version is 2.1.0 These two sets of counters below from offsb2.domainname seem to imply that the Portswitch is sending runt packets that may be causing the FCS errors in the first place. Why would the BP port be generating significant traffic to cause this huge number of runt packets? What do you think? Is this a known problem? Thanks Doug Smith >#4566 >Description: RMON Rising Alarm: erptrRptrInfo.erptrTotalRptrErrors.0exceeded > threshold 1 value = 4165484 ( Sample type = 2 alarm in Counters taken a few days later... mib2...rptrMonitorPortFCSErrors.1.1 : 3 mib2...rptrMonitorPortFCSErrors.2.1 : 0 mib2...rptrMonitorPortFCSErrors.3.1 : 0 mib2...rptrMonitorPortFCSErrors.4.1 : 0 mib2...rptrMonitorPortFCSErrors.5.1 : 0 mib2...rptrMonitorPortFCSErrors.6.1 : 0 mib2...rptrMonitorPortFCSErrors.7.1 : 0 mib2...rptrMonitorPortFCSErrors.8.1 : 0 mib2...rptrMonitorPortFCSErrors.9.1 : 0 mib2...rptrMonitorPortFCSErrors.10.1 : 0 mib2...rptrMonitorPortFCSErrors.11.1 : 0 mib2...rptrMonitorPortFCSErrors.12.1 : 0 mib2...rptrMonitorPortFCSErrors.13.1 : 0 mib2...rptrMonitorPortFCSErrors.14.1 : 0 mib2...rptrMonitorPortFCSErrors.15.1 : 0 mib2...rptrMonitorPortFCSErrors.16.1 : 0 mib2...rptrMonitorPortFCSErrors.17.1 : 0 mib2...rptrMonitorPortFCSErrors.18.1 : 0 mib2...rptrMonitorPortFCSErrors.19.1 : 0 mib2...rptrMonitorPortFCSErrors.20.1 : 0 mib2...rptrMonitorPortFCSErrors.21.1 : 0 mib2...rptrMonitorPortFCSErrors.22.1 : 0 mib2...rptrMonitorPortFCSErrors.23.1 : 0 mib2...rptrMonitorPortFCSErrors.24.1 : 0 mib2...rptrMonitorPortFCSErrors.25.1 : 0 mib2...rptrMonitorPortFCSErrors.26.1 : 0 mib2...rptrMonitorPortFCSErrors.27.1 : 0 mib2...rptrMonitorPortFCSErrors.28.1 : 0 mib2...rptrMonitorPortFCSErrors.29.1 : 0 mib2...rptrMonitorPortFCSErrors.30.1 : 0 mib2...rptrMonitorPortFCSErrors.31.1 : 0 mib2...rptrMonitorPortFCSErrors.32.1 : 0 mib2...rptrMonitorPortFCSErrors.33.1 : 6695441 mib2...rptrMonitorPortRunts.1.1 : 12 mib2...rptrMonitorPortRunts.2.1 : 208 mib2...rptrMonitorPortRunts.3.1 : 0 mib2...rptrMonitorPortRunts.4.1 : 0 mib2...rptrMonitorPortRunts.5.1 : 357 mib2...rptrMonitorPortRunts.6.1 : 0 mib2...rptrMonitorPortRunts.7.1 : 224 mib2...rptrMonitorPortRunts.8.1 : 96 mib2...rptrMonitorPortRunts.9.1 : 18 mib2...rptrMonitorPortRunts.10.1 : 10 mib2...rptrMonitorPortRunts.11.1 : 525 mib2...rptrMonitorPortRunts.12.1 : 147 mib2...rptrMonitorPortRunts.13.1 : 75 mib2...rptrMonitorPortRunts.14.1 : 14 mib2...rptrMonitorPortRunts.15.1 : 14 mib2...rptrMonitorPortRunts.16.1 : 14 mib2...rptrMonitorPortRunts.17.1 : 65 mib2...rptrMonitorPortRunts.18.1 : 0 mib2...rptrMonitorPortRunts.19.1 : 0 mib2...rptrMonitorPortRunts.20.1 : 371 mib2...rptrMonitorPortRunts.21.1 : 0 mib2...rptrMonitorPortRunts.22.1 : 277 mib2...rptrMonitorPortRunts.23.1 : 654 mib2...rptrMonitorPortRunts.24.1 : 194 mib2...rptrMonitorPortRunts.25.1 : 2897 mib2...rptrMonitorPortRunts.26.1 : 211 mib2...rptrMonitorPortRunts.27.1 : 389 mib2...rptrMonitorPortRunts.28.1 : 102 mib2...rptrMonitorPortRunts.29.1 : 30 mib2...rptrMonitorPortRunts.30.1 : 160 mib2...rptrMonitorPortRunts.31.1 : 0 mib2...rptrMonitorPortRunts.32.1 : 0 mib2...rptrMonitorPortRunts.33.1 : 6858079
4188.5	900 tp management questions - FCS/Runt errs on port 33	WOTVAX::SMITHD		`Mon Feb 03 1997 14:59`	14
	Shawn, Forgot to mention that it doesn't matter what virtual lan or thinwire segment the BP is associated with. I continue getting the trap even when the BP is on a TOTALLY ISOLATED segment bridged to the backbone via the EF switch. Needless to say when I disconnect the BP port by disassociating the thinwire from the other segments (NO CONNECTION) I stop getting the traps. Could you say a few words around the purpose of the BP port and its association with the MAC (or port 33) on a portswitch 900 tp? Thanks Doug
4188.6		NETCAD::GALLAGHER		`Mon Feb 03 1997 18:31`	25
	Whew. I'm falling behind in this thread. Port 33 is your backplane thinwire port. It doesn't have anything to do with your MAC. (I assume when you say "BP port" you mean "backplane thinwire port", and when you say "virtual LAN" you mean "backplane LAN".) >These two sets of counters below from offsb2.domainname seem to imply that the >Portswitch is sending runt packets that may be causing the FCS errors in the >first place. Why would the BP port be generating significant traffic to cause >this huge number of runt packets? I'm confused by "...Portswitch is sending runt packets...". The PORTswitch would only be repeating FCS and runt packets received on the port, not "generating" them right? Your counters seem to indicate that a large number of FCS and runts are happening on the Hub's backplane thinwire. What else in the hub is attached to the thinwire? Since I'm not getting it, how 'bout painting me the big picture? Can you tell me more about the hub's configuration? Back up a bit and maybe I'll be able to catch up. -Shawn
4188.7	configuration info - 900TP alarms	WOTVAX::SMITHD		`Tue Feb 04 1997 06:19`	98
	Thanks again Shawn. Okay, hopefully this will help. I'll try to answer your questions then give you the configuration for one of the hubs. Any help you can lend or things I can try would be greatly appreciated! \|Port 33 is your backplane thinwire port. It doesn't have anything to do \|with your MAC. (I assume when you say "BP port" you mean "backplane thinwire \|port", and when you say "virtual LAN" you mean "backplane LAN".) Glad to hear it. I just wanted to make sure my understanding of the BP port and MAC in the portswitch display were indeed different. When I disconnect the BP, by setting 'no connect' in the portswitch display, will I continue to get traps if they occur? I do stop getting alarms from the 900tp when I disconnect it from the portswitch display. But I don't understand why. >These two sets of counters below from offsb2.domainname seem to imply that the >Portswitch is sending runt packets that may be causing the FCS errors in the >first place. Why would the BP port be generating significant traffic to cause >this huge number of runt packets? \|I'm confused by "...Portswitch is sending runt packets...". The PORTswitch \|would only be repeating FCS and runt packets received on the port, not \|"generating" them right? Appologies for not explaining the the previous note! My 'assumption' was if the runt packet counters were incrementing, runt packets are either being sent (repeated onto) or received from the THINWIRE. Since there are NO ports on the thinwire and the thinwire is not bridged to any other segments, my assumption was that the portswitches are sending packets (bleeding from another net or group?) itself or incorrectly incrementing the runt/FCS counters. The runt and FCS counters increment together, but always more runts than FCS errors. I would of ignored the counters and traps if I wasn't seeing intermitent network problems from this hub. \|Your counters seem to indicate that a large number of FCS and runts are \|happening on the Hub's backplane thinwire. What else in the hub is attached \|to the thinwire? \|Since I'm not getting it, how 'bout painting me the big picture? Can \|you tell me more about the hub's configuration? Back up a bit and maybe \|I'll be able to catch up. Configuration from one of the hubs generating alarms: 900 mx 900ef 900tp 900tp 900tp 900tm ds900 900tp x x THINWIRE ---------------------------------------------------------- - see notes below - FDDI backbone ===+======+================================================ OFFLIN ----------+-------+-------+-------+---------------xx-------x OFFLIN SEG2 ----------+-------+-------+-------+------------------------- PRINT ----------+-------+---------------+------------------------x PCRnet ------------------+---------------+-------x----------------- MCOFFL ----------+-------+--------+------+------------------------X notes: 1. The THINWIRE and PCRNET are not being bridged via the EF switch and are internal to the hub. 2. All ports on the repeater and 900tps are in use, except A1 in slot 3 and A1 in slot 4. I use these for management (e.g. IRIS, SNIFFER, laptop, etc.) 3. There are no external network connections EXCEPT 1 external connection on port 5 of the EF switch and the connections to the FDDI backbone (A into the MX/B into the EF). The FDDI connection (B on MX/A on EF) is pushed into the backplane. 4. Portswitch 900TP information slot 3 - thinwire tied to an unused group (1), no ports on group 1 or thinwire, MAC is on OFFLIN slot 4 - thinwire - no connection, MAC is on OFFLIN, no ports on thinwire slot 5 - thinwire - no connection, MAC is on OFFLIN, no ports on thinwire slot 8 - thinwire tied to an unused group (1), no ports on group 1 or thinwire MAC is on OFFLIN 5. Repeater 900tm is group 2 is on the offlin, group 1 is not connected to a network. 6. The terminal server has both groups 1 and 2 on the OFFLIN net. 7. The MX is connect to the backbone as described in note 3. It also has separate SAS connections to 2 8400s and an alphastation 600 workstation. 8. In the current configuration I am only getting alarms from the 900tp in slot 8. When I associate the BP to a group in slots 4 and 5 I get alarms from all 4, 5 and 8. I have not been receiving alarms from slot 3, but this maybe because I have not set up this 900tps trap address. I'll double check this after I send this note. 9. Firmware on the hub is 4.1.1 and the portswitches are 2.1.0.
4188.8	bleeding traffic onto the Backplane Thinwire?	WOTVAX::SMITHD		`Tue Feb 04 1997 14:12`	18
	Per my note 8 in .-1 I put my LAN Analyzer on the tp's in slot 3 and discovered my PCRnet traffic is bleeding through into group 1 which is tied to the Backplane Thinwire in my portswitches in slots 3 and 8. The PCRnet connected ports are in separate groups to provide isolation from all other nets and their groups are not tied to the Backplane Thinwire. Group 1 in slots 3 and 8 had no ports assigned except my analyzer. The TPs in slots 4 and 5 are 'not connected' to the backplane thinwire. The PCRnet traffic was NOT on any of the other virtual LANs just the Backplane Thinwire. My IRIS analyzer did not see any crc or framing errors on this net yet the counters were still rising and alarms being generated. When examining some of the ports (8 total connections) I found 3 ports with 200-300 runts and about 20-50 FCS errors per port. Have you ever heard of a Portswitch bleeding traffic across groups? Thanks again, Doug
4188.9	error continues	WOTVAX::SMITHD		`Tue Mar 25 1997 11:01`	5
	The hub was brought up to the latest and greatest firmware revision. The problem did not change. All counters and trace were then recollected and have now been sent to engineering.