T.R | Title | User | Personal Name | Date | Lines |
---|
928.1 | Bummer! | NPSS::SOLOWAY | Stu Soloway 226-7651 | Thu Feb 13 1997 11:42 | 9 |
| Are you sure the GIGAswitch was the root? If someone decided that they
had a very low root priority and started sending out funny-looking
hellos with that priority, everyone else, including the GIGAswitch,
would propagate the bad hellos. When you removed the GIGAswitch from
the network, did it continue to send out bogus hellos? If not, I'd
suspect the problem was elsewhere.
What firmware release are you running on the GIGAswitch?
|
928.2 | | NPSS::MDLYONS | Michael D. Lyons DTN 226-6943 | Thu Feb 13 1997 12:54 | 15 |
| I suppose some code could be added to check that someone else isn't
trying to claim the broadcast address is the root. 802.1d doesn't
specify much about range-checking any of the parameters in received
hellos.
Unfortunately, most of the evidence has probably disappeared.
If the GIGAswitch/FDDI system hasn't been rebooted, but is still
sending out BPDUs with all FFs, you should look to see from where the
GIGAswitch/FDDI system believes it's receiving those BPDUs (the root),
or if it believe that it is the root itself.
Are you running multiple spanning trees?
MDL
|
928.3 | power browned while GIGAswitch was being removed | MARVIN::RIGBY | No such thing as an alpha beta | Thu Feb 13 1997 12:56 | 42 |
| Thanks for your reply.
> Are you sure the GIGAswitch was the root?
No, I'm not. I was just dragged in to try and work out what the problem with the
network was, I'm a user rather than the network controller but I'll pass on the
possibilities. From memory the root priority of FF-FF-FF-FF-FF-FF was reported
as 0.
>If someone decided that they had a very low root priority and started sending
>out funny-looking hellos with that priority, everyone else, including the
>GIGAswitch, would propagate the bad hellos.
I'd forgotten that some other root could have caused this. We'd have to try and
find the inlink and look on there to see if these hello were still arriving.
Interestingly the DECNIS I was tracing was sending and receiving
TopologyChangeNotifications every second, the GIGAswitch was sending hellos
every 256 seconds with TopologyChangeAck - I'll need to check if the DECNIS TCN
hellos are sent too often when the hello timer has been 'changed'. So few people
ever touch the hello time its possible that the DECNIS has a day one bug here.
>When you removed the GIGAswitch from the network, did it continue to send out
>bogus hellos? If not, I'd suspect the problem was elsewhere.
Well, just to make network analysis completely impossible the power to DECpark
browned and EVERYTHING went bad. A significant proportion of the office area
didn't recover from the brown-out (probably tripped a circuit breaker) so now
I'm writing this from home and I don't know what happened for the rest of the
afternoon. Certainly the network is usable again but then a restart of
everything in the building might well have cleared it up anyway, even without
the removal of the GIGAswitch.
When I get back in (which will be Monday - I'm on leave tomorrow) I'll try to
find out were we are and try to restore the GIGAswitches reputation because, as
you point out, it could have been an innocent bystander. Of course, if the
GIGAswitch is still out of the network everything might be behaving better -
even in the face of some bad root - because parallel paths might no longer be
present and you don't really need spanning tree if there are no loops
I'll update when I have anymore information.
John
|
928.4 | Checking more than the standard is always dodgy - been there!-( | MARVIN::RIGBY | No such thing as an alpha beta | Thu Feb 13 1997 13:03 | 29 |
| >I suppose some code could be added to check that someone else isn't trying to
>claim the broadcast address is the root. 802.1d doesn't specify much about
>range-checking any of the parameters in received hellos.
Interestingly, what would you do if you did check for this. As 99% of other
bridges wouldn't be doing the same sanity checks you'd get a very unpleasant
mess.
>If the GIGAswitch/FDDI system hasn't been rebooted, but is still sending out
>BPDUs with all FFs, you should look to see from where the GIGAswitch/FDDI system
>believes it's receiving those BPDUs (the root), or if it believe that it is the
>root itself.
See my crossing reply - we had a site-wide power brown out while all this was
going on. As you say, all the evidence will have gone.
>Are you running multiple spanning trees?
No
In case this ever happens again, what's the best way to find out root
information from the console (you can't use any network tools because the LAN
isn't working very well)?
I presume you can't think of any way that the GIGAswitch could have been the
real source of the bad hello - even if there was some sort of memory problem in
the switch?
John
|
928.5 | You can look at MIB objects through OBM. | NPSS::SOLOWAY | Stu Soloway 226-7651 | Thu Feb 13 1997 13:25 | 14 |
| If there were bad memory in the GIGAswitch SCP, there's no way I could
rule out anything. As it turns out, the address FFFFFFFFFFFF
is a default address the GIGAswitch uses for bridge IDs in various
places until it knows better. But that code has been copied from
older DEC products, so I'm sure that's true of all our bridges.
If you are running SCP firmware rev 3.10, you can look at all MIB
objects from OBM, so even if your switch is inaccessible you can still
get spanning tree parameters. Use main menu option 11, and remember
that everything needs an instance number. (For example, you would say
"dot1dStpPriority.0", not "dot1dStpPriority".)
It wouldn't hurt to check your SCP error log to see if there was
anything unusual happening at the time of this incident.
|
928.6 | GIGAswitch exonerated... | MARVIN::RIGBY | No such thing as an alpha beta | Mon Feb 17 1997 10:37 | 32 |
| .3>When I get back in (which will be Monday - I'm on leave tomorrow) I'll try...
Well, its Monday. We've been experimenting with the GIGAswitch - which, by the
way, survived Thursdays brown-out and was running on generator power for as long
as it needed. It had successfully recovered from the bad hellos and was now
transmitting hellos on its ports with itself as root at priority 1 and hello
times of 1 second.
It was no longer connected to anything, however. So to try out whether it needed
the datalinks to bounce to recover we built a bad hello with an FDDI traffic
generator and sent
00000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
as a bridge packet. We saw 255 TopologyChangeNotifications sent back out on that
port at 1 second intervals and then the GIGAswitch reverted back to being the
root. We tried sending a TopologyChangeAcknowledge which stopped the TCNs as
expected and the GIGAswitch reverted back to being root 256 seconds later, as
expected.
It seems that there is something on our LAN here (which is used for all sorts of
development work) which causes these bad hellos on a regular basis - its been
happenning on and off since November. Unfortunately it appears we do have some
bugs in DECNIS I'll have to go off and track down, we don't behave at all well
when these hellos arrive.
Subsequent to my original post I was reminded that the MaxAge field of the BPDU
was not all ones but a few seconds less - confiming the theory that the
GIGAswitch was merely using the bad hellos as required by the bridge spec.
Can the GIGAswitch be set to use priority 0, at least that would prevent this
other box every taking over as root.
|
928.7 | | NPSS::MDLYONS | Michael D. Lyons DTN 226-6943 | Mon Feb 17 1997 18:32 | 3 |
| ...yes, the GIGAswitch/FDDI system here in LKG is set to 0....
MDL
|
928.8 | Now Root is 0 | RDGENG::GREID | I'm a firestarter, twisted firestarter | Tue Feb 18 1997 04:49 | 8 |
|
I have set our Gigaswitch to Root Priority 0 now, and all the DECnis
Routers running Bridging have updated themselves now I have connected
it back up to the LAN. This should guard against this bad Hello we saw
with a priority of 0 but an address of all F's. Unless of course
we get another with a lower address than the Gigaswitch. ;-)
Giles.
|