T.R | Title | User | Personal Name | Date | Lines |
---|
8678.1 | | SMURF::MENNER | it's just a box of Pax.. | Mon Feb 03 1997 08:48 | 1 |
| Does netstat show any dropped packets?
|
8678.2 | | IOSG::MARSHALL | | Mon Feb 03 1997 09:39 | 11 |
| Running netstat in a (local) window while the problem is occurring in a remote
window shows no dropped packets, and no increase in the number of error packets.
While the system is quiescent (ie no explicit network activity), there are about
ten input packets and two output packets per second, presumably related to
keeping links 'alive'.
The total number of error packets is 555 (input), 1 (output); is that reasonable
for a machine that's been up a week or so?
Scott
|
8678.3 | too high | SMURF::DUSTIN | | Mon Feb 03 1997 16:55 | 7 |
| No, input errors shouldn't be that high. I have 6 input errors
over the last 55 days.
Get us a "netstat -is" so we can see what the input errors are.
John
|
8678.4 | Output from netstat -is | IOSG::MARSHALL | | Wed Feb 05 1997 12:47 | 31 |
| Well, input errors are up to 710 now, although these errors don't seem to
coincide with the network delays.
Here's the output from netstat -is. I guess the "Block check error" and
"Framing Error" lines are the significant ones. What do they mean?
tu0 Ethernet counters at Wed Feb 5 17:21:44 1997
65535 seconds since last zeroed
966491707 bytes received
27988323 bytes sent
10333496 data blocks received
393414 data blocks sent
936096342 multicast bytes received
10057314 multicast blocks received
1287953 multicast bytes sent
9451 multicast blocks sent
0 blocks sent, initially deferred
0 blocks sent, single collision
0 blocks sent, multiple collisions
1 send failures, reasons include:
0 collision detect check failure
710 receive failures, reasons include:
Block check error
Framing Error
0 unrecognized frame destination
0 data overruns
0 system buffer unavailable
0 user buffer unavailable
Scott
|
8678.5 | related question | LEXSS1::GINGER | Ron Ginger | Wed Feb 05 1997 15:58 | 3 |
| In the previous note the 'blocks received' and the 'multicast blocks'
are nearly the same value. Is it normal to see such a high multicast?
What causes these?
|
8678.6 | | IOSG::MARSHALL | | Thu Feb 06 1997 05:16 | 28 |
| re .5
Don't know if this answers the question, but: our local network topology is UTP,
with all workstation nodes (inc. my AS255) on individual UTP wires from a bank
of repeaters that feed into the 'main' network in the machine room.
Most workstation nodes have a very small /etc/hosts (or PC equiv; most of the
nodes are W95 or NT PCs) file, containing just the names/addresses of the two
name servers in our area.
Would the lack of 'complete' address databases cause in increase in multicasts?
Do PCs use multicasts significantly more than Unix (I am led to believe the
NetBEUI uses broadcasts a lot), or specifically more than Unix is tuned to
handle?
I did a test: from my AS255, I pinged another Unix node in the machine room,
while simultaneously pinging my machine from that node, letting it run for about
a minute.
Other node pinging my node gives: round-trip (ms) min/avg/max = 0/0/0 ms
My node pinging other node gives: round-trip (ms) min/avg/max = 0/1/7 ms
Not particularly conclusive, but repeating the test always gives a longer
round-trip time when starting at my node.
Anything else I can try?
Scott
|
8678.7 | | SMURF::MENNER | it's just a box of Pax.. | Thu Feb 06 1997 08:10 | 5 |
| As far as i know UNIX has no problem dealing with multicasts. The
710 receive errors (block check errors; Framing errors) point to a
hardware problem with either your network adapter or with a network
adapter somewhere else on the net. You really need a sniffer to
find out more.
|
8678.8 | How can I force FULL-DUPLEX mode? | IOSG::MARSHALL | | Thu Mar 27 1997 11:55 | 30 |
| This problem hasn't gone away, but I've been trying to get hold of a 'sniffer'
as per .7 to analyse this some more. But unfortunately it seems we have only
one network protocol analyser, and I'm a very low priority, so no joy yet.
My latest thought concerns full-duplex vs simplex. I have a UTP port, and at
the console I see:
ewa0_mode = Full Duplex, Twisted Pair
As the system boots, I see the message:
tu0: console mode: selecting 10BasetT (UTP) port: full duplex
But after booting, ifconfig -a gives:
tu0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX>
The wiring and routers I'm connected to support full duplex, and I've been
recommended to try that as a remedy for this problem. But given the console and
boot-time settings, why does ifconfig claim SIMPLEX, or is it lying? If it
really is simplex, how can I change it to full duplex?
>> a hardware problem with either your network adapter or with a network
>> adapter somewhere else on the net
The network adapter on my machine is on the motherboard (AS255), which has been
swapped since this problem began (for another reason), without making any
difference, so I think my end of the hardware is 'clean'.
The other end of the wire has been moved from port to port on one router, and to
different routers, without making any difference. No-one else on the same
routers has the problem, so I don't think it lies there either.
|
8678.9 | | netrix.lkg.dec.com::thomas | The Code Warrior | Thu Mar 27 1997 12:28 | 3 |
| SIMPLEX means the device does not listen to its transmit and has absolutely
nothing with to do with full-duplex.
|
8678.10 | | IOSG::MARSHALL | | Tue Apr 01 1997 11:43 | 5 |
| Ahhh... a confusing re-use of terminology. I take it this means the duplex
thing is a red-herring and I should continue looking elsewhere.
Ta,
Scott
|
8678.11 | It's something in the AS255, either h/w or UNIX... | IOSG::MARSHALL | | Wed Apr 16 1997 12:17 | 40 |
| The problem persists, and a protocol analyser on the net hasn't uncovered
anything much; here is the current situation:
There is nothing wrong with the rest of the network.
It isn't a complete network "hang"; I can have two windows with remote sessions
(to the same remote machine) and one can hang while the other is fine; then a
few seconds later the situation could reverse. It isn't local loading either;
all local apps work fine, both during and in between the network glitches, and
there's very little happening on the system.
The problem is worse when network traffic increases, but the network is nowhere
near saturated, and the problem persists even when the network is otherwise
quiescent.
Also, I've installed DECnet, and the same symptoms occur over 'dlogin' sessions
as well as 'rlogin' ones.
We notice similar symptoms on all AS255s, but not on other machines (this isn't
conclusive yet, but is a definite trend). All the affected AS255s run UNIX
V4.0x, whereas most other machines are on UNIX V3.x or non-UNIX operating
systems.
So, assuming it's not hardware-related, could there be something in UNIX V4.0x
causing this problem (I've upgarded from 4.0A to 4.0B with no change)? Maybe
something in the device driver that doesn't interact properly with the version
of the network chip in these machines?
I'm thinking along the lines of the sound problem in AS255s where the audio
codec manufacturer changed their spec such that the UNIX device driver no longer
did the right thing with the chip. Could something similar be true for the
network stuff?
Any suggestions on how we can track this down?
Oh, and as an aside, why would netstat start saying "no namelist" and not give
any output?
Thanks,
Scott
|
8678.12 | Just tests | KEIKI::WHITE | MIN(2�,FWIW) | Thu Apr 17 1997 03:56 | 11 |
|
Simplex is the red herring not Full Duplex.
try setting ewa0_mode to twisted
>>>set ewa0_mode Return will list the syntax for different settings
Also how long a run of twisted pair are you using? If over 60
Meters try a test and run a short cable to the repeater.
Bill
|
8678.13 | Another possibility... | ADISSW::TENHAVE | | Thu Apr 17 1997 09:02 | 16 |
|
After you work out any console settings....
Do you a PCI card in the bottom most slot (closest to motherboard)?
This PCI slot shares an interrupt with the eisa/isa bus. This shared
interrupt has an affect on your embedded network adapter. If you do
have a PCI card and an available slot above this lower slot, move this
card out of the lower slot. This wil most likely cause a kernel
rebuild depending on your kernel config file (to pick up your new
hardware configuration - moved PCI boards). If you don't have an
available slot, shuffle your PCI cards around. Try the graphics card
in the lower slot...
It is worth a try, Tim
|
8678.14 | Progress? | IOSG::MARSHALL | | Fri Apr 18 1997 13:56 | 18 |
| re .13: No, there's nothing in the bottom slot; the only card is the graphics
one. But I'll bear this in mind if I get any more cards.
re .12: At the console, I changed ewa0_mode from "Full Duplex, Twisted Pair", to
just "Twisted Pair". When UNIX boots, it now claims that tu0 is half-duplex
instead of full-duplex.
The effect of this seems to be that the several-second "hangs" no longer occur.
The response is still lumpy, but doesn't seem quite so bad. This isn't
conclusive yet, but the tests so far suggest an improvement.
To verify whether this is the case, can someone please explain what difference
the half/full duplex setting actually makes? Yes, I know what the words mean in
terms of comms technology, but what is the practical upshot in this case, and
why would it make a difference?
Many thanks for the suggestions so far, I'm glad I'm finally getting somewhere!
Scott
|
8678.15 | Start | KEIKI::WHITE | MIN(2�,FWIW) | Fri Apr 18 1997 22:28 | 17 |
|
If you are attached to a repeater then full duplex should never be
used. Only Switches and bridges and other computers are capable of
full duplex and not all of those either.
Question why are we shipping these workstations configured for
Full Duplex?
In very rough terms when the card is configured for full duplex
we assume we can transmit whenever we want and do not monitor for
collisions. Our transmit packets can easily cause late collisions and or
CRC's, and the retransmissions would take place very slowly since
upper layers of each protocol would have to timeout first.
Bill
PS - Are you using over 30 meters of 10BaseT cable?
|
8678.16 | Makes sense... | IOSG::MARSHALL | | Mon Apr 21 1997 05:55 | 12 |
| Bill,
Thanks for the very informative explanation. I think I understand what's going
on now, and your description would account for the 'hangs', and also for why the
hangs are more frequent when the network is busier.
As for why my machine was configured for full-duplex: I don't think it was
shipped that way, I think it was one of the things I changed at some point while
investigating this problem (having been told by our networks guy that that's how
it should be set!).
Scott
|
8678.17 | | IOSG::MARSHALL | | Mon Apr 21 1997 06:06 | 10 |
| Oh, just seen your PS: I don't know the exact length, as the cable goes through
ducts and the ceiling space to get to the repeater, but assuming it follows a
"sensible" path, there would be about 90 to 100 feet, so yes, it is knocking
30m. Would this length cause significant degradation of the signal?
Perhaps more significant than the length, the cable is in conduits with
electricity supply leads, etc; I don't know how resilient they are to
interference.
Scott
|
8678.18 | | KEIKI::WHITE | MIN(2�,FWIW) | Mon Apr 21 1997 21:25 | 9 |
|
Well 10BaseT should never be run in close proximity to and parallel
with anything inherently noisy. However 30 Meters should be short
enough that most signal degrading problems should be eliminated as
a cause for your problems.
Have the errors you saw earlier gone away?
Bill
|
8678.19 | 10BaseT specs | QUARRY::reeves | Jon Reeves, UNIX compiler group | Tue Apr 22 1997 11:41 | 5 |
| My notes on the 10BaseT specs say 90m is the limit of the run, so that's
probably not your problem, but they also specify a one foot separation from
parallel power conduits, so that may well be your problem. I'd beat up your
wiring contractor, who is obviously incompetent if they ran 10BaseT wiring
in the same conduit as power.
|
8678.20 | | IOSG::MARSHALL | | Thu Apr 24 1997 07:17 | 15 |
| re .18
I am happy to report (via netstat) no send or receive failures, and no error
packets, compared with the rather large number I used to get.
Things are still lumpy, but it's not as bad as it was.
re .19
Unfortunately the wiring guys are at the mercy of the conduit system available
in our (DEC standard issue) partition walling. Also, I'd rather not beat them
up as they're also our system managers, and as we all know, it pays to be nice
to your system manager :-)
Scott
|
8678.21 | Everyone feels any lumps on a shared segment | KEIKI::WHITE | MIN(2�,FWIW) | Thu Apr 24 1997 20:33 | 7 |
|
You might be at the mercy of someone elses lumps.
Check the other systems on your network for incorrect duplex
settings.
Bill
|
8678.22 | Unfortunately I have to deal with inferior operating systems ;-) | IOSG::MARSHALL | | Fri Apr 25 1997 06:52 | 7 |
| Bill,
Yes, that's what we're doing. Trouble is, a lot of the systems are running
Windows 95/NT, and on such systems it seems hard to find out what mode they're
running in, let alone how to change it if it's wrong!
Scott
|
8678.23 | | KITCHE::schott | Eric R. Schott USG Product Management | Fri Apr 25 1997 09:50 | 5 |
| Have you run tcpdump to see what is going on?
Have you run sys_check?
|