| Here is the nfsstat output :
************************************************************************
# nfsstat
Server rpc:
calls badcalls nullrecv badlen xdrcall
0 0 0 0 0
Server nfs:
calls badcalls
0 0
Server nfs V2:
null getattr setattr root lookup readlink read
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
wrcache write create remove rename link
symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 0 0% 0 0%
Server nfs V3:
null getattr setattr lookup access readlink read
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
write create mkdir symlink mknod remove rmdir
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
rename link readdir readdir+ fsstat fsinfo
pathconf
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
commit
0 0%
Client rpc:
calls badcalls retrans badxid timeout wait
newcred bad
verfs timers
2880159 7 6636 8 6643 0 0
0
19980
Client nfs:
calls badcalls nclget nclsleep
2880117 2 2880151 0
Client nfs V2:
null getattr setattr root lookup readlink read
0 0% 2235 0% 0 0% 0 0% 400 0% 134 0% 0 0%
wrcache write create remove rename link
symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 17 0% 67 0%
Client nfs V3:
null getattr setattr lookup access readlink read
0 0% 3128 0% 22 0% 3613 0% 5022 0% 96 0%
2328091 80%
write create mkdir symlink mknod remove rmdir
525130 18% 46 0% 0 0% 0 0% 0 0% 32 0% 2 0%
rename link readdir readdir+ fsstat fsinfo
pathconf
15 0% 0 0% 182 0% 3749 0% 142 0% 65 0% 0 0%
commit
7929 0%
************************************************************************
# netstat -p udp
udp:
2908416 packets sent
2953433 packets received
0 incomplete headers
0 bad data length fields
692 bad checksums
0 full sockets
41565 for no port (41496 broadcasts, 0 multicasts)
# netstat -p ip
ip:
14484817 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
13817112 fragments received
0 fragments dropped (dup or out of space)
205 fragments dropped after timeout
0 packets forwarded
87 packets not forwardable
0 redirects sent
# netstat -I tu0 -s
tu0 Ethernet counters at Thu Mar 6 15:59:58 1997
65535 seconds since last zeroed
2486930378 bytes received
4294967257 bytes sent
15227760 data blocks received
5538214 data blocks sent
102512090 multicast bytes received
807449 multicast blocks received
0 multicast bytes sent
0 multicast blocks sent
77530 blocks sent, initially deferred
1305417 blocks sent, single collision
2203918 blocks sent, multiple collisions
5637 send failures, reasons include:
Excessive collisions
0 collision detect check failure
18 receive failures, reasons include:
Block check error
Framing Error
0 unrecognized frame destination
0 data overruns
0 system buffer unavailable
0 user buffer unavailable
#
|
| Thanks Bob for your answer.
I understand that for official escalation, IPMT is the way to
proceed. However I strongly suspect an application problem.
Actually I only expect unofficial answer, experience sharing...
which can help me to narrow down were the problem is.
I do not want to overload engineering with application problems
but want to make sure there isn't something wrong with NFS3!
However I can say that the original statements reported in .0 and .1
already evolved.
The question :
>> Why a read which specifies a particular offset and a count of zero
>> keeps getting different chunks (8K?) of data.
>> I had understood that a read with a count of zero should simply get
>> an OK response, like it was a keep-alive.
was due to the network analyser which wasn't able to properly
decode NFS3 messages !!!
After SW upgrade of Net-analyser, it transforms the "count 0 offset n"
differently : they become something like
>> NFS C READ FH=21CC at 22798336 for 8192
and the writes turn into
>> NFS C WRITE FH=21CC at 60997632 for 8192 (UNSTABLE)
so it does seem as if it is reading and writing the same file, but
in different places (treating it as a random access file!) all the
time.
However I don't know why it says "UNSTABLE"! Any explanation ???
Thanks for your co-operation,
============================
Alain MARTIN/SSG Switzerland
|
| PANTER::MARTIN (Be vigilant...) writes in .0 or whatever:
A customer complains that one of their Alpha's running OSF 3.2C takes
2/3 of the network bandwidth (which in their case should correspond to
some 400 KBytes/sec) when creating a file on an NFS mounted file system.
What sort of network has 600-700 MB/sec capacity? Before analyzing any
performance problem knowing what the underlying fabric is is vital.
Both the NFS server and client are running Digital Unix v3.2C, and
all the NFS patches have already been applied on both systems.
Can somebody explain to me this NFS cache traffic, including why my
Sniffer refers to it as obsolete!
Your traces, as you have figured out, are from obsolete Sniffer code. You
have
not posted new traces. Sniffers are useful in chasing low level performance
problems, especially to look at CRC errors and the sort. If the network
is running well (tested via spray, ttcp, etc), then dealing with Sniffer
traces is a pain in the neck because they aren't as malleable as tcpdump
traces.
See note 2328.0. (But note that tcpdump in V3.xX doesn't decode NFS V3
any better than your old sniffer code.)
I don't understand what problem you are addressing. In .0 you talk about
data caching and updates, but that may be misled by the Sniffer "write cache"
messages. In .1 you're confused by old Sniffer code and read messages.
.2 shows a very happy nfsstat (but server nfsstat data discloses very little
about NFS performance problems - I run nfsstat on the client and look at the
number of timeouts). netstat suggests that you have an Ethernet,
which has considerably higher throughput than 600 MB/sec. The collision
rate suggests that it's very busy. The excessive collision rate is pushing
1%, a loss rate that devastates NFS performance. .5 asks about unstable
writes.
Those are a key addition to NFS V3 and are what allows us to get 6 MB/sec NFS
write rates to disk on Sandpipers. I wish I had time to measure performance
on current hardware. With the sort of loss rate you post, NFS over TCP
may help, but that requires Duniox V4.0.
NFS V3 is well described in a USENIX paper. I guess we don't have a reference
on our internal Web, but it's on the zk3.dec.com production machines at
/usr/specs/filesystem/nfs/nfsv3_usenix.ps.
The USENIX references is:
Author: Brian Pawlowski
Author: Chet Juszczak
Author: Peter Staubach
Author: Carl Smith
Author: Diane Lebel
Author: Dave Hitz
Title: NFS Version 3: Design and Implementation
Pages: 137-151
Publisher: USENIX
Proceedings: USENIX Conference Proceedings
Date: Summer, 1994
Location: Boston, MA
Institution: Network Appliance
Institution: Digital Equipment Corp
Institution: SunSoft Inc.
Institution: SunSoft Inc.
Institution: Digital Equipment Corp.
Institution: Network Appliance
Odd - USENIX doesn't appear to make that public, but I think I saw it at
netapp. Look for it with AltaVista.
The protocol itself is RFC 1813, but a PostScript version is available at
http://www.sun.com/sunsoft/connectathon/nfsv3.ps. I've been putting together
a page of information about analyzing NFS issues at
http://www.zk3.dec.com/~werme/support.html. It is only the barest beginnings,
as training support is not in my job description. To the best of my
knowledge,
the customer support organization does nothing like it. (Indeed, is training
customer support part of customer support's charter?) If anyone in this
silly company is trying build a WWW support hints page, I'd love to hear
about it.
For that matter, if anyone knows how to fix the NOTES-USENET gateway for
this notes file, I wish he would - this Web interface is only good for
archival access and no one seems to have ported notes to Dunix. Mutter,
mutter.
Back to the issue at hand. What is the customer complaining about? Nothing
in your postings states a problem. Even the phrase "A customer complains that
one of their Alpha's running OSF 3.2C takes 2/3 of the network bandwidth..."
is not a statement of a problem, as I think it should take 100% of an idle
Ethernet. Unless, of course, CERN is writing the results of a quantum
mechanical simulation of Uranium fission, in which case I'd expect 1% network
bandwidth to be about right. What does the customer want to see? How big
of a file is he writing, how many NFS writes does it take? Do tcpdump
(or Sniffer) traces show gaps. (Really easy to look for with tcpdump and
xterm configured to buffer several hundred lines.) What else is on his
network? Are there routers between client and server? I haven't met a
Cisco router that didn't discard more messages than I think it should, and
dropped messages do dreadful things to NFS read and write performance.
I'm still recovering from Connectathon, so I won't have time for further
help until I file expense reports, rebuild my workstation (I had to take it
to the show since I wanted something faster than a Sandpiper), catch up on
mail, review several specs, and dodge the next snowstorm.
[Posted by WWW Notes gateway]
|