| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
| Notice: | Welcome to the Digital UNIX Conference |
| Moderator: | SMURF::DENHAM |
| Created: | Thu Mar 16 1995 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 10068 |
| Total number of notes: | 35879 |
A customer complains that one of their Alpha's running OSF 3.2C takes
2/3 of the network bandwidth (which in their case should correspond to
some 400 KBytes/sec) when creating a file on an NFS mounted file system.
Both the NFS server and client are running Digital Unix v3.2C, and
all the NFS patches have already been applied on both systems.
Can somebody explain to me this NFS cache traffic, including why my
Sniffer refers to it as obsolete!
It seems to me like the two end systems have decided to do some caching
of a very large file, thus involving a LOT of update traffic even when
the user concerned is making no specific reference to the file.
But what do the Write to cache mean:
NFS: Proc = 7 (Write to cache)
NFS: (This procedure is obsolete)
Has it to do with Unix's buffer cache ?
And why is it refered as obsolete ?
Here is the customer's complaint:
>> We have unacceptable problems at the moment due to AXCA47 reading
>> via NFS from ASCA02. The reads are consuming 60-80% of the network
>> bandwidth in Building 892 AND they seem to be "wrong"!
>>
>> I am seeing repeated requests from AXCA47 as follows
>>
>> 12:07:00.3736 [137.138.28.137] .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3738 [137.138.28.137] .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3923 [137.138.28.137] .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3930 [137.138.28.137] .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>>
>> to which ASCA02 is each time replying with 8k bytes!
>> ASCA02 has now sent 522068000 bytes !!!!!
>>
>> I do not know what file corresponds to F=896C : the details of the read
>> request are given by the Sniffer as
>>
>>> NFS: Proc = 6 (Read from file)
>>> NFS: File handle = 000000202C3179FE0A0000000C000000
>>> NFS: D0030000048000000C00000002000000
>>> NFS: Offset = 25165824
>>> NFS: Count = 0
>>>
>>
>> (and the offset never changes!).
>>
>> Now, I am not expert enough in NFS to know what is meant when there is
>> a zero value for the count, nor an unchanging value for the offset.
>> Maybe it is some special sort of file.
>>
>> I can see that the data being returned is slightly different in each
>> 8k block! I can also see that my Sniffer analysis thinks that there
>> are still over 500000000 bytes to come.
>>
>> If it really 500000000 then that can occupy an ethernet at 75% for
>> over 10 minutes. However, what I saw earlier was SEVERAL simultaneous
>> read requests on different files!
======================================================================
Trace below.
First packet in detail
DLC: Destination = Station 0A003080D329
DLC: Source = Station 0000F820574A
DLC: Ethertype = 0800 (IP)
IP: Version = 4, header length = 20 bytes
IP: Type of service = 00
IP: 000. .... = routine
IP: ...0 .... = normal delay
IP: .... 0... = normal throughput
IP: Type of service = 00
IP: 000. .... = routine
IP: ...0 .... = normal delay
IP: .... 0... = normal throughput
IP: .... .0.. = normal reliability
IP: Total length = 1500 bytes
IP: Identification = 62737
IP: Flags = 2X
IP: .0.. .... = may fragment
IP: ..1. .... = more fragments
IP: Fragment offset = 0 bytes
IP: Fragment offset = 0 bytes
IP: Time to live = 30 seconds/hops
IP: Protocol = 17 (UDP)
IP: Header checksum = 9BEC (correct)
IP: Source address = [128.141.191.114], axca92.cern.ch
IP: Time to live = 30 seconds/hops
IP: Protocol = 17 (UDP)
IP: Header checksum = 9BEC (correct)
IP: Source address = [128.141.191.114], axca92.cern.ch
IP: Destination address = [137.138.28.137]
IP: No options
UDP: Source port = 1017 (Sun RPC)
UDP: Destination port = 2049
UDP: Length = 8344 (not all data contained in this fragment)
UDP: Checksum = 4EFD (Can't verify, packet incomplete)
RPC: Transaction id = 2270736662
RPC: Type = 0 (Call)
RPC: RPC version = 2
RPC: Program = 100003 (NFS), version = 3
RPC: Transaction id = 2270736662
RPC: Type = 0 (Call)
RPC: RPC version = 2
RPC: Program = 100003 (NFS), version = 3
RPC: Procedure = 7 (Write to cache)
RPC: Credentials: authorization flavor = 1 (Unix)
RPC: len = 48, stamp = 857039708
RPC: machine = axca92.cern.ch
RPC: uid = 659, gid = 1227
RPC: 3 other group id(s):
RPC: gid 33536
RPC: gid 32533
RPC: gid 1227
RPC: Verifier: authorization flavor = 0 (Null)
RPC: [Verifier: 0 byte(s) of authorization data]
RPC: gid 1227
RPC: Verifier: authorization flavor = 0 (Null)
RPC: [Verifier: 0 byte(s) of authorization data]
NFS: Proc = 7 (Write to cache)
NFS: (This procedure is obsolete)
NFS: *** 1384 byte(s) of additional data present ***
1574 0.8286 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=1514 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
UDP D=2049 S=1017 LEN=8344
RPC C XID=2270736662 PROG=NFS VERS=3 PROC=7
NFS C Write cache (obsolete)
1575 0.0124 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=1514 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
UDP continuation ID=62737
1576 0.0026 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=1514 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
UDP continuation ID=62737
1577 0.2047 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=1514 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
UDP continuation ID=62737
1578 0.0010 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=978 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=944 ID=62737
UDP continuation ID=62737
1579 2.3894 [137.138.28.137] axca92.cern.ch NFS
retransmission
DLC Ethertype=0800, size=1514 bytes
IP D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62738
UDP D=2049 S=1017 LEN=8344
RPC C XID=2270736662 PROG=NFS VERS=3 PROC=7
NFS C Write cache (obsolete)
1580 0.0017 [137.138.28.137] axca92.cern.ch DLC
Ethertype=0800, size=1514 bytes
.
.
.
======================================================================
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 9041.1 | NFS read with zero count ? | PANTER::MARTIN | Be vigilant... | Thu Mar 06 1997 05:49 | 16 |
Other question arising from this problem :
Why a read which specifies a particular offset and a count of zero
keeps getting different chunks (8K?) of data.
I had understood that a read with a count of zero should simply get
an OK response, like it was a keep-alive.
Thanks in advance for your explanations...
Cheers,
============================
Alain MARTIN/SSG Switzerland
| |||||
| 9041.2 | nfsstat and netstat statistics... | LEMAN::MARTIN_A | Be vigilant... | Thu Mar 06 1997 10:02 | 118 |
Here is the nfsstat output :
************************************************************************
# nfsstat
Server rpc:
calls badcalls nullrecv badlen xdrcall
0 0 0 0 0
Server nfs:
calls badcalls
0 0
Server nfs V2:
null getattr setattr root lookup readlink read
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
wrcache write create remove rename link
symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 0 0% 0 0%
Server nfs V3:
null getattr setattr lookup access readlink read
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
write create mkdir symlink mknod remove rmdir
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
rename link readdir readdir+ fsstat fsinfo
pathconf
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
commit
0 0%
Client rpc:
calls badcalls retrans badxid timeout wait
newcred bad
verfs timers
2880159 7 6636 8 6643 0 0
0
19980
Client nfs:
calls badcalls nclget nclsleep
2880117 2 2880151 0
Client nfs V2:
null getattr setattr root lookup readlink read
0 0% 2235 0% 0 0% 0 0% 400 0% 134 0% 0 0%
wrcache write create remove rename link
symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 17 0% 67 0%
Client nfs V3:
null getattr setattr lookup access readlink read
0 0% 3128 0% 22 0% 3613 0% 5022 0% 96 0%
2328091 80%
write create mkdir symlink mknod remove rmdir
525130 18% 46 0% 0 0% 0 0% 0 0% 32 0% 2 0%
rename link readdir readdir+ fsstat fsinfo
pathconf
15 0% 0 0% 182 0% 3749 0% 142 0% 65 0% 0 0%
commit
7929 0%
************************************************************************
# netstat -p udp
udp:
2908416 packets sent
2953433 packets received
0 incomplete headers
0 bad data length fields
692 bad checksums
0 full sockets
41565 for no port (41496 broadcasts, 0 multicasts)
# netstat -p ip
ip:
14484817 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
13817112 fragments received
0 fragments dropped (dup or out of space)
205 fragments dropped after timeout
0 packets forwarded
87 packets not forwardable
0 redirects sent
# netstat -I tu0 -s
tu0 Ethernet counters at Thu Mar 6 15:59:58 1997
65535 seconds since last zeroed
2486930378 bytes received
4294967257 bytes sent
15227760 data blocks received
5538214 data blocks sent
102512090 multicast bytes received
807449 multicast blocks received
0 multicast bytes sent
0 multicast blocks sent
77530 blocks sent, initially deferred
1305417 blocks sent, single collision
2203918 blocks sent, multiple collisions
5637 send failures, reasons include:
Excessive collisions
0 collision detect check failure
18 receive failures, reasons include:
Block check error
Framing Error
0 unrecognized frame destination
0 data overruns
0 system buffer unavailable
0 user buffer unavailable
#
| |||||
| 9041.3 | Anybody online ??? | PANTER::MARTIN | Be vigilant... | Fri Mar 07 1997 09:18 | 3 |
Anybody online, should I IPMT this for getting an answer ?
Alain
| |||||
| 9041.4 | An IPMT would insure you get some kind of attention | UNIFIX::HARRIS | Juggling has its ups and downs | Sun Mar 09 1997 22:34 | 11 |
If you have a customer problem, the only reliable way to get official
support is via an official support channel. An IPMT is one such
support channel.
As a side note, the most experienced NFS engineer around here happens
to be away at a 2 week long connect-a-thon and as such has been unable
to read and answer any notes. Whether he has any answers or any time
to catch up on notes after he gets back is an entirely different
question.
Bob Harris
| |||||
| 9041.5 | yes, but could be applic. problem... | LEMAN::MARTIN_A | Be vigilant... | Mon Mar 10 1997 04:16 | 45 |
Thanks Bob for your answer.
I understand that for official escalation, IPMT is the way to
proceed. However I strongly suspect an application problem.
Actually I only expect unofficial answer, experience sharing...
which can help me to narrow down were the problem is.
I do not want to overload engineering with application problems
but want to make sure there isn't something wrong with NFS3!
However I can say that the original statements reported in .0 and .1
already evolved.
The question :
>> Why a read which specifies a particular offset and a count of zero
>> keeps getting different chunks (8K?) of data.
>> I had understood that a read with a count of zero should simply get
>> an OK response, like it was a keep-alive.
was due to the network analyser which wasn't able to properly
decode NFS3 messages !!!
After SW upgrade of Net-analyser, it transforms the "count 0 offset n"
differently : they become something like
>> NFS C READ FH=21CC at 22798336 for 8192
and the writes turn into
>> NFS C WRITE FH=21CC at 60997632 for 8192 (UNSTABLE)
so it does seem as if it is reading and writing the same file, but
in different places (treating it as a random access file!) all the
time.
However I don't know why it says "UNSTABLE"! Any explanation ???
Thanks for your co-operation,
============================
Alain MARTIN/SSG Switzerland
| |||||
| 9041.6 | See note 2328.0, see http://www.zk3.dec.com/~werme/support.html | NETRIX::"[email protected]" | Ric Werme | Wed Mar 12 1997 09:51 | 103 |
PANTER::MARTIN (Be vigilant...) writes in .0 or whatever:
A customer complains that one of their Alpha's running OSF 3.2C takes
2/3 of the network bandwidth (which in their case should correspond to
some 400 KBytes/sec) when creating a file on an NFS mounted file system.
What sort of network has 600-700 MB/sec capacity? Before analyzing any
performance problem knowing what the underlying fabric is is vital.
Both the NFS server and client are running Digital Unix v3.2C, and
all the NFS patches have already been applied on both systems.
Can somebody explain to me this NFS cache traffic, including why my
Sniffer refers to it as obsolete!
Your traces, as you have figured out, are from obsolete Sniffer code. You
have
not posted new traces. Sniffers are useful in chasing low level performance
problems, especially to look at CRC errors and the sort. If the network
is running well (tested via spray, ttcp, etc), then dealing with Sniffer
traces is a pain in the neck because they aren't as malleable as tcpdump
traces.
See note 2328.0. (But note that tcpdump in V3.xX doesn't decode NFS V3
any better than your old sniffer code.)
I don't understand what problem you are addressing. In .0 you talk about
data caching and updates, but that may be misled by the Sniffer "write cache"
messages. In .1 you're confused by old Sniffer code and read messages.
.2 shows a very happy nfsstat (but server nfsstat data discloses very little
about NFS performance problems - I run nfsstat on the client and look at the
number of timeouts). netstat suggests that you have an Ethernet,
which has considerably higher throughput than 600 MB/sec. The collision
rate suggests that it's very busy. The excessive collision rate is pushing
1%, a loss rate that devastates NFS performance. .5 asks about unstable
writes.
Those are a key addition to NFS V3 and are what allows us to get 6 MB/sec NFS
write rates to disk on Sandpipers. I wish I had time to measure performance
on current hardware. With the sort of loss rate you post, NFS over TCP
may help, but that requires Duniox V4.0.
NFS V3 is well described in a USENIX paper. I guess we don't have a reference
on our internal Web, but it's on the zk3.dec.com production machines at
/usr/specs/filesystem/nfs/nfsv3_usenix.ps.
The USENIX references is:
Author: Brian Pawlowski
Author: Chet Juszczak
Author: Peter Staubach
Author: Carl Smith
Author: Diane Lebel
Author: Dave Hitz
Title: NFS Version 3: Design and Implementation
Pages: 137-151
Publisher: USENIX
Proceedings: USENIX Conference Proceedings
Date: Summer, 1994
Location: Boston, MA
Institution: Network Appliance
Institution: Digital Equipment Corp
Institution: SunSoft Inc.
Institution: SunSoft Inc.
Institution: Digital Equipment Corp.
Institution: Network Appliance
Odd - USENIX doesn't appear to make that public, but I think I saw it at
netapp. Look for it with AltaVista.
The protocol itself is RFC 1813, but a PostScript version is available at
http://www.sun.com/sunsoft/connectathon/nfsv3.ps. I've been putting together
a page of information about analyzing NFS issues at
http://www.zk3.dec.com/~werme/support.html. It is only the barest beginnings,
as training support is not in my job description. To the best of my
knowledge,
the customer support organization does nothing like it. (Indeed, is training
customer support part of customer support's charter?) If anyone in this
silly company is trying build a WWW support hints page, I'd love to hear
about it.
For that matter, if anyone knows how to fix the NOTES-USENET gateway for
this notes file, I wish he would - this Web interface is only good for
archival access and no one seems to have ported notes to Dunix. Mutter,
mutter.
Back to the issue at hand. What is the customer complaining about? Nothing
in your postings states a problem. Even the phrase "A customer complains that
one of their Alpha's running OSF 3.2C takes 2/3 of the network bandwidth..."
is not a statement of a problem, as I think it should take 100% of an idle
Ethernet. Unless, of course, CERN is writing the results of a quantum
mechanical simulation of Uranium fission, in which case I'd expect 1% network
bandwidth to be about right. What does the customer want to see? How big
of a file is he writing, how many NFS writes does it take? Do tcpdump
(or Sniffer) traces show gaps. (Really easy to look for with tcpdump and
xterm configured to buffer several hundred lines.) What else is on his
network? Are there routers between client and server? I haven't met a
Cisco router that didn't discard more messages than I think it should, and
dropped messages do dreadful things to NFS read and write performance.
I'm still recovering from Connectathon, so I won't have time for further
help until I file expense reports, rebuild my workstation (I had to take it
to the show since I wanted something faster than a Sandpiper), catch up on
mail, review several specs, and dodge the next snowstorm.
[Posted by WWW Notes gateway]
| |||||