[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9041.0. "NFS3 performance problem...caching ???" by PANTER::MARTIN (Be vigilant...) Wed Mar 05 1997 11:08

    A customer complains that one of their Alpha's running OSF 3.2C takes
    2/3 of the network bandwidth (which in their case should correspond to
    some 400 KBytes/sec) when creating a file on an NFS mounted file system.
    
    Both the NFS server and client are running Digital Unix v3.2C, and
    all the NFS patches have already been applied on both systems.
    
Can somebody explain to me this NFS cache traffic, including why my 
Sniffer refers to it as obsolete!

It seems to me like the two end systems have decided to do some caching
of a very large file, thus involving a LOT of update traffic even when
the user concerned is making no specific reference to the file.

    But what do the Write to cache mean:
    
    	 NFS:  Proc = 7 (Write to cache)
    	 NFS:  (This procedure is obsolete)
    
    Has it to do with Unix's buffer cache ?
    And why is it refered as obsolete ?
    
    Here is the customer's complaint:
    
>> We have unacceptable problems at the moment due to AXCA47 reading
>> via NFS from ASCA02. The reads are consuming 60-80% of the network
>> bandwidth in Building 892 AND they seem to be "wrong"!
>>
>> I am seeing repeated requests from AXCA47 as follows
>>
>> 12:07:00.3736  [137.138.28.137]  .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3738  [137.138.28.137]  .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3923  [137.138.28.137]  .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 12:07:00.3930  [137.138.28.137]  .axca47.cern.ch NFS C F=896C Read 0 at 25165824
>> 
>> to which ASCA02 is each time replying with 8k bytes!
>> ASCA02 has now sent 522068000 bytes !!!!!
>> 
>> I do not know what file corresponds to F=896C : the details of the read
>> request are given by the Sniffer as
>>  
>>> NFS:  Proc = 6 (Read from file)
>>> NFS:  File handle = 000000202C3179FE0A0000000C000000
>>> NFS:                D0030000048000000C00000002000000
>>> NFS:  Offset = 25165824
>>> NFS:  Count = 0
>>> 
>>
>> (and the offset never changes!).
>> 
>> Now, I am not expert enough in NFS to know what is meant when there is
>> a zero value for the count, nor an unchanging value for the offset.
>> Maybe it is some special sort of file.
>> 
>> I can see that the data being returned is slightly different in each
>> 8k block! I can also see that my Sniffer analysis thinks that there
>> are still over 500000000 bytes to come.
>>  
>> If it really 500000000 then that can occupy an ethernet at 75% for
>> over 10 minutes. However, what I saw earlier was SEVERAL simultaneous
>> read requests on different files!
 

    
    ======================================================================
    Trace below.
    First packet in detail
    
    DLC:  Destination = Station 0A003080D329
    DLC:  Source      = Station 0000F820574A
    DLC:  Ethertype  = 0800 (IP)
    
    IP:   Version = 4, header length = 20 bytes
    IP:   Type of service = 00
    IP:         000. .... = routine
    IP:         ...0 .... = normal delay
    IP:         .... 0... = normal throughput
    
    IP:   Type of service = 00
    IP:         000. .... = routine
    IP:         ...0 .... = normal delay
    IP:         .... 0... = normal throughput
    IP:         .... .0.. = normal reliability
    IP:   Total length = 1500 bytes
    IP:   Identification = 62737
    IP:   Flags = 2X
    IP:   .0.. .... = may fragment
    IP:   ..1. .... = more fragments
    IP:   Fragment offset = 0 bytes
    IP:   Fragment offset = 0 bytes
    IP:   Time to live = 30 seconds/hops
    IP:   Protocol = 17 (UDP)
    IP:   Header checksum = 9BEC (correct)
    IP:   Source address = [128.141.191.114], axca92.cern.ch
    IP:   Time to live = 30 seconds/hops
    IP:   Protocol = 17 (UDP)
    IP:   Header checksum = 9BEC (correct)
    IP:   Source address = [128.141.191.114], axca92.cern.ch
    IP:   Destination address = [137.138.28.137]
    IP:   No options
    
    UDP:  Source port = 1017 (Sun RPC)
    UDP:  Destination port = 2049
    UDP:  Length = 8344 (not all data contained in this fragment)
    UDP:  Checksum = 4EFD (Can't verify, packet incomplete)
    
    RPC:  Transaction id = 2270736662
    RPC:  Type = 0 (Call)
    RPC:  RPC version = 2
    RPC:  Program = 100003 (NFS), version = 3
    RPC:  Transaction id = 2270736662
    RPC:  Type = 0 (Call)
    RPC:  RPC version = 2
    RPC:  Program = 100003 (NFS), version = 3
    RPC:  Procedure = 7 (Write to cache)
    RPC:  Credentials: authorization flavor = 1 (Unix)
    RPC:   len = 48, stamp = 857039708
    RPC:   machine = axca92.cern.ch
    RPC:   uid = 659, gid = 1227
    RPC:   3 other group id(s):
    RPC:    gid 33536
    RPC:    gid 32533
    RPC:    gid 1227
    RPC:  Verifier: authorization flavor = 0 (Null)
    RPC:  [Verifier: 0 byte(s) of authorization data]
    RPC:    gid 1227
    RPC:  Verifier: authorization flavor = 0 (Null)
    RPC:  [Verifier: 0 byte(s) of authorization data]
    
    NFS:  Proc = 7 (Write to cache)
    NFS:  (This procedure is obsolete)
    NFS:  *** 1384 byte(s) of additional data present ***
    
    
    
      1574    0.8286  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=1514 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
            UDP D=2049 S=1017 LEN=8344
            RPC C XID=2270736662 PROG=NFS VERS=3 PROC=7
            NFS C Write cache (obsolete)
      1575    0.0124  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=1514 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
            UDP continuation ID=62737
      1576    0.0026  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=1514 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
            UDP continuation ID=62737
      1577    0.2047  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=1514 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62737
            UDP continuation ID=62737
      1578    0.0010  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=978 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=944 ID=62737
            UDP continuation ID=62737
      1579    2.3894  [137.138.28.137]   axca92.cern.ch      NFS
    retransmission
            DLC Ethertype=0800, size=1514 bytes
            IP  D=[137.138.28.137] S=[128.141.191.114] LEN=1480 ID=62738
            UDP D=2049 S=1017 LEN=8344
            RPC C XID=2270736662 PROG=NFS VERS=3 PROC=7
            NFS C Write cache (obsolete)
      1580    0.0017  [137.138.28.137]   axca92.cern.ch      DLC
    Ethertype=0800, size=1514 bytes
    .
    .
    .
    ======================================================================
T.RTitleUserPersonal
Name
DateLines
9041.1NFS read with zero count ?PANTER::MARTINBe vigilant...Thu Mar 06 1997 05:4916
    Other question arising from this problem :
    
    Why a read which specifies a particular offset and a count of zero 
    keeps getting different chunks (8K?) of data.
    I had understood that a read with a count of zero should simply get 
    an OK response, like it was a keep-alive.
    
    Thanks in advance for your explanations...
    
    
    Cheers,
    				============================
    				Alain MARTIN/SSG Switzerland
    
    
    
9041.2nfsstat and netstat statistics...LEMAN::MARTIN_ABe vigilant...Thu Mar 06 1997 10:02118
    Here is the nfsstat output :
    
    ************************************************************************
    # nfsstat
    
    Server rpc:
    calls      badcalls   nullrecv   badlen     xdrcall
    0          0          0          0          0
    
    Server nfs:
    calls      badcalls
    0          0
    
    Server nfs V2:
    null       getattr    setattr    root       lookup     readlink   read
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    wrcache    write      create     remove     rename     link      
    symlink
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    mkdir      rmdir      readdir    statfs
    0  0%      0  0%      0  0%      0  0%
    
    Server nfs V3:
    null       getattr    setattr    lookup     access     readlink   read
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    write      create     mkdir      symlink    mknod      remove     rmdir
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    rename     link       readdir    readdir+   fsstat     fsinfo    
    pathconf
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    commit
    0  0%
    
    Client rpc:
    calls      badcalls   retrans    badxid     timeout    wait      
    newcred    bad
    verfs   timers
    2880159    7          6636       8          6643       0          0         
    0
            19980
    
    Client nfs:
    calls      badcalls   nclget     nclsleep
    2880117    2          2880151    0
    
    Client nfs V2:
    null       getattr    setattr    root       lookup     readlink   read
    0  0%      2235  0%   0  0%      0  0%      400  0%    134  0%    0  0%
    wrcache    write      create     remove     rename     link      
    symlink
    0  0%      0  0%      0  0%      0  0%      0  0%      0  0%      0  0%
    mkdir      rmdir      readdir    statfs
    0  0%      0  0%      17  0%     67  0%
    
    Client nfs V3:
    null       getattr    setattr    lookup     access     readlink   read
    0  0%      3128  0%   22  0%     3613  0%   5022  0%   96  0%    
    2328091 80%
    write      create     mkdir      symlink    mknod      remove     rmdir
    525130 18% 46  0%     0  0%      0  0%      0  0%      32  0%     2  0%
    rename     link       readdir    readdir+   fsstat     fsinfo    
    pathconf
    15  0%     0  0%      182  0%    3749  0%   142  0%    65  0%     0  0%
    commit
    7929  0%
    ************************************************************************
    
    # netstat -p udp
    udp:
            2908416 packets sent
            2953433 packets received
            0 incomplete headers
            0 bad data length fields
            692 bad checksums
            0 full sockets
            41565 for no port (41496 broadcasts, 0 multicasts)
    # netstat -p ip
    ip:
            14484817 total packets received
            0 bad header checksums
            0 with size smaller than minimum
            0 with data size < data length
            0 with header length < data size
            0 with data length < header length
            13817112 fragments received
            0 fragments dropped (dup or out of space)
            205 fragments dropped after timeout
            0 packets forwarded
            87 packets not forwardable
            0 redirects sent
    
    # netstat -I tu0 -s
    
    tu0 Ethernet counters at Thu Mar  6 15:59:58 1997
    
           65535 seconds since last zeroed
      2486930378 bytes received
      4294967257 bytes sent
        15227760 data blocks received
         5538214 data blocks sent
       102512090 multicast bytes received
          807449 multicast blocks received
               0 multicast bytes sent
               0 multicast blocks sent
           77530 blocks sent, initially deferred
         1305417 blocks sent, single collision
         2203918 blocks sent, multiple collisions
            5637 send failures, reasons include:
                    Excessive collisions
               0 collision detect check failure
              18 receive failures, reasons include:
                    Block check error
                    Framing Error
               0 unrecognized frame destination
               0 data overruns
               0 system buffer unavailable
               0 user buffer unavailable
    #
9041.3Anybody online ???PANTER::MARTINBe vigilant...Fri Mar 07 1997 09:183
    Anybody online, should I IPMT this for getting an answer ?
    
    Alain
9041.4An IPMT would insure you get some kind of attentionUNIFIX::HARRISJuggling has its ups and downsSun Mar 09 1997 22:3411
    If you have a customer problem, the only reliable way to get official
    support is via an official support channel.  An IPMT is one such
    support channel.
    
    As a side note, the most experienced NFS engineer around here happens
    to be away at a 2 week long connect-a-thon and as such has been unable
    to read and answer any notes.  Whether he has any answers or any time
    to catch up on notes after he gets back is an entirely different
    question.
    
    					Bob Harris
9041.5yes, but could be applic. problem...LEMAN::MARTIN_ABe vigilant...Mon Mar 10 1997 04:1645
    Thanks Bob for your answer.
    
    I understand that for official escalation, IPMT is the way to
    proceed. However I strongly suspect an application problem.
    
    Actually I only expect unofficial answer, experience sharing...
    which can help me to narrow down were the problem is. 
    I do not want to overload engineering with application problems
    but want to make sure there isn't something wrong with NFS3!
    
    However I can say that the original statements reported in .0 and .1 
    already evolved.
    
    The question :
    
    >> Why a read which specifies a particular offset and a count of zero
    >> keeps getting different chunks (8K?) of data.
    >> I had understood that a read with a count of zero should simply get
    >> an OK response, like it was a keep-alive.
    
    was due to the network analyser which wasn't able to properly
    decode NFS3 messages !!!
    
    After SW upgrade of Net-analyser, it transforms the "count 0 offset n" 
    differently : they become something like
    
    >> NFS C READ FH=21CC at 22798336 for 8192
    
    and the writes turn into
    
    >> NFS C WRITE FH=21CC at 60997632 for 8192 (UNSTABLE)
    
    so it does seem as if it is reading and writing the same file, but
    in different places (treating it as a random access file!) all the
    time.
    
    However I don't know why it says "UNSTABLE"!  Any explanation ???
    
    Thanks for your co-operation,
    
    					============================
    					Alain MARTIN/SSG Switzerland
    
    
    
9041.6See note 2328.0, see http://www.zk3.dec.com/~werme/support.htmlNETRIX::&quot;[email protected]&quot;Ric WermeWed Mar 12 1997 09:51103
PANTER::MARTIN (Be vigilant...) writes in .0 or whatever:
    A customer complains that one of their Alpha's running OSF 3.2C takes
    2/3 of the network bandwidth (which in their case should correspond to
    some 400 KBytes/sec) when creating a file on an NFS mounted file system.

What sort of network has 600-700 MB/sec capacity?  Before analyzing any
performance problem knowing what the underlying fabric is is vital.

    Both the NFS server and client are running Digital Unix v3.2C, and
    all the NFS patches have already been applied on both systems.
    
  Can somebody explain to me this NFS cache traffic, including why my 
  Sniffer refers to it as obsolete!

Your traces, as you have figured out, are from obsolete Sniffer code.  You
have
not posted new traces.  Sniffers are useful in chasing low level performance
problems, especially to look at CRC errors and the sort.  If the network
is running well (tested via spray, ttcp, etc), then dealing with Sniffer
traces is a pain in the neck because they aren't as malleable as tcpdump
traces.
See note 2328.0.  (But note that tcpdump in V3.xX doesn't decode NFS V3
any better than your old sniffer code.)

I don't understand what problem you are addressing.  In .0 you talk about
data caching and updates, but that may be misled by the Sniffer "write cache"
messages.  In .1 you're confused by old Sniffer code and read messages.
.2 shows a very happy nfsstat (but server nfsstat data discloses very little
about NFS performance problems - I run nfsstat on the client and look at the
number of timeouts).  netstat suggests that you have an Ethernet,
which has considerably higher throughput than 600 MB/sec.  The collision
rate suggests that it's very busy.  The excessive collision rate is pushing
1%, a loss rate that devastates NFS performance.  .5 asks about unstable
writes.
Those are a key addition to NFS V3 and are what allows us to get 6 MB/sec NFS
write rates to disk on Sandpipers.  I wish I had time to measure performance
on current hardware.  With the sort of loss rate you post, NFS over TCP
may help, but that requires Duniox V4.0.

NFS V3 is well described in a USENIX paper.  I guess we don't have a reference
on our internal Web, but it's on the zk3.dec.com production machines at
/usr/specs/filesystem/nfs/nfsv3_usenix.ps.

The USENIX references is:
      Author: Brian Pawlowski
      Author: Chet Juszczak
      Author: Peter Staubach
      Author: Carl Smith
      Author: Diane Lebel 
      Author: Dave Hitz
       Title: NFS Version 3: Design and Implementation 
       Pages: 137-151
   Publisher: USENIX
 Proceedings: USENIX Conference Proceedings
        Date: Summer, 1994
    Location: Boston, MA
 Institution: Network Appliance
 Institution: Digital Equipment Corp
 Institution: SunSoft Inc.
 Institution: SunSoft Inc.
 Institution: Digital Equipment Corp.
 Institution: Network Appliance

Odd - USENIX doesn't appear to make that public, but I think I saw it at
netapp.  Look for it with AltaVista.

The protocol itself is RFC 1813, but a PostScript version is available at
http://www.sun.com/sunsoft/connectathon/nfsv3.ps.  I've been putting together
a page of information about analyzing NFS issues at
http://www.zk3.dec.com/~werme/support.html.  It is only the barest beginnings,
as training support is not in my job description.  To the best of my
knowledge,
the customer support organization does nothing like it.  (Indeed, is training
customer support part of customer support's charter?)  If anyone in this
silly company is trying build a WWW support hints page, I'd love to hear
about it.

For that matter, if anyone knows how to fix the NOTES-USENET gateway for
this notes file, I wish he would - this Web interface is only good for
archival access and no one seems to have ported notes to Dunix.  Mutter,
mutter.


Back to the issue at hand.  What is the customer complaining about?  Nothing
in your postings states a problem.  Even the phrase "A customer complains that
one of their Alpha's running OSF 3.2C takes 2/3 of the network bandwidth..."
is not a statement of a problem, as I think it should take 100% of an idle
Ethernet.  Unless, of course, CERN is writing the results of a quantum
mechanical simulation of Uranium fission, in which case I'd expect 1% network
bandwidth to be about right.  What does the customer want to see?  How big
of a file is he writing, how many NFS writes does it take?  Do tcpdump
(or Sniffer) traces show gaps.  (Really easy to look for with tcpdump and
xterm configured to buffer several hundred lines.)  What else is on his
network? Are there routers between client and server?  I haven't met a
Cisco router that didn't discard more messages than I think it should, and
dropped messages do dreadful things to NFS read and write performance.

I'm still recovering from Connectathon, so I won't have time for further
help until I file expense reports, rebuild my workstation (I had to take it
to the show since I wanted something faster than a Sandpiper), catch up on
mail, review several specs, and dodge the next snowstorm.

[Posted by WWW Notes gateway]