T.R | Title | User | Personal Name | Date | Lines |
---|
5362.1 | second try (device was full) | UTRTSC::KNOPPERS | Oswald Knoppers | Thu Mar 20 1997 09:42 | 34 |
| Hi,
One of our customers is experiencing performance problems with NFS. I can
reproduce (some) of this.
My setup is simple, I have two alphas connected through FDDI. One runs
Digital UNIX and the other UCX V4.1 ECO 4.
On the VMS side I export a vms directory.
Copying a text file of 85MB from unix to VMS takes 12:51 minutes. This is
about 881 Kbits/second.
Copying the same file from VMS to unix takes 40 seconds (about
17Mbits/second). A difference of a factor 20!
Where could this big difference come from?
In the 'fast' case, I see the NFS server process take about 70-80% CPU
time. The slow case the server process only consumes 5-15% of the
(otherwise idle) CPU.
We have not yet had the opportunity to do a FDDI trace, however we did make
one at the customers site. One thing we noticed was that the VMS system
'waited' between 20 and 100 milliseconds (average about 30) between
receiving a packet from the unix host (in the customers casa a HP system)
and sending the next packet back. What could be the cause of this delay?
Any thoughts?
Regards,
Oswald
|
5362.2 | Now you know the rest of the story | LASSIE::GEMIGNANI | | Fri Mar 21 1997 15:09 | 49 |
| When copying from the VMS system, the file is accessed and the data is
returned as it is requested. There is a readahead algorithm in there
which also speeds up the reads.
When writing to the VMS system, the following things occur:
1) A file is created
1A) If ACP_DATACHECK is on, then all operations are verified
by re-reading the data and comparing the two copies
2) the file is entered into a directory
2A) If ACP_DATACHECK is on, then all operations are verified
by re-reading the data and comparing the two copies
A write loop performs the following operations for each write:
3) The file is extended and the attributes are written with the
latest (current) time
3A) If ACP_DATACHECK is on, then all operations are verified
by re-reading the data and comparing the two copies
3B) If HIGHWATER marking is on, then the XQP will write the newly
allocated area full of NULLs
4) The data is written to disk and an NFS reply is NOT generated
until the QIO completes
So, think of it this way:
1) Allocate a file header out of the bitmap
2) DATACHECK the modifications
3) Write the file header
4) DATACHECK the write
5) Modify the directory (could be extend and split operations here)
6) DATACHECK the write(s)
Loop:
7) Modify the file header, possibly extending
8) DATACHECK the modifications
9) Clear the newly allocated area
10) Write the data
Doing a calculation reveals that:
The 85MB read probably took about 10,900 QIOs.
The 85MB write probably took in excess of 43,500 QIOs.
if you turn off HIGHWATER marking on the volume, and turn off
DATACHECK, then you are probably reducing the write QIOs to
around 11,700. Your write performance could increase greatly.
NOTE: the HIGHWATER and DATACHECK operations are under the control
of the F11BXQP, and we (NFS) can't circumvent them.
|
5362.3 | | LASSIE::GEMIGNANI | | Fri Mar 21 1997 15:10 | 1 |
| Make that 21,700, not 11,700. It's still a big difference.
|
5362.4 | not very impressing | UTRTSC::KNOPPERS | Oswald Knoppers | Mon Mar 24 1997 06:32 | 12 |
| It does make a difference but not much. With datacheck and highwater
enabled the nfs$server process uses 61675 IO's and without them this goes
down to 51618.
The elapsed time for the copy goes down from 12:54 to 9:12, about 25 %.
Still much slower than copying the other direction.
Also using monitor the system is idle for 90 % of the time...
All suggestions welcome...
Oswald
|
5362.5 | | UCXAXP::GEMIGNANI | | Fri Mar 28 1997 18:25 | 7 |
| Take a look at the NFS display at the number of retries.
Be sure that the timeout on your client is sufficiently high enough to
keep the number of retries down. A retried write operation is VERY
expensive (as are ALL writes).
-John
|
5362.6 | | UTRTSC::KNOPPERS | Oswald Knoppers | Wed Apr 09 1997 08:34 | 44 |
| How do you check for retries? I don't see it in the nfs counters. Anyway,
after one of these copies I get the following:
Server: NFS$SERVER Loaded: 9-APR-1997 14:14:30.44
Status: ACTIVE Running: 0 00:10:46.21
Memory allocated 441212 RPC errors
Message processing: Authentication 0
Threads busy 0 Others 0
Threads free 39 Mount data base:
Max. threads busy 8 Mounted File Systems 0
Duplicate cache xid 0 Current users 0
Duplicate active xid 1 Maximum mounted 0
Dropped 0 Maximum users 0
Data exchange: NFS operations:
Bytes sent 996544 null 0 getattr 3
Bytes rcvd 86345740 setattr 0 lookup 2
Messages sent 10375 readlink 0 rename 0
Messages rcvd 10376 read 0 write 10368
Max. message sent 744 statfs 0 create 1
Max. message rcvd 8328 remove 0 link 0
Open files: symlink 0 mkdir 0
Maximum opened 1 rmdir 0 readdir 1
Closed per interval 0 Total NFS operations 10375
Currently opened 1 Error messages 1
The command on the unix system was:
% time cp mario /mnt
0.08u 5.99s 10:24 0% 0+1k 10382+10369io 0pf+0w
%
In this case the file 'mario' is 84934656 bytes big, so this try I got a
throughput of 1088905.8 bits/second. I find this very poor, especially for
FDDI.
Again, using monitor, the VMS system is doing nothing.... So I have no idea
why we get this poor performance.
Any more suggestions?
Thanks,
Oswald
|
5362.7 | There are two to look for ... | LASSIE::GEMIGNANI | | Thu Apr 10 1997 19:35 | 11 |
|
"Duplicate cache XID" means that we got a duplicate after sending a
reply back. The cache contains only those operations which cannot be
executed twice and return the same result:
create, delete, rename, link, symlink, etc.
"Duplicate active XID" means that we got a duplicate while we are
working on the original packet. Unlike the previous case, this applies
to all NFS requests.
|
5362.8 | | UTRTSC::KNOPPERS | Oswald Knoppers | Fri Apr 11 1997 03:23 | 9 |
| Re .-1
Ok, thanks. So we have only one "Duplicate active XID" and no "Duplicate
cache XID" so what else could be the cause of the poor perdormance. Does
UDP have something similar as TCP_DELAY?
Regards,
Oswald
|
5362.9 | ?updateing fileheader? | BACHUS::VANHAVERE | Willy Van Havere 8786 | Fri Apr 11 1997 09:34 | 12 |
| from my own experiance (watch the XQP activity caused by the nfs$server
with set watch file/class=all) and from reading this notesfile
(2048.*,4535.*) it seems that a lot of time is spend in updating the
fileheader either after each extend or even worse as (4535.1) suggests
after each io.
The question than ofcourse becomes can one do something about this?????
increase the extend size??
don't update the header each time??
|
5362.10 | | LASSIE::GEMIGNANI | | Mon Apr 14 1997 18:45 | 4 |
| Some work with the extend size is being looked at. As far as writing
the file header every time -- that's an NFS problem as we are required
to ensure that all structures are stable (on disk) when we return a
successful response to the NFS client.
|