Title: | dec_mls_plus |
Moderator: | SMURF::BAT |
Created: | Mon Nov 29 1993 |
Last Modified: | Thu Jun 05 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 534 |
Total number of notes: | 2544 |
On an otherwise idle system, (root was the only one logged on any node.) the following command was issued on host sebastian: ls -sR | sort this resulted in the message: NFS Slookup failed for server bashful: RPC: Timed out ./ace1/accplib/NOS/PF/CYBER/IRWD2 not found I then logged on to the host bashful and entered the following: cd /ace1/accplib/NOS/PF/CYBER lsacl IRWD2 which resulted in this output: # file:IRWD2 # owner:accplib # group:users user::rwx mask::rwx user:rtdr:r-- user:tlcsrll:r-- user:tlrcwah:r-- user:tlrcfwp:r-- user:tlrcelh:r-- user:tlruwah:r-- user:tlruelh:r-- user:tlrcfts:r-- user:tlruhab:r-- group::rwx group:rtdr:r-- group:tlrcfts:r-- other::r-x I subsequently entered this on sebastian: ls -sR ./ace1/accplib/NOS/PF/CYBER/IRWD2 with this result: 7 ./ace1/accplib/NOS/PF/CYBER/IRWD2 If this were an isolated incident, we would not be concerned, but this happens fequently, sometimes on systems that are busy, sometimes on systems that don't appear to have much of a load. We would appreciate any sugestions and/or fixes to prevent this problem. Sam
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
505.1 | patch level | RHETT::AMAN | Thu May 15 1997 16:08 | 9 | |
One note on this customer site. He currently has Level 9 patches installed. He plans to pull and install the Level 10 patches soon. This also applies to note #506. Thanks, janet csc/cs 770-514-1050 | |||||
505.2 | OK | SMURF::BAT | Segui la tua beatitudine | Thu May 15 1997 19:18 | 4 |
In note 466.4, Martin says that the customer installed PK#10. If they are saying they only have PK#9 installed, presume they de-installed PK#10 because they were having performance problems. I never heard if they rebuilt their TNETDB and whether the problems went away or not? | |||||
505.3 | current level | RHETT::AMAN | Wed May 21 1997 17:39 | 30 | |
Here's what the customer says about the patch levels - Janet, Funny you should mention patch levels. We just recently installed patch level 9. (about a month ago). The NFS timeouts seemed to have gotten much worse. About a week after putting this patch level in Martin Moore sent me an e-amail indicating we should have the TNETDB file rebuilt as part of this. This was done. There were fewer nfs timeouts, but our activity level had subsided due to other reasons. I put patch level 10 in Sunday, May 18. We installed a new emulator in Tuesday, May 20 early in the A.M. In the past two days there have been a lot more activity, a lot more people logging in to test the emulator and we have had NO nfs timeouts. We'll keep an eye out for these, but so far so good. I will try a mltape backup tonight that has consistently displayed the acl / memory alocation error. If your question was to imply that these problems began at any given patch level, the answer is we have had the mltape problem ever since we have had the large number of user files on the system (several months). We have nfs timeout problems off and on ever since the system was installed. (May/June 1995). The i/o error from nfs served disk due to unknown user indexes was only noticed in the relatively recent past. ---------- Thanks, janet 770-514-1050 | |||||
505.4 | good | SMURF::BAT | Segui la tua beatitudine | Wed May 21 1997 21:15 | 15 |
Ah, now it is getting clearer. PK#10 is the one, after you install it, you must remove your TNETDB before rebooting. That did not apply to PK#9. I can look back at PK#9 and see if there was something NFS related that might have explained the timeout problem, but if they aren't complaining any more then why bother? PK#10 contained a major change to way security attributes get hashed into structure of the TNETDB, i.e., how the attributes are getting stored into buckets. It was expected to improve performance... but not if you didn't start with a fresh TNETDB. mltape is a different problem altogether. | |||||
505.5 | they're back... | RHETT::AMAN | Tue May 27 1997 11:09 | 18 | |
Hi, This update for the customer came in Sat, 24 May 1997 12:59:44. ------------- Janet, In my last message I said we hadn't had an nfs timeout in 2 days. Well there back. We have been having them sporadically since tuesday. (And, then he goes on to describe a new and different issue with a particular user account. I'll work on this one separately, and may enter another note...) ------------ Thanks, janet | |||||
505.6 | more more more info, please... | SMURF::SCHOFIELD | Rick Schofield, DTN 381-0116 | Mon Jun 02 1997 10:42 | 18 |
I spoke with Janet this morning and asked her to collect some more details on the systems in use at PAFB. Specifically, I'm looking for deatils on which machines are NFS clients/servers and which are NIS master/slaves/clients. I also asked if we can find out, when these timeouts occur, which NIS server the NFS server is bound to. I'm trying to eliminate NIS from consideration as part of this problem. I also asked if Janet could determine if the folks at PAFB would be willing to run with modified versions of code (NFS server/client, etc) if we felt it would expedite the debugging process. This just occurred to me too: Do we know if there are any entries in the var/adm/syslog.dated/.../*.log files when these timeouts occur? We usually look at these first-thing, but I don't see anything in the history saying that this was/wasn't done. Rick | |||||
505.7 | information from the customer | RHETT::AMAN | Tue Jun 03 1997 22:27 | 187 | |
From the customer - Now, we'd like to get on with our nfs timeout problems. Here is an line of our configuration. Our MLS+ configuration: dumbo (16.20.40.109) alpha 3000-900 nis master server (there is no slave server.) fddi interface to gigaswtich via DECconcentrator nfs client only bashful (16.20.40.111) alpha 3000-900 fddi interface to gigaswtich via DECconcentrator nfs server serving the following disks: /dev/rz50c /ace1 /dev/rz52c /ace4 /dev/rz29c /ace5 /dev/rz61c /ace6 /dev/rz45c /audit1 bashful also exports /usr/local kumba (16.20.40.107) alpha 3000-900 fddi interface to gigaswtich via DECconcentrator nfs server serving the following disks: /dev/rz25c /ace2 /dev/rz29c /ace3 /dev/rz57c /ace7 /dev/rz52c /ace8 /dev/rz42c /sey /dev/rz61c /audit simba (16.20.40.105) alpha 3000-900 fddi interface to gigaswtich via DECconcentrator nfs client only goofy, thumper (16.20.40.104), (16.20.40.101) alpha 3000-700 fddi interface to gigaswtich via DECconcentrator nfs client only flower, pocohantas, flounder, pinnochio, sebastian alpha 3000-300 thin wire interface to gigaswtich via DECrepeater nfs client only We also have 23 lat ports serving rs232 type lines. Only about 6 of these have dumb terminal currently attached. ---------------- We have two nfs servers (bashful and kumba). the below is a copy of their fstab and export files. --------------- bashful fstab /dev/rz16a / ufs rw 1 1 /dev/rz18a /usr ufs rw 1 2 /dev/rz16b swap1 ufs sw 0 0 /dev/rz32b swap2 ufs sw 0 0 /dev/rz34b swap3 ufs sw 0 0 /dev/rz18b /var ufs rw 1 2 /dev/rz50c /ace1 ufs rw 0 2 /dev/rz52c /ace4 ufs rw 0 2 /dev/rz29c /ace5 ufs rw 0 2 /dev/rz61c /ace6 ufs rw 0 2 /dev/rz45c /audit1 ufs rw 0 2 /ace2@kumba /ace2 nfs rw,bg,hard 0 0 /ace3@kumba /ace3 nfs rw,bg,hard 0 0 /ace7@kumba /ace7 nfs rw,bg,hard 0 0 /ace8@kumba /ace8 nfs rw,bg,hard 0 0 /home@kumba /home nfs rw,bg,hard 0 0 /audit@kumba /audit nfs rw,bg,hard 0 0 /sey@kumba /sey nfs rw,bg,hard 0 0 -------------- kumba fstab /dev/rz16a / ufs rw 1 1 /dev/rz18c /usr ufs rw 1 2 /dev/rz16b swap1 ufs sw 0 0 /dev/rz37b swap2 ufs sw 0 0 /dev/rz40b swap3 ufs sw 0 0 /dev/rz20c /home ufs rw 1 2 /dev/rz25c /ace2 ufs rw 1 2 /dev/rz33c /ris1 ufs rw 1 2 /dev/rz61c /audit ufs rw 1 2 /dev/rz57c /ace7 ufs rw 1 2 /dev/rz52c /ace8 ufs rw 1 2 /dev/rz42c /sey ufs rw 1 2 /dev/rz29c /ace3 ufs rw 1 2 /ace1@bashful /ace1 nfs rw,bg,hard 0 0 /ace4@bashful /ace4 nfs rw,bg,hard 0 0 /ace5@bashful /ace5 nfs rw,bg,hard 0 0 /ace6@bashful /ace6 nfs rw,bg,hard 0 0 /usr/local@bashful /usr/local nfs rw,bg,hard 0 0 /audit1@bashful /audit1 nfs rw,bg,hard 0 0 ------------- bashful exports /ace1 -root=0 /ace3 -root=0 /ace4 -root=0 /ace5 -root=0 /ace6 -root=0 /usr/local -root=0 /audit1 -root=0 /ace9 -root=0 /mnt1 -root=0 ------------- kumba exports /home -root=0 /ace2 -root=0 /ace3 -root=0 /audit -root=0 /ris1 -root=0 /ace7 -root=0 /ace8 -root=0 /sey -root=0 -------- The rest of the hosts have fstab and export files similar to the following: ---------------- typical fstab /dev/rz16a / ufs rw 1 1 /dev/rz18g /usr ufs rw 1 2 /dev/rz16b swap1 ufs sw 0 2 /dev/rz18b swap2 ufs sw 0 2 /dev/rz18a /var ufs rw 1 2 /ace1@bashful /ace1 nfs rw,bg,hard 0 0 /ace4@bashful /ace4 nfs rw,bg,hard 0 0 /ace5@bashful /ace5 nfs rw,bg,hard 0 0 /ace6@bashful /ace6 nfs rw,bg,hard 0 0 /usr/local@bashful /usr/local nfs rw,bg,hard 0 0 /audit1@bashful /audit1 nfs rw,bg,hard 0 0 /ace2@kumba /ace2 nfs rw,bg,hard 0 0 /ace3@kumba /ace3 nfs rw,bg,hard 0 0 /ace7@kumba /ace7 nfs rw,bg,hard 0 0 /ace8@kumba /ace8 nfs rw,bg,hard 0 0 /home@kumba /home nfs rw,bg,hard 0 0 /audit@kumba /audit nfs rw,bg,hard 0 0 /sey@kumba /sey nfs rw,bg,hard 0 0 -------------- typical fstab is empty These are the yp passwd and group files from dumbo (nis server): (I have the real copies of these files if you need them. The passwd file has 178 accounts. The group file has 75 groups. janet) and finally a copy of typical local passwd and group files: (I have these files as well. The local passwd file has 24 entries. He may have mistakenly sent the same group file twice. The one he sent as the yp is identical to the local one. janet) You asked about syslog.dated files. Heres the only one I found from yesterday with any pertinent information: --------------- Jun 2 13:12:20 bashful vmunix: NFS Slookup failed for server kumba: RPC: Timed out Jun 2 13:12:32 bashful vmunix: NFS Slookup failed for server kumba: RPC: Timed out Jun 2 13:13:41 bashful vmunix: NFS Sgetattr failed for server kumba: RPC: Remote system error Jun 2 13:13:42 bashful last message repeated 7 times Jun 2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply Jun 2 13:13:42 bashful vmunix: NFS Sgetattr failed for server kumba: RPC: Remote system error Jun 2 13:13:42 bashful last message repeated 3 times Jun 2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply Jun 2 13:13:44 bashful vmunix: rfs_dispatch: dispatch error, no reply Jun 2 13:15:06 bashful vmunix: NFS Sgetattr failed for server kumba: RPC: Timed out ------------------ Janet, I realize the difficulty in determining these type of problems remotely. Fortunately we are not in a production environment and any trap and/or debug code we can and will put in. So let us know what we can do. sam ---------------- Please let me know if you need additional information. Thanks! janet 770-514-1050 |