[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::dec_mls_plus

Title:dec_mls_plus
Moderator:SMURF::BAT
Created:Mon Nov 29 1993
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:534
Total number of notes:2544

505.0. "PAFB: NFS Slookup failed for server RPC: Timed out" by SMURF::BAT (Segui la tua beatitudine) Thu May 15 1997 12:26

 On an otherwise idle system, (root was the only 
 one logged on any node.) the following command was issued on host 
 sebastian:
 
  ls -sR | sort
 
 this resulted in the message:
     
 NFS Slookup failed for server bashful: RPC: Timed out
 ./ace1/accplib/NOS/PF/CYBER/IRWD2 not found
 
 I then logged on to the host bashful and entered the following:
 
  cd /ace1/accplib/NOS/PF/CYBER
 lsacl IRWD2
 
 which resulted in this output:
 # file:IRWD2
 # owner:accplib
 # group:users
 user::rwx
 mask::rwx
 user:rtdr:r--
 user:tlcsrll:r--
 user:tlrcwah:r--
 user:tlrcfwp:r--
 user:tlrcelh:r--
 user:tlruwah:r--
 user:tlruelh:r--
 user:tlrcfts:r--
 user:tlruhab:r--
 
 group::rwx
 group:rtdr:r--
 group:tlrcfts:r--
 
 other::r-x
 
 I subsequently entered this on sebastian:
 
 ls -sR ./ace1/accplib/NOS/PF/CYBER/IRWD2
 
 with this result:
 
 7 ./ace1/accplib/NOS/PF/CYBER/IRWD2
 
    If this were an isolated incident, we would not be concerned, 
 but this happens fequently, sometimes on systems that are busy, 
 sometimes on systems that don't appear to have much of a load.
  
    We would appreciate any sugestions and/or fixes to prevent this 
 problem.
 
 
 					Sam
T.RTitleUserPersonal
Name
DateLines
505.1patch levelRHETT::AMANThu May 15 1997 16:089
    One note on this customer site.  He currently has Level 9 patches
    installed.  He plans to pull and install the Level 10 patches soon. 
    This also applies to note #506.
    
    Thanks,
    janet
    csc/cs
    770-514-1050
    
505.2OKSMURF::BATSegui la tua beatitudineThu May 15 1997 19:184
    In note 466.4, Martin says that the customer installed PK#10.  If they
    are saying they only have PK#9 installed,  presume they de-installed
    PK#10 because they were having performance problems.  I never heard if
    they rebuilt their TNETDB and whether the problems went away or not?
505.3current levelRHETT::AMANWed May 21 1997 17:3930
    Here's what the customer says about the patch levels -
    
    Janet,
           Funny you should mention patch levels. We just recently
    installed patch level 9. (about a month ago). The NFS timeouts seemed 
    to have gotten much worse. About a week after putting this patch level 
    in Martin Moore sent me an e-amail indicating we should have the 
    TNETDB file rebuilt as part of this.  This was done. There were fewer 
    nfs timeouts, but our activity level had subsided due to other reasons.
    
           I put patch level 10 in Sunday, May 18. We installed a new
    emulator in Tuesday, May 20 early in the A.M. In the past two days 
    there have been a lot more activity, a lot more people logging in to 
    test the emulator and we have had NO nfs timeouts. We'll keep an eye 
    out for these, but so far so good. I will try a mltape backup tonight 
    that has consistently displayed the acl / memory alocation error.
    
           If your question was to imply that these problems began at any
    given patch level, the answer is we have had the mltape problem ever 
    since we have had the large number of user files on the system (several 
    months).  We have nfs timeout problems off and on ever since the system 
    was installed.  (May/June 1995). The i/o error from nfs served disk due 
    to unknown user indexes was only noticed in the relatively recent past.
    
    ----------
    
    Thanks,
    janet
    770-514-1050
    
505.4goodSMURF::BATSegui la tua beatitudineWed May 21 1997 21:1515
    Ah, now it is getting clearer.
    
    PK#10 is the one, after you install it, you must remove your TNETDB
    before rebooting.  That did not apply to PK#9.  I can look back at PK#9
    and see if there was something NFS related that might have explained
    the timeout problem, but if they aren't complaining any more then why
    bother?
    
    PK#10 contained a major change to way security attributes get hashed
    into structure of the TNETDB, i.e., how the attributes are getting
    stored into buckets.  It was expected to improve performance... but not
    if you didn't start with a fresh TNETDB.
    
    
    mltape is a different problem altogether.
505.5they're back...RHETT::AMANTue May 27 1997 11:0918
    Hi,
    
    This update for the customer came in Sat, 24 May 1997 12:59:44.
    
    -------------
    Janet, 
    
    In my last message I said we hadn't had an nfs timeout in 2 days.
    Well there back. We have been having them sporadically since tuesday.
    
    (And, then he goes on to describe a new and different issue with a
    particular user account.  I'll work on this one separately, and may
    enter another note...)
    ------------
    
    Thanks,
    janet
    
505.6more more more info, please...SMURF::SCHOFIELDRick Schofield, DTN 381-0116Mon Jun 02 1997 10:4218
    I spoke with Janet this morning and asked her to collect some more
    details on the systems in use at PAFB.  Specifically, I'm looking for
    deatils on which machines are NFS clients/servers and which are NIS
    master/slaves/clients.  I also asked if we can find out, when these
    timeouts occur, which NIS server the NFS server is bound to.  I'm
    trying to eliminate NIS from consideration as part of this problem.
    
    I also asked if Janet could determine if the folks at PAFB would be
    willing to run with modified versions of code (NFS server/client, etc)
    if we felt it would expedite the debugging process.
    
    This just occurred to me too:  Do we know if there are any entries in
    the var/adm/syslog.dated/.../*.log files when these timeouts occur?  We
    usually look at these first-thing, but I don't see anything in the
    history saying that this was/wasn't done.
    
    	Rick
    
505.7information from the customerRHETT::AMANTue Jun 03 1997 22:27187
    From the customer -
    
    Now, we'd like to get on with our nfs timeout problems. Here is an 
    line of our configuration.
    
    Our MLS+ configuration:
    dumbo (16.20.40.109)
            alpha 3000-900
            nis master server (there is no slave server.)
            fddi interface to gigaswtich via DECconcentrator
            nfs client only
    bashful (16.20.40.111)
            alpha 3000-900
            fddi interface to gigaswtich via DECconcentrator
            nfs server
            serving the following disks:
                    /dev/rz50c      /ace1   
                    /dev/rz52c      /ace4   
                    /dev/rz29c      /ace5   
                    /dev/rz61c      /ace6   
                    /dev/rz45c      /audit1
            bashful also exports 
    		   /usr/local
    kumba (16.20.40.107)
            alpha 3000-900
            fddi interface to gigaswtich via DECconcentrator
            nfs server
            serving the following disks:
                    /dev/rz25c      /ace2  
                    /dev/rz29c      /ace3 
                    /dev/rz57c      /ace7 
                    /dev/rz52c      /ace8
                    /dev/rz42c      /sey  
                    /dev/rz61c      /audit
    simba (16.20.40.105)
            alpha 3000-900
            fddi interface to gigaswtich via DECconcentrator
            nfs client only
    goofy, thumper (16.20.40.104), (16.20.40.101)
            alpha 3000-700
            fddi interface to gigaswtich via DECconcentrator
            nfs client only
    flower, pocohantas, flounder, pinnochio, sebastian
            alpha 3000-300
            thin wire interface to  gigaswtich via DECrepeater
            nfs client only
    
    We also have 23 lat ports serving rs232 type lines. Only about
    6 of these have dumb terminal currently attached.
    
    ----------------
    We have two nfs servers (bashful and kumba). the below is a copy of
    their fstab and export files.
    
    ---------------  bashful fstab
    /dev/rz16a      /       ufs rw 1 1
    /dev/rz18a      /usr    ufs rw 1 2
    /dev/rz16b      swap1   ufs sw 0 0
    /dev/rz32b      swap2   ufs sw 0 0
    /dev/rz34b      swap3   ufs sw 0 0
    /dev/rz18b      /var    ufs rw 1 2
    /dev/rz50c      /ace1   ufs rw 0 2
    /dev/rz52c      /ace4   ufs rw 0 2
    /dev/rz29c      /ace5   ufs rw 0 2
    /dev/rz61c      /ace6   ufs rw 0 2
    /dev/rz45c      /audit1 ufs rw 0 2
    /ace2@kumba     /ace2 nfs rw,bg,hard 0 0
    /ace3@kumba     /ace3 nfs rw,bg,hard 0 0
    /ace7@kumba     /ace7 nfs rw,bg,hard 0 0
    /ace8@kumba     /ace8 nfs rw,bg,hard 0 0
    /home@kumba     /home nfs rw,bg,hard 0 0
    /audit@kumba    /audit nfs rw,bg,hard 0 0
    /sey@kumba      /sey nfs rw,bg,hard 0 0
    
    --------------  kumba fstab
    /dev/rz16a      /       ufs rw 1 1
    /dev/rz18c      /usr    ufs rw 1 2
    /dev/rz16b      swap1   ufs sw 0 0
    /dev/rz37b      swap2   ufs sw 0 0
    /dev/rz40b      swap3   ufs sw 0 0
    /dev/rz20c      /home   ufs rw 1 2
    /dev/rz25c      /ace2   ufs rw 1 2
    /dev/rz33c      /ris1   ufs rw 1 2
    /dev/rz61c      /audit  ufs rw 1 2
    /dev/rz57c      /ace7   ufs rw 1 2
    /dev/rz52c      /ace8   ufs rw 1 2
    /dev/rz42c      /sey    ufs rw 1 2
    /dev/rz29c      /ace3   ufs rw 1 2
    /ace1@bashful   /ace1 nfs rw,bg,hard 0 0
    /ace4@bashful   /ace4 nfs rw,bg,hard 0 0
    /ace5@bashful   /ace5 nfs rw,bg,hard 0 0
    /ace6@bashful   /ace6 nfs rw,bg,hard 0 0
    /usr/local@bashful      /usr/local nfs rw,bg,hard 0 0
    /audit1@bashful /audit1 nfs rw,bg,hard 0 0
    
    ------------- bashful exports
    /ace1   -root=0  
    /ace3            -root=0
    /ace4   -root=0
    /ace5           -root=0
    /ace6 -root=0
    /usr/local      -root=0
    /audit1 -root=0
    /ace9 -root=0
    /mnt1 -root=0
    
    ------------- kumba exports
    /home -root=0
    /ace2 -root=0
    /ace3 -root=0
    /audit   -root=0
    /ris1  -root=0
    /ace7 -root=0
    /ace8 -root=0
    /sey    -root=0
    
    --------
        
    The rest of the hosts have fstab and export files similar to the
    following:
    
    ---------------- typical fstab
    /dev/rz16a      /       ufs rw 1 1
    /dev/rz18g      /usr    ufs rw 1 2
    /dev/rz16b      swap1   ufs sw 0 2
    /dev/rz18b      swap2   ufs sw 0 2
    /dev/rz18a      /var    ufs rw 1 2
    /ace1@bashful   /ace1 nfs rw,bg,hard 0 0
    /ace4@bashful   /ace4 nfs rw,bg,hard 0 0
    /ace5@bashful   /ace5 nfs rw,bg,hard 0 0
    /ace6@bashful   /ace6 nfs rw,bg,hard 0 0
    /usr/local@bashful      /usr/local nfs rw,bg,hard 0 0
    /audit1@bashful /audit1 nfs rw,bg,hard 0 0
    /ace2@kumba     /ace2 nfs rw,bg,hard 0 0
    /ace3@kumba     /ace3 nfs rw,bg,hard 0 0
    /ace7@kumba     /ace7 nfs rw,bg,hard 0 0
    /ace8@kumba     /ace8 nfs rw,bg,hard 0 0
    /home@kumba     /home nfs rw,bg,hard 0 0
    /audit@kumba    /audit nfs rw,bg,hard 0 0
    /sey@kumba      /sey nfs rw,bg,hard 0 0
    
    -------------- typical fstab is empty
    
    These are the yp passwd and group files from dumbo (nis server):
    (I have the real copies of these files if you need them.  The passwd
    file has 178 accounts. The group file has 75 groups.  janet)
    
    and finally a copy of typical local passwd and group files:
    (I have these files as well.  The local passwd file has 24 entries.  He
    may have mistakenly sent the same group file twice.  The one he sent as
    the yp is identical to the local one.  janet)
    
    You asked about syslog.dated files. Heres the only one I found from
    yesterday with any pertinent information:
    ---------------
    Jun  2 13:12:20 bashful vmunix: NFS Slookup failed for server kumba:
    RPC: Timed out
    Jun  2 13:12:32 bashful vmunix: NFS Slookup failed for server kumba:
    RPC: Timed out
    Jun  2 13:13:41 bashful vmunix: NFS Sgetattr failed for server kumba:
    RPC: Remote system error
    Jun  2 13:13:42 bashful last message repeated 7 times
    Jun  2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply
    Jun  2 13:13:42 bashful vmunix: NFS Sgetattr failed for server kumba:
    RPC: Remote system error
    Jun  2 13:13:42 bashful last message repeated 3 times
    Jun  2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply
    Jun  2 13:13:44 bashful vmunix: rfs_dispatch: dispatch error, no reply
    Jun  2 13:15:06 bashful vmunix: NFS Sgetattr failed for server kumba:
    RPC: Timed out
    ------------------
    
    Janet, I realize the difficulty in determining these type of problems
    remotely. Fortunately we are not in a production environment and any
    trap and/or debug code we can and will put in. So let us know what we can
    do.
    
                                                    sam
    ----------------
    
    Please let me know if you need additional information.
    
    Thanks!
    janet
    770-514-1050