[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:*OLD* ALL-IN-1 (tm) Support Conference
Notice:Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:IOSG::PYE
Created:Thu Jan 30 1992
Last Modified:Tue Jan 23 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:4343
Total number of notes:18308

2731.0. "RMS error - Invalid FAB or FAB not accessible" by BIS1::DESTRIJCKER (Back again to the home town) Wed May 19 1993 15:56

    
    Hi everybody,
    
    One of my users is trying to do an RFD of several READ mail messages
    into a shared folder to which she has got all the access possible.
    
    The error is:
    RMS error has occurred. Refer to extended status for RMS error code.
    
    Gold W reveals:
    Invalid FAB or FAB not accessible
    
    What FAB in what file is ALL-IN-1 talking about? Funny thing is that
    all these messages are informational only yet the message is not
    refiled. The folder in the shared drawer does not exist yet, i.e. is
    being created while refiling.
    
    So far I have checked the following:
    - user's whole directory structure has S:RWED set
    - the user owns this shared drawer
    - the message was created on another system and sent accross the
      network 
    - all user's *.dat files have S:RWED set
    - the message file in SHARB has S:RWED protection
    
    I ever tried to RFD this message myself by using the ALLIN1 account and
    NEWDIR in her account, and I get the same error.
    
    User has plenty diskquota and document quota left.
    
    Did I miss to check something important? 
    Where do I go from here?
    
    Thanks in advance,
    
    Wivine.
T.RTitleUserPersonal
Name
DateLines
2731.1FCS problem?IOSG::MAURICENight rolls in, my dark companionWed May 19 1993 16:2610
    The messages originate from the File Cabinet server, so I recommend
    you check the log files to see if there is further information in
    there.
    
    The messages are not really informational. They get reported by the FCS
    as error, but get downgraded by the ALL-IN-1 client before display.
    
    HTH
    
    Stuart
2731.2GONE ! DISAPPEARED ! BIS1::DESTRIJCKERBack again to the home townMon May 24 1993 17:149
    Well, this was an easy one. The error corrected itself. Maybe File
    cabinet reorganize had something to do with it. This housekeeping
    procedure was run over the weekend. 
    
    I can not reproduce it anymore. All that remains is the trace.
    
    Oh well, thanks for the help anyway.
    
    Wivine.
2731.3BUSHIE::SETHIAhhhh (-: an upside down smile from OZTue May 25 1993 01:5712
    Hi Wivine,
    
    My guess is that the DOCDB.DAT or DAF.DAT had a problem with the File
    Access Block of some kind (corruption).  The File cabinet
    reorganisation did a convert/fdl=oa$data:docdb.fdl (or pdaf.fdl) and
    cleared the problem.  It would be interesting to know when you do an
    analyze/rms/check on the previous versions of the above mentioned
    .dat's if any errors are reported.
    
    Regards,
    
    Sunil
2731.4Nothing to go by.BIS1::DESTRIJCKERBack again to the home townThu Jun 03 1993 15:1217
    Yes, that could well be the case for this occurrence. Reorganise filing
    cabinets runs every weekend. And it was working OK again on the
    following monday. I'm happy with this one.
    
    But! It doesn't explain why last week tuesday the error popped up again
    with somebody else who was trying to forward a message from a shared
    drawer (not her's). The next day it worked 8-). Only CDQ runs daily,
    and EW every other day but it doesn't reorganise files. 
    
    Since then I haven't had any complaints anymore. This FAB error seems
    to be very temperamental. I shall start monitering the cluster. Perhaps
    it has to do with system resources! I've been told that there are
    periods it occurs daily and periods when it doesn't happen at all.
    
    I'll keep you all posted if I do find something new.
    
    Wivine.
2731.5Check the disks first done repair them before checkingTINNIE::SETHIAhhhh (-: an upside down smile from OZFri Jun 04 1993 01:2824
    Hi Wivine,
    
    >Since then I haven't had any complaints anymore. This FAB error seems
    >to be very temperamental. I shall start monitering the cluster.
    
    Since this is happening to others what I would suggest is that you do
    an $analyze/disk/read_check/norepair on your disks.  Ask the customer
    if they have done an $analyze/disk/read_check/REPAIR note I put the
    repair in uppercase.  OpenVMS version 5.5-1 and below had a slight
    misfeature in that they actually corrupted disks if the repair was
    used.  This ONLY happened under certain circumstances so check the
    error with the OpenVMS support group (CSC), before you repair the disk.
    By the way if the customer did repair the disk and it's corrupted there
    is no way of repairing the damage.  Please don't say anything to the
    customers let your manager deal with it.
    
    Why I have mentioned the above is because I have delt with a number of
    calls that had the above types of problems.  Also the above problem has
    been fixed in 5.5-2 again doing a repair will not fix the problem.  If
    you need more help let me know.
    
    Regards,
    
    Sunil
2731.6It's worth a tryBIS1::DESTRIJCKERBack again to the home townFri Jun 04 1993 10:5212
    
    Thanks for the advice, I'll schedule an analyse disk maybe this
    weekend. Sounds definately worthwhile trying.
    
    BTW, the customer happens to be Digital itself, the IS department in
    Brussels. I support also the Luxemburg ALL-IN-1 machine and another
    ALL-IN-1 cluster here in Brussels. The FAB error only occurs on the
    biggest ALL-IN-1 cluster.
    
    I'll keep you posted.
    
    Wivine.
2731.7Please do not discuss this with a customer*WARNING*TINNIE::SETHIAhhhh (-: an upside down smile from OZMon Jun 07 1993 02:5737
    Hi Wivine,
    
    The type of error that would indicate that the data on your disk maybe
    corrupted is:
    
    The following error messages MAY be returned on systems                   
    experiencing this problem:                                                
                                                                                
      VERIFY-I-MULTALLOC, file ('file-id') 'filename' multiply                  
                          allocated blocks VBN 'n' to 'n' LBN 'n'               
                          to 'n', RVN 'n'                                       
                                                                                
      VERIFY-I-LOSTEXTHDR, file ('file-id') 'filename' lost                     
                           extension file header                                
                                                                                
      VERIFY-I-MAPAREA, file ('file-id) 'filename' invalid map area            
                                                                               
    NOTE:  This problem ONLY occurs when repairing a volume                    
           with lost extension file headers.  It does not occur                
           every time you repair a disk volume using the VERIFY                 
           Utility.                                              
    
    A stars article called "OpenVMS] ANALYZE/DISK/REPAIR Causes Mult
    Allocated Blocks/Corruption", has all the details.
    
    Again I must emphasise please don't discuss this with your customers
    let your manager deal with this.  It's a very sensitive issue as I have
    found out at some sites and you don't want to get involved in the
    politics of this.  Also the article warns you not to discuss this with
    the customer, that does not mean that we forget about the problem.
    
    I think I may have a site with this problem I am crossing my fingers it
    isn't.
    
    Regards,
    
    Sunil      
2731.8Probably a VM corruption in FCS.IOSG::CHINNICKgone walkaboutTue Jun 08 1993 13:5639
    
    Hi Wivine...
    
    Personally, I doubt that this error results from a disk corruption or
    even an RMS file corruption.
    
    The RMS$_FAB and RMS$_RAB statuses reflect that the File ACcess Block
    or Record Access block are not at a valid address or have been
    corrupted in some way.
    
    These blocks are used for access through the RMS services and in no way
    relate to disk structures. They will be allocated in memory by the FCS
    or the IOS kernel (depending on what you are using and accessing) and
    the address o fthese blocks are passed to servioces such as $OPEN, $GET,
    $PUT etc.
    
    Most likely, the problem you have results from some form of memory
    corruption taking place inside the FCS. This conference is littered
    with similar problems where files are being left open or other errors
    are occuring.
    
    The problem with FCS is that it is an extrememly complex piece of
    software which performs the file cabinet manipulation as does ALL-IN-1
    but also has to worry about authentication and communication with
    clients AND running multiple threads. You might get this error because
    of something else which someone completely different has asked the FCS
    to perform.
    
    The most probably cause of this error can be corruption of the DAF
    records in the SDAF, PDAFs in shared drawers or PENDING. I'd suggest
    that you try to get these files checked out or at the very least see if
    TRU/TRM is getting run on the site and if any problems are being
    reported. [CSC's have some tools which can help here.]
    
    You might well check your FCS logs and see if you've been getting
    things like thread ACCVIOs or other conditions - these would be
    confirmation that the FCS is getting this type of error.
    
    Paul.
2731.10Probably not the FCS...CHRLIE::HUSTONTue Jun 08 1993 15:2014
    
    re .8 and .9
    
    I don't think it is the FCS, simply becuase the FCS would return a 
    status of OafcRmsError, not the actaul RMS error. You would simply
    get an error saying there was an RMS error, if you hit GOLD-W you would
    then possibly see the actual RMS error.  In the FCS, if ANY RMS
    operation returns an error, it is masked to OafcRmsError (same for
    DASL errors, they go to OafcDaslError). Why? simple, the person
    who the error is returned to may be non-VMS, in which case giving 
    them an RMS error would be meanlingless.
    
    --Bob
    
2731.11Sure looks like FCSIOSG::CHINNICKgone walkaboutTue Jun 08 1993 15:4015
    
    Not the FCS?    I might just beg to differ on that count.
    
    Well, the text quoted is for OafcRmsError status from OAFC$MESSAGES.MSG.
    
    There are no instances of this status or message in the IOS code.
    
    And the FCS does return the 'extended' RMS error status does it not?
    (Or so the sources would seem to indicate.)
    
    ALL-IN-1 reports the extended status as well as the Oafc status.
    
    
    
    Paul.
2731.12Ok, so I was wrong...CHRLIE::HUSTONTue Jun 08 1993 17:1728
    
    Ok, I was under the impression that OafcRmsError was not being 
    returned, just the actual RMS error. 
    
    If OafcRmsError is being returned then the error is definetly coming
    from the FCS. Sorry for the misunderstanding.
    
    Therefore, there are probably 2 ways this can occur:
    
    1) Internal FCS corruption as you pointed out, the FCS built the
       FAB and when it later used it, something had stepped on some
       portion of it.
    
    2) The information that the FCS is reading to build the FAB is bad.
    
    I would lean towards 2 for the simple fact that if something is
    corrupting memory, it would have a tendancy to show itself in alot
    of ways (depending on the size of the corruption of course), and would
    tend to go away when the memory that is corrupted is freed.
    
    THe FCS gets the info from a variety of places, mostly from either
    RMS itself, FC files (DOCDB, DAF etc), or from previous functions.
    I will go back and re-read this string and see if anythign jumps out
    at me. I have not had time to keep up wiht all the possible FCS 
    problems and this is one that I haven't been reading.
    
    --Bob
    
2731.13It's back again !BIS1::DESTRIJCKERBack again to the home townThu Jun 10 1993 13:4433
    Hi again,

    Yes, it's happened again. A user is using RFD to refile a from one
    personal drawer to another personal drawer and gets the Invalid FAB -
    or FAB not accessible. He encountered the same problem last week and as
    usual it went away but it came back.

    I looked at the system, which isn't very busy at all. The disk has got
    over 300000 blocks free and no errors. I can't analyse his .DAT files
    since he's got them open. User has got enough diskquota left.

    Paul,
    on the subject of file cabinet server logs, the oafc$server.log does
    have the same error in it. Unfortunately I only got todays logfile
    left. oafc$server_error.log is empty. The startup file claims the
    server was started successfully. Would it be worth running the server
    as a foreground job?

    TRM is sheduled for this weekend. There seems to be something wrong
    here. I've got 2 sm_fcvr_mail_area log files 5 minutes apart. Both have
    the SMJACKET error: Internal error in housekeeping procedure,
    performing %SMJACKET exit and cleanup processing. It then starts the
    servers (3 of them).

    Would it be a good idea to schedule TRU as well the day after, since
    it's happens to be a sunday.

    Any further suggestions, ideas are more than welcome.

    Regards,

    Wivine.
    
2731.14Couple things to do...CHRLIE::HUSTONThu Jun 10 1993 15:0136
    
    
    re .13
    
    >on the subject of file cabinet server logs, the oafc$server.log does
    >have the same error in it. Unfortunately I only got todays logfile
    >left. oafc$server_error.log is empty. The startup file claims the
    >server was started successfully. Would it be worth running the server
    >as a foreground job?
    
    All this would do for you, is instead of writing the invalid FAB
    error to the log file, you would see it on the screen. Without the
    source code, running in the foreground is not very usefull.
    
    If you are sure that this is being done by the RFD, then contact
    me off line. I have one thing you can try that will give me more
    information (like what FCS routines are being called). I would rather
    not put it in here since I don't want everyone doing it, and I am 
    not positive it will work.
    
    Another thing to try is: Get the system to a state that this is 
    easily reproducible for a user, say user X. Enable server tracing. Have
    the user do what ever he needs to to get the error. Filter out any
    session information not related to the user, then post the formatted
    log file here. It should not be to big after you filter out all the
    other sessions. Leave EVERYTHING that has to do with this users 
    session.
    
    Can you also do an $analyze/image sys$system:oafc$server.exe and
    sys$share:oafc$client_shr.exe and tell me what the image IDs of them
    are?
    
    Thanks
    
    --Bob
    
2731.15Not helpful... but...IOSG::CHINNICKgone walkaboutThu Jun 10 1993 15:0724
    Hi Wivine...
    
    Well, it's kind of difficult to say what you should do here.
    
    My money would be on there being a problem with one or more DAF
    records.
    
    We're investigating the FCS at the moment because I think it doesn't
    cope with corrupt DAF records. Unfortunately, certain forms of corrupt
    DAF record are not corrected by FCVR either!
    
    Then, it may not be either of the users/drawers involved which is
    responsible but another completely separate thread. Fun - huh?!
    
    CSC could probably find any DAF corruption and cure it, but it is beyond
    the allowable scope here. The tool they can use is 'restricted use'.
    
    Of course - it might be something completely different, but I think I'd
    offer long odds.
    
    I'll have to give this some thought as to how to procede. In the
    meantime, I fully expect it to recur regularly.  
    
    Paul.
2731.16So long and thanks for th efish.BIS1::DESTRIJCKERBack again to the home townWed Jun 16 1993 16:3212
    
    I haven't forgotten you all, honest.
    
    I have organised myself so that the user who have encountered this nice
    Invalid FAB error will contact me and I can without delay switch FCS
    tracing on, let them reproduce the error and hopefully I will be able
    to pass to you (I'll have a look at it too, so you don't feel lonely)
    some valuable information.
    
    Talk to you in the near future, I hope.
    
    Wivine.
2731.17Stumble .. stumble ... ouch!BIS1::DESTRIJCKERBack again to the home townFri Jun 18 1993 13:1824
    
    Hi,
    
    This may or may not be related but last night I stumbled on 733 new
    mail messages in the postmasters account. The first one and also the
    eldest dated 17-Nov-1987, had a NOTED status, NO header and NO text.
    There was a 0 block .TXT file in one of the shared ares though. 
    
    The second message in the list was a bit younger i.e. 29_dec-1992, had
    a READ status and looked OK for the rest. Both messages were in the
    INBOX folder!
    
    I cleared out all these messages, noticed that the mail count was out
    by 40 but I could not delete these 2 messages. I removed them manually. 
    
    Verifying the DOCDB, it complained that the MAIL_ORIG field contained
    invalid characters. I tried to read this field to no avail. Funny thing
    is that all subsequent messages received get the same complaint about
    these invalid characters in the DOCDB field MAIL_ORIG. Does this mean
    the postmaster's DOCDB isn't healthy? Should I give it a new one?
    
    In the mean time FCV is still OK.
    
    Wivine. 
2731.18Don't worry about that...IOSG::CHINNICKgone walkaboutFri Jun 18 1993 13:4523
    
    Hi Wivine...
    
    Don't worry about the MAIL_ORIG field - it's because DOCDB has changed
    layout in V3.0 that you'll get that. In fact - don't worry about DOCDB
    at all - it won't cause any problems normally.
    
    Much more to worry about is the DAF files... SDAF and PDAF because
    these have a much more complex structure and if they go wrong, nasty
    things start happening. If you have problems on your DAF files that is
    the most likely thing to cause errors such as those observed.
    
    POSTMASTER is important in the context of MAIL delivery - it's used by
    Sender/Fetcher - but it probably isn't too relevent to FCS. I'd
    concentrate on the SDAF files and the DAF.DAT files in drawer
    directories.
    
    Cleaning out POSTMASTER regularly is a good idea for the helth and
    performance of your MAIL system however.
    
    Regards,
    
    Paul.
2731.19Keep digging . . . . . BIS1::DESTRIJCKERBack again to the home townFri Jun 18 1993 15:026
    
    Hi Paul,
    
    DAF's it'll be then in what ever format.
    
    Wivine.
2731.20Shouldn't POSTMASTER be set to NOMAIL anyway?IOSG::PYEGraham - ALL-IN-1 Sorcerer's ApprenticeFri Jun 18 1993 15:501
    
2731.21But of course.BIS1::DESTRIJCKERBack again to the home townMon Jun 21 1993 10:058
    Graham,
    
    Yes, indeed. The SENDER and FETCHER accounts used by the sender and
    fetcher are set to NO MAIL. The POSTMASTER account isn't, maybe it
    should. The messages it receives are mainly delivery failures from
    messages sent through X400 by people who are not authorized to do so.
    
    Wivine.
2731.22Working...IOSG::CHINNICKgone walkaboutMon Jun 21 1993 11:3423
    
    Wivine,
    
    OK... you are running Concurrent Sender/Fetcher so my comments about
    "keeping clean" apply to SENDER and FETCHER accounts. Even so, these
    accounts are still not related to your FCS problems I'd expect.
    
    As for POSTMASTER being set to NOMAIL... I'm not too sure about what
    effect this would have. GAP might have a better idea? In any event -
    this is a side issue.
    
    I should also mention that we are still looking at the FCS code. Looks
    a bit dodgey in a few places! I always seem to arrive 12 months too
    late to circumvent these problems!
    
    With luck, any PFR might benefit from this detailed probing of FCS
    entrails. Not sure about producing any patches at this stage.
    
    Will keep you posted,
    
    Paul.