| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 2731.1 | FCS problem? | IOSG::MAURICE | Night rolls in, my dark companion | Wed May 19 1993 15:26 | 10 | 
|  |     The messages originate from the File Cabinet server, so I recommend
    you check the log files to see if there is further information in
    there.
    
    The messages are not really informational. They get reported by the FCS
    as error, but get downgraded by the ALL-IN-1 client before display.
    
    HTH
    
    Stuart
 | 
| 2731.2 | GONE ! DISAPPEARED ! | BIS1::DESTRIJCKER | Back again to the home town | Mon May 24 1993 16:14 | 9 | 
|  |     Well, this was an easy one. The error corrected itself. Maybe File
    cabinet reorganize had something to do with it. This housekeeping
    procedure was run over the weekend. 
    
    I can not reproduce it anymore. All that remains is the trace.
    
    Oh well, thanks for the help anyway.
    
    Wivine.
 | 
| 2731.3 |  | BUSHIE::SETHI | Ahhhh (-: an upside down smile from OZ | Tue May 25 1993 00:57 | 12 | 
|  |     Hi Wivine,
    
    My guess is that the DOCDB.DAT or DAF.DAT had a problem with the File
    Access Block of some kind (corruption).  The File cabinet
    reorganisation did a convert/fdl=oa$data:docdb.fdl (or pdaf.fdl) and
    cleared the problem.  It would be interesting to know when you do an
    analyze/rms/check on the previous versions of the above mentioned
    .dat's if any errors are reported.
    
    Regards,
    
    Sunil
 | 
| 2731.4 | Nothing to go by. | BIS1::DESTRIJCKER | Back again to the home town | Thu Jun 03 1993 14:12 | 17 | 
|  |     Yes, that could well be the case for this occurrence. Reorganise filing
    cabinets runs every weekend. And it was working OK again on the
    following monday. I'm happy with this one.
    
    But! It doesn't explain why last week tuesday the error popped up again
    with somebody else who was trying to forward a message from a shared
    drawer (not her's). The next day it worked 8-). Only CDQ runs daily,
    and EW every other day but it doesn't reorganise files. 
    
    Since then I haven't had any complaints anymore. This FAB error seems
    to be very temperamental. I shall start monitering the cluster. Perhaps
    it has to do with system resources! I've been told that there are
    periods it occurs daily and periods when it doesn't happen at all.
    
    I'll keep you all posted if I do find something new.
    
    Wivine.
 | 
| 2731.5 | Check the disks first done repair them before checking | TINNIE::SETHI | Ahhhh (-: an upside down smile from OZ | Fri Jun 04 1993 00:28 | 24 | 
|  |     Hi Wivine,
    
    >Since then I haven't had any complaints anymore. This FAB error seems
    >to be very temperamental. I shall start monitering the cluster.
    
    Since this is happening to others what I would suggest is that you do
    an $analyze/disk/read_check/norepair on your disks.  Ask the customer
    if they have done an $analyze/disk/read_check/REPAIR note I put the
    repair in uppercase.  OpenVMS version 5.5-1 and below had a slight
    misfeature in that they actually corrupted disks if the repair was
    used.  This ONLY happened under certain circumstances so check the
    error with the OpenVMS support group (CSC), before you repair the disk.
    By the way if the customer did repair the disk and it's corrupted there
    is no way of repairing the damage.  Please don't say anything to the
    customers let your manager deal with it.
    
    Why I have mentioned the above is because I have delt with a number of
    calls that had the above types of problems.  Also the above problem has
    been fixed in 5.5-2 again doing a repair will not fix the problem.  If
    you need more help let me know.
    
    Regards,
    
    Sunil
 | 
| 2731.6 | It's worth a try | BIS1::DESTRIJCKER | Back again to the home town | Fri Jun 04 1993 09:52 | 12 | 
|  |     
    Thanks for the advice, I'll schedule an analyse disk maybe this
    weekend. Sounds definately worthwhile trying.
    
    BTW, the customer happens to be Digital itself, the IS department in
    Brussels. I support also the Luxemburg ALL-IN-1 machine and another
    ALL-IN-1 cluster here in Brussels. The FAB error only occurs on the
    biggest ALL-IN-1 cluster.
    
    I'll keep you posted.
    
    Wivine.
 | 
| 2731.7 | Please do not discuss this with a customer*WARNING* | TINNIE::SETHI | Ahhhh (-: an upside down smile from OZ | Mon Jun 07 1993 01:57 | 37 | 
|  |     Hi Wivine,
    
    The type of error that would indicate that the data on your disk maybe
    corrupted is:
    
    The following error messages MAY be returned on systems                   
    experiencing this problem:                                                
                                                                                
      VERIFY-I-MULTALLOC, file ('file-id') 'filename' multiply                  
                          allocated blocks VBN 'n' to 'n' LBN 'n'               
                          to 'n', RVN 'n'                                       
                                                                                
      VERIFY-I-LOSTEXTHDR, file ('file-id') 'filename' lost                     
                           extension file header                                
                                                                                
      VERIFY-I-MAPAREA, file ('file-id) 'filename' invalid map area            
                                                                               
    NOTE:  This problem ONLY occurs when repairing a volume                    
           with lost extension file headers.  It does not occur                
           every time you repair a disk volume using the VERIFY                 
           Utility.                                              
    
    A stars article called "OpenVMS] ANALYZE/DISK/REPAIR Causes Mult
    Allocated Blocks/Corruption", has all the details.
    
    Again I must emphasise please don't discuss this with your customers
    let your manager deal with this.  It's a very sensitive issue as I have
    found out at some sites and you don't want to get involved in the
    politics of this.  Also the article warns you not to discuss this with
    the customer, that does not mean that we forget about the problem.
    
    I think I may have a site with this problem I am crossing my fingers it
    isn't.
    
    Regards,
    
    Sunil      
 | 
| 2731.8 | Probably a VM corruption in FCS. | IOSG::CHINNICK | gone walkabout | Tue Jun 08 1993 12:56 | 39 | 
|  |     
    Hi Wivine...
    
    Personally, I doubt that this error results from a disk corruption or
    even an RMS file corruption.
    
    The RMS$_FAB and RMS$_RAB statuses reflect that the File ACcess Block
    or Record Access block are not at a valid address or have been
    corrupted in some way.
    
    These blocks are used for access through the RMS services and in no way
    relate to disk structures. They will be allocated in memory by the FCS
    or the IOS kernel (depending on what you are using and accessing) and
    the address o fthese blocks are passed to servioces such as $OPEN, $GET,
    $PUT etc.
    
    Most likely, the problem you have results from some form of memory
    corruption taking place inside the FCS. This conference is littered
    with similar problems where files are being left open or other errors
    are occuring.
    
    The problem with FCS is that it is an extrememly complex piece of
    software which performs the file cabinet manipulation as does ALL-IN-1
    but also has to worry about authentication and communication with
    clients AND running multiple threads. You might get this error because
    of something else which someone completely different has asked the FCS
    to perform.
    
    The most probably cause of this error can be corruption of the DAF
    records in the SDAF, PDAFs in shared drawers or PENDING. I'd suggest
    that you try to get these files checked out or at the very least see if
    TRU/TRM is getting run on the site and if any problems are being
    reported. [CSC's have some tools which can help here.]
    
    You might well check your FCS logs and see if you've been getting
    things like thread ACCVIOs or other conditions - these would be
    confirmation that the FCS is getting this type of error.
    
    Paul.
 | 
| 2731.10 | Probably not the FCS... | CHRLIE::HUSTON |  | Tue Jun 08 1993 14:20 | 14 | 
|  |     
    re .8 and .9
    
    I don't think it is the FCS, simply becuase the FCS would return a 
    status of OafcRmsError, not the actaul RMS error. You would simply
    get an error saying there was an RMS error, if you hit GOLD-W you would
    then possibly see the actual RMS error.  In the FCS, if ANY RMS
    operation returns an error, it is masked to OafcRmsError (same for
    DASL errors, they go to OafcDaslError). Why? simple, the person
    who the error is returned to may be non-VMS, in which case giving 
    them an RMS error would be meanlingless.
    
    --Bob
    
 | 
| 2731.11 | Sure looks like FCS | IOSG::CHINNICK | gone walkabout | Tue Jun 08 1993 14:40 | 15 | 
|  |     
    Not the FCS?    I might just beg to differ on that count.
    
    Well, the text quoted is for OafcRmsError status from OAFC$MESSAGES.MSG.
    
    There are no instances of this status or message in the IOS code.
    
    And the FCS does return the 'extended' RMS error status does it not?
    (Or so the sources would seem to indicate.)
    
    ALL-IN-1 reports the extended status as well as the Oafc status.
    
    
    
    Paul.
 | 
| 2731.12 | Ok, so I was wrong... | CHRLIE::HUSTON |  | Tue Jun 08 1993 16:17 | 28 | 
|  |     
    Ok, I was under the impression that OafcRmsError was not being 
    returned, just the actual RMS error. 
    
    If OafcRmsError is being returned then the error is definetly coming
    from the FCS. Sorry for the misunderstanding.
    
    Therefore, there are probably 2 ways this can occur:
    
    1) Internal FCS corruption as you pointed out, the FCS built the
       FAB and when it later used it, something had stepped on some
       portion of it.
    
    2) The information that the FCS is reading to build the FAB is bad.
    
    I would lean towards 2 for the simple fact that if something is
    corrupting memory, it would have a tendancy to show itself in alot
    of ways (depending on the size of the corruption of course), and would
    tend to go away when the memory that is corrupted is freed.
    
    THe FCS gets the info from a variety of places, mostly from either
    RMS itself, FC files (DOCDB, DAF etc), or from previous functions.
    I will go back and re-read this string and see if anythign jumps out
    at me. I have not had time to keep up wiht all the possible FCS 
    problems and this is one that I haven't been reading.
    
    --Bob
    
 | 
| 2731.13 | It's back again ! | BIS1::DESTRIJCKER | Back again to the home town | Thu Jun 10 1993 12:44 | 33 | 
|  |     Hi again,
    Yes, it's happened again. A user is using RFD to refile a from one
    personal drawer to another personal drawer and gets the Invalid FAB -
    or FAB not accessible. He encountered the same problem last week and as
    usual it went away but it came back.
    I looked at the system, which isn't very busy at all. The disk has got
    over 300000 blocks free and no errors. I can't analyse his .DAT files
    since he's got them open. User has got enough diskquota left.
    Paul,
    on the subject of file cabinet server logs, the oafc$server.log does
    have the same error in it. Unfortunately I only got todays logfile
    left. oafc$server_error.log is empty. The startup file claims the
    server was started successfully. Would it be worth running the server
    as a foreground job?
    TRM is sheduled for this weekend. There seems to be something wrong
    here. I've got 2 sm_fcvr_mail_area log files 5 minutes apart. Both have
    the SMJACKET error: Internal error in housekeeping procedure,
    performing %SMJACKET exit and cleanup processing. It then starts the
    servers (3 of them).
    Would it be a good idea to schedule TRU as well the day after, since
    it's happens to be a sunday.
    Any further suggestions, ideas are more than welcome.
    Regards,
    Wivine.
    
 | 
| 2731.14 | Couple things to do... | CHRLIE::HUSTON |  | Thu Jun 10 1993 14:01 | 36 | 
|  |     
    
    re .13
    
    >on the subject of file cabinet server logs, the oafc$server.log does
    >have the same error in it. Unfortunately I only got todays logfile
    >left. oafc$server_error.log is empty. The startup file claims the
    >server was started successfully. Would it be worth running the server
    >as a foreground job?
    
    All this would do for you, is instead of writing the invalid FAB
    error to the log file, you would see it on the screen. Without the
    source code, running in the foreground is not very usefull.
    
    If you are sure that this is being done by the RFD, then contact
    me off line. I have one thing you can try that will give me more
    information (like what FCS routines are being called). I would rather
    not put it in here since I don't want everyone doing it, and I am 
    not positive it will work.
    
    Another thing to try is: Get the system to a state that this is 
    easily reproducible for a user, say user X. Enable server tracing. Have
    the user do what ever he needs to to get the error. Filter out any
    session information not related to the user, then post the formatted
    log file here. It should not be to big after you filter out all the
    other sessions. Leave EVERYTHING that has to do with this users 
    session.
    
    Can you also do an $analyze/image sys$system:oafc$server.exe and
    sys$share:oafc$client_shr.exe and tell me what the image IDs of them
    are?
    
    Thanks
    
    --Bob
    
 | 
| 2731.15 | Not helpful... but... | IOSG::CHINNICK | gone walkabout | Thu Jun 10 1993 14:07 | 24 | 
|  |     Hi Wivine...
    
    Well, it's kind of difficult to say what you should do here.
    
    My money would be on there being a problem with one or more DAF
    records.
    
    We're investigating the FCS at the moment because I think it doesn't
    cope with corrupt DAF records. Unfortunately, certain forms of corrupt
    DAF record are not corrected by FCVR either!
    
    Then, it may not be either of the users/drawers involved which is
    responsible but another completely separate thread. Fun - huh?!
    
    CSC could probably find any DAF corruption and cure it, but it is beyond
    the allowable scope here. The tool they can use is 'restricted use'.
    
    Of course - it might be something completely different, but I think I'd
    offer long odds.
    
    I'll have to give this some thought as to how to procede. In the
    meantime, I fully expect it to recur regularly.  
    
    Paul.
 | 
| 2731.16 | So long and thanks for th efish. | BIS1::DESTRIJCKER | Back again to the home town | Wed Jun 16 1993 15:32 | 12 | 
|  |     
    I haven't forgotten you all, honest.
    
    I have organised myself so that the user who have encountered this nice
    Invalid FAB error will contact me and I can without delay switch FCS
    tracing on, let them reproduce the error and hopefully I will be able
    to pass to you (I'll have a look at it too, so you don't feel lonely)
    some valuable information.
    
    Talk to you in the near future, I hope.
    
    Wivine.
 | 
| 2731.17 | Stumble .. stumble ... ouch! | BIS1::DESTRIJCKER | Back again to the home town | Fri Jun 18 1993 12:18 | 24 | 
|  |     
    Hi,
    
    This may or may not be related but last night I stumbled on 733 new
    mail messages in the postmasters account. The first one and also the
    eldest dated 17-Nov-1987, had a NOTED status, NO header and NO text.
    There was a 0 block .TXT file in one of the shared ares though. 
    
    The second message in the list was a bit younger i.e. 29_dec-1992, had
    a READ status and looked OK for the rest. Both messages were in the
    INBOX folder!
    
    I cleared out all these messages, noticed that the mail count was out
    by 40 but I could not delete these 2 messages. I removed them manually. 
    
    Verifying the DOCDB, it complained that the MAIL_ORIG field contained
    invalid characters. I tried to read this field to no avail. Funny thing
    is that all subsequent messages received get the same complaint about
    these invalid characters in the DOCDB field MAIL_ORIG. Does this mean
    the postmaster's DOCDB isn't healthy? Should I give it a new one?
    
    In the mean time FCV is still OK.
    
    Wivine. 
 | 
| 2731.18 | Don't worry about that... | IOSG::CHINNICK | gone walkabout | Fri Jun 18 1993 12:45 | 23 | 
|  |     
    Hi Wivine...
    
    Don't worry about the MAIL_ORIG field - it's because DOCDB has changed
    layout in V3.0 that you'll get that. In fact - don't worry about DOCDB
    at all - it won't cause any problems normally.
    
    Much more to worry about is the DAF files... SDAF and PDAF because
    these have a much more complex structure and if they go wrong, nasty
    things start happening. If you have problems on your DAF files that is
    the most likely thing to cause errors such as those observed.
    
    POSTMASTER is important in the context of MAIL delivery - it's used by
    Sender/Fetcher - but it probably isn't too relevent to FCS. I'd
    concentrate on the SDAF files and the DAF.DAT files in drawer
    directories.
    
    Cleaning out POSTMASTER regularly is a good idea for the helth and
    performance of your MAIL system however.
    
    Regards,
    
    Paul.
 | 
| 2731.19 | Keep digging . . . . . | BIS1::DESTRIJCKER | Back again to the home town | Fri Jun 18 1993 14:02 | 6 | 
|  |     
    Hi Paul,
    
    DAF's it'll be then in what ever format.
    
    Wivine.
 | 
| 2731.20 | Shouldn't POSTMASTER be set to NOMAIL anyway? | IOSG::PYE | Graham - ALL-IN-1 Sorcerer's Apprentice | Fri Jun 18 1993 14:50 | 1 | 
|  |     
 | 
| 2731.21 | But of course. | BIS1::DESTRIJCKER | Back again to the home town | Mon Jun 21 1993 09:05 | 8 | 
|  |     Graham,
    
    Yes, indeed. The SENDER and FETCHER accounts used by the sender and
    fetcher are set to NO MAIL. The POSTMASTER account isn't, maybe it
    should. The messages it receives are mainly delivery failures from
    messages sent through X400 by people who are not authorized to do so.
    
    Wivine.
 | 
| 2731.22 | Working... | IOSG::CHINNICK | gone walkabout | Mon Jun 21 1993 10:34 | 23 | 
|  |     
    Wivine,
    
    OK... you are running Concurrent Sender/Fetcher so my comments about
    "keeping clean" apply to SENDER and FETCHER accounts. Even so, these
    accounts are still not related to your FCS problems I'd expect.
    
    As for POSTMASTER being set to NOMAIL... I'm not too sure about what
    effect this would have. GAP might have a better idea? In any event -
    this is a side issue.
    
    I should also mention that we are still looking at the FCS code. Looks
    a bit dodgey in a few places! I always seem to arrive 12 months too
    late to circumvent these problems!
    
    With luck, any PFR might benefit from this detailed probing of FCS
    entrails. Not sure about producing any patches at this stage.
    
    Will keep you posted,
    
    Paul.
    
    
 |