[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:	OLD ALL-IN-1 (tm) Support Conference
Notice:	Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:	IOSG::PYE

Created:	Thu Jan 30 1992
Last Modified:	Tue Jan 23 1996
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4343
Total number of notes:	18308

2731.0. "RMS error - Invalid FAB or FAB not accessible" by BIS1::DESTRIJCKER (Back again to the home town) Wed May 19 1993 14:56

    
    Hi everybody,
    
    One of my users is trying to do an RFD of several READ mail messages
    into a shared folder to which she has got all the access possible.
    
    The error is:
    RMS error has occurred. Refer to extended status for RMS error code.
    
    Gold W reveals:
    Invalid FAB or FAB not accessible
    
    What FAB in what file is ALL-IN-1 talking about? Funny thing is that
    all these messages are informational only yet the message is not
    refiled. The folder in the shared drawer does not exist yet, i.e. is
    being created while refiling.
    
    So far I have checked the following:
    - user's whole directory structure has S:RWED set
    - the user owns this shared drawer
    - the message was created on another system and sent accross the
      network 
    - all user's *.dat files have S:RWED set
    - the message file in SHARB has S:RWED protection
    
    I ever tried to RFD this message myself by using the ALLIN1 account and
    NEWDIR in her account, and I get the same error.
    
    User has plenty diskquota and document quota left.
    
    Did I miss to check something important? 
    Where do I go from here?
    
    Thanks in advance,
    
    Wivine.

T.R	Title	User	Personal Name	Date	Lines
2731.1	FCS problem?	IOSG::MAURICE	Night rolls in, my dark companion	`Wed May 19 1993 15:26`	10
	The messages originate from the File Cabinet server, so I recommend you check the log files to see if there is further information in there. The messages are not really informational. They get reported by the FCS as error, but get downgraded by the ALL-IN-1 client before display. HTH Stuart
2731.2	GONE ! DISAPPEARED !	BIS1::DESTRIJCKER	Back again to the home town	`Mon May 24 1993 16:14`	9
	Well, this was an easy one. The error corrected itself. Maybe File cabinet reorganize had something to do with it. This housekeeping procedure was run over the weekend. I can not reproduce it anymore. All that remains is the trace. Oh well, thanks for the help anyway. Wivine.
2731.3		BUSHIE::SETHI	Ahhhh (-: an upside down smile from OZ	`Tue May 25 1993 00:57`	12
	Hi Wivine, My guess is that the DOCDB.DAT or DAF.DAT had a problem with the File Access Block of some kind (corruption). The File cabinet reorganisation did a convert/fdl=oa$data:docdb.fdl (or pdaf.fdl) and cleared the problem. It would be interesting to know when you do an analyze/rms/check on the previous versions of the above mentioned .dat's if any errors are reported. Regards, Sunil
2731.4	Nothing to go by.	BIS1::DESTRIJCKER	Back again to the home town	`Thu Jun 03 1993 14:12`	17
	Yes, that could well be the case for this occurrence. Reorganise filing cabinets runs every weekend. And it was working OK again on the following monday. I'm happy with this one. But! It doesn't explain why last week tuesday the error popped up again with somebody else who was trying to forward a message from a shared drawer (not her's). The next day it worked 8-). Only CDQ runs daily, and EW every other day but it doesn't reorganise files. Since then I haven't had any complaints anymore. This FAB error seems to be very temperamental. I shall start monitering the cluster. Perhaps it has to do with system resources! I've been told that there are periods it occurs daily and periods when it doesn't happen at all. I'll keep you all posted if I do find something new. Wivine.
2731.5	Check the disks first done repair them before checking	TINNIE::SETHI	Ahhhh (-: an upside down smile from OZ	`Fri Jun 04 1993 00:28`	24
	Hi Wivine, >Since then I haven't had any complaints anymore. This FAB error seems >to be very temperamental. I shall start monitering the cluster. Since this is happening to others what I would suggest is that you do an $analyze/disk/read_check/norepair on your disks. Ask the customer if they have done an $analyze/disk/read_check/REPAIR note I put the repair in uppercase. OpenVMS version 5.5-1 and below had a slight misfeature in that they actually corrupted disks if the repair was used. This ONLY happened under certain circumstances so check the error with the OpenVMS support group (CSC), before you repair the disk. By the way if the customer did repair the disk and it's corrupted there is no way of repairing the damage. Please don't say anything to the customers let your manager deal with it. Why I have mentioned the above is because I have delt with a number of calls that had the above types of problems. Also the above problem has been fixed in 5.5-2 again doing a repair will not fix the problem. If you need more help let me know. Regards, Sunil
2731.6	It's worth a try	BIS1::DESTRIJCKER	Back again to the home town	`Fri Jun 04 1993 09:52`	12
	Thanks for the advice, I'll schedule an analyse disk maybe this weekend. Sounds definately worthwhile trying. BTW, the customer happens to be Digital itself, the IS department in Brussels. I support also the Luxemburg ALL-IN-1 machine and another ALL-IN-1 cluster here in Brussels. The FAB error only occurs on the biggest ALL-IN-1 cluster. I'll keep you posted. Wivine.
2731.7	Please do not discuss this with a customerWARNING	TINNIE::SETHI	Ahhhh (-: an upside down smile from OZ	`Mon Jun 07 1993 01:57`	37
	Hi Wivine, The type of error that would indicate that the data on your disk maybe corrupted is: The following error messages MAY be returned on systems experiencing this problem: VERIFY-I-MULTALLOC, file ('file-id') 'filename' multiply allocated blocks VBN 'n' to 'n' LBN 'n' to 'n', RVN 'n' VERIFY-I-LOSTEXTHDR, file ('file-id') 'filename' lost extension file header VERIFY-I-MAPAREA, file ('file-id) 'filename' invalid map area NOTE: This problem ONLY occurs when repairing a volume with lost extension file headers. It does not occur every time you repair a disk volume using the VERIFY Utility. A stars article called "OpenVMS] ANALYZE/DISK/REPAIR Causes Mult Allocated Blocks/Corruption", has all the details. Again I must emphasise please don't discuss this with your customers let your manager deal with this. It's a very sensitive issue as I have found out at some sites and you don't want to get involved in the politics of this. Also the article warns you not to discuss this with the customer, that does not mean that we forget about the problem. I think I may have a site with this problem I am crossing my fingers it isn't. Regards, Sunil
2731.8	Probably a VM corruption in FCS.	IOSG::CHINNICK	gone walkabout	`Tue Jun 08 1993 12:56`	39
	Hi Wivine... Personally, I doubt that this error results from a disk corruption or even an RMS file corruption. The RMS$_FAB and RMS$_RAB statuses reflect that the File ACcess Block or Record Access block are not at a valid address or have been corrupted in some way. These blocks are used for access through the RMS services and in no way relate to disk structures. They will be allocated in memory by the FCS or the IOS kernel (depending on what you are using and accessing) and the address o fthese blocks are passed to servioces such as $OPEN, $GET, $PUT etc. Most likely, the problem you have results from some form of memory corruption taking place inside the FCS. This conference is littered with similar problems where files are being left open or other errors are occuring. The problem with FCS is that it is an extrememly complex piece of software which performs the file cabinet manipulation as does ALL-IN-1 but also has to worry about authentication and communication with clients AND running multiple threads. You might get this error because of something else which someone completely different has asked the FCS to perform. The most probably cause of this error can be corruption of the DAF records in the SDAF, PDAFs in shared drawers or PENDING. I'd suggest that you try to get these files checked out or at the very least see if TRU/TRM is getting run on the site and if any problems are being reported. [CSC's have some tools which can help here.] You might well check your FCS logs and see if you've been getting things like thread ACCVIOs or other conditions - these would be confirmation that the FCS is getting this type of error. Paul.
2731.10	Probably not the FCS...	CHRLIE::HUSTON		`Tue Jun 08 1993 14:20`	14
	re .8 and .9 I don't think it is the FCS, simply becuase the FCS would return a status of OafcRmsError, not the actaul RMS error. You would simply get an error saying there was an RMS error, if you hit GOLD-W you would then possibly see the actual RMS error. In the FCS, if ANY RMS operation returns an error, it is masked to OafcRmsError (same for DASL errors, they go to OafcDaslError). Why? simple, the person who the error is returned to may be non-VMS, in which case giving them an RMS error would be meanlingless. --Bob
2731.11	Sure looks like FCS	IOSG::CHINNICK	gone walkabout	`Tue Jun 08 1993 14:40`	15
	Not the FCS? I might just beg to differ on that count. Well, the text quoted is for OafcRmsError status from OAFC$MESSAGES.MSG. There are no instances of this status or message in the IOS code. And the FCS does return the 'extended' RMS error status does it not? (Or so the sources would seem to indicate.) ALL-IN-1 reports the extended status as well as the Oafc status. Paul.
2731.12	Ok, so I was wrong...	CHRLIE::HUSTON		`Tue Jun 08 1993 16:17`	28
	Ok, I was under the impression that OafcRmsError was not being returned, just the actual RMS error. If OafcRmsError is being returned then the error is definetly coming from the FCS. Sorry for the misunderstanding. Therefore, there are probably 2 ways this can occur: 1) Internal FCS corruption as you pointed out, the FCS built the FAB and when it later used it, something had stepped on some portion of it. 2) The information that the FCS is reading to build the FAB is bad. I would lean towards 2 for the simple fact that if something is corrupting memory, it would have a tendancy to show itself in alot of ways (depending on the size of the corruption of course), and would tend to go away when the memory that is corrupted is freed. THe FCS gets the info from a variety of places, mostly from either RMS itself, FC files (DOCDB, DAF etc), or from previous functions. I will go back and re-read this string and see if anythign jumps out at me. I have not had time to keep up wiht all the possible FCS problems and this is one that I haven't been reading. --Bob
2731.13	It's back again !	BIS1::DESTRIJCKER	Back again to the home town	`Thu Jun 10 1993 12:44`	33
	Hi again, Yes, it's happened again. A user is using RFD to refile a from one personal drawer to another personal drawer and gets the Invalid FAB - or FAB not accessible. He encountered the same problem last week and as usual it went away but it came back. I looked at the system, which isn't very busy at all. The disk has got over 300000 blocks free and no errors. I can't analyse his .DAT files since he's got them open. User has got enough diskquota left. Paul, on the subject of file cabinet server logs, the oafc$server.log does have the same error in it. Unfortunately I only got todays logfile left. oafc$server_error.log is empty. The startup file claims the server was started successfully. Would it be worth running the server as a foreground job? TRM is sheduled for this weekend. There seems to be something wrong here. I've got 2 sm_fcvr_mail_area log files 5 minutes apart. Both have the SMJACKET error: Internal error in housekeeping procedure, performing %SMJACKET exit and cleanup processing. It then starts the servers (3 of them). Would it be a good idea to schedule TRU as well the day after, since it's happens to be a sunday. Any further suggestions, ideas are more than welcome. Regards, Wivine.
2731.14	Couple things to do...	CHRLIE::HUSTON		`Thu Jun 10 1993 14:01`	36
	re .13 >on the subject of file cabinet server logs, the oafc$server.log does >have the same error in it. Unfortunately I only got todays logfile >left. oafc$server_error.log is empty. The startup file claims the >server was started successfully. Would it be worth running the server >as a foreground job? All this would do for you, is instead of writing the invalid FAB error to the log file, you would see it on the screen. Without the source code, running in the foreground is not very usefull. If you are sure that this is being done by the RFD, then contact me off line. I have one thing you can try that will give me more information (like what FCS routines are being called). I would rather not put it in here since I don't want everyone doing it, and I am not positive it will work. Another thing to try is: Get the system to a state that this is easily reproducible for a user, say user X. Enable server tracing. Have the user do what ever he needs to to get the error. Filter out any session information not related to the user, then post the formatted log file here. It should not be to big after you filter out all the other sessions. Leave EVERYTHING that has to do with this users session. Can you also do an $analyze/image sys$system:oafc$server.exe and sys$share:oafc$client_shr.exe and tell me what the image IDs of them are? Thanks --Bob
2731.15	Not helpful... but...	IOSG::CHINNICK	gone walkabout	`Thu Jun 10 1993 14:07`	24
	Hi Wivine... Well, it's kind of difficult to say what you should do here. My money would be on there being a problem with one or more DAF records. We're investigating the FCS at the moment because I think it doesn't cope with corrupt DAF records. Unfortunately, certain forms of corrupt DAF record are not corrected by FCVR either! Then, it may not be either of the users/drawers involved which is responsible but another completely separate thread. Fun - huh?! CSC could probably find any DAF corruption and cure it, but it is beyond the allowable scope here. The tool they can use is 'restricted use'. Of course - it might be something completely different, but I think I'd offer long odds. I'll have to give this some thought as to how to procede. In the meantime, I fully expect it to recur regularly. Paul.
2731.16	So long and thanks for th efish.	BIS1::DESTRIJCKER	Back again to the home town	`Wed Jun 16 1993 15:32`	12
	I haven't forgotten you all, honest. I have organised myself so that the user who have encountered this nice Invalid FAB error will contact me and I can without delay switch FCS tracing on, let them reproduce the error and hopefully I will be able to pass to you (I'll have a look at it too, so you don't feel lonely) some valuable information. Talk to you in the near future, I hope. Wivine.
2731.17	Stumble .. stumble ... ouch!	BIS1::DESTRIJCKER	Back again to the home town	`Fri Jun 18 1993 12:18`	24
	Hi, This may or may not be related but last night I stumbled on 733 new mail messages in the postmasters account. The first one and also the eldest dated 17-Nov-1987, had a NOTED status, NO header and NO text. There was a 0 block .TXT file in one of the shared ares though. The second message in the list was a bit younger i.e. 29_dec-1992, had a READ status and looked OK for the rest. Both messages were in the INBOX folder! I cleared out all these messages, noticed that the mail count was out by 40 but I could not delete these 2 messages. I removed them manually. Verifying the DOCDB, it complained that the MAIL_ORIG field contained invalid characters. I tried to read this field to no avail. Funny thing is that all subsequent messages received get the same complaint about these invalid characters in the DOCDB field MAIL_ORIG. Does this mean the postmaster's DOCDB isn't healthy? Should I give it a new one? In the mean time FCV is still OK. Wivine.
2731.18	Don't worry about that...	IOSG::CHINNICK	gone walkabout	`Fri Jun 18 1993 12:45`	23
	Hi Wivine... Don't worry about the MAIL_ORIG field - it's because DOCDB has changed layout in V3.0 that you'll get that. In fact - don't worry about DOCDB at all - it won't cause any problems normally. Much more to worry about is the DAF files... SDAF and PDAF because these have a much more complex structure and if they go wrong, nasty things start happening. If you have problems on your DAF files that is the most likely thing to cause errors such as those observed. POSTMASTER is important in the context of MAIL delivery - it's used by Sender/Fetcher - but it probably isn't too relevent to FCS. I'd concentrate on the SDAF files and the DAF.DAT files in drawer directories. Cleaning out POSTMASTER regularly is a good idea for the helth and performance of your MAIL system however. Regards, Paul.
2731.19	Keep digging . . . . .	BIS1::DESTRIJCKER	Back again to the home town	`Fri Jun 18 1993 14:02`	6
	Hi Paul, DAF's it'll be then in what ever format. Wivine.
2731.20	Shouldn't POSTMASTER be set to NOMAIL anyway?	IOSG::PYE	Graham - ALL-IN-1 Sorcerer's Apprentice	`Fri Jun 18 1993 14:50`	1

2731.21	But of course.	BIS1::DESTRIJCKER	Back again to the home town	`Mon Jun 21 1993 09:05`	8
	Graham, Yes, indeed. The SENDER and FETCHER accounts used by the sender and fetcher are set to NO MAIL. The POSTMASTER account isn't, maybe it should. The messages it receives are mainly delivery failures from messages sent through X400 by people who are not authorized to do so. Wivine.
2731.22	Working...	IOSG::CHINNICK	gone walkabout	`Mon Jun 21 1993 10:34`	23
	Wivine, OK... you are running Concurrent Sender/Fetcher so my comments about "keeping clean" apply to SENDER and FETCHER accounts. Even so, these accounts are still not related to your FCS problems I'd expect. As for POSTMASTER being set to NOMAIL... I'm not too sure about what effect this would have. GAP might have a better idea? In any event - this is a side issue. I should also mention that we are still looking at the FCS code. Looks a bit dodgey in a few places! I always seem to arrive 12 months too late to circumvent these problems! With luck, any PFR might benefit from this detailed probing of FCS entrails. Not sure about producing any patches at this stage. Will keep you posted, Paul.