[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:	OLD ALL-IN-1 (tm) Support Conference
Notice:	Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:	IOSG::PYE

Created:	Thu Jan 30 1992
Last Modified:	Tue Jan 23 1996
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4343
Total number of notes:	18308

2597.0. "FCS - INSCHANNELCNT, Access violation, Internal error" by TINNIE::SETHI (Ah (-: an upside down smile from Oz) Wed Apr 21 1993 06:06

    Hi All,

    A customer has ALL-IN-1 IOS 3.0-1 installed and is experiencing
    problems with the FCS, they are also using TeamLinks 1.1.  The logfile
    contains several error messages and in particular: 

     1-APR-1993 15:48:30.83  Server: CNB06V::"73="
    Error: %OAFC-E-INTERR, Internal error in File Cabinet Server
    Message: FCS has access violated, please submit an SPR.

     2-APR-1993 08:45:12.25  Server: CNB06V::"73="
    Error: %RMS-E-RNF, record not found  Message: CsiOpenRmsFileCab;
    error getting record: LANIGAN MICHELE

     2-APR-1993 08:45:30.30  Server: CNB06V::"73="  Error:
    %RMS-E-RNF, record not found  Message: CsiOpenRmsFileCab; error getting
    record: LANIGAN MICHELE

     2-APR-1993 10:45:29.14  Server: CNB06V::"73="  Error:
    %MCC-E-IN_USE_ERROR, in use error  Message: CsiCacheFlushDrawerAccess;
    Error from mcc_mutex_try_lock

    .....

    19-APR-1993 08:27:47.26  Server: CNB08V::"73="
    Error: %OAFC-E-INTERR, Internal error in File Cabinet Server  Message:
    CsiOpenDrawer; DOCDB file/dev id does not match IUID in partition,
    continuing

    19-APR-1993 09:49:21.33  Server: CNB08V::"73="  Error:
    %MCC-E-REQARG, required argument is missing

    19-APR-1993 09:49:41.87  Server: CNB08V::"73="  Error:
    %MCC-E-EXISTENCE_ERROR, object does not exist

    19-APR-1993 09:50:03.28  Server: CNB08V::"73="  Error:
    %SYSTEM-E-ACCVIO, access violation, reason mask=!XB, virtualaddress=!XL,
    PC=!XL, PSL=!XL

    19-APR-1993 09:50:34.33  Server: CNB08V::"73="  Error:%MCC-E-ALERT_TERMREQ,
    thread termination requested  Message: CsiOpenRmsFC; error opening
    filecab: DISK$USER21:[PDD1.DOHERM.A1]

    19-APR-1993 09:50:42.06  Server: CNB08V::"73="  Error:
    %MCC-E-ALERT_TERMREQ, thread termination requested  Message: CsiOpenRmsFC;
    error opening filecab: DISK$DCSH_USR11:[AGHSACT.BAILYB.A1]

    19-APR-1993 09:50:43.65  Server: CNB08V::"73="  Error:
    %MCC-E-ALERT_TERMREQ,thread termination requested  Message: CsiOpenRmsFC; 
    error opening filecab:DISK$USER21:[PDD1.THOMALE.A1]
                          
    ......

    19-APR-1993 13:37:56.49  Server: CNB06V::"73="
    Message: CHANNELCNT low - flushing drawer from cache

    19-APR-1993 13:38:00.16  Server: CNB06V::"73="
    Message: Drawer flushed from cache for IOCHANNELCNT

    I have asked the the customer to look for an error message of the kind
    "%DSL-E-INVSPEC", trun on tracing and when the problem occurs to get
    the various values from SM MFC MS SAI (Server Audit Information), in
    particular the CHANNELCNT and num.  To me it looks like a process is
    chewing up the I/O CHANNELS or not releasing them for some reason.

    I saw the Stars article "Error: %OAFC-E-INSCHANNELCNT, Insufficient
    CHANNELCNT", which suggested if a user does a control-y while editing a
    remote document than reenters ALL-IN-1 to edit the same document you
    could get the above problem.  Is this the case in 3.0-1 still ?  I
    don't think it's the same problem but something quite similar.

    Logfiles (customer edited the 3000 block file) can be found on
    RIPPER::USER$TSC:[SETHI]DHHCS.LOG1 and DHHCS.LOG2.
    
    Thanks for your help in advance,

    Sunil

T.R	Title	User	Personal Name	Date	Lines
2597.1	More of the same ??	TENTO1::MARSHALLJ	Glad that the devil is red ......	`Wed Apr 21 1993 08:22`	8
	* see also note 2473 * Sunil, Hi. The above note may describe the self same problem. John
2597.2	Similar but not identical.	IOSG::STANDAGE		`Wed Apr 21 1993 09:30`	33
	Sunil & John, Your problems are slightly different in that Sunil's server is producing the message: 19-APR-1993 13:38:00.16 Server: CNB06V::"73=" Message: Drawer flushed from cache for IOCHANNELCNT This means that the server has hit 90% of used channels and so it's attempted to close down some drawers and files to free up some channels. Whenever this happens a message will be logged as above. In Johns case the CHANELCNT being low was logged, but no freeing up messages appeared. Although you say the customer has problems with the FCS, you haven't mentioned exactly what the users are experiencing. Is the server behaving normally but the customer wishes to understand more fully what all the errors in the logfile mean ? V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I would be interested to know if these types of errors have only started appearing since V3.0-1 was installed. I'd also be interested in knowing what proportion of users are running ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations underwhich some of your errors might be occuring. Cheers, Kevin.
2597.3	More information	TINNIE::SETHI	Ah (-: an upside down smile from Oz	`Thu Apr 22 1993 05:50`	74
	Hi Kevin, >This means that the server has hit 90% of used channels and so it's >attempted to close down some drawers and files to free up some >channels. Whenever this happens a message will be logged as above. In >Johns case the CHANELCNT being low was logged, but no freeing up >messages appeared. I understood this to be the case and the customer has confirmed that the users on the node could not access their shared file cabinets. He had to stop and restart the server to solve the problem. The server seemed to have gone "crazy" for want of a better word, to close down the the drawers. >Although you say the customer has problems with the FCS, you haven't >mentioned exactly what the users are experiencing. Is the server >behaving normally but the customer wishes to understand more fully what >all the errors in the logfile mean ? What the customer wants to know is what is causing this problem ? The server is not behaving "normally" because of the above mentioned. >V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I >would be interested to know if these types of errors have only started >appearing since V3.0-1 was installed. They went to 3.0-1 and never had 3.0 installed. >I'd also be interested in knowing what proportion of users are running >ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations >underwhich some of your errors might be occuring. Well they are three nodes in this cluster and the following are the details you have requested, I must drawer you attention to the fact that they had this problem prior to the installation of TeamLinks and had to inscrease the CHANNELCNT to 1220: Problem node CNB06V They typically have between 200-350 user on this node. A more TeamLinks users on this node they aren't using cluster aliasing, because they only have TeamLinks on two nodes in the cluster. They have noticed that the number of page faults for the FCS is 228,000 today They have 29 users accessing the MUAS$SERVER process They have 31 accessing the FCS On node CNB08V: They typically have between 200-300 user on this node. They have fewerTeamLinks users on this node. They have noticed that the number of page faults for the FCS is 89,000 today They have 19 users accessing the MUAS$SERVER process They have 11 accessing the FCS The above is a typical load on the systems. The customer has been auditing the FCS via the SAI option and finds that the CHANNELCNT is 1220 (as expected) and the Channelnum is around 705, for the above load on the problem node. What it all boils down to is, why does the server get into such a state where the customer is forced to shutdown the server and restart it. What should the CHANNELCNT be set to ? Final reminder that this problem occured before TeamLinks was installed and the went straight to 3.0-1. One more thing they have DEC MAILworks version 1.2 of the server installed (field test) to solve a serious problem, again the problem occured before this. Regards, Sunil
2597.4		IOSG::STANDAGE		`Thu Apr 22 1993 10:31`	42
	Sunil, >>The customer has been auditing the FCS via the SAI option and finds >>that the CHANNELCNT is 1220 (as expected) and the Channelnum is >>around 705, for the above load on the problem node. Are these the values currently at the moment, or during the times when the customer gets the FCS problems ? >>What it all boils down to is, why does the server get into such a >>state where the customer is forced to shutdown the server and restart it. Firstly, the customer should never be forced to shutdown and restart the server to resolve such problems. What I think needs to be done is for the server to be 'fine tuned' somewhat - to ensure that the servers resources are set correctly for the environment it is operating in. >>What should the CHANNELCNT be set to ? Every client connection uses two I/O channels, and every open drawer uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then this should be regarded as two clients as obviously they may well run both concurrently. Also, remember that CHANNELCNT is used for a variety of other products, and so this value should not be set for the FCS alone. I suggest your customer refers to section 15.2.5 "Tuning the File Cabinet Server" in the ALL-IN-1 Management Guide for more details. If a problem does exist where CHANNELCNT is slowly being used up and not released by the FCS, then this is a new problem not reported by anyone. Kevin.
2597.5	Possible answer	CHRLIE::HUSTON		`Thu Apr 22 1993 15:08`	34
	re .4 >Firstly, the customer should never be forced to shutdown and restart >the server to resolve such problems. What I think needs to be done is >for the server to be 'fine tuned' somewhat - to ensure that the servers >resources are set correctly for the environment it is operating in. Yup, the server is designed to be up all the time (7*24 service), any thing that requires a shutdown to fix is technically a bug. >>>What should the CHANNELCNT be set to ? > >Every client connection uses two I/O channels, and every open drawer >uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then >this should be regarded as two clients as obviously they may well run >both concurrently. Also, remember that CHANNELCNT is used for a variety >of other products, and so this value should not be set for the FCS >alone. True about the channel count being shared by others on the system, in fact this may be the root of the problem. The way the FCS works is: During startup, read the SYSGEN parameter to get the number of channels available on the system Any time a channel is needed, do the 90% check against the system value. In other words, the FCS ignores other processes requests for channels. I don't know alot about this area of VMS, but could this be causing problems? --Bob
2597.6		SIOG::T_REDMOND	Thoughts of an Idle Mind	`Thu Apr 22 1993 18:32`	7
	If the FCS is paging so heavily then maybe the drawer cache needs to be incrased significantly. If it's left at anything near the default values (way too low for any reasonable sized system) then the garbage collector thread is going to be very busy just continually attempting to manage the drawer cache... T
2597.7	Cust. has already fine tuned the FCS	TINNIE::SETHI	Ah (-: an upside down smile from Oz	`Fri Apr 23 1993 05:12`	39
	Hi All, Thanks for all your suggestions, what worried me was not knowing if the server could indeed allocate channels and than not deallocate them. Hence I asked the customer not to proceed further until I had checked. Basically the customer has been tuning the server as recommended in the documentation (Management Guide page 15-17 onwards). The customer has set the following: Values calulated based upon 400 user Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500. I must add that the Drawer Cache value was set to 50 BUT somehow got adjusted to 30 the customer assures me. How could this have happened ? They typically find that memory usage is between 90 to 98% on a 512 maga byte system, they cannot install any more as they have reached the maximum for the system. The customer also has DPS running (Digital Systems Performance analyzer), this has shown consistently that OA$FCV needs to have it's Working Set extent increased. Can someone tell me what is the function of OA$FCV, is it some kind of locking machanism of some kind ? The customer will carry out another audit and fine tune the server as per the Management Guide. I will keep you posted of the developments. By the way the values I had given in .3 were during the normal functioning of the server. I just wanted to give you a feel for the systems involved, when the problem occurs I will give the same parameters for comparision, the figures maybe useful to someone as a random sample of 1 :-). Thanks for all your help so far. Sunil
2597.8		IOSG::STANDAGE		`Fri Apr 23 1993 09:42`	64
	Sunil, >>Thanks for all your suggestions, what worried me was not knowing if >>the server could indeed allocate channels and than not deallocate them. >>Hence I asked the customer not to proceed further until I had checked. So far there have been no reported problems with the FCS and CHANNELCNT in the way you describe. A few people have had to fine tune the server to suit their environment, but I haven't heard specifically of channels not being deallocated. >>Basically the customer has been tuning the server as recommended in >the documentation (Management Guide page 15-17 onwards). The customer >>has set the following: >> Values calulated based upon 400 user >> Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500. These values look good to me for a 400 user environment, certainly they are as documented in the guide! >>I must add that the Drawer Cache value was set to 50 BUT somehow >>got adjusted to 30 the customer assures me. How could this have >>happened ? The default value for the Drawer Cache when the server is created is 10. The only way this can really be modified is by editing the server attributes and changing the value. Remember that if any of the server attributes are modified then the server has to be stopped and restarted to pick up the new values. >>The customer also has DPS running (Digital Systems Performance >>analyzer), this has shown consistently that OA$FCV needs to have >>it's Working Set extent increased. Can someone tell me what is the >>function of OA$FCV, is it some kind of locking machanism of some kind ? The OA$FCV is started as a detached process when you run A1V30START.COM. It's the mechanism by which unique filenames are generated for filecabinet entries upon a users request. >>The customer will carry out another audit and fine tune the server >>as per the Management Guide. I will keep you posted of the >>developments. Well let us know how things go. In some environments the tuning of the servers might take a while to get right as it's dependant on some many other variables which themselves fluctuate. Good luck, Kevin.
2597.9	Channels almost have to be deallocated	CHRLIE::HUSTON		`Fri Apr 23 1993 13:39`	24
	re .7 >Thanks for all your suggestions, what worried me was not knowing if the >server could indeed allocate channels and than not deallocate them. >Hence I asked the customer not to proceed further until I had checked. I have been thinking about the channel problems. I honestly cannot figure out how it would not release them. For the most part channels are used for drawer opens. Even if an acc vio wipes out the thread that opened the drawer, it has no effect on the drawer itself. Sooner or later the drawer closing thread will run and shut down unused drawers, thus freeing up channels. The only exception to this, and this would be an unmissable bug, is if the drawer closing thread, for some reason was not there. An easy way to check this is to shut the server down via the SM interface. THis will request all background threads, including the drawer closer, to nicely commit suicide, a side effect of this, is that each thread will write a "Someone just requested my death" message to the log file (sorry, I forget the exact wording of the message). There should be several of these for each server shutdown. --Bob
2597.10	An improvment	TINNIE::SETHI	Ah (-: an upside down smile from Oz	`Tue Apr 27 1993 01:06`	16
	Hi Kevin and Bob, The latest news is that the customer increased the Drawer Cache to 50 (was set at 30) and has not had any problems. I will be monitoring the system for the rest of the week and I have reassured the customer that there is no known problem with the allocation of channels. One more question in the DPA report the image OAFC$SERVER has been mentioned in that there is excessive page faulting. The report mentions that WSMAX should be increased as more memory is required, however they can not increase memory. Is there anything else they can do ? Load balancing etc. has been done. Regards, Sunil
2597.11	Caching drawers = more memory for the server	SIOG::T_REDMOND	Thoughts of an Idle Mind	`Tue Apr 27 1993 12:04`	7
	Increasing the size of the drawer cache should reduce paging because the background threads won't have so much work to do to manage the cache (flush unused drawers and the like). Increasing the drawer cache should also be matched by increasing the memory allocated to the detached process when it is invoked. Has that been done? Tony
2597.12		TINNIE::SETHI	Ah (-: an upside down smile from Oz	`Thu Apr 29 1993 01:21`	12
	Hi Tony, Thanks for your above suggestion the problem we have is that they are running at between 80-98% memory usage on their 7000 machines. Therefore they cannot allocate more memory to the server process without impacting performance elsewhere. What has also happened is that we quoted that 500 ALL-IN-1 users per processor could be supported, they can only support 380. Another one of those hot potatoes for the accounts team to handle. Sunil
2597.13	Black art time again	SIOG::T_REDMOND	Thoughts of an Idle Mind	`Fri Apr 30 1993 15:41`	16
	Well, calculating the supported user population for an ALL-IN-1 system is a bit of a black art. The basic figures achieved in a RTE/SUT environment (and published afterwards) need to be adjusted to take account of all the things the test environment omits, like network activity, programmers compiling bits and pieces, third party software running in the subprocess, and so on. In my experience, the adjustment (down) runs from 20% upwards. So moving from 500 (estimated) to 380 (actual) isn't too surprising. If you don't allocate additional memory to the FCS it will take it anyway, but extra pain will be caused to VMS as the FCS pages unhappily. You can do it either way, but setting the cache sizes correctly will probably ease the system demands because the background threads won't have so much work to do. As you like it, Tony