[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1_v30

Title:*OLD* ALL-IN-1 (tm) Support Conference
Notice:Closed - See Note 4331.l to move to IOSG::ALL-IN-1
Moderator:IOSG::PYE
Created:Thu Jan 30 1992
Last Modified:Tue Jan 23 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:4343
Total number of notes:18308

2597.0. "FCS - INSCHANNELCNT, Access violation, Internal error" by TINNIE::SETHI (Ah (-: an upside down smile from Oz) Wed Apr 21 1993 07:06

    Hi All,

    A customer has ALL-IN-1 IOS 3.0-1 installed and is experiencing
    problems with the FCS, they are also using TeamLinks 1.1.  The logfile
    contains several error messages and in particular: 

     1-APR-1993 15:48:30.83  Server: CNB06V::"73="
    Error: %OAFC-E-INTERR, Internal error in File Cabinet Server
    Message: FCS has access violated, please submit an SPR.

     2-APR-1993 08:45:12.25  Server: CNB06V::"73="
    Error: %RMS-E-RNF, record not found  Message: CsiOpenRmsFileCab;
    error getting record: LANIGAN MICHELE

     2-APR-1993 08:45:30.30  Server: CNB06V::"73="  Error:
    %RMS-E-RNF, record not found  Message: CsiOpenRmsFileCab; error getting
    record: LANIGAN MICHELE

     2-APR-1993 10:45:29.14  Server: CNB06V::"73="  Error:
    %MCC-E-IN_USE_ERROR, in use error  Message: CsiCacheFlushDrawerAccess;
    Error from mcc_mutex_try_lock

    .....

    19-APR-1993 08:27:47.26  Server: CNB08V::"73="
    Error: %OAFC-E-INTERR, Internal error in File Cabinet Server  Message:
    CsiOpenDrawer; DOCDB file/dev id does not match IUID in partition,
    continuing

    19-APR-1993 09:49:21.33  Server: CNB08V::"73="  Error:
    %MCC-E-REQARG, required argument is missing

    19-APR-1993 09:49:41.87  Server: CNB08V::"73="  Error:
    %MCC-E-EXISTENCE_ERROR, object does not exist

    19-APR-1993 09:50:03.28  Server: CNB08V::"73="  Error:
    %SYSTEM-E-ACCVIO, access violation, reason mask=!XB, virtualaddress=!XL,
    PC=!XL, PSL=!XL

    19-APR-1993 09:50:34.33  Server: CNB08V::"73="  Error:%MCC-E-ALERT_TERMREQ,
    thread termination requested  Message: CsiOpenRmsFC; error opening
    filecab: DISK$USER21:[PDD1.DOHERM.A1]

    19-APR-1993 09:50:42.06  Server: CNB08V::"73="  Error:
    %MCC-E-ALERT_TERMREQ, thread termination requested  Message: CsiOpenRmsFC;
    error opening filecab: DISK$DCSH_USR11:[AGHSACT.BAILYB.A1]

    19-APR-1993 09:50:43.65  Server: CNB08V::"73="  Error:
    %MCC-E-ALERT_TERMREQ,thread termination requested  Message: CsiOpenRmsFC; 
    error opening filecab:DISK$USER21:[PDD1.THOMALE.A1]
                          
    ......

    19-APR-1993 13:37:56.49  Server: CNB06V::"73="
    Message: CHANNELCNT low - flushing drawer from cache

    19-APR-1993 13:38:00.16  Server: CNB06V::"73="
    Message: Drawer flushed from cache for IOCHANNELCNT

    I have asked the the customer to look for an error message of the kind
    "%DSL-E-INVSPEC", trun on tracing and when the problem occurs to get
    the various values from SM MFC MS SAI (Server Audit Information), in
    particular the CHANNELCNT and num.  To me it looks like a process is
    chewing up the I/O CHANNELS or not releasing them for some reason.

    I saw the Stars article "Error: %OAFC-E-INSCHANNELCNT, Insufficient
    CHANNELCNT", which suggested if a user does a control-y while editing a
    remote document than reenters ALL-IN-1 to edit the same document you
    could get the above problem.  Is this the case in 3.0-1 still ?  I
    don't think it's the same problem but something quite similar.

    Logfiles (customer edited the 3000 block file) can be found on
    RIPPER::USER$TSC:[SETHI]DHHCS.LOG1 and DHHCS.LOG2.
    
    Thanks for your help in advance,

    Sunil  
    
T.RTitleUserPersonal
Name
DateLines
2597.1More of the same ??TENTO1::MARSHALLJGlad that the devil is red ......Wed Apr 21 1993 09:228
    
    		***  see also note 2473 ***
    
    Sunil,
    
    Hi.  The above note may describe the self same problem.
    
    John
2597.2Similar but not identical.IOSG::STANDAGEWed Apr 21 1993 10:3033
    
    Sunil & John,
    
    Your problems are slightly different in that Sunil's server is
    producing the message:
    
        19-APR-1993 13:38:00.16  Server: CNB06V::"73="
        Message: Drawer flushed from cache for IOCHANNELCNT
    
    This means that the server has hit 90% of used channels and so it's
    attempted to close down some drawers and files to free up some
    channels. Whenever this happens a message will be logged as above. In
    Johns case the CHANELCNT being low was logged, but no freeing up
    messages appeared. 
    
    Although you say the customer has problems with the FCS, you haven't
    mentioned exactly what the users are experiencing. Is the server
    behaving normally but the customer wishes to understand more fully what
    all the errors in the logfile mean ?
    
    V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I
    would be interested to know if these types of errors have only started
    appearing since V3.0-1 was installed.
    
    I'd also be interested in knowing what proportion of users are running
    ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations
    underwhich some of your errors might be occuring.
    
    
    Cheers,
    Kevin.
    
    
2597.3More informationTINNIE::SETHIAh (-: an upside down smile from OzThu Apr 22 1993 06:5074
    Hi Kevin,
    
    >This means that the server has hit 90% of used channels and so it's
    >attempted to close down some drawers and files to free up some
    >channels. Whenever this happens a message will be logged as above. In
    >Johns case the CHANELCNT being low was logged, but no freeing up
    >messages appeared. 
    
    I understood this to be the case and the customer has confirmed that
    the users on the node could not access their shared file cabinets.  He
    had to stop and restart the server to solve the problem.  The server
    seemed to have gone "crazy" for want of a better word, to close down
    the the drawers.
     
    >Although you say the customer has problems with the FCS, you haven't
    >mentioned exactly what the users are experiencing. Is the server
    >behaving normally but the customer wishes to understand more fully what
    >all the errors in the logfile mean ?
    
    What the customer wants to know is what is causing this problem ?  The
    server is not behaving "normally" because of the above mentioned.
    
    >V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I
    >would be interested to know if these types of errors have only started
    >appearing since V3.0-1 was installed.
    
    They went to 3.0-1 and never had 3.0 installed.
    
    >I'd also be interested in knowing what proportion of users are running
    >ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations
    >underwhich some of your errors might be occuring.
    
    Well they are three nodes in this cluster and the following are the
    details you have requested, I must drawer you attention to the fact
    that they had this problem prior to the installation of TeamLinks and
    had to inscrease the CHANNELCNT to 1220:
    
    Problem node CNB06V
    
    *They typically have between 200-350 user on this node.
    *A more TeamLinks users on this node they aren't using cluster
     aliasing, because they only have TeamLinks on two nodes in the
     cluster.
    *They have noticed that the number of page faults for the FCS is
     228,000 today
    *They have 29 users accessing the MUAS$SERVER process
    *They have 31 accessing the FCS
    
    On node CNB08V:
    
    *They typically have between 200-300 user on this node.
    *They have fewerTeamLinks users on this node.
    *They have noticed that the number of page faults for the FCS is
     89,000 today
    *They have 19 users accessing the MUAS$SERVER process
    *They have 11 accessing the FCS
    
    The above is a typical load on the systems.
    
    The customer has been auditing the FCS via the SAI option and finds
    that the CHANNELCNT is 1220 (as expected) and the Channelnum is around
    705, for the above load on the problem node.
    
    What it all boils down to is, why does the server get into such a state
    where the customer is forced to shutdown the server and restart it. 
    What should the CHANNELCNT be set to ?  Final reminder that this
    problem occured before TeamLinks was installed and the went straight to
    3.0-1.  One more thing they have DEC MAILworks version 1.2 of the
    server installed (field test) to solve a serious problem, again the
    problem occured before this.
    
    Regards,
    
    Sunil
2597.4IOSG::STANDAGEThu Apr 22 1993 11:3142
    
    
    Sunil,
    
    >>The customer has been auditing the FCS via the SAI option and finds
    >>that the CHANNELCNT is 1220 (as expected) and the Channelnum is
    >>around 705, for the above load on the problem node.
    
    Are these the values currently at the moment, or during the times when
    the customer gets the FCS problems ?
    
    
    >>What it all boils down to is, why does the server get into such a
    >>state where the customer is forced to shutdown the server and restart it.
    
    Firstly, the customer should never be forced to shutdown and restart
    the server to resolve such problems. What I think needs to be done is
    for the server to be 'fine tuned' somewhat - to ensure that the servers
    resources are set correctly for the environment it is operating in.
    
    >>What should the CHANNELCNT be set to ?
    
    Every client connection uses two I/O channels, and every open drawer
    uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then
    this should be regarded as two clients as obviously they may well run
    both concurrently. Also, remember that CHANNELCNT is used for a variety
    of other products, and so this value should not be set for the FCS
    alone. 
    
    I suggest your customer refers to section 15.2.5 "Tuning the File
    Cabinet Server" in the ALL-IN-1 Management Guide for more details.
    
    
    If a problem does exist where CHANNELCNT is slowly being used up and
    not released by the FCS, then this is a new problem not reported by
    anyone.
    
    
    Kevin.
    
    
    
2597.5Possible answerCHRLIE::HUSTONThu Apr 22 1993 16:0834
    
    re .4
    
    >Firstly, the customer should never be forced to shutdown and restart
    >the server to resolve such problems. What I think needs to be done is
    >for the server to be 'fine tuned' somewhat - to ensure that the servers
    >resources are set correctly for the environment it is operating in.
    
    Yup, the server is designed to be up all the time (7*24 service), any
    thing that requires a shutdown to fix is technically a bug.
    
    >>>What should the CHANNELCNT be set to ?
    >
    >Every client connection uses two I/O channels, and every open drawer
    >uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then
    >this should be regarded as two clients as obviously they may well run
    >both concurrently. Also, remember that CHANNELCNT is used for a variety
    >of other products, and so this value should not be set for the FCS
    >alone. 
    
    True about the channel count being shared by others on the system, 
    in fact this may be the root of the problem. The way the FCS works is:
    
    During startup, read the SYSGEN parameter to get the number of
    	channels available on the system
    Any time a channel is needed, do the 90% check against the system
    	value.
    
    In other words, the FCS ignores other processes requests for channels.
    I don't know alot about this area of VMS, but could this be causing 
    problems?
    
    --Bob
    
2597.6SIOG::T_REDMONDThoughts of an Idle MindThu Apr 22 1993 19:327
    If the FCS is paging so heavily then maybe the drawer cache needs to be
    incrased significantly.  If it's left at anything near the default
    values (way too low for any reasonable sized system) then the garbage
    collector thread is going to be very busy just continually attempting
    to manage the drawer cache...
    
    T
2597.7Cust. has already fine tuned the FCSTINNIE::SETHIAh (-: an upside down smile from OzFri Apr 23 1993 06:1239
    Hi All,                
    
    Thanks for all your suggestions, what worried me was not knowing if the
    server could indeed allocate channels and than not deallocate them. 
    Hence I asked the customer not to proceed further until I had checked.
    
    Basically the customer has been tuning the server as recommended in the
    documentation (Management Guide page 15-17 onwards).  The customer has
    set the following:
    
    Values calulated based upon 400 user
    
    Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500.
    
    I must add that the Drawer Cache value was set to 50 BUT somehow got
    adjusted to 30 the customer assures me.  How could this have happened ?
    
    They typically find that memory usage is between 90 to 98% on a 512
    maga byte system, they cannot install any more as they have reached the
    maximum for the system.
    
    The customer also has DPS running (Digital Systems Performance
    analyzer), this has shown consistently that OA$FCV needs to have it's
    Working Set extent increased.  Can someone tell me what is the function
    of OA$FCV, is it some kind of locking machanism of some kind ?
    
    The customer will carry out another audit and fine tune the server as
    per the Management Guide.  I will keep you posted of the developments.
    
    By the way the values I had given in .3 were during the normal
    functioning of the server.  I just wanted to give you a feel for the
    systems involved, when the problem occurs I will give the same
    parameters for comparision, the figures maybe useful to someone as a
    random sample of 1 :-).
    
    Thanks for all your help so far.
    
    Sunil
    
2597.8IOSG::STANDAGEFri Apr 23 1993 10:4264
    
    
    Sunil,
    
    >>Thanks for all your suggestions, what worried me was not knowing if
    >>the server could indeed allocate channels and than not deallocate them.
    >>Hence I asked the customer not to proceed further until I had checked.
    
    So far there have been no reported problems with the FCS and CHANNELCNT
    in the way you describe. A few people have had to fine tune the server
    to suit their environment, but I haven't heard specifically of channels
    not being deallocated.
    
    
    >>Basically the customer has been tuning the server as recommended in
    >the documentation (Management Guide page 15-17 onwards).  The customer
    >>has set the following:
    
    >>    Values calulated based upon 400 user
    
    >>    Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500.
    
    
    These values look good to me for a 400 user environment, certainly they
    are as documented in the guide!
    
    
    >>I must add that the Drawer Cache value was set to 50 BUT somehow
    >>got adjusted to 30 the customer assures me.  How could this have
    >>happened ?
      
    The default value for the Drawer Cache when the server is created  
    is 10. The only way this can really be modified is by editing the
    server attributes and changing the value. Remember that if any of the
    server attributes are modified then the server has to be stopped and
    restarted to pick up the new values. 
    
    
    >>The customer also has DPS running (Digital Systems Performance
    >>analyzer), this has shown consistently that OA$FCV needs to have
    >>it's Working Set extent increased.  Can someone tell me what is the
    >>function of OA$FCV, is it some kind of locking machanism of some kind ?
     
    The OA$FCV is started as a detached process when you run
    A1V30START.COM. It's the mechanism by which unique filenames are
    generated for filecabinet entries upon a users request. 
    
    
    >>The customer will carry out another audit and fine tune the server
    >>as per the Management Guide.  I will keep you posted of the
    >>developments.
     
    Well let us know how things go. In some environments the tuning of the
    servers might take a while to get right as it's dependant on some many
    other variables which themselves fluctuate.
    
    
    Good luck,
    
    Kevin.
    
    
    
    
2597.9Channels almost have to be deallocatedCHRLIE::HUSTONFri Apr 23 1993 14:3924
    
    re .7
    
    >Thanks for all your suggestions, what worried me was not knowing if the
    >server could indeed allocate channels and than not deallocate them. 
    >Hence I asked the customer not to proceed further until I had checked.
    
    I have been thinking about the channel problems. I honestly cannot
    figure out how it would not release them. For the most part channels
    are used for drawer opens. Even if an acc vio wipes out the thread
    that opened the drawer, it has no effect on the drawer itself. Sooner
    or later the drawer closing thread will run and shut down unused
    drawers, thus freeing up channels.  The only exception to this, and
    this would be an unmissable bug, is if the drawer closing thread, for
    some reason was not there. An easy way to check this is to shut
    the server down via the SM interface. THis will request all background
    threads, including the drawer closer, to nicely commit suicide, a side
    effect of this, is that each thread will write a "Someone just
    requested my death" message to the log file (sorry, I forget the
    exact wording of the message). There should be several of these for
    each server shutdown.
    
    --Bob
    
2597.10An improvmentTINNIE::SETHIAh (-: an upside down smile from OzTue Apr 27 1993 02:0616
    Hi Kevin and Bob,
    
    The latest news is that the customer increased the Drawer Cache to 50
    (was set at 30) and has not had any problems.  I will be monitoring the
    system for the rest of the week and I have reassured the customer that
    there is no known problem with the allocation of channels.
         
    One more question in the DPA report the image OAFC$SERVER has been
    mentioned in that there is excessive page faulting.  The report
    mentions that WSMAX should be increased as more memory is required,
    however they can not increase memory.  Is there anything else they can
    do ?  Load balancing etc. has been done.
    
    Regards,
    
    Sunil
2597.11Caching drawers = more memory for the serverSIOG::T_REDMONDThoughts of an Idle MindTue Apr 27 1993 13:047
    Increasing the size of the drawer cache should reduce paging because
    the background threads won't have so much work to do to manage the
    cache (flush unused drawers and the like).  Increasing the drawer cache
    should also be matched by increasing the memory allocated to the
    detached process when it is invoked.  Has that been done?  
    
    Tony
2597.12TINNIE::SETHIAh (-: an upside down smile from OzThu Apr 29 1993 02:2112
    Hi Tony,
    
    Thanks for your above suggestion the problem we have is that they are
    running at between 80-98% memory usage on their 7000 machines. 
    Therefore they cannot allocate more memory to the server process
    without impacting performance elsewhere.
    
    What has also happened is that we quoted that 500 ALL-IN-1 users per 
    processor could be supported, they can only support 380.  Another one
    of those hot potatoes for the accounts team to handle.
    
    Sunil
2597.13Black art time againSIOG::T_REDMONDThoughts of an Idle MindFri Apr 30 1993 16:4116
    Well, calculating the supported user population for an ALL-IN-1 system
    is a bit of a black art. The basic figures achieved in a RTE/SUT
    environment (and published afterwards) need to be adjusted to take
    account of all the things the test environment omits, like network
    activity, programmers compiling bits and pieces, third party software
    running in the subprocess, and so on.  In my experience, the adjustment
    (down) runs from 20% upwards.  So moving from 500 (estimated) to 380
    (actual) isn't too surprising.
    
    If you don't allocate additional memory to the FCS it will take it
    anyway, but extra pain will be caused to VMS as the FCS pages
    unhappily.  You can do it either way, but setting the cache sizes
    correctly will probably ease the system demands because the background
    threads won't have so much work to do.
    
    As you like it, Tony