T.R | Title | User | Personal Name | Date | Lines |
---|
2597.1 | More of the same ?? | TENTO1::MARSHALLJ | Glad that the devil is red ...... | Wed Apr 21 1993 09:22 | 8 |
|
*** see also note 2473 ***
Sunil,
Hi. The above note may describe the self same problem.
John
|
2597.2 | Similar but not identical. | IOSG::STANDAGE | | Wed Apr 21 1993 10:30 | 33 |
|
Sunil & John,
Your problems are slightly different in that Sunil's server is
producing the message:
19-APR-1993 13:38:00.16 Server: CNB06V::"73="
Message: Drawer flushed from cache for IOCHANNELCNT
This means that the server has hit 90% of used channels and so it's
attempted to close down some drawers and files to free up some
channels. Whenever this happens a message will be logged as above. In
Johns case the CHANELCNT being low was logged, but no freeing up
messages appeared.
Although you say the customer has problems with the FCS, you haven't
mentioned exactly what the users are experiencing. Is the server
behaving normally but the customer wishes to understand more fully what
all the errors in the logfile mean ?
V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I
would be interested to know if these types of errors have only started
appearing since V3.0-1 was installed.
I'd also be interested in knowing what proportion of users are running
ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations
underwhich some of your errors might be occuring.
Cheers,
Kevin.
|
2597.3 | More information | TINNIE::SETHI | Ah (-: an upside down smile from Oz | Thu Apr 22 1993 06:50 | 74 |
| Hi Kevin,
>This means that the server has hit 90% of used channels and so it's
>attempted to close down some drawers and files to free up some
>channels. Whenever this happens a message will be logged as above. In
>Johns case the CHANELCNT being low was logged, but no freeing up
>messages appeared.
I understood this to be the case and the customer has confirmed that
the users on the node could not access their shared file cabinets. He
had to stop and restart the server to solve the problem. The server
seemed to have gone "crazy" for want of a better word, to close down
the the drawers.
>Although you say the customer has problems with the FCS, you haven't
>mentioned exactly what the users are experiencing. Is the server
>behaving normally but the customer wishes to understand more fully what
>all the errors in the logfile mean ?
What the customer wants to know is what is causing this problem ? The
server is not behaving "normally" because of the above mentioned.
>V3.0-1 does not fix any problems around CHANNELCNT usage etc, but I
>would be interested to know if these types of errors have only started
>appearing since V3.0-1 was installed.
They went to 3.0-1 and never had 3.0 installed.
>I'd also be interested in knowing what proportion of users are running
>ALL-IN-1 compared to TeamLinks, to perhaps narrow down the situations
>underwhich some of your errors might be occuring.
Well they are three nodes in this cluster and the following are the
details you have requested, I must drawer you attention to the fact
that they had this problem prior to the installation of TeamLinks and
had to inscrease the CHANNELCNT to 1220:
Problem node CNB06V
*They typically have between 200-350 user on this node.
*A more TeamLinks users on this node they aren't using cluster
aliasing, because they only have TeamLinks on two nodes in the
cluster.
*They have noticed that the number of page faults for the FCS is
228,000 today
*They have 29 users accessing the MUAS$SERVER process
*They have 31 accessing the FCS
On node CNB08V:
*They typically have between 200-300 user on this node.
*They have fewerTeamLinks users on this node.
*They have noticed that the number of page faults for the FCS is
89,000 today
*They have 19 users accessing the MUAS$SERVER process
*They have 11 accessing the FCS
The above is a typical load on the systems.
The customer has been auditing the FCS via the SAI option and finds
that the CHANNELCNT is 1220 (as expected) and the Channelnum is around
705, for the above load on the problem node.
What it all boils down to is, why does the server get into such a state
where the customer is forced to shutdown the server and restart it.
What should the CHANNELCNT be set to ? Final reminder that this
problem occured before TeamLinks was installed and the went straight to
3.0-1. One more thing they have DEC MAILworks version 1.2 of the
server installed (field test) to solve a serious problem, again the
problem occured before this.
Regards,
Sunil
|
2597.4 | | IOSG::STANDAGE | | Thu Apr 22 1993 11:31 | 42 |
|
Sunil,
>>The customer has been auditing the FCS via the SAI option and finds
>>that the CHANNELCNT is 1220 (as expected) and the Channelnum is
>>around 705, for the above load on the problem node.
Are these the values currently at the moment, or during the times when
the customer gets the FCS problems ?
>>What it all boils down to is, why does the server get into such a
>>state where the customer is forced to shutdown the server and restart it.
Firstly, the customer should never be forced to shutdown and restart
the server to resolve such problems. What I think needs to be done is
for the server to be 'fine tuned' somewhat - to ensure that the servers
resources are set correctly for the environment it is operating in.
>>What should the CHANNELCNT be set to ?
Every client connection uses two I/O channels, and every open drawer
uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then
this should be regarded as two clients as obviously they may well run
both concurrently. Also, remember that CHANNELCNT is used for a variety
of other products, and so this value should not be set for the FCS
alone.
I suggest your customer refers to section 15.2.5 "Tuning the File
Cabinet Server" in the ALL-IN-1 Management Guide for more details.
If a problem does exist where CHANNELCNT is slowly being used up and
not released by the FCS, then this is a new problem not reported by
anyone.
Kevin.
|
2597.5 | Possible answer | CHRLIE::HUSTON | | Thu Apr 22 1993 16:08 | 34 |
|
re .4
>Firstly, the customer should never be forced to shutdown and restart
>the server to resolve such problems. What I think needs to be done is
>for the server to be 'fine tuned' somewhat - to ensure that the servers
>resources are set correctly for the environment it is operating in.
Yup, the server is designed to be up all the time (7*24 service), any
thing that requires a shutdown to fix is technically a bug.
>>>What should the CHANNELCNT be set to ?
>
>Every client connection uses two I/O channels, and every open drawer
>uses three I/O channels. If a user runs TeamLinks AND ALL-IN-1, then
>this should be regarded as two clients as obviously they may well run
>both concurrently. Also, remember that CHANNELCNT is used for a variety
>of other products, and so this value should not be set for the FCS
>alone.
True about the channel count being shared by others on the system,
in fact this may be the root of the problem. The way the FCS works is:
During startup, read the SYSGEN parameter to get the number of
channels available on the system
Any time a channel is needed, do the 90% check against the system
value.
In other words, the FCS ignores other processes requests for channels.
I don't know alot about this area of VMS, but could this be causing
problems?
--Bob
|
2597.6 | | SIOG::T_REDMOND | Thoughts of an Idle Mind | Thu Apr 22 1993 19:32 | 7 |
| If the FCS is paging so heavily then maybe the drawer cache needs to be
incrased significantly. If it's left at anything near the default
values (way too low for any reasonable sized system) then the garbage
collector thread is going to be very busy just continually attempting
to manage the drawer cache...
T
|
2597.7 | Cust. has already fine tuned the FCS | TINNIE::SETHI | Ah (-: an upside down smile from Oz | Fri Apr 23 1993 06:12 | 39 |
| Hi All,
Thanks for all your suggestions, what worried me was not knowing if the
server could indeed allocate channels and than not deallocate them.
Hence I asked the customer not to proceed further until I had checked.
Basically the customer has been tuning the server as recommended in the
documentation (Management Guide page 15-17 onwards). The customer has
set the following:
Values calulated based upon 400 user
Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500.
I must add that the Drawer Cache value was set to 50 BUT somehow got
adjusted to 30 the customer assures me. How could this have happened ?
They typically find that memory usage is between 90 to 98% on a 512
maga byte system, they cannot install any more as they have reached the
maximum for the system.
The customer also has DPS running (Digital Systems Performance
analyzer), this has shown consistently that OA$FCV needs to have it's
Working Set extent increased. Can someone tell me what is the function
of OA$FCV, is it some kind of locking machanism of some kind ?
The customer will carry out another audit and fine tune the server as
per the Management Guide. I will keep you posted of the developments.
By the way the values I had given in .3 were during the normal
functioning of the server. I just wanted to give you a feel for the
systems involved, when the problem occurs I will give the same
parameters for comparision, the figures maybe useful to someone as a
random sample of 1 :-).
Thanks for all your help so far.
Sunil
|
2597.8 | | IOSG::STANDAGE | | Fri Apr 23 1993 10:42 | 64 |
|
Sunil,
>>Thanks for all your suggestions, what worried me was not knowing if
>>the server could indeed allocate channels and than not deallocate them.
>>Hence I asked the customer not to proceed further until I had checked.
So far there have been no reported problems with the FCS and CHANNELCNT
in the way you describe. A few people have had to fine tune the server
to suit their environment, but I haven't heard specifically of channels
not being deallocated.
>>Basically the customer has been tuning the server as recommended in
>the documentation (Management Guide page 15-17 onwards). The customer
>>has set the following:
>> Values calulated based upon 400 user
>> Drawer Cache = 50 ,Max drawers = 140, Drawer timeout 500.
These values look good to me for a 400 user environment, certainly they
are as documented in the guide!
>>I must add that the Drawer Cache value was set to 50 BUT somehow
>>got adjusted to 30 the customer assures me. How could this have
>>happened ?
The default value for the Drawer Cache when the server is created
is 10. The only way this can really be modified is by editing the
server attributes and changing the value. Remember that if any of the
server attributes are modified then the server has to be stopped and
restarted to pick up the new values.
>>The customer also has DPS running (Digital Systems Performance
>>analyzer), this has shown consistently that OA$FCV needs to have
>>it's Working Set extent increased. Can someone tell me what is the
>>function of OA$FCV, is it some kind of locking machanism of some kind ?
The OA$FCV is started as a detached process when you run
A1V30START.COM. It's the mechanism by which unique filenames are
generated for filecabinet entries upon a users request.
>>The customer will carry out another audit and fine tune the server
>>as per the Management Guide. I will keep you posted of the
>>developments.
Well let us know how things go. In some environments the tuning of the
servers might take a while to get right as it's dependant on some many
other variables which themselves fluctuate.
Good luck,
Kevin.
|
2597.9 | Channels almost have to be deallocated | CHRLIE::HUSTON | | Fri Apr 23 1993 14:39 | 24 |
|
re .7
>Thanks for all your suggestions, what worried me was not knowing if the
>server could indeed allocate channels and than not deallocate them.
>Hence I asked the customer not to proceed further until I had checked.
I have been thinking about the channel problems. I honestly cannot
figure out how it would not release them. For the most part channels
are used for drawer opens. Even if an acc vio wipes out the thread
that opened the drawer, it has no effect on the drawer itself. Sooner
or later the drawer closing thread will run and shut down unused
drawers, thus freeing up channels. The only exception to this, and
this would be an unmissable bug, is if the drawer closing thread, for
some reason was not there. An easy way to check this is to shut
the server down via the SM interface. THis will request all background
threads, including the drawer closer, to nicely commit suicide, a side
effect of this, is that each thread will write a "Someone just
requested my death" message to the log file (sorry, I forget the
exact wording of the message). There should be several of these for
each server shutdown.
--Bob
|
2597.10 | An improvment | TINNIE::SETHI | Ah (-: an upside down smile from Oz | Tue Apr 27 1993 02:06 | 16 |
| Hi Kevin and Bob,
The latest news is that the customer increased the Drawer Cache to 50
(was set at 30) and has not had any problems. I will be monitoring the
system for the rest of the week and I have reassured the customer that
there is no known problem with the allocation of channels.
One more question in the DPA report the image OAFC$SERVER has been
mentioned in that there is excessive page faulting. The report
mentions that WSMAX should be increased as more memory is required,
however they can not increase memory. Is there anything else they can
do ? Load balancing etc. has been done.
Regards,
Sunil
|
2597.11 | Caching drawers = more memory for the server | SIOG::T_REDMOND | Thoughts of an Idle Mind | Tue Apr 27 1993 13:04 | 7 |
| Increasing the size of the drawer cache should reduce paging because
the background threads won't have so much work to do to manage the
cache (flush unused drawers and the like). Increasing the drawer cache
should also be matched by increasing the memory allocated to the
detached process when it is invoked. Has that been done?
Tony
|
2597.12 | | TINNIE::SETHI | Ah (-: an upside down smile from Oz | Thu Apr 29 1993 02:21 | 12 |
| Hi Tony,
Thanks for your above suggestion the problem we have is that they are
running at between 80-98% memory usage on their 7000 machines.
Therefore they cannot allocate more memory to the server process
without impacting performance elsewhere.
What has also happened is that we quoted that 500 ALL-IN-1 users per
processor could be supported, they can only support 380. Another one
of those hot potatoes for the accounts team to handle.
Sunil
|
2597.13 | Black art time again | SIOG::T_REDMOND | Thoughts of an Idle Mind | Fri Apr 30 1993 16:41 | 16 |
| Well, calculating the supported user population for an ALL-IN-1 system
is a bit of a black art. The basic figures achieved in a RTE/SUT
environment (and published afterwards) need to be adjusted to take
account of all the things the test environment omits, like network
activity, programmers compiling bits and pieces, third party software
running in the subprocess, and so on. In my experience, the adjustment
(down) runs from 20% upwards. So moving from 500 (estimated) to 380
(actual) isn't too surprising.
If you don't allocate additional memory to the FCS it will take it
anyway, but extra pain will be caused to VMS as the FCS pages
unhappily. You can do it either way, but setting the cache sizes
correctly will probably ease the system demands because the background
threads won't have so much work to do.
As you like it, Tony
|