T.R | Title | User | Personal Name | Date | Lines |
---|
2473.1 | What version are you running? | CHRLIE::HUSTON | | Thu Mar 25 1993 14:17 | 30 |
|
I just did it and it worked fine.
>1. A user performs an SMU and works merrily away. However when they have
> finished and try to SMU back to their own account, they get a message that
> "Drawer is already in use by another User". Investigation shows that the
> FCS process still has open the users own DOCDB, DAF and RESERVATIONS.DAT
> file of their MAIN drawer.
>
>2. The solution to this problem would seem to be to do a SM MFC MS MSC and from
> the Index select the users that are affected and disconnect them. This is
> where Catch 22 comes in. When this operation is attempted, an error message
> "Client Buffer not big enough for Requested Operation" and no Index is
> displayed. Consequently, the remaining alternative is to stop the FCS in its
> entirety which then affects everyone.
>
>Are these known problems ? Any workarounds ? Any fixes now or in a PFP/PFR ?
Killing the client connections will not close down the drawer files.
The FCS keeps drawers open for performance reasons. Are you by chance
running V2.4 of ALL-IN-1?
There is no workaround for the "client buffer not big enough..."
problem, it has to be fixed in the UI.
What happens, if while you are SMU'd to another user, you try to
go into ALL-IN-1 into your account, from another terminal?
--Bob
|
2473.2 | | FROIS1::HOFMANN | Stefan Hofmann, LC Frankfurt, ISE | Thu Mar 25 1993 14:32 | 4 |
| Bob,
John must be using V3, since V2.4 didn't provide a SMU option.
Stefan
|
2473.3 | | IOSG::MAURICE | Because of the architect the building fell down | Thu Mar 25 1993 18:07 | 27 |
| Hi,
Here's how I think the scenario is:
1. User does an SMU and so the current drawer is the Manager's drawer.
2. A cross-drawer operation is done which involves the user's MAIN
drawer - perhaps a message is refiled to it for example. The FCS
now has to access the user's MAIN drawer, and as a performance
optimisation attempts first to get an exclusive lock on the
drawer. As only the FCS is accesing the drawer this is successful.
3. The user now wishes to SMU back to the MAIN drawer. The ALL-IN-1
File Cabinet code attempts to get a lock on the drawer. In normal
working the FCS is triggered to release the exclusive lock and
downgrade to a read lock. Your symptom suggests that the FCS is
not reacting to the downgrade request. Note that no client/server
dialogue is required - it is the VMS lock manager which should
trigger the FCS into performing the downgrade.
Since this is an abnormal situation I recommend you look in the FCS log
files to see if any errors have been recorded there.
Cheers
Stuart
|
2473.4 | Intermittent problem - will post logs soon | JOCKEY::MARSHALLJ | Glad that the devil is red ...... | Tue Mar 30 1993 11:21 | 11 |
|
**** awaiting further info ****
Re .1,.3
Thanks for the ideas so far. The problem isn't reproducible at will so
I have asked the customer to copy the log files and also turn on FCS
tracing as soon as the next occurence is reported. I will post them
here.
John
|
2473.5 | More FCS Problems (moved from 2585.0) | TENTO1::MARSHALLJ | Glad that the devil is red ...... | Sat Apr 17 1993 16:49 | 205 |
| Hi,
Unfortunately these haven't gone away and below I include more detailed
problem statements plus the associatted FCS logs containing the
relevant error messages etc.
Any help would be greatfully appreciatted.
Is there anything else we can set to receive more debug/error type
information ?
Just out of curiosity, some of the errors listed are MCC-E-*******
Does MCC mean that hooks are in the FCS so that it can be
managed/monitored by DECmcc (Polcenter Framework) ? If so, any details
on what I need to do to enable this ?
Thanks in advance,
John
______________________________________________________________
We have again experienced problems with the A1 file Cab servers this week.
These problems have not all been the same but generally require the filecab
server in question being shutdown and restarted. Details are as follows:-
PROBLEM 1:-
User did a reserve on a document then unreserved it. At this point the user got
a DOCUMENT IN USE. We were able to use the MSC option to show the users on the
file cab server but this user did not show as a client. Looking at the files
held open on the users disk the file cab server had the users DOCDB,
RESERVATIONS etc held open as well as the .WPL file of the document the user was
trying to access. Shutting down and restarting cleared the problem.
PROBLEM 2:-
Over the past couple of days we have had a few users reporting problems with
SMU. They have SMU'd successfully to another user and attempted to create a new
email. At this point they enter the EMHEAD information and attempt to enter WPS.
It is then that they are taken back to the EMAIL menu with a message UNABLE TO
CREATE DOCUMENT. Investigating the file cab servers we found one that was
rejecting requests. Its channel count was up to 356 out of a max of 400 with
about 35 attached clients and approx 30 more threads allocated than deleted.
There should be ample channel count to accomodate the number of users on this
server. What appears to be happening and this is also reflected in PROBLEM 3
below is that the file cab server is holding open channels and not releasing
them.
PROBLEM 3:-
This morning a user logged into ALLIN1 and attempted to access his main drawer
for WP and got DRAWER CURRENTLY BEING USED BY ANOTHER USER. None of this users
drawers are shared and he does not have access to any other drawer.
Investigation of the files open for him showed him logged on to GRFH9 node of
the cluster whilst the file cab server on GRFH12 node in the cluster was holding
open his DOCDB.DAT, RESERVATIONS.DAT and DAF.DAT. Looking at the SAI option on
Manage servers screen for the GRFH12 server we could see that the channel count
was up to 356 out of 400 and it was rejecting requests to it. Again it appears
that channels are being held open. A bit of a guess would say that the user in
question was probably logged on to GRFH12 node yesterday and the server has held
onto him.
Below are the server log files from each node in our cluster since we last
rebooted on the 11th April. They show various internal errors and problems as
well as the shutdown/restarts.
11-APR-1993 15:42:38.52 Server: GRFH8::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
13-APR-1993 13:40:50.44 Server: GRFH8::"73=" Error: %OAFC-E-INTERR, Internal
error in File Cabinet Server Message: FCS has access violated, please submit
an SPR.
13-APR-1993 22:54:56.38 Server: GRFH8::"73=" Error: %MCC-E-ALERT_TERMREQ,
thread termination requested Message: CsiCacheBlockAstService; Error from
mcc_astevent_receive
13-APR-1993 22:54:57.33 Server: GRFH8::"73=" Error: %MCC-E-ALERT_TERMREQ,
thread termination requested Message: SrvTimeoutSysMan; receive alert to
terminate thread
13-APR-1993 22:55:54.36 Server: GRFH8::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
14-APR-1993 18:47:02.91 Server: GRFH8::"73=" Error: %MCC-E-IN_USE_ERROR, in
use error Message: CsiCacheFlushDrawerAccess; Error from mcc_mutex_try_lock
11-APR-1993 15:37:08.05 Server: GRFH9::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
11-APR-1993 15:52:50.90 Server: GRFH10::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
13-APR-1993 17:22:52.86 Server: GRFH10::"73=" Error: %MCC-E-IN_USE_ERROR, in
use error Message: CsiCacheFlushDrawerAccess; Error from mcc_mutex_try_lock
11-APR-1993 15:54:35.70 Server: GRFH11::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
11-APR-1993 15:39:16.06 Server: GRFH12::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
14-APR-1993 09:11:05.46 Server: GRFH12::"73=" Error: %OAFC-E-INTERR,
Internal error in File Cabinet Server Message: FCS has access violated,
please submit an SPR.
15-APR-1993 10:36:35.55 Server: GRFH12::"73=" Error: %MCC-E-EXISTENCE_ERROR,
object does not exist
15-APR-1993 10:51:26.76 Server: GRFH12::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
11-APR-1993 15:37:25.91 Server: GRFH13::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
13-APR-1993 15:49:12.96 Server: GRFH13::"73=" Error: %OAFC-E-INTERR,
Internal error in File Cabinet Server Message: FCS has access violated,
please submit an SPR.
13-APR-1993 16:10:15.94 Server: GRFH13::"73=" Error: %MCC-E-EXISTENCE_ERROR,
object does not exist
14-APR-1993 14:56:34.56 Server: GRFH13::"73=" Error: %MCC-E-EXISTENCE_ERROR,
object does not exist
14-APR-1993 14:57:02.50 Server: GRFH13::"73=" Error: %MCC-E-EXISTENCE_ERROR,
object does not exist
14-APR-1993 15:00:35.38 Server: GRFH13::"73=" Message: Startup for File
Cabinet Server V1.0-2 complete
Below is an extract from one of the file cab servers error logs
(OAFC$SERVER_ERROR.LOG). The information in this log is typical of what is in
all six of our file cab server logs on our cluster. The manual says that errors
should be reported to Digital if they occur in this log.
Can you throw any light on them?
Is it also possible to move the location of this log file from SYS$MANAGER to
our own location and perform some form of new version processing? At present the
file cab servers have been appending to the same file since we bought up version
3 of ALL-IN-1 last October.
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$BROKACCT:[DIRECTUW.ALLIN1.CREDIT_CONTROL]�S
Drawer owner: DIRECTUW
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$BROKACCT:[DIRECTUW.ALLIN1.CREDIT_CONTROL]��
Drawer owner: DIRECTUW
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$OANDG:[OANDGSD.ALLIN1.OGIPOL]
Drawer owner: OANDGSD
ALL-IN-1 Index Server Internal Error:
Error locking DAB during cache garbage collection:
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$OANDG:[OANDGSD.ALLIN1.OGIPOL])�
Drawer owner: OANDGSD
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$ITNLUSER:[ALEXANDERMM.ALLIN1]ab.dat
Drawer owner: ALEXANDERMM
The lock on the following drawer has become invalidated by another
process. Note that the lock has been granted and OafcNormal will be
returned to the client, however, all other processes wishing to share
this lock will also be granted invalid locks until all processes
sharing this lock are terminated.
Drawer directory: DIR$DIV36:[RIUKIPS.ALLIN1.SAH_SECTION_INFO]EMO!000874
Drawer owner: RIUKIPS
|
2473.6 | A few comments | CHRLIE::HUSTON | | Mon Apr 19 1993 16:06 | 77 |
| re .5
>Is there anything else we can set to receive more debug/error type
>information ?
THe only thing else you can do is turn on FCS tracing for the
users that are having problems, not sure if it will show anything
and it will get large quick, but worth a shot.
>Just out of curiosity, some of the errors listed are MCC-E-*******
>
>Does MCC mean that hooks are in the FCS so that it can be
>managed/monitored by DECmcc (Polcenter Framework) ? If so, any details
>on what I need to do to enable this ?
MCC is the threads package used by the FCS. There is nothing you can
do to get more information from it.
>User did a reserve on a document then unreserved it. At this point the user got
>a DOCUMENT IN USE. We were able to use the MSC option to show the users on the
>file cab server but this user did not show as a client. Looking at the files
>held open on the users disk the file cab server had the users DOCDB,
>RESERVATIONS etc held open as well as the .WPL file of the document the user was
>trying to access. Shutting down and restarting cleared the problem.
Having a FCS trace of this would be helpfull to see what FCS calls are
being made and what status is being returned. It sounds like there is
a bit of non-cooperation between the FCS and IOS with respect to
locking.
>Is it also possible to move the location of this log file from SYS$MANAGER to
>our own location and perform some form of new version processing? At present the
>file cab servers have been appending to the same file since we bought up version
>3 of ALL-IN-1 last October.
You can move the log simply by renaming it, the FCS opens the file, if
not there it creates a new one. Sorry but the location of
oafc$server_error.log is hard coded in the FCS.
> The lock on the following drawer has become invalidated by another
> process. Note that the lock has been granted and OafcNormal will be
> returned to the client, however, all other processes wishing to share
> this lock will also be granted invalid locks until all processes
> sharing this lock are terminated.
> Drawer directory: DIR$BROKACCT:[DIRECTUW.ALLIN1.CREDIT_CONTROL]�S
> Drawer owner: DIRECTUW
The only time I have seen this is when IOS has a MAIN drawer open (not
by using the FCS) and then the FCS tries to access it. THere is code
in to allow the locks to be managed properly, What happened is that
the FCS had exclusive lock on the drawer, IOS (or someone else) also
requested access. Background ASTs and the VMS lock manager work
together to tell the guy with the exclusive lock to loosen up its hold
on the resource (drawer name). This sounds like something went corrupt
in the lock resource. THe drawer directory looks like garbage.
In fact all the drawer directory fields in that look appear to have
a couple bytes of garbage on the end.
As fro the channels, teh only thing I can think of is that when the FCS
access violates it is not letting go of the channels that that thread
had. Probably due to channels being process allocated and there is no
map of what thread has how many channels. The condition handler will
attempt to close down files/drawers, not sure if it is smart enough to
let go of the channels as well.
Also you seem to have alot of uses for only 400 channels, each drawer
takes 4 channels, I seem to recall you having alot of users (could be
confusing you with someone else though). If so, bump up the channel
count and see if that problem goes away. I also don't see any messages
in the log file about the FCS thinking it is low on channels and trying
to release some. Whenever the FCS hits 90% used channels, it tries to
close some drawers/files down to free up channels, when it does this
it writes a message to the server log file
(sys$manager:oafc$server.log).
--Bob
|
2473.7 | Any news here ??? | VNABRW::EHRLICH_K | Ronnie James DIO, vocals! | Wed Jun 30 1993 12:01 | 64 |
| Hi Bob, Kevin,
I've been at a customer (ABB Vienna) today's morning because
they've had some troubles with SMU and back again. (DWRLOCKED!)
(The same as John in Re.1 mentioned!)
Also some users had problems with Creating a Mail. They filled in
TO's , CC's and a subject. And after the subject they hung.
Having a look in the trace I've found the following:
![SCRIPT] WP_SYS_EDIT Line 7: GET #DOC_FULLPATH = #DRAWER_FULLPATH "."
'"' OA$CU
! RDOC_FOLDER '".' OA$CURDOC_DOCNUM
![FUNC] Function = GET, Cmd line = #DOC_FULLPATH = #DRAWER_FULLPATH
"." '"' OA
! $CURDOC_FOLDER '".' OA$CURDOC_DOCNUM
![A1LOG] Entry = %OA-I-LOGFUN, Funktion: GET #DOC_FULLPATH
= #DRAWE
! R_FULLPATH "." '"' OA$CURDOC_FOLDER '".'
OA$CURDOC_DOCNUM
![SYMBOL] Symbol = #DOC_FULLPATH = #DRAWER_FULLPATH "." '"'
OA$CURDOC_FOLDER '".
! ' OA$CURDOC_DOCNUM, Value = OFFICE::."[PINCZOLITS
JOSEF]STANDARD
! "."AUSGANG".000437
![SCRIPT] WP_SYS_EDIT Line 8: FILECAB GET_ATTRIBUTES (DOCUMENT =
#DOC_FULLPATH,
! #MS = MAIL_STATUS, #MF = MODIFY)
![FUNC] Function = FILECAB, Cmd line = GET_ATTRIBUTES (DOCUMENT =
#DOC_FULLPAT
! H, #MS = MAIL_STATUS, #MF = MODIFY)
![A1LOG] Entry = %OA-I-LOGFUN, Funktion: FILECAB
GET_ATTRIBUTES (DOCUME
! NT = #DOC_FULLPATH, #MS = MAIL_STATUS, #MF = MODIFY)
![SYMBOL] Symbol = #DOC_FULLPATH, Value = OFFICE::."[PINCZOLITS
JOSEF]STANDARD".
! "AUSGANG".000437
![IO] FILECAB Server Request = LIST
![IO] Getting field CODE from OA$FOLDERS, Value = DEDE
![A1LOG] Entry = %OA-I-LOGERROR, %OA-W-SUBTERM, Fehler beim Ablauf des
Subproze
! sses "20801C18".
![A1LOG] Entry = %OA-I-LOGERROR, -NONAME-W-NOMSG, Message number
>>>>>> A6E83240 <<<<<<
Here I had to STOP/ID the process! The files were locked by the FCS,
after doing a SHOW DEVICE /FILES.
If you're interested in the whole Tracefile you'll find it on VNOTSC::
(49790::)ABB_TRACE.LOG
Now my question is, have you both found something. Are there any news
about the FCS. I've told ABB to install ICF #10 which solves some
problems with SMU.
ABB will tune the FCS as described in the ManagementGuide, maybe this
will help ??? But it can not be a solution to stop and restart the FCS.
No fun, I know.
Best regards and greetings from Vienna
Charly
|
2473.8 | some SMU problems have been fixed | CHRLIE::HUSTON | | Wed Jun 30 1993 13:39 | 13 |
|
There were problems in the FCS that would restrict SMU, they have been
fixed and put into some sort of patch (MUP or ICF not sure which, I
just build 'em, don't ship 'em :-) ).
ICF 10 does sound to be about the right timeframe though.
Also, the trace you showed is very hard to use to get FCS problems,
if you could show that and the FCS trace on the user in question
things may make more sense.
--Bob
|
2473.9 | Yes, but it's difficult to trace ... | VNABRW::EHRLICH_K | Ronnie James DIO, vocals! | Wed Jun 30 1993 14:05 | 20 |
| Bob,
first, it's great to get such a fast response. - Thank you very
much!
It's difficult to trace things that happened in the past. And enable
tracing after a FCS-restart for 500 ALL-IN-1 users will also be a
challenge, but I will tell this ABB. But mostly, the problems occour
when noone is reachable.
It looks like that there are sometimes some 'unserious' behaviours
between the FCS and the VMS-lockmanager. Who knows?
ABB has restarted FCS, all problems have gone (at the moment, hopefully
they never come back!).
Best regards
Charly_from_CSC_Vienna
|
2473.10 | Doing. . . | IOSG::STANDAGE | | Wed Jun 30 1993 14:44 | 14 |
|
Charly,
Some filelocking problems similar to what you are experiencing have
been investigated to some degree here in IOSG. The good news is that
progress is being made, but the extent of the changes means that you
won't see a fixed version of the FCS for a while yet.
Thanks for your feedback,
Kevin.
|
2473.11 | Yes, I know (2934.0) | VNABRW::EHRLICH_K | Ronnie James DIO, vocals! | Wed Jun 30 1993 15:18 | 15 |
| Kevin,
yes, I understand what you mean by
>The good news is that
>progress is being made, but the extent of the changes means that you
>won't see a fixed version of the FCS for a while yet.
after announcing note 2934 by GAP.
Is there really no way to get an 'ICF' for this. If there's a need, I'll
come over to you and help you!
Good luck for you (as we say in Austria toi, toi, toi!)
Charly_who's_happy_and_a_little_bit_sad_now.
|
2473.12 | Clarifying... | IOSG::PYE | Graham - ALL-IN-1 Sorcerer's Apprentice | Thu Jul 01 1993 16:26 | 6 |
| Well actually, (putting words in Kevin's mouth!) I think he meant
that the fix is sufficiently complicated that we might not be doing it
straight away. Besides the FCS team (which was unaffected by the 2934
announcement) is flat out on our committments for TeamLinks connection.
Graham
|
2473.13 | We'll get there eventually! | IOSG::STANDAGE | | Fri Jul 02 1993 10:37 | 11 |
|
Yes. As usual, Graham is very accurate !
The changes are rather extensive to the server, so we want to take our
time and get it right, plus the fact that there are other committments
which are taking priority.
Thanks,
Kevin.
|