T.R | Title | User | Personal Name | Date | Lines |
---|
13.1 | FILLM or symptom of other problem? | STKOFF::SPERSSON | Pas de Probleme | Mon Oct 02 1989 10:55 | 26 |
|
Hung,
I can think of two possible reasons for this:
1) We have noticed that the default FILLM quota (40) for the MRMEMO
account may actually be too low. Our tests have shown that if you
have several manager sessions (ie MRMMAN) that use the SHOW/CONTINUOUS
command running simultaneously, then the server will run out of this
particular quota. I don't have a specific maximum number of allowed
manager sessions, but we have certainly managed to crash the server
while running only three of them. So a FILLM of 60 would I guess
be more appropriate.
2) (this more probably caused your colleagues particular problem).
A look through the code shows that if the server LIB$SIGNALs an
unexpected error code like MRMEMO$_MESSED (internal error), then the
clean-up procedures might be less than water-proof.
So, does the log show an error before the "exceeded file limit".
That would, I think, indicate the real problem. Also, are we talking
V1 or V2?
cheers,
Stefan
|
13.2 | more problem description | HAN01::FISCHER_HE | | Mon Oct 02 1989 12:25 | 99 |
|
Hallo,
We are using version 2.0 (SDC).
We have no problem with parallel MRMMAN sessions und
MRMEMO-Account-Quotas.
Problems are caused by server quotas which are
set up by SYSGEN PQL_D... (FILLM = 16, PGFLQUOTA = 4096)
My observations showed at the time the server crashed:
FILLM was counted to 0 und PGFLQUOTA was lower than 1000.
SHO DEV/FILES continuously showed that MRMEMO2_CONTENTS
was one reason for counting FILLM down. There may be other,
especially it seemes to be much faster comming to a crash,
if sending activities to MEMO are parallel to
fetching activities.
The LOG file (see below) shows only the symptom not the
reason.
We will increase our PQL_ - Quotas, but that is not
the final solution. Is the missing release of page file usage
a general problem of MRIF-Routines ?
---
MfG
Heinz Fischer
OAVAX::ALLIN1 > typ mrmemo$dir:mrmemo2.log
$!..
$!.= Central user login procedure !17
$!.< ---------------------------- !5
Login of user MRMEMO on OAVAX at 15-SEP-1989 10:19:06.97 Mode=OTHER
$
$ IF "MRMEMOLOGIN.COM".EQS."NL:" .OR. "0".NES."1" -
THEN SET CONTROL=(T,Y)
$
$ EXIT ! Central user login procedure !18
$! M r m e m o . C o m - comments and copyright notice at the end
$!
$ set noon
$ set verify
$ define tt nl:
$ run/nodebug sys$system:mrmsrv
Time: 1989-09-15 10:19:13.38; message from server MRMEMO2:
%MRMEMO-I-NEWACCOUNT, opening new accounting file: FEBK_DATA:[MRMEMO]MRMEMOACC2.DAT;40
Time: 1989-09-15 10:20:33.98; message from server MRMEMO2:
%MRMEMO-I-DISREJ, distribution rejected by MEMO system:
-MRMEMO-S-RCERRTXT_04, Fehler in der Empf�ngerliste (w�hrend der Verteilung).
Time: 1989-09-15 10:59:24.48; message from server MRMEMO2:
%DCF-E-CONV_FAIL, Conversion failed - no further information
%MRMEMO-W-TRACE, traceback forced from the Server Handler
-MRMEMO-I-FSM, idle, connected to MR and MEMO after message indicated in MR mailbox
-MRMEMO-I-RING, ring: '1A 19 18 27 1B 13 36 1A 1A 19.', substates: 00000470
%TRACE-W-NOMSG, Message number 00098198
module name routine name line rel PC abs PC
SRVLOG SRV$LOG_HANDLER 3597 000001BB 000BE2AF
----- above condition handler called with exception 01A8809A:
%DCF-E-CONV_FAIL, Conversion failed - no further information
----- end of exception message
SRVMRD SRV$MRD_DISASS_TEXT 5367 000002C5 000C6B28
SRVMRD SRV$MRD_DISASS_BODY 8919 00000028 000C9EDF
SRVMRD SRV$MRD_DISASS 9893 00000D2E 000CAC7F
SRVACT SRV$ACT_M 4284 0000003D 000C56A1
SRVDSP SRV$DSP_FSM 2894 00000115 000BDDB9
SRVMMO SRV$MMO_ONE_LIFE 3955 000001EF 000BBD5F
SRVMMO SRV$MMO_MAIN 3787 00000017 000BBB63
001112BE 001112BE
KOTERM KOTERM 804 00000039 0010ECB2
00111299 00111299
KODOC KODOC 1768 00000097 0010BEF4
00111299 00111299
001C19AA 001C19AA
ADA$ELAB_DDS ADA$ELAB_DDS 0000000E 0009FC0E
00111299 00111299
Time: 1989-09-15 10:59:30.79; message from server MRMEMO2:
%RMS-E-CRE, ACP file create failed
%MRMEMO-W-TRACE, traceback forced from the Server Handler
-MRMEMO-I-FSM, idle, connected to MR and MEMO after message indicated in MR mailbox
-MRMEMO-I-RING, ring: '36 1A 1A 19 ][ 4A 5A 59 52 19.', substates: 00000430
%TRACE-W-NOMSG, Message number 00098198
module name routine name line rel PC abs PC
SRVLOG SRV$LOG_HANDLER 3597 000001BB 000BE2AF
----- above condition handler called with exception 0001C00A:
%RMS-E-CRE, ACP file create failed
----- end of exception message
SRVMRC SRV$MRC_FETCH 4282 000000B7 000C147D
SRVACT SRV$ACT_M 4273 00000022 000C5686
SRVDSP SRV$DSP_FSM 2894 00000115 000BDDB9
SRVMMO SRV$MMO_ONE_LIFE 3955 000001EF 000BBD5F
SRVMMO SRV$MMO_MAIN 3787 00000017 000BBB63
001112BE 001112BE
KOTERM KOTERM 804 00000039 0010ECB2
00111299 00111299
KODOC KODOC 1768 00000097 0010BEF4
00111299 00111299
001C19AA 001C19AA
ADA$ELAB_DDS ADA$ELAB_DDS 0000000E 0009FC0E
00111299 00111299
|
13.3 | DCF fails, that's the problem | STKOFF::SPERSSON | Pas de Probleme | Wed Oct 04 1989 14:19 | 37 |
|
Heinz,
I think that we have to look at what causes the initial problem
here:
>>> Time: 1989-09-15 10:59:24.48; message from server MRMEMO2:
>>> %DCF-E-CONV_FAIL, Conversion failed - no further information
This indicates that a conversion of a WPSPLUS or DX format message
has failed for some reason. DCF is an unsupported, undocumented
un-whatever tool that we use for document conversion. It is used by
most corporate gateways. We use it, not because we want to or like
it, but because there is no good alternative. Anyway, this error
message clearly should not occur and this is the problem. My theory
is that the document contains some really nasty WPSPLUS or DX message
that DCF simply can't handle. We would be *very* interested if you
could provide the file MRMEMO$DIR:MRMEMO2-CONTENTS.NBS so we could
have a look at it. The file itself rather than a dump of it is what
we would appreciate the most.
The quota problems, as stated in .1, probably occur because we fail
to do an MRIF$END_DISASSEMBLE (which would close the file) if an error
occurs in the middle of the module SRVMRD. This means that as the
server restarts, we try to do another MRIF$START_DISASSEMBLE (which
will open the file) and inevitably we will run out of resources.
This is clearly also a bug, but the main problem is the crashing
Document Conversion.
Anyway, we'll try and get nicer clean-up routines into future versions.
By the way, I don't think that PQL has anything to do with things,
because the server is started via DECnet, and the quotas used should
be those assigned to the MRMEMO account.
Stefan
|
13.4 | What WPS-PLUS version/contents ? | 49167::VANHOOSTE | Guide to Shadowland | Mon Oct 09 1989 13:50 | 11 |
| OAVAX ?
Using ALL-IN-1 V2.3 perhaps and WPS-PLUS ?
There are new features in WPS-PLUS that have no match in DX.
Perhaps they have no match in teh conversion software either.
I am thinking of widow/orphan control, footnote processing etc.
Should this be the reason, I guess we are in deep **** for the official
other gateways too.
Marc VH.
|
13.5 | Who knows? | STKOFF::SPERSSON | Pas de Probleme | Tue Oct 10 1989 11:47 | 15 |
|
> Should this be the reason, I guess we are in deep **** for the official
> other gateways too.
Could be, could be...
This is why we're so interested in obtaining the violating message,
to see for ourselves exactly what it looks like. I guess we have
to look at a way to use official DDIF-based routines to perform
our conversions pretty soon ("no commitment is hereby implied" etc
etc)
regards,
Stefan
|
13.6 | Marc's question was good! | HAN01::FISCHER_HE | | Fri Oct 13 1989 16:44 | 15 |
|
Stefan,
we are running ALL-IN-1 2.3 (incl. WPSPLUS) which we have
installed 2 weeks before we took the new SDC version of
MRMEMO.
But as far as I remember is the conversion error not
the only one which causes problem.
I have to look at all the logging files and try to
catch a CONTENTS file during a crash (but after
some days of holiday for me, Ok ?)
regards
Heinz
|
13.7 | Also look at the doc itself | KETJE::VANHOOSTE | Guide to Shadowland | Mon Oct 16 1989 18:56 | 10 |
| If you can find out the date and time, you could use it to scan the
user's file cab for the MR ID (which is maintained on sent mails).
Or just look at it using the time as a "Creation Time" in the search.
Then you could have a look at the contents of the DOC in WPs-PLUS (i.e.
if the user has not deleted the mail AND cleaned up his WASTEBASKET).
This will tell us whether the doc is indeed one using the new
features.
Marc VH.
|
13.8 | Service message leaves open file | STKHLM::OLSSON | Anders Olsson | Tue Oct 17 1989 10:12 | 31 |
| I have found a code path in MRMEMO (V2.0) where a contents file might not
get closed. It only happens when a service message is disassembled
so if /REQUEST_NOTIFICATIONS is used, it is more likely that this
problem occurs. When /REQUEST_NOTIFICATIONS is not used, the amount of
disassembled service messages is much less and consists mainly of
non-delivery notifications.
There is no simple workaround since a random factor is involved (an
uninitialized variable). Not using /REQUEST_NOTIFICATIONS and not
sending MEMO:s to wrong addresses (that causes non-deliveries) is the
best way to avoid the code path that doesn't close the contents file.
This might however not be the only cause of the exceeded quota problem.
It is still possible that some strange WPS-PLUS format causes DCF to fail
so keep looking for any WPS-PLUS documents that causes conversion errors.
Regarding the server process quotas and PQL: The MRMEMO server runs as
a detached process and hence uses the PQL quotas. Stefan's comment:
.3> By the way, I don't think that PQL has anything to do with things,
.3> because the server is started via DECnet, and the quotas used should
.3> be those assigned to the MRMEMO account.
was not correct but it came from me so don't blame him.
The server is started in MRMEMOLOGIN.COM with a RUN/DETACH command.
Several of the important quotas are specified in the RUN/DETACH
command (e.g. /FILE_LIMIT) so the PQL quotas are not very involved. If
you want to increase server quotas, do it in MRMEMOLOGIN.COM.
Anders
|