| Adam,
I have seen the ALL-IN-1 Fetcher job loop when the first message in the
ALL-IN-1 (A1) mailbox in Message Router is corrupt.
I'm not sure if this would produce the error you have in the log
file, but it might be a place to start.
If it looks like that is the case, I think there is information in the
FORTY2::MAILBUS notes file on checking for corrupt messages.
Good luck,
Terri K.
|
| Adam,
Perhaps this Stars article might help...
Regards,
Jan
Troubleshooting ALL-IN-1 Fetcher Process Resulting In CPU Loop
COPYRIGHT (c) 1988, 1989, 1990 by Digital Equipment Corporation.
ALL RIGHTS RESERVED. No distribution except as provided under contract.
PRODUCT: ALL-IN-1 V2.2, V2.3
SOURCE: Customer Support Center/Atlanta USA
BACKGROUND:
The ALL-IN-1 Fetcher batch job OA$LIB:OAMTIMAIL.COM is submitted to the
queue by doing a SF Start Fetcher from the Mail Management Menu. The
process A1 Fetcher goes into a CPU bound loop (uses CPU time with no
mail being fetched from Message Router A1 Mailbox). Attempting to run
the Fetcher interactively using the SM MM RF (Run Fetcher) also goes
into a CPU loop.
INFORMATION:
The Sender and Fetcher jobs are running from the same account and the
Sender is running okay, this means that the file cabinet for the
ALL-IN-1 account that is running these procedures is not corrupt.
To verify file cabinet corruption, identify the ALL-IN-1 user which
the Sender and/or Fetcher procedures are running from (ie usually
POSTMASTER). From DCL in this users ALL-IN-1 directory, analyze
the DOCDB.DAT and DAF.DAT files.
Example:
$SET DEF dev:[ALLIN1.POSTMASTE]
$ANAL/RMS DOCDB.DAT
$ANAL/RMS DAF.DAT
If either of these shows errors, you may want to create a new file cabinet
for this account, as follows:
$SET DEF dev:[ALLIN1.POSTMASTE]
$CREATE/FDL=OA$LIB:DOCDB DOCDB.DAT
$CREATE/FDL=OA$LIB:PDAF DAF.DAT
Use DCL command: $RUN OA$LIB:FETCHCHECK to check the Fetcher queue record
in the Pending file. This results in the message
"Contains 2 messages to be fetched".
Put the fetcher on hold with a SM MM HF (Hold Fetcher) option from
the Message Management Menu. Then from an ALL-IN-1 privileged account,
make sure the fetcher was not running with a DCL command:
$SHOW QUEUE/ALL/BATCH and look for the OA$LIB:OAMTIMAIL.COM job.
Attempted to interactively invoke the Fetcher using the ALL-IN-1
command:
<MAIL MTI_FETCH_REMOTE.
The Fetcher still goes into a CPU loop.
Check the Fetcher Queue Pending file record using the DCL command:
$SEARCH OA$DATA:PENDING.DAT/OUT=FETCHER.LOG "FETCHER"
The output from this command will then be contained in the file FETCHER.LOG
in the default directory. Edit or type this file to view the contents.
In this case we see 4 Z********.NBS files. The .NBS files are stored in
pairs. One for the body and another for the envelope of each message.
This corresponds to the two messages that running Fetchcheck returned.
$SHOW LOGICAL OA$MTI_MAILBX revealed that the mailbox the
fetcher was using was A1.
Check A1 Message router Mailbox:
$MC MRMAN
MRMAN> DUMP A1
This displayed several screens of messages. This system is not using
Wordperfect.
We ran the fetcher interactively 11 times since he is running V2.2
ALL-IN-1. In ALL-IN-1 V2.2, the Sender and the Fetcher will reject
a message from their queue after 10 attempts to process a message.
In V2.3 there is a patch (K536) that will allow the retry count to be set
on a system between 1 and 10.
After running the fetcher 11 times, he did another search on the pending
file for the fetcher record. Now there was 2 .NBS files, and they have
different names than the previous .NBS files. Then we went into $MC MRMAN
again and did a DUMP A1. The previous message was gone from the mailbox.
We noticed that each time the fetcher ran only 1 message would be
fetched from the mailbox. The fetcher normally should fetch from
the mailbox until there are no other messages to fetch. We then had
him make a copy of the 2 .NBS files that are stored in OA$MTI_DATA
directory to SYS$LOGIN. Dump the NBS file using the DCL command:
$RUN MB$TOOLS:MRNBSDMP
entered the filenames of the .NBS files in SYS$LOGIN.
He was able to determine who the message was from and the
subject of the message.
We then created and ran a command file below to
delete the fetcher queue record from pending file.
$OPEN/APPEND/SHARE DUMMY OA$DATA:PENDING.DAT
$READ/DELETE/KEY="FETCHER QUEUE" DUMMY TEST
$CLOSE DUMMY
We then did a $SEARCH OA$DATA:PENDING.DAT "FETCHER" to verify that
no records are found, and the record was successfully deleted.
The fetcher was then restared with SM MM SF (Start Fetcher). Next
to verify that the messages in the A1 mailbox of Message router are
now getting fetched, go into MRMAN and DUMP A1. The number of messages
in this mailbox should continue to decrease. Now the messages are
leaving the mailbox. The fetcher should run successfully until the
mailbox is empty.
It is possible that the "FETCHER QUEUE" record might have become
corrupt when that disk ran out of disk space.
If deleting the FETCHER record from the PENDING file does not solve
the problem, it may be caused by a corrupt system DAF file. Identify
what Mail area is being used when the fetcher is attempting to post the
message (ie OA$SHARE:OA$DAF_E.DAT), and analyze this file.
Example:
$SET DEF OA$SHARE
$ANAL/RMS OA$DAF_E.DAT
If errors are found, these should be corrected. Dending on the type
and extent of the corrution, this may be accomplished by reorganizing
the system files or using the DCL CONVERT/FDL utility.
|