[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference iosg::all-in-1

Title:ALL-IN-1 (tm) Support Conference
Notice:Please spell ALL-IN-1 correctly - all CAPITALS!
Moderator:IOSG::PYECE
Created:Fri Jul 01 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2716
Total number of notes:12169

2688.0. "Slow mails from ALL-IN-1" by KERNEL::BURDENI () Wed May 21 1997 13:43

    Can anyone offer any ideas as to why it would take an hour to send a
    mail which goes via message router, even though EXPRESS only takes 2
    minutes.  This makes me think that Message Router is running OK, but
    the Fetcher is waiting a long to run.  The customer says that this
    never used to be the case (used to take only 5 minutes), but cannot 
    identify anything that has changed on the system.
    
    He is running ALL-IN-1 3.2 vms 6.2 with Message Router 3.3-313
    
    Can anyone offer any assistance ?
    
    Ivan.
T.RTitleUserPersonal
Name
DateLines
2688.1First Class remote mail goes via SenderIOSG::MARSHALLThu May 22 1997 11:528
Sounds like there's something wrong with the Sender process on the originating
system.  Are there a lot of messages on the Sender queue for some reason?

I doubt your suggestion that the Fetcher on the receiving system is the culprit
here, if as you observe Express mail (which is sent directly by the originating
process and doesn't go near the Sender process) is fine.

Scott
2688.2EMD-E-OPENERR, Error openingKERNEL::BURDENIThu May 22 1997 14:0715
    Apologies, I ment sender, there seems to be a queue of upto 200
    regularly in waiting to be sent. They have reported an error in the
    Sender log as follows
    
    MTI$ERROR log has the following error every 10 minutes.
    
    'default sender'
    %EMD-E-OPENERR, Error opening link to message router.
    
    Mails do go, and there does not seem to be any failures to send.
    
    (This is information I should have posted in .0)
    
    Any ideas why this is happening ?
    Ivan.
2688.3IOSG::PYEGraham - ALL-IN-1 Sorcerer's ApprenticeThu May 22 1997 15:123
    Is the Message Router on only one (or some) nodes of a cluster, and
    hence the sender is taking a lot of retries to find the node that
    works?
2688.4Both machinesKERNEL::BURDENIThu May 22 1997 16:164
    The cluster has two nodes, each has a logical MR$NODE set to the
    Nodename. and each has OA$PRIMARY_NODE set to the Cluster alias.
    ALL-IN-1 (spelt correctly:-) is on both machines.
    	
2688.5IOSG::MARSHALLFri May 23 1997 16:1114
Hmmm, .4 doesn't answer the question: on which node(s) is MR running?

Are the nodes VAXes or Alphas?

What are the values of the "Remote Mail", "Remote MR" and "MR node" fields in
A1CONFIG?

What is the value, if any, of the OA$MTI_MR_NODE logical?  Note that ALL-IN-1
doesn't use the MR$NODE logical.

What does SHOW A1 (or whatever your A1 mailbox is called) yield from MRMAN? 
Does this match what the documentation says it should be?

Scott
2688.6More infoKERNEL::BURDENIWed May 28 1997 10:2422
    The customer has a Vax cluster, with the following setup from ALL-IN-1
    
               ALL-IN-1 SYSTEM CONFIGURATION INFORMATION - continued (1)
    
     Remote Mail: 1   Direct Type: 0   Direct Level: 0   ASCII Translate: 0
    
     Remote MR: 0   MR Node:          MR Mailbox: A1
    
    >>>>This is the same on both nodes.
    
    The logical OA$MTI_MR_NODE is not set on either machine, and the A1
    Mailbox is setup as follows.
    
            This is MRMAN V3.3-313
    MRM> sho a1
    A1,                       Owner=ALLIN1 Notify=OA$NOTIFY_MBX 
    Suppress_Delivery_Reports Complete_Messages Ignore_Sender Service_Messages
    MRM>
    
    I hope this helps
    Ivan.
    
2688.7IOSG::MARSHALLFri May 30 1997 18:2825
All the information in .6 seems in order.

Again: please tell us on which nodes of the cluster Message Router is running. 
In your environment, MR should be running on all nodes (and that is the only
supported configuration), but if for some reason it's only running on one, that
could explain this problem.

On which nodes of the cluster do you have ALL-IN-1 Senders running?

The reason for the backlog of messages on the Sender queue is that every time
    %EMD-E-OPENERR, Error opening link to message router

occurs, the Sender waits ten minutes before trying again.  So if this error
happens a lot, you end up with a lot of dead time when nothing is happening.

You should probably check the Message Router log files to see if there is any
information there which would help explain why ALL-IN-1 can't connect to MR.  If
everything's running on the same node, it's not likely to be network problems,
so there's no external factors which could be at fault here.

It might also be worth re-setting the A1 mailbox password, just in case some
discrepancy there is causing problems.  Then there's always the option of
shutting everything down and restarting it, in case that clears the problem.

Scott
2688.8More infoKERNEL::BURDENITue Jun 03 1997 13:3523
    Thanks, I have asked the customer a few more questions and the results
    are as follows.  He has message router started on both nodes, though
    there is only the MRLOGGER process running on node B.  There are
    sender and Fetchers running only on node A along with the transfer
    service.
    
    In the MRERR_ALL.INF log there is an error which matches the Sender
    error for timing (every 10 minutes) as follows :
    
    %MROUTER-I-FAILOG_LSTN_S ' date/time ', The application ALL-IN-1 on
    nodes CHECC1 identified to Mailbox A1, is sending a message.
    %EXPO-E-TEXT,!AS
    
    The CHECC1 is the cluster alias.
    
    I really should have picked this up sooner apologies. But the customer
    reported no errors in the MR logs.  Is this any help ?
    
    The systems have been rebooted a number of times since this problem
    began.
    
    Cheers
    Ivan
2688.9Maybe this will fix itIOSG::MARSHALLWed Jun 04 1997 12:3538
Hmm, the two error messages you're seeing don't make sense:

>> 'default sender'
>> %EMD-E-OPENERR, Error opening link to message router.

>> %MROUTER-I-FAILOG_LSTN_S ' date/time ', The application ALL-IN-1 on
>> nodes CHECC1 identified to Mailbox A1, is sending a message.

So ALL-IN-1 thinks it can't connect to Message Router, but according to the
Message Router log, ALL-IN-1 has connected and is sending a message.  Plus, as
you confirm, your messages are being sent, albeit with some delays.

My guess is that the Sender process is using the cluster alias to connect to
Message Router, so there's a 50% chance DECnet will try to connect it to node B,
where there is no Transfer Service, and that will cause the ALL-IN-1 error and a
ten minute wait.  Only when DECnet gives ALL-IN-1 a connection to node A will
the messages get through.

But it's curious the two messages occur at the same time; how many Senders are
they trying to run on node A?

You have several options to fix this:

1) Start the transfer service on node B as well.  This is in fact the only
supported configuration of the components, so is the one the customer should use.

2. Persuade ALL-IN-1 not to connect to node B.  One way you may be able to do
this is to set 'Remote MR' to 1 in A1CONFIG, and define the remote MR node to be
node A.  Note this isn't an officially supported way of doing things, and I
don't know whether there would be any unwanted side effects - I'm not suggesting
there will be, and I don't think there will be, but I'm not entirely certain
there won't be.

If you do (2), don't forget to shutdown and restart ALL-IN-1.  Also, do they use
NETWORK.DAT for mail addressing?  There are some subtle implications around that
if you change your MR node name; see topic 438 in this conference for info.

Scott