[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference abbott::mailworks-unix

Title:Mailworks-unix
Notice:V2.0.4 now available -- see Note 4.375
Moderator:TAMARA::NEUMAN::Neumann
Created:Wed Jun 02 1993
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1384
Total number of notes:5851

1365.0. "mss and mcs errors-what is causing this?" by VMSNET::J_COLBURN () Thu Apr 03 1997 14:19

A Customer (Mitre) says that Teamlinks users were having problems accessing 
Mailworks/Unix.  Versions:

Mailworks V2.0 eco3
Unix V3.2-C
Teamlinks V2.5

The MCS process was out there.....MSS process is out there.

There are errors in the logs for mcs and mss...something like:

socket to host mail05a could not receive data.  The rcv system call 
returned an error 54, connection reset by peer.

No users could connect and users were getting errors about profile 
corruption.  mschk shows no corruption.

They stopped Mailworks and re-started it ... now everything is working just 
fine.  He wants to know what was going on. 

Ask him to send me the sections of the two logs and I'll post them in the 
Engineering notes file to get feedback.  I did a search from stars and 
didn't find anything like this.  

This is what he sent me:
    
    From:   US3RMC::"[email protected]" "Ke-Chieh Chu"  3-APR-1997
    13:04:35.54
    To:     vmsnet::j_colburn
    CC:     [email protected]
    Subj:   MAILworks problems on mail05 this morning (4/3). Track
    #C970403-1029
    
    Hi Jean,
    
    We have problem on one of the TeamLinks server (mail05) this morning. 
    Many Tea*
    
    - -----------------------------------------------------
    - -----
    Thu Apr  3 08:10:46 1997, Program: /usr/opt/DMW/bin/mss, Pid:11697,
    User: root,
    Type: Err, Sev: Error
            Version: 2.0-3, Version Date: Tue Feb 25 00:42:31 EST 1997
            Module: 8, Error: 5
    A network connection closed or a UNIX error occurred
    (Error NIOSendErr).
    
    If you lost your connection to a message store, try connecting
    again later.  Report this error to your system administrator.
    
    The socket to host mail05 could not send data.  The send system
    
    call returned an error:
       (32) Broken pipe
    - -----------------------------------------------------------------
    
    The mcslog had the following error in the log;
    - ----------------------------------------------------------------
    - -----
    Thu Apr  3 07:06:58 1997, Program: /usr/opt/DMW/bin/mcs, Pid:11977,
    User: root,
    Type: Err, Sev: Error
            Version: 2.0-3, Version Date: Tue Feb 25 01:30:03 EST 1997
            Module: 8, Error: 2
    A network connection closed or a UNIX error occurred
    (Error NIORcvErr).
    
    If you lost your connection to a message store, try connecting
    again later.  Report this error to your system administrator.
    
    The socket to host mail05.mitre.org could not receive data.  The rcv
    system
    call returned an error:
       (54) Connection reset by peer
    _
    
jan

T.RTitleUserPersonal
Name
DateLines
1365.1Hard to sayTAMARA::lamac.zko.dec.com::NeumannStan NeumannMon Apr 07 1997 16:5342
Jean (when did you change your name from Jan? :-)

The simple answer is that this represents a problem communicating
between the mcs process and the mss process.  Of course, that
doesn't answer the real question, which is "why?"

Unfortunately, we'd need to do a fairly thorough examination
of the log files, and even then, we might not figure out
what went wrong.

Some possibilities:

* They might have been having network problems that were affecting
  this node - in general that seems unlikely, since restarting
  MailWorks shouldn't affect the network, but suppose some
  MailWorks process were having trouble connecting to the DSA,
  for instance, and were constantly retrying - that in turn could
  increase the traffic through the network adaptor, which in turn
  could cause problems if the network were very busy in other ways.

* It's possible that the mss process had too many open files
  (a side effect of another problem that has been fixed in ECO4),
  since network connections use file handles, that could prevent
  the mss from accepting new connections.

Did they try connecting to the mss server with a motif client
or the command line?  (That is one of the diagnostic steps we've
suggested they take to help us narrow down which component is
having problems.)

I'm inclined to just wait to see if it happens again.
If/when it happens again, ask them to run the suppscr.sh
and send you or us the output.  

(That's a general observation, by the way - starting with V2.0,
in the unsupported directory is a script suppscr.sh that takes
a snapshot of important parameters of the system.  If a customer
runs that while a problem is happening, it may collect data
that will help us diagnose the problem.)

-Stan

1365.2Thanks StanVMSNET::J_COLBURNTue Apr 08 1997 14:068
    Stan,
    
    Thanks so much for this response.  I'm going to send it to him along
    with recommendations that should the problem re-occur, he should run
    the script.  
    
    Jan (not Jean...no name change here..)