| Title: | Mailworks-unix |
| Notice: | V2.0.4 now available -- see Note 4.37 5 |
| Moderator: | TAMARA::NEUMAN::Neumann |
| Created: | Wed Jun 02 1993 |
| Last Modified: | Tue Jun 03 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1384 |
| Total number of notes: | 5851 |
A Customer (Mitre) says that Teamlinks users were having problems accessing
Mailworks/Unix. Versions:
Mailworks V2.0 eco3
Unix V3.2-C
Teamlinks V2.5
The MCS process was out there.....MSS process is out there.
There are errors in the logs for mcs and mss...something like:
socket to host mail05a could not receive data. The rcv system call
returned an error 54, connection reset by peer.
No users could connect and users were getting errors about profile
corruption. mschk shows no corruption.
They stopped Mailworks and re-started it ... now everything is working just
fine. He wants to know what was going on.
Ask him to send me the sections of the two logs and I'll post them in the
Engineering notes file to get feedback. I did a search from stars and
didn't find anything like this.
This is what he sent me:
From: US3RMC::"[email protected]" "Ke-Chieh Chu" 3-APR-1997
13:04:35.54
To: vmsnet::j_colburn
CC: [email protected]
Subj: MAILworks problems on mail05 this morning (4/3). Track
#C970403-1029
Hi Jean,
We have problem on one of the TeamLinks server (mail05) this morning.
Many Tea*
- -----------------------------------------------------
- -----
Thu Apr 3 08:10:46 1997, Program: /usr/opt/DMW/bin/mss, Pid:11697,
User: root,
Type: Err, Sev: Error
Version: 2.0-3, Version Date: Tue Feb 25 00:42:31 EST 1997
Module: 8, Error: 5
A network connection closed or a UNIX error occurred
(Error NIOSendErr).
If you lost your connection to a message store, try connecting
again later. Report this error to your system administrator.
The socket to host mail05 could not send data. The send system
call returned an error:
(32) Broken pipe
- -----------------------------------------------------------------
The mcslog had the following error in the log;
- ----------------------------------------------------------------
- -----
Thu Apr 3 07:06:58 1997, Program: /usr/opt/DMW/bin/mcs, Pid:11977,
User: root,
Type: Err, Sev: Error
Version: 2.0-3, Version Date: Tue Feb 25 01:30:03 EST 1997
Module: 8, Error: 2
A network connection closed or a UNIX error occurred
(Error NIORcvErr).
If you lost your connection to a message store, try connecting
again later. Report this error to your system administrator.
The socket to host mail05.mitre.org could not receive data. The rcv
system
call returned an error:
(54) Connection reset by peer
_
jan
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1365.1 | Hard to say | TAMARA::lamac.zko.dec.com::Neumann | Stan Neumann | Mon Apr 07 1997 15:53 | 42 |
Jean (when did you change your name from Jan? :-) The simple answer is that this represents a problem communicating between the mcs process and the mss process. Of course, that doesn't answer the real question, which is "why?" Unfortunately, we'd need to do a fairly thorough examination of the log files, and even then, we might not figure out what went wrong. Some possibilities: * They might have been having network problems that were affecting this node - in general that seems unlikely, since restarting MailWorks shouldn't affect the network, but suppose some MailWorks process were having trouble connecting to the DSA, for instance, and were constantly retrying - that in turn could increase the traffic through the network adaptor, which in turn could cause problems if the network were very busy in other ways. * It's possible that the mss process had too many open files (a side effect of another problem that has been fixed in ECO4), since network connections use file handles, that could prevent the mss from accepting new connections. Did they try connecting to the mss server with a motif client or the command line? (That is one of the diagnostic steps we've suggested they take to help us narrow down which component is having problems.) I'm inclined to just wait to see if it happens again. If/when it happens again, ask them to run the suppscr.sh and send you or us the output. (That's a general observation, by the way - starting with V2.0, in the unsupported directory is a script suppscr.sh that takes a snapshot of important parameters of the system. If a customer runs that while a problem is happening, it may collect data that will help us diagnose the problem.) -Stan | |||||
| 1365.2 | Thanks Stan | VMSNET::J_COLBURN | Tue Apr 08 1997 13:06 | 8 | |
Stan,
Thanks so much for this response. I'm going to send it to him along
with recommendations that should the problem re-occur, he should run
the script.
Jan (not Jean...no name change here..)
| |||||