Title: | Mailworks-unix |
Notice: | V2.0.4 now available -- see Note 4.37 5 |
Moderator: | TAMARA::NEUMAN::Neumann |
Created: | Wed Jun 02 1993 |
Last Modified: | Tue Jun 03 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1384 |
Total number of notes: | 5851 |
A Customer (Mitre) says that Teamlinks users were having problems accessing Mailworks/Unix. Versions: Mailworks V2.0 eco3 Unix V3.2-C Teamlinks V2.5 The MCS process was out there.....MSS process is out there. There are errors in the logs for mcs and mss...something like: socket to host mail05a could not receive data. The rcv system call returned an error 54, connection reset by peer. No users could connect and users were getting errors about profile corruption. mschk shows no corruption. They stopped Mailworks and re-started it ... now everything is working just fine. He wants to know what was going on. Ask him to send me the sections of the two logs and I'll post them in the Engineering notes file to get feedback. I did a search from stars and didn't find anything like this. This is what he sent me: From: US3RMC::"[email protected]" "Ke-Chieh Chu" 3-APR-1997 13:04:35.54 To: vmsnet::j_colburn CC: [email protected] Subj: MAILworks problems on mail05 this morning (4/3). Track #C970403-1029 Hi Jean, We have problem on one of the TeamLinks server (mail05) this morning. Many Tea* - ----------------------------------------------------- - ----- Thu Apr 3 08:10:46 1997, Program: /usr/opt/DMW/bin/mss, Pid:11697, User: root, Type: Err, Sev: Error Version: 2.0-3, Version Date: Tue Feb 25 00:42:31 EST 1997 Module: 8, Error: 5 A network connection closed or a UNIX error occurred (Error NIOSendErr). If you lost your connection to a message store, try connecting again later. Report this error to your system administrator. The socket to host mail05 could not send data. The send system call returned an error: (32) Broken pipe - ----------------------------------------------------------------- The mcslog had the following error in the log; - ---------------------------------------------------------------- - ----- Thu Apr 3 07:06:58 1997, Program: /usr/opt/DMW/bin/mcs, Pid:11977, User: root, Type: Err, Sev: Error Version: 2.0-3, Version Date: Tue Feb 25 01:30:03 EST 1997 Module: 8, Error: 2 A network connection closed or a UNIX error occurred (Error NIORcvErr). If you lost your connection to a message store, try connecting again later. Report this error to your system administrator. The socket to host mail05.mitre.org could not receive data. The rcv system call returned an error: (54) Connection reset by peer _ jan
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
1365.1 | Hard to say | TAMARA::lamac.zko.dec.com::Neumann | Stan Neumann | Mon Apr 07 1997 16:53 | 42 |
Jean (when did you change your name from Jan? :-) The simple answer is that this represents a problem communicating between the mcs process and the mss process. Of course, that doesn't answer the real question, which is "why?" Unfortunately, we'd need to do a fairly thorough examination of the log files, and even then, we might not figure out what went wrong. Some possibilities: * They might have been having network problems that were affecting this node - in general that seems unlikely, since restarting MailWorks shouldn't affect the network, but suppose some MailWorks process were having trouble connecting to the DSA, for instance, and were constantly retrying - that in turn could increase the traffic through the network adaptor, which in turn could cause problems if the network were very busy in other ways. * It's possible that the mss process had too many open files (a side effect of another problem that has been fixed in ECO4), since network connections use file handles, that could prevent the mss from accepting new connections. Did they try connecting to the mss server with a motif client or the command line? (That is one of the diagnostic steps we've suggested they take to help us narrow down which component is having problems.) I'm inclined to just wait to see if it happens again. If/when it happens again, ask them to run the suppscr.sh and send you or us the output. (That's a general observation, by the way - starting with V2.0, in the unsupported directory is a script suppscr.sh that takes a snapshot of important parameters of the system. If a customer runs that while a problem is happening, it may collect data that will help us diagnose the problem.) -Stan | |||||
1365.2 | Thanks Stan | VMSNET::J_COLBURN | Tue Apr 08 1997 14:06 | 8 | |
Stan, Thanks so much for this response. I'm going to send it to him along with recommendations that should the problem re-occur, he should run the script. Jan (not Jean...no name change here..) |