[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference virke::mrmemo

Title:VAX MAILGATE for MEMO
Moderator:STKHLM::OLSSON
Created:Sat Feb 25 1989
Last Modified:Tue May 14 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:216
Total number of notes:933

163.0. "MRMEMO consuming whole CPU power" by HAN01::FISCHER_HE () Wed Feb 24 1993 14:12

    
    Hi,
    
    we had some trouble with the MRMEMO connection in the case
    that the IBM side had changed the MEMO/GWY environment and
    caused UNBINDS with Sense Codes "8000"/"0857"....
    It seems that there was no clean shutdown on their side
    and so their message queue fetched corrupted mails. After purging
    the actual outstanding message on their side the MRMEMO
    server was running as usual.
    Thats not a big problem if there were not the fact that
    the MRMEMO server in that situation runs in an CPU loop
    consuming almost whole the CPU of our 6610. A1 housekeeping
    couldn't run in an usual way and A1 was kept shut until the
    next morning when users wanted to log in.
    There is no way of reproducing what was in the message queue
    on the IBM (no MEMO test system) and what was send to us.
    I told the VW people that there are little chances to find a solution.
    But they want an official statement from engineering.
    
    I think its an similar case as mentioned in note 151
    with binary messages but in our case it is much more undefined
    as we don't know how corrupt the message was and what
    protocol convention was lost. What is your advice ?
    
    best regard
    Heinz Fischer 
T.RTitleUserPersonal
Name
DateLines
163.1Lower priority avoids hoggingSTKHLM::OLSSONAnders Olsson, SIP SwedenWed Feb 24 1993 17:5359
    Hello Heinz,

.0> we had some trouble with the MRMEMO connection in the case
.0> that the IBM side had changed the MEMO/GWY environment and
.0> caused UNBINDS with Sense Codes "8000"/"0857"....

    "8000" - sounds strange

.0> It seems that there was no clean shutdown on their side
.0> and so their message queue fetched corrupted mails. After purging
.0> the actual outstanding message on their side the MRMEMO
.0> server was running as usual.

    That's a relief to hear.

.0> Thats not a big problem if there were not the fact that
.0> the MRMEMO server in that situation runs in an CPU loop
.0> consuming almost whole the CPU of our 6610. A1 housekeeping
.0> couldn't run in an usual way and A1 was kept shut until the
.0> next morning when users wanted to log in.

    Was anything written in the MRMEMO log file in this loop (should have
    produced a huge log file)?

    Did anyone check what the server was doing (e.g. with SHOW PROC/CONT)?

.0> There is no way of reproducing what was in the message queue
.0> on the IBM (no MEMO test system) and what was send to us.
.0> I told the VW people that there are little chances to find a solution.

    You are right that the chance of finding a solution is small. If you
    have the MRMEMO log file from this occasion, could you make it available
    for me to look at? 

.0> But they want an official statement from engineering.

    There is no official engineering group. But if there was one, I 
    would probably be in it. :-)
    
.0> I think its an similar case as mentioned in note 151
.0> with binary messages but in our case it is much more undefined
.0> as we don't know how corrupt the message was and what
.0> protocol convention was lost. What is your advice ?

    The MRMEMO server usually runs at base priority 5. This isn't really
    necessary so if there is concern about MRMEMO eating all CPU, the
    priority can be changed (/prio= in MRMEMOLOGIN.COM). 

    You are right that the chance of finding a solution is small. Lowering
    the priority does however solve, or at least reduce, the problem of
    MRMEMO taking all CPU. 

    If the CPU loop situation, however unlikely, *does* appear again, use
    SHOW PROC/CONT to check what is going on (any IOs or pagefaults?). Also
    from SHOW PROC/CONT - write down as many "Current PC" values as possible
    to enable the loop path to be found. Use RUN MRMSRV/DEBUG and the debug
    command SHOW IMAGE to get a map of the address space. 

    Anders Olsson, [unofficial] MRMEMO developer
163.2log files HAN01::FISCHER_HETue Mar 02 1993 10:28102
    
    Hi Anders,
    
    as nobody was aware of getting 'debugging' info 
    in that situation there is no other intormation than
    the log files (see below). I hope we find more time
    to determine the problem next time. So just for looking
    at the strange sense code: (after stop/id on the process
    the second log file shows no connection at all !?)
    

$!..
$!.=	Central user login procedure					     !17
$!.<	----------------------------					     !5 
Login of user MRMEMO on OAVAX3 at 11-FEB-1993 20:52:03.08 Mode=OTHER
$	
$	IF "".NES."" THEN LOGOUTNOW
$	IF "".EQS."NL:" .OR. "".NES."1" -
	   THEN SET CONTROL=(T,Y)
$	
$	IF Remote_or_Dialup THEN TMP=F$SETPRV("NONETMBX,NETMBX")
$!
$	IF Prot_User-Username_.NES.Prot_User THEN say f$cvtime()," EXIT"
$	EXIT ! Central user login procedure				     !18
$!M r m e m o . C o m  - comments and copyright notice at the end
$!
$ set noon
$ set verify
$!
$! eco18 start of fix     
$!
$ prcnam = f$getjpi (0,"PRCNAM")
$ srvnum = f$extract (14,1,prcnam)
$ if f$search("SRV2.dir") .eqs. "" then goto nosrvdir
$ defdir = "[.SRV2]"
$ set default [.SRV2]
$ nosrvdir:
$!
$! eco18 end of fix     
$!
$ define tt nl:
$ run/nodebug sys$system:mrmsrv
Time: 1993-02-11 20:52:06.67; message from server MRMEMO2:
%MRMEMO-I-NEWACCOUNT, opening new accounting file: FEBK_COMMON:[MRMEMO]MRMEMOACC2.DAT;1273
%SNA-I-UNBINDREC, UNBIND received from IBM application
Time: 1993-02-11 22:11:58.12; message from server MRMEMO2:
%MRMEMO-F-UNBIND, UNBIND request received from MEMO Gateway
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-I-UNBINDREC, UNBIND received from IBM application
Time: 1993-02-11 22:16:00.51; message from server MRMEMO2:
%MRMEMO-F-UNBIND, UNBIND request received from MEMO Gateway
%SNA-I-UNBINDREC, UNBIND received from IBM application
Time: 1993-02-12 00:56:30.24; message from server MRMEMO2:
%MRMEMO-F-UNBIND, UNBIND request received from MEMO Gateway
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'8000'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'087D'
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
%SNA-I-UNBINDREC, UNBIND received from IBM application
Time: 1993-02-12 01:01:15.20; message from server MRMEMO2:
%MRMEMO-F-UNBIND, UNBIND request received from MEMO Gateway
%SNA-E-CONREQREJ, connect request rejected by IBM host, sense code %X'0857'
    
    
    
################################  second log ######################
    

$!..
$!.=	Central user login procedure					     !17
$!.<	----------------------------					     !5 
Login of user MRMEMO on OAVAX3 at 12-FEB-1993 07:03:07.86 Mode=OTHER
$	
$	IF "".NES."" THEN LOGOUTNOW
$	IF "".EQS."NL:" .OR. "".NES."1" -
	   THEN SET CONTROL=(T,Y)
$	
$	IF Remote_or_Dialup THEN TMP=F$SETPRV("NONETMBX,NETMBX")
$!
$	IF Prot_User-Username_.NES.Prot_User THEN say f$cvtime()," EXIT"
$	EXIT ! Central user login procedure				     !18
$!M r m e m o . C o m  - comments and copyright notice at the end
$!
$ set noon
$ set verify
$!
$! eco18 start of fix     
$!
$ prcnam = f$getjpi (0,"PRCNAM")
$ srvnum = f$extract (14,1,prcnam)
$ if f$search("SRV2.dir") .eqs. "" then goto nosrvdir
$ defdir = "[.SRV2]"
$ set default [.SRV2]
$ nosrvdir:
$!
$! eco18 end of fix     
$!
$ define tt nl:
$ run/nodebug sys$system:mrmsrv
Time: 1993-02-12 07:03:13.15; message from server MRMEMO2:
%MRMEMO-I-NEWACCOUNT, opening new accounting file: FEBK_COMMON:[MRMEMO]MRMEMOACC2.DAT;1274