[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

3544.0. "Intermittent MCC errors - Explanation needed." by MICROW::SEVIGNY (Poultry-flavored toothpaste?) Wed Aug 12 1992 11:04

    
    We've written a set of tests using MCC and we have some intermittent
    (inconsistent) errors, and I was wondering if anyone has a hint as to
    what might be the cause of these errors.
    
    (We are using MCC T1.2.7 on Ultrix, and using DECDTM submitting the
    tests to an Ultrix system using a RISCserver).
    
    Often (90%), the first command issued after an enroll produces the
    following error:
    
    
    %MCC-E-RECEIVEERROR, error trying to receive a packet
    
    
    All subsequent directives operate without a problem.  My first thought
    was that after the enroll, it may take some time for the AM to
    initialize.  I added a delay of 15 seconds before issuing the first
    directive to the AM and it still (mostly) fails.
    
    Any idea what causes this?  I cannot reproduce it when I run the tests
    outside of DTM.
    
    ------------------------------------------------------------------------
    
    Secondly, In the middle of a 500-directive test, I received an unusual
    error that I have not yet seen:
    
    DECmcc (T1.2.7)
    
    Using default ALL IDENTIFIERS
    %MCC-F-FATAL, fatal DECmcc error
    %MCC-F-TRM_FAILURE, PM unable to continue
    %MCC-F-FATAL, fatal DECmcc error
    
    (The command which caused this was a "manage show tpcontroller '*'",
    but I don't think that is significant, since it is the first time I've
    seen it, even though I've issued this command many times.)
    
    Any clues, ideas, suggestions of things to investigate, etc.. woudl be
    greatly appreciated.
    
    Marc
    
T.RTitleUserPersonal
Name
DateLines
3544.1TOOK::SWISTJim Swist LKG2-2/T2 DTN 226-7102Wed Aug 12 1992 13:132
    setenv MCC_LOG 0x10000 and rerun the first test....
    
3544.2MICROW::LIMThu Aug 13 1992 10:2237
    I'm having the same problem in my regression test collection:
    %MCC-E-RECEIVEERROR, error trying to receive a packet.
    
    I enroll the mcc process in prologue, sleeps for 30 seconds.
    The first call to mcc in the first test fails with the error.  It does
    not happen always, but happens about 80 %.
    
    When I turned on logging, the following appears:
    
    %MCC-I-LOG, MCC_LOG = 10000
    RPC_LOG: REG CONN-OK: frm id=1, to id=16
    RPC_LOG: SEND: frm id=1, to id=16
    RPC_LOG: SEND: frm id=1, to id=16
    RPC_LOG: DISCONN-OK, id=1
    RPC_LOG: REG DISCONN: frm id=1, to id=16
    %MCC-E-RECEIVEERROR, error trying to receive a packet
    
    The next call, which succeeded, has the following log:
    DECmcc (T1.2.7)
    
    %MCC-I-LOG, MCC_LOG = 10000
    
    RPC_LOG: CONN-FAIL: frm id=1, to id=16
    RPC_LOG: DISCONN-OK, id=1
    RPC_LOG: REG DISCONN: frm id=1, to id=16
    Starting MM mcc_tps_am (enroll ID 16) from MM enroll id 1
    RPC_LOG: REG CONN-OK: frm id=1, to id=16
    RPC_LOG: SEND: frm id=1, to id=16
    RPC_LOG: SEND: frm id=1, to id=16
    RPC_LOG: RECV: frm id=16, to id=1
    
    TPCONTROLLER LOCAL_NS:.servershow
    AT YYYY-MM-DD-HH:MM:SS
    
    It appears the first call never started mcc_tps_am, but the second call
    did, but why?
                         
3544.3Please try the V1.2 SSB kitTOOK::GUERTINIt fall down, go boomThu Aug 13 1992 12:464
    I believe there are a couple of bug fixes in V1.2.0 that could solve
    your receive errors.
    
    -Matt.
3544.4MICROW::LIMThu Aug 13 1992 14:113
I'm just trying to understand... why does this problem  happen?

- Kyungae
3544.5Bug somewhereTOOK::MINTZErik Mintz, dtn 226-5033Thu Aug 13 1992 15:076
This can happen if a management module crashes.
However, there were some problems in the T1.2.7 RPC mechanism;
that is why Matt suggests that you upgrade.

-- Erik

3544.6will MCC set $status?MACROW::LIMFri Aug 14 1992 11:264
    If %MCC-E-RECEIVEERROR is returned, will $status be set to a certain
    value?
    
    Kyungae
3544.7even worse with new kit....MICROW::SEVIGNYWed Aug 26 1992 15:568
    
    We took the advice, and upgraded to 1.2
    
    The "Error receiving packet" occurs more frequently now.  What should
    we do now?
    
    Marc
    
3544.8Something weird must be happening at enrollmentTOOK::GUERTINIt fall down, go boomWed Aug 26 1992 16:3318
    Why not set the log bit for the background process?  Perhaps the MM is
    self-destructing.  The foreground process (FCL) seems to be behaving
    itself.  I'm assuming that the FCL process is starting the background
    MM during enrollment.  The MM must be dying because later when you try
    to access it again, a message is displayed saying that it is starting
    it up (again).
    
    Also you could try running the MM in the foreground:
    
    % /<your-path>/<your-MM-name> 16 Y
                                  ^  ^
                                  |  |
         your MM's enrollment id -+  |
                                     +-- Enroll the MM [Y/N]?
    
    See if any error messages get displayed.
    
    -Matt.
3544.9MICROW::SEVIGNYThu Aug 27 1992 14:4712
    
    Well, I took your advice, and set the MCC_LOC env var, and reran the
    tests while running the AM in the foreground.  Almost every request to
    the AM resulted in a segmentation fault.  
    
    When I used the debugger to determine where the AM dies, it seems to
    often die in MCC's RPC. 
    
    I hope that it is safe to assume that there are no incompatibilities
    between MCC's RPC and DCE's RPC, right?  Because the AM uses DCE RPC to
    communicate to the agent.  Sound suspicious?
    
3544.10Dispatch table in synch?TOOK::MINTZErik Mintz, dtn 226-5033Thu Aug 27 1992 14:529
Are you sure that the version of the AM that you are running is
EXACTLY the same as the one last enrolled?  And are you sure that
nobody else is writing to your dispatch table?

This kit of symptom often happens when the dispatch table gets out
of synch with the module (eg when the module is re-linked).

-- Erik

3544.11MICROW::SEVIGNYThu Aug 27 1992 17:0037
    
    
     These are the steps that I took.
    
    1. Log onto a node with NO other users.
    2. manage enroll mcc_tps_am (MCC_MMEXE_LOCATION points to a known
    executable).
    3. manually kill the AM. (kill <pid>)
    4. make sure there are no other AM running. (there were none)
    5. setenv MCC_LOG 0x10000
    6. mcc_tps_am 16 Y &  
    7. sleep 10  (give it some time to initialize)
    8. manage create ......
    
    When I look at /usr/mcc/mcc_system, I notice that
    mcc_dispatch_table.dat has been updated.  So I assume that the enroll
    did what it was supposed to do.
    
    This is the result of the most recent fault:  (doesn't look very
    meaningful to me)
    
    dbx /pdir/ptpmresults/cd5a_debug/mcc_tps_am
    core
    dbx version 2.10.1
    Type 'help' for help.
    Corefile produced from file "mcc_fcl_pm"
    Child died at pc 0x48050c of signal : Segmentation fault
    reading symbolic information ...
    warning: volatile variable in symbol table -- $datacache zeroed
    
    [using memory image in core]
    (dbx) t
    >  0 validate_time_now(0x0, 0x0, 0x0, 0x0, 0x0)
    ["mcc_desframe_internal.c":1895]
    (dbx)
    
    
3544.12MICROW::SEVIGNYThu Aug 27 1992 17:316
    
    Again, I don't have to worry about CMA compatibility, do I?  The AM
    links in DECthreads version V1.10-030.
    
    I seem to remember hearing that DCE and MCC need to be in synch.
    
3544.13In case this mattersMICROW::SEVIGNYUnity without UniformityFri Aug 28 1992 11:3818
    
    As an addendum, I might warn you that our test scripts are written in
    this manner:
    
    #!/bin/csh              
    source $TSRC/env_vars_setup.csh
    manage enroll mcc_tps_am
    
    manage create entity1 name1 attr1=foo attr2=bar
    
    manage show entity1  name1 attr1
    
    manage set entity1 name1 attr2=junk
    
    .
    .
    .
    
3544.14TOOK::SWISTJim Swist LKG2-2/T2 DTN 226-7102Fri Aug 28 1992 12:4312
    I'm not sure it matters but why write your scripts so that you bring
    FCL up and down completely for each command?
    
    manage <<%
    enroll....
    set...
    show...
    %
    
    would be a lot more efficient.
    
    
3544.15MICROW::SEVIGNYUnity without UniformityFri Aug 28 1992 14:434
    
    We intersperse comment-like "echos" between the commands.
    
    
3544.16MICROW::SEVIGNYMon Aug 31 1992 10:3410
    
    
    Is there any further information that I can provide that would help to
    diagnose this problem?  We are really desperate...  Our deadlines are
    being affected by not being able to close this issue.
    
    Thanks,
    
    Marc
    
3544.17QAR 3395TOOK::MINTZErik Mintz, dtn 226-5033Mon Aug 31 1992 10:432
Entered as QAR 3395 at high priority