[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

3544.0. "Intermittent MCC errors - Explanation needed." by MICROW::SEVIGNY (Poultry-flavored toothpaste?) Wed Aug 12 1992 11:04

    
    We've written a set of tests using MCC and we have some intermittent
    (inconsistent) errors, and I was wondering if anyone has a hint as to
    what might be the cause of these errors.
    
    (We are using MCC T1.2.7 on Ultrix, and using DECDTM submitting the
    tests to an Ultrix system using a RISCserver).
    
    Often (90%), the first command issued after an enroll produces the
    following error:
    
    
    %MCC-E-RECEIVEERROR, error trying to receive a packet
    
    
    All subsequent directives operate without a problem.  My first thought
    was that after the enroll, it may take some time for the AM to
    initialize.  I added a delay of 15 seconds before issuing the first
    directive to the AM and it still (mostly) fails.
    
    Any idea what causes this?  I cannot reproduce it when I run the tests
    outside of DTM.
    
    ------------------------------------------------------------------------
    
    Secondly, In the middle of a 500-directive test, I received an unusual
    error that I have not yet seen:
    
    DECmcc (T1.2.7)
    
    Using default ALL IDENTIFIERS
    %MCC-F-FATAL, fatal DECmcc error
    %MCC-F-TRM_FAILURE, PM unable to continue
    %MCC-F-FATAL, fatal DECmcc error
    
    (The command which caused this was a "manage show tpcontroller '*'",
    but I don't think that is significant, since it is the first time I've
    seen it, even though I've issued this command many times.)
    
    Any clues, ideas, suggestions of things to investigate, etc.. woudl be
    greatly appreciated.
    
    Marc

T.R	Title	User	Personal Name	Date	Lines
3544.1		TOOK::SWIST	Jim Swist LKG2-2/T2 DTN 226-7102	`Wed Aug 12 1992 13:13`	2
	setenv MCC_LOG 0x10000 and rerun the first test....
3544.2		MICROW::LIM		`Thu Aug 13 1992 10:22`	37
	I'm having the same problem in my regression test collection: %MCC-E-RECEIVEERROR, error trying to receive a packet. I enroll the mcc process in prologue, sleeps for 30 seconds. The first call to mcc in the first test fails with the error. It does not happen always, but happens about 80 %. When I turned on logging, the following appears: %MCC-I-LOG, MCC_LOG = 10000 RPC_LOG: REG CONN-OK: frm id=1, to id=16 RPC_LOG: SEND: frm id=1, to id=16 RPC_LOG: SEND: frm id=1, to id=16 RPC_LOG: DISCONN-OK, id=1 RPC_LOG: REG DISCONN: frm id=1, to id=16 %MCC-E-RECEIVEERROR, error trying to receive a packet The next call, which succeeded, has the following log: DECmcc (T1.2.7) %MCC-I-LOG, MCC_LOG = 10000 RPC_LOG: CONN-FAIL: frm id=1, to id=16 RPC_LOG: DISCONN-OK, id=1 RPC_LOG: REG DISCONN: frm id=1, to id=16 Starting MM mcc_tps_am (enroll ID 16) from MM enroll id 1 RPC_LOG: REG CONN-OK: frm id=1, to id=16 RPC_LOG: SEND: frm id=1, to id=16 RPC_LOG: SEND: frm id=1, to id=16 RPC_LOG: RECV: frm id=16, to id=1 TPCONTROLLER LOCAL_NS:.servershow AT YYYY-MM-DD-HH:MM:SS It appears the first call never started mcc_tps_am, but the second call did, but why?
3544.3	Please try the V1.2 SSB kit	TOOK::GUERTIN	It fall down, go boom	`Thu Aug 13 1992 12:46`	4
	I believe there are a couple of bug fixes in V1.2.0 that could solve your receive errors. -Matt.
3544.4		MICROW::LIM		`Thu Aug 13 1992 14:11`	3
	I'm just trying to understand... why does this problem happen? - Kyungae
3544.5	Bug somewhere	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Thu Aug 13 1992 15:07`	6
	This can happen if a management module crashes. However, there were some problems in the T1.2.7 RPC mechanism; that is why Matt suggests that you upgrade. -- Erik
3544.6	will MCC set $status?	MACROW::LIM		`Fri Aug 14 1992 11:26`	4
	If %MCC-E-RECEIVEERROR is returned, will $status be set to a certain value? Kyungae
3544.7	even worse with new kit....	MICROW::SEVIGNY		`Wed Aug 26 1992 15:56`	8
	We took the advice, and upgraded to 1.2 The "Error receiving packet" occurs more frequently now. What should we do now? Marc
3544.8	Something weird must be happening at enrollment	TOOK::GUERTIN	It fall down, go boom	`Wed Aug 26 1992 16:33`	18
	Why not set the log bit for the background process? Perhaps the MM is self-destructing. The foreground process (FCL) seems to be behaving itself. I'm assuming that the FCL process is starting the background MM during enrollment. The MM must be dying because later when you try to access it again, a message is displayed saying that it is starting it up (again). Also you could try running the MM in the foreground: % /<your-path>/<your-MM-name> 16 Y ^ ^ \| \| your MM's enrollment id -+ \| +-- Enroll the MM [Y/N]? See if any error messages get displayed. -Matt.
3544.9		MICROW::SEVIGNY		`Thu Aug 27 1992 14:47`	12
	Well, I took your advice, and set the MCC_LOC env var, and reran the tests while running the AM in the foreground. Almost every request to the AM resulted in a segmentation fault. When I used the debugger to determine where the AM dies, it seems to often die in MCC's RPC. I hope that it is safe to assume that there are no incompatibilities between MCC's RPC and DCE's RPC, right? Because the AM uses DCE RPC to communicate to the agent. Sound suspicious?
3544.10	Dispatch table in synch?	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Thu Aug 27 1992 14:52`	9
	Are you sure that the version of the AM that you are running is EXACTLY the same as the one last enrolled? And are you sure that nobody else is writing to your dispatch table? This kit of symptom often happens when the dispatch table gets out of synch with the module (eg when the module is re-linked). -- Erik
3544.11		MICROW::SEVIGNY		`Thu Aug 27 1992 17:00`	37
	These are the steps that I took. 1. Log onto a node with NO other users. 2. manage enroll mcc_tps_am (MCC_MMEXE_LOCATION points to a known executable). 3. manually kill the AM. (kill <pid>) 4. make sure there are no other AM running. (there were none) 5. setenv MCC_LOG 0x10000 6. mcc_tps_am 16 Y & 7. sleep 10 (give it some time to initialize) 8. manage create ...... When I look at /usr/mcc/mcc_system, I notice that mcc_dispatch_table.dat has been updated. So I assume that the enroll did what it was supposed to do. This is the result of the most recent fault: (doesn't look very meaningful to me) dbx /pdir/ptpmresults/cd5a_debug/mcc_tps_am core dbx version 2.10.1 Type 'help' for help. Corefile produced from file "mcc_fcl_pm" Child died at pc 0x48050c of signal : Segmentation fault reading symbolic information ... warning: volatile variable in symbol table -- $datacache zeroed [using memory image in core] (dbx) t > 0 validate_time_now(0x0, 0x0, 0x0, 0x0, 0x0) ["mcc_desframe_internal.c":1895] (dbx)
3544.12		MICROW::SEVIGNY		`Thu Aug 27 1992 17:31`	6
	Again, I don't have to worry about CMA compatibility, do I? The AM links in DECthreads version V1.10-030. I seem to remember hearing that DCE and MCC need to be in synch.
3544.13	In case this matters	MICROW::SEVIGNY	Unity without Uniformity	`Fri Aug 28 1992 11:38`	18
	As an addendum, I might warn you that our test scripts are written in this manner: #!/bin/csh source $TSRC/env_vars_setup.csh manage enroll mcc_tps_am manage create entity1 name1 attr1=foo attr2=bar manage show entity1 name1 attr1 manage set entity1 name1 attr2=junk . . .
3544.14		TOOK::SWIST	Jim Swist LKG2-2/T2 DTN 226-7102	`Fri Aug 28 1992 12:43`	12
	I'm not sure it matters but why write your scripts so that you bring FCL up and down completely for each command? manage <<% enroll.... set... show... % would be a lot more efficient.
3544.15		MICROW::SEVIGNY	Unity without Uniformity	`Fri Aug 28 1992 14:43`	4
	We intersperse comment-like "echos" between the commands.
3544.16		MICROW::SEVIGNY		`Mon Aug 31 1992 10:34`	10
	Is there any further information that I can provide that would help to diagnose this problem? We are really desperate... Our deadlines are being affected by not being able to close this issue. Thanks, Marc
3544.17	QAR 3395	TOOK::MINTZ	Erik Mintz, dtn 226-5033	`Mon Aug 31 1992 10:43`	2
	Entered as QAR 3395 at high priority