T.R | Title | User | Personal Name | Date | Lines |
---|
3544.1 | | TOOK::SWIST | Jim Swist LKG2-2/T2 DTN 226-7102 | Wed Aug 12 1992 13:13 | 2 |
| setenv MCC_LOG 0x10000 and rerun the first test....
|
3544.2 | | MICROW::LIM | | Thu Aug 13 1992 10:22 | 37 |
| I'm having the same problem in my regression test collection:
%MCC-E-RECEIVEERROR, error trying to receive a packet.
I enroll the mcc process in prologue, sleeps for 30 seconds.
The first call to mcc in the first test fails with the error. It does
not happen always, but happens about 80 %.
When I turned on logging, the following appears:
%MCC-I-LOG, MCC_LOG = 10000
RPC_LOG: REG CONN-OK: frm id=1, to id=16
RPC_LOG: SEND: frm id=1, to id=16
RPC_LOG: SEND: frm id=1, to id=16
RPC_LOG: DISCONN-OK, id=1
RPC_LOG: REG DISCONN: frm id=1, to id=16
%MCC-E-RECEIVEERROR, error trying to receive a packet
The next call, which succeeded, has the following log:
DECmcc (T1.2.7)
%MCC-I-LOG, MCC_LOG = 10000
RPC_LOG: CONN-FAIL: frm id=1, to id=16
RPC_LOG: DISCONN-OK, id=1
RPC_LOG: REG DISCONN: frm id=1, to id=16
Starting MM mcc_tps_am (enroll ID 16) from MM enroll id 1
RPC_LOG: REG CONN-OK: frm id=1, to id=16
RPC_LOG: SEND: frm id=1, to id=16
RPC_LOG: SEND: frm id=1, to id=16
RPC_LOG: RECV: frm id=16, to id=1
TPCONTROLLER LOCAL_NS:.servershow
AT YYYY-MM-DD-HH:MM:SS
It appears the first call never started mcc_tps_am, but the second call
did, but why?
|
3544.3 | Please try the V1.2 SSB kit | TOOK::GUERTIN | It fall down, go boom | Thu Aug 13 1992 12:46 | 4 |
| I believe there are a couple of bug fixes in V1.2.0 that could solve
your receive errors.
-Matt.
|
3544.4 | | MICROW::LIM | | Thu Aug 13 1992 14:11 | 3 |
| I'm just trying to understand... why does this problem happen?
- Kyungae
|
3544.5 | Bug somewhere | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Thu Aug 13 1992 15:07 | 6 |
| This can happen if a management module crashes.
However, there were some problems in the T1.2.7 RPC mechanism;
that is why Matt suggests that you upgrade.
-- Erik
|
3544.6 | will MCC set $status? | MACROW::LIM | | Fri Aug 14 1992 11:26 | 4 |
| If %MCC-E-RECEIVEERROR is returned, will $status be set to a certain
value?
Kyungae
|
3544.7 | even worse with new kit.... | MICROW::SEVIGNY | | Wed Aug 26 1992 15:56 | 8 |
|
We took the advice, and upgraded to 1.2
The "Error receiving packet" occurs more frequently now. What should
we do now?
Marc
|
3544.8 | Something weird must be happening at enrollment | TOOK::GUERTIN | It fall down, go boom | Wed Aug 26 1992 16:33 | 18 |
| Why not set the log bit for the background process? Perhaps the MM is
self-destructing. The foreground process (FCL) seems to be behaving
itself. I'm assuming that the FCL process is starting the background
MM during enrollment. The MM must be dying because later when you try
to access it again, a message is displayed saying that it is starting
it up (again).
Also you could try running the MM in the foreground:
% /<your-path>/<your-MM-name> 16 Y
^ ^
| |
your MM's enrollment id -+ |
+-- Enroll the MM [Y/N]?
See if any error messages get displayed.
-Matt.
|
3544.9 | | MICROW::SEVIGNY | | Thu Aug 27 1992 14:47 | 12 |
|
Well, I took your advice, and set the MCC_LOC env var, and reran the
tests while running the AM in the foreground. Almost every request to
the AM resulted in a segmentation fault.
When I used the debugger to determine where the AM dies, it seems to
often die in MCC's RPC.
I hope that it is safe to assume that there are no incompatibilities
between MCC's RPC and DCE's RPC, right? Because the AM uses DCE RPC to
communicate to the agent. Sound suspicious?
|
3544.10 | Dispatch table in synch? | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Thu Aug 27 1992 14:52 | 9 |
| Are you sure that the version of the AM that you are running is
EXACTLY the same as the one last enrolled? And are you sure that
nobody else is writing to your dispatch table?
This kit of symptom often happens when the dispatch table gets out
of synch with the module (eg when the module is re-linked).
-- Erik
|
3544.11 | | MICROW::SEVIGNY | | Thu Aug 27 1992 17:00 | 37 |
|
These are the steps that I took.
1. Log onto a node with NO other users.
2. manage enroll mcc_tps_am (MCC_MMEXE_LOCATION points to a known
executable).
3. manually kill the AM. (kill <pid>)
4. make sure there are no other AM running. (there were none)
5. setenv MCC_LOG 0x10000
6. mcc_tps_am 16 Y &
7. sleep 10 (give it some time to initialize)
8. manage create ......
When I look at /usr/mcc/mcc_system, I notice that
mcc_dispatch_table.dat has been updated. So I assume that the enroll
did what it was supposed to do.
This is the result of the most recent fault: (doesn't look very
meaningful to me)
dbx /pdir/ptpmresults/cd5a_debug/mcc_tps_am
core
dbx version 2.10.1
Type 'help' for help.
Corefile produced from file "mcc_fcl_pm"
Child died at pc 0x48050c of signal : Segmentation fault
reading symbolic information ...
warning: volatile variable in symbol table -- $datacache zeroed
[using memory image in core]
(dbx) t
> 0 validate_time_now(0x0, 0x0, 0x0, 0x0, 0x0)
["mcc_desframe_internal.c":1895]
(dbx)
|
3544.12 | | MICROW::SEVIGNY | | Thu Aug 27 1992 17:31 | 6 |
|
Again, I don't have to worry about CMA compatibility, do I? The AM
links in DECthreads version V1.10-030.
I seem to remember hearing that DCE and MCC need to be in synch.
|
3544.13 | In case this matters | MICROW::SEVIGNY | Unity without Uniformity | Fri Aug 28 1992 11:38 | 18 |
|
As an addendum, I might warn you that our test scripts are written in
this manner:
#!/bin/csh
source $TSRC/env_vars_setup.csh
manage enroll mcc_tps_am
manage create entity1 name1 attr1=foo attr2=bar
manage show entity1 name1 attr1
manage set entity1 name1 attr2=junk
.
.
.
|
3544.14 | | TOOK::SWIST | Jim Swist LKG2-2/T2 DTN 226-7102 | Fri Aug 28 1992 12:43 | 12 |
| I'm not sure it matters but why write your scripts so that you bring
FCL up and down completely for each command?
manage <<%
enroll....
set...
show...
%
would be a lot more efficient.
|
3544.15 | | MICROW::SEVIGNY | Unity without Uniformity | Fri Aug 28 1992 14:43 | 4 |
|
We intersperse comment-like "echos" between the commands.
|
3544.16 | | MICROW::SEVIGNY | | Mon Aug 31 1992 10:34 | 10 |
|
Is there any further information that I can provide that would help to
diagnose this problem? We are really desperate... Our deadlines are
being affected by not being able to close this issue.
Thanks,
Marc
|
3544.17 | QAR 3395 | TOOK::MINTZ | Erik Mintz, dtn 226-5033 | Mon Aug 31 1992 10:43 | 2 |
| Entered as QAR 3395 at high priority
|