T.R | Title | User | Personal Name | Date | Lines |
---|
101.1 | | HERR::crosbie | Graham Crosbie @PCS DTN 873-4193 | Mon Sep 06 1993 09:53 | 38 |
| Hi Alan,
>Xerox is running into some sporadic problems when writing out to a FIFO
>shared by two KAV30's.
>He has been getting a KAV30$_BUS_WRT_ERROR on a KAV$BUS_WRITE sporadically.
>So he put the bus_write into a loop until it succeeded. When he did this,
>he would eventually run into the following:
>%SYSTEM-F-BREAK, breakpoint fault at PC=7FFFFD84, PSL=00000000
>Module name = KAV30JACKET
>Routine of Psect Name = $AAAA_VECTORS
>Line =
>Rel PC = 107
>Abs PC = D707
>7FFFFD84: BPT
>In looking at the KAV30JACKET.LIS source file, I can't find any reference to
>a bpt instruction.
Is the customer, running this program under the debugger? If so, has he set
a break-point or is he single stepping into the KAV$BUS_WRITE routine?
Is this a user mode or kernel mode application. Also what version of the KAV30
software and VAXELN is the customer using (KAV30JACKET.MAR does not exist in
VAXELN V4.4, its name changed to 3KVJACKET.MAR)?
>I have asked him to not put this in a tight loop in case he is inadvertantly
>corrupting something.
>However, is there any cause for this?
I can not think of anything that could be causing this off the top of my head.
Graham
|
101.2 | Using both v4.3 or v4.4 for different targets | ZYDECO::BODA | Realtime Products Support | Wed Sep 08 1993 21:18 | 34 |
| Hi Graham,
Thanks for the reply.
The customer has not set any breakpoints nor is he single stepping. He does
have it under warm debug.
He is running in kernel mode (PSL was 0).
Just found out that they *are* using VAXELN V4.3 with the 1.1 KAV kit. Up to
this point they kept saying they were using V4.4 with their KAV targets. Turns
out that this is only with their VAX 4000 model 60 targets.
Ok given that revelation, how could one hit a BPT in module KAV30JACKET - short
of corrupting the PC?
Going back to a discussion we had last year regarding AST's, you mentioned
that an AST should not signal an object that the process who "owns" the
AST is waiting upon as this could result in an internal race condition
(where the owner process is defined as the one who called KAV$IN_MAP with
location monitoring or called KAV$SET_AST.) If one has a dummy process
to do the setup of an AST, can one suspend that dummy process or have it
wait forever on some unsigned object and have AST's set up by the dummy
process still delivered?
The customer has now asked us to demonstrate to them how the internal race
condition can occur if they have an AST (that was setup by the a
process A) signal an event that process A is waiting upon. We will
attempt to do so. But do you have a more detailed explanation of why this
internal race condition can occur?
Thanks again Graham for your help!
Alan
|
101.3 | | HERR::crosbie | Graham Crosbie @PCS DTN 873-4193 | Thu Sep 09 1993 09:57 | 51 |
|
Hi Alan,
>The customer has not set any breakpoints nor is he single stepping. He does
>have it under warm debug.
>He is running in kernel mode (PSL was 0).
>Just found out that they *are* using VAXELN V4.3 with the 1.1 KAV kit. Up to
>this point they kept saying they were using V4.4 with their KAV targets. Turns
>out that this is only with their VAX 4000 model 60 targets.
Is there any reason why they aren't using 4.4 for the KAV30s, after all it
is the supported version. I totally re-worked the system service dispatch
mechanism for KAV30 system services in 4.4
>Ok given that revelation, how could one hit a BPT in module KAV30JACKET -
> short of corrupting the PC?
There are no breakpoints in KAV30JACKET, all of the XDELTA breakpoints in
the KAV30 kernel should have been commented out by the build procedure (from
the 1.1 source pool it looks like the build procedure did this correctly). I
would suspect that the PC may have been corrupted. Can the customer reproduce
this behaviour with a small application?
>Going back to a discussion we had last year regarding AST's, you mentioned
>that an AST should not signal an object that the process who "owns" the
>AST is waiting upon as this could result in an internal race condition
>(where the owner process is defined as the one who called KAV$IN_MAP with
>location monitoring or called KAV$SET_AST.) If one has a dummy process
>to do the setup of an AST, can one suspend that dummy process or have it
>wait forever on some unsigned object and have AST's set up by the dummy
>process still delivered?
Yes, one can use the technique you describe to work around this restriction.
>The customer has now asked us to demonstrate to them how the internal race
>condition can occur if they have an AST (that was setup by the a
>process A) signal an event that process A is waiting upon. We will
>attempt to do so. But do you have a more detailed explanation of why this
>internal race condition can occur?
I'll mail you what I have at the moment and what I can remember.
>Thanks again Graham for your help!
You're welcome,
Graham
--------
|
101.4 | Analyzing the error log info. | ZYDECO::REDDY | | Fri Sep 10 1993 18:57 | 27 |
|
The customer enabled error logging. The error logger has the following for
the master:
5C4C Status Code(VME/VSB write error occured)
39C0000B VME/VSB Master AM & Error Code and Retry Count
00924000 VME/VSB address accessed
8000B590 PC (KAV$WRITE_LONGWORD)
00080001 PSL
The error code
39 AM modifier
C0 bit 6 set - KAV30 had control of the bus
bit 7 set - KAV30 was performing a read operation when the error occurred.
Am I reading this right? I would have expected the bit 7 to be clear.
On the slave side error logger, we get 00001038, which if I am reading correctly
tells me, since bit 5 is 0, that KAV30 was performing a a read when the error
occurred.
Sumithra
|
101.5 | | HERR::crosbie | Graham Crosbie @PCS DTN 873-4193 | Wed Sep 15 1993 10:16 | 33 |
| Hi Sumithra,
>The customer enabled error logging. The error logger has the following for
>the master:
>5C4C Status Code(VME/VSB write error occured)
>39C0000B VME/VSB Master AM & Error Code and Retry Count
>00924000 VME/VSB address accessed
>8000B590 PC (KAV$WRITE_LONGWORD)
>00080001 PSL
>The error code
>39 AM modifier
>C0 bit 6 set - KAV30 had control of the bus
> bit 7 set - KAV30 was performing a read operation when the error occurred.
>Am I reading this right? I would have expected the bit 7 to be clear.
It looks like you're interpreting the information correctly, and your
expectation is also correct bit 7 should be clear for a write operation.
I'm not familiar enough with the error-logging code in the KAV30 kernel to
provide an explanation for the behaviour that you have reported. If you
consider this behaviour to be a real problem, please escalate the problem
through the official channels for further investigation.
Graham
|
101.6 | bus writes and name service? | ZYDECO::REDDY | | Mon Oct 18 1993 22:59 | 13 |
|
XEROX ran into the problem mentioned in the note .0 regarding the
jacket routine. The bus_write fails if they include name service. If
the name service is removed, then the bus write completes with success.
They tell me that they cannot move to 4.4 for sometime. Can anyone
tell me if there is any connection between enabling name service and
kav bus writes?
Thanks,
Sumithra
|
101.7 | It is all in the documentation.... | BAYERN::WOLFF | Conformism is for little minds. | Tue Oct 19 1993 12:22 | 14 |
| Sumithra,
I my opinion there are some incorrect settings in the Ebuild data files,
either you got too little system IO region which under V4.3 has to be totally
different then under V4.4 (See Docu) or you are running out of P0/P1 space or
you just plain running out of memory. Are all the settings in EBUILD correct?
Did you ever check the Ebuild .DAT file against the Documentation (I think it is
chapter 5 of Prog/Ref manual) Does the customer have clue how much memory he is
using?
There is one thing very clear to me the KAV bus write routines have nothing to
do with the name service - so there is another reason, and I suspect either
incorrect or too low settings in Ebuild or too little physical memory.
Julian.
|
101.8 | Thanks | ZYDECO::REDDY | | Tue Oct 19 1993 13:30 | 5 |
|
Thanks, Julian. I don't see how there could be any connection between
bus write and name service. I will check out their other parameters.
Sumithra
|