[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 49.910::kav30

Title:VAX on VMEbus: KAV30
Notice:Could have been as fast as 68K but its a VAX!
Moderator:CSSVMS::KAV30_SUPP
Created:Thu Apr 18 1991
Last Modified:Fri Aug 02 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:159
Total number of notes:645

109.0. "KAV$BUS_WRITE problems" by ZYDECO::REDDY () Mon Dec 13 1993 21:24

I would like to know under what conditions calling KAV$BUS_WRITE would result
in the system rebooting itself.

I have a customer who is accessing a VME D/A card.  When he does a bus write
and the card is there, no problems.  When he does the KAV$BUS_WRITE and the
card is not there, sometimes the KAV$BUS_WRITE results in the system reboot.
No exceptions or machine checks.  If the code was compiled with a /noopt the
bus write to a card that is not there works.  If the code is compiled with a
/OPT it works some times and it causes a reboot other times.  

Thanks,

Sumithra

    
T.RTitleUserPersonal
Name
DateLines
109.1ZYDECO::REDDYTue Dec 14 1993 02:4519
KAVSYS_VME.MAR in the eln directory has the following.  What CVAX chip
problem is it talking about and how does it impact a kav$bus_write.

;++
; This code is a workaround for a CVAX chip problem. Can otherwise cause
; double error halt!
	BBC	#KAV$V_BLOCK_IRQ, -
		DATA_TYPE(AP), 12$	; Br if not raise IPL
	SETIPL	#IPL$K_POWER		; Block all device interrupts
	BRB	15$
.
.
.

Thanks,

Sumithra

    
109.2never write to non-existent VMEbus addresse repeatedly...GOBANG::LEMMERTue Dec 14 1993 11:5463
.re 0

	Under NO circumstances a KAV30$BUS_WRITE should cause the system to
	reboot. Nevertheless if you are accesing 'non-existent' VMEadresses
	there may be some problems:

	Due to the 'disconnected writes' of the cvax an error at the VMEbus
	(which could be a timeout) is detected a long time after the write 
	has been posted, therefore this error is handled in the machine check
	handler (if you use the KAV30$BUS_WRITE routine, but if you write 
	directly, nothing is handled...). This is not a problem, as long as
	you do not write for many times to the non exixtent address.

	Now if you do this writing in a tight loop for about 1M times and at
	maximum speed, the rtVAX300 may fail over with:

	02 DBL ERR halt

	This is caused by a problem in the rtVAX microcode.

	For 'real' applications, this should not be a problem: either your
	VMEboard is there, then everything is ok. If it's not there,
	'probing' for one time should be enough to realize that the board is 
	not there and you should never retry to write to this location.

	The differences in the behaviour using/not using the /OPT switch may
	be caused by the compiler, especially check your return arguments and
	make them 'volatile' or something like this (we have discovered a lot
	of 'weird' optimisations with the 'C' compiler....)

.re 1

	The cvax bug is the above described one, but we should not discuss
	the details here in public... This bug can be seen with all cvax 
	implementations, but it only causes a problem on the KAV30, since
	only on VMEbus you may have an 'outstanding' write waiting for a long
	time period (caused by retries on the VME).

	The code path you have depicted there shows the usage of the
	undocumented qualifier 'BLOCK_IRQ' which can be used with the 
	KAV30$BUS_WRITE service to block any other interrupt while a write to
	the VMEbus is pending. This qualifier was implemented for debug purposes
	only.

	WE STRONGLY DISCOURAGE YOU FROM USING THIS QUALIFIER, SINCE IT MAY
	CAUSE UNPREDICTED BEHAVIOUR OF YOUR APPLICATION. IT RAISES IPL TO POWER
	AND STAYS THERE FOR A 'LONG' (SEVERAL 10TH OF MICROSECONDS) TIME.

	The usage of this qualifier lowers the probabillity of getting the
	'double error halt', but it does not cure the problem. If you write
	long enough and with high speed to a non-existent VMEbus address, you
	may see it again.

	The only possible workaround is:

	Once you have detected a non existent address on the VMEbus by getting
	back the 'VMEbus write error' (don't remember the correct writing..)
	from the KAV30$BUS_WRITE service, never write to this address again.


Best regards,

	Thomas
109.3ZYDECO::REDDYTue Dec 14 1993 18:0636
Thomas,

Thanks for responding back.  The program KAVSYS_VME.MAR is included in the
ELN kit that is sent to customers.  My customer looked at the comments and 
wanted to know what this was all about.

The following is listed in several places in that file:

; This code is a workaround for a CVAX chip problem. Can otherwise cause
; double error halt!

My customer (Corning) was able to get their application work consistently
by using KAV$M_BLOCK_IRQ in their KAV$BUS_WRITE.  It is possible that while
his write to the non-existent board was pending, another interrupt came in
and caused the system halt.  They have two VME D/A boards, one will always
be present and the second may or may not be there.  Sometimes they have
gotten the right status back in their status argument.

They will use the kav$bus_write routine once to check to see if the board is
there or not.  (He tells me that they cannot do reads to determine this.)
Do you still think it is dangerous to use the KAV$M_BLOCK_IRQ?  He said that
he will use the "volatile" attribute and see what happens.

If the only workaround for them is to use the KAV$M_BLOCK_IRQ with their
KAV$BUS_WRITE and they are only going to use this routine once, what kind
of problems do you think they would run into?

If we do not want the customers to know about the CVAX problem or 
KAV$M_BLOCK_IRQ why was the KAVVME_SYS.MAR program included in the kit?


Thanks for all the help,

Sumithra
    
109.4BAYERN::WOLFFConformism is for little minds.Wed Dec 15 1993 09:0928
>If we do not want the customers to know about the CVAX problem or 
>KAV$M_BLOCK_IRQ why was the KAVVME_SYS.MAR program included in the kit?

This is common, you usually document the code when you go along writing it.
The customer can also buy a VMS listing disk and look at the comments there
which contain other 'clue's if you want to call this that way. What we do not
want is to tell the customer why it exactly happens - it is sufficient if he
knows that there is a problem. There is nothing wrong with the comments since
it only hints to the fact that there is a problem.

If your customer has to probe the second D/A card and he does this once there is
no problem - in whatever way he does that. This specific problem only occurs
as Thomas mentioned in tight loops writing to NXM addresses on VME

As with all undocumented features you should work with the customer to see
whether you can solve the problem without using this undocumented feature,
however I can assure you that on KAV30 this won't go away, so if there is
an absolute need use BLOCK_IRQ, (but don't call us when something else, like
ethernet breaks)

The customers program should do have a INIT routine in which the the two boards
are probed and determined what's there BEFORE any traffic (VME INTs) can be
posted, then the code has to set some flags as to what the config is and after
that you start the full application. If you do that way you do not need any 
BLOCK_IRQ modifieres in your code, and you won't see the DBLERR problem either.
It's more a design question then anything else really.

	Julian.
109.5ZYDECO::REDDYWed Dec 15 1993 15:134
    Thanks,  Julian.
    
    sr