[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | VAX and Alpha VMS |
Notice: | This is a new VMSnotes, please read note 2.1 |
Moderator: | VAXAXP::BERNARDO |
|
Created: | Wed Jan 22 1997 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 703 |
Total number of notes: | 3722 |
625.0. "Alpha V7.1, SYS$WFLOR loops." by PRSSOS::MAILLARD (Denis MAILLARD) Wed May 21 1997 10:18
The following problem has been entered as QAR #1134 in database
EVMS-RAVEN (due to the fact that there is not currently on TRIFID a QAR
database for V7.1). I'd appreciate it if anyone could enter comments
about what in SYS$WFLOR code may cause it. This problem is not
currently urgent, but I suppose that anything that may cause an event
flag system service to fail should be considered serious enough for
prompt action.
Denis.
P.S.: Moderator, if you think that the way to reproduce this problem
should not be published, please hide this note and let me know.
The following short (4 instructions) PASCAL program makes SYS$WFLOR
loop and take 100% of available CPU on OpenVMS Alpha V7.1, while it works
correctly on V6.2 or on VAX V7.1 (it does not loops in elevated IPLs though,
and lets the other users work too, provided that the process has not a high
priority, but it can severely slow a production system).
[INHERIT ('sys$library:starlet','sys$library:pascal$lib_routines')]
PROGRAM test (INPUT,OUTPUT);
VAR
status, D1, pos_D1 : unsigned;
begin
status := lib$get_ef (D1);
status := $clref (D1);
pos_D1 := 2 ** (D1-32);
status := $wflor (D1,pos_D1);
end.
Note: lib$get_ef assigns local event flags starting with number 63 if
available and then in decreasing order, so, in this case, we know that event
flag number 63 will be assigned.
When examining the generated machine code and using STEP/INSTRUCTION
under the debugger, one sees that the cause of the problem lays in the PASCAL
code setting the value of pos_D1 (stored in R17) to 0000000080000000, as
returned in R0 by the MATH$POW_QQ routine which calculates the value of 2**31.
If one replaces "pos_D1 := 2 ** (D1-32);" with "pos_D1 := 16#FFFFFFFF80000000;",
or deposits FFFFFFFF80000000 in R17 just before the call to $WFLOR, the problem
does not occur.
A workaround is to specify the [VOLATILE] attribute for the pos_D1
variable. In that case, the value returned in R0 by MATH$POW_QQ
(0000000080000000) is stored in memory before being loaded from memory into R17
with a LDL instruction that effectively sign-extends it to FFFFFFFF80000000.
This workaround is the one that was suggested to the customer who first saw the
problem. I should know whether it works for his (rather complex) application as
soon as he has tested it (hopefully before the end of the week).
Similarly, if the program is compiled with the /NOOPTIMISE qualifier,
the value of pos_D1 that is stored in R17 before the call to $wflor is
FFFFFFFF80000000 instead of 0000000080000000 and the problem does not occur.
There is possibly here (unless the PASCAL developers can establish that it does
not violate the language specifications) a minor bug in the Alpha VMS PASCAL
compiler code optimiser, which should also be addressed. It seems to me that the
register should contain the same value whether the code is optimised or not,
even if the part that is different is beyond the limit of the actual longword
PASCAL variable.
When sampling PC values for the process when it is looping with the SDA
command SHOW PROCESS/REGISTER, one sees that all sampled values are within the
PROCESS_MANAGEMENT and the EXCEPTION executive modules.
If one replaces the line "status := $clref (D1);" with the line
"status := $setef (D1);" to have the flag set when the call to $wflor is made,
the program returns immediately, as expected, which seems to point to a problem
in the wait part of the system service code, as this part is only called if the
flag is not found set when entering the system routine EXE$WFLOR.
The fact that the SAME image (compiled under V6.2) runs OK in V6.2 but
shows the problem under V7.1 seems to point to modifications introduced in V7.0
or V7.1, i.e. probably code to handle the EFN$C_ENF flag (flag 128). I
unfortunately don't have a V7.0 Alpha system available to test the case with it.
I suppose that either EXE$WFLOR should check the parameters it receives,
or that the instructions that manipulate the mask variable in the wait code
should be examined to see where the content of the higher part of the register
that contains the longword mask variable influences their results, maybe by
generating a (recursive?) exception, as the PC sampling seems to indicate.
T.R | Title | User | Personal Name | Date | Lines |
---|
625.1 | | TLE::REAGAN | All of this chaos makes perfect sense | Wed May 21 1997 11:06 | 17 |
| I won't speak to the WFLOR issue, but I can speak to the Pascal
issues/questions.
1) Yes, I think the generated code is suspicious. The immediate
unsigned parameters to $WFLOR should be sign-extended, but as you
noticed, that didn't happen.
2) Doing "pos_D1 := 16#FFFFFFFF80000000;" didn't do what you think
it did. Pos_D1 is a 32-bit unsigned number, you can't stick 64-bits
into it. 32-bit unsigned assignments do an implicit (MOD 2**32) before
assigning. All you have done is put 16#80000000 into pos_D1. However,
it seems that the code generator then did the correct sign-extending of
the LU value into the argument.
I will post a note in CLT::DEC_PASCAL_BUGS for the generated code.
-John
|
625.2 | | PRSSOS::MAILLARD | Denis MAILLARD | Wed May 21 1997 11:36 | 15 |
| Re .1: John,
> 2) Doing "pos_D1 := 16#FFFFFFFF80000000;" didn't do what you think
> it did. Pos_D1 is a 32-bit unsigned number, you can't stick 64-bits
> into it. 32-bit unsigned assignments do an implicit (MOD 2**32) before
> assigning. All you have done is put 16#80000000 into pos_D1. However,
> it seems that the code generator then did the correct sign-extending of
> the LU value into the argument.
Oooooops! My mistake (red face...), sorry about that. Thanks for
pointing it to me. However, as I said, a deposit of FFFFFFFF80000000 in
R17 after setting language to macro under the debugger also did it, so
there was no doubt about the issue as far as the $WFLOR argument is
concerned.
Denis.
|
625.3 | QAR Is The Right Approach | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Wed May 21 1997 11:51 | 12 |
|
EVMS-RAVEN is the right spot.
There are any number of ways to cause a system service to fail, and
presenting one with garbage input is certainly one common way to
engender a system service failure.
That sys$wflor loops (in a user-interruptable fashion) does look like
a small problem (around looking at more of the input than the longword
that it should), but you've already done what you should have here --
you've already logged the QAR.
|
625.4 | | ZIMBRA::BERNARDO | Dave Bernardo, VMS Engineering | Wed May 21 1997 15:01 | 23 |
| I just took a quick look at this... one of the things that was added in
V7.0 (alpha) were checks for correct 32 bit sign extension on
parameters passed to system services. This was done because most/all
services now take true 64 bit values.
The $WFLOR system is actually a composite service, the first part
runs in the mode of the caller (user in this case). If this routine
finds the process/thread must wait, it calls an internal system service
to wait the thread. This service does the sign extension checks. For
this failing example, the service finds that the second parameter has
the value x0000000080000000, which is not sign extended. The internal
service returns SYSTEM-F-ARG_GTR_32_BITS, which the caller proceeds to
ignore and just rechecks the wait condition. It finds the thread must
still wait and calls the internal service again, which checks the sign
extension and returns... and so on, and so... and...
I could argue it either way as to whether the service should be
checking for sign extension on the event flag mask... But, if it's
going to, the error should be filtered back, that's for sure.
In any case, making the code pass a sign extended value is the workaround.
d.
|
625.5 | | TLE::REAGAN | All of this chaos makes perfect sense | Thu May 22 1997 10:29 | 6 |
| The missing sign-extension in the Pascal compiler has been fixed.
It only occurs when the result of an unsigned exponentiation is
passed to an immediate-mode parameter. The fix will appear in the
next release of the compiler.
-John
|