T.R | Title | User | Personal Name | Date | Lines |
---|
747.1 | | TLE::EKLUND | Always smiling on the inside! | Wed Apr 24 1996 18:32 | 19 |
747.2 | How about an environment variable? | HDLITE::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Wed Apr 24 1996 23:34 | 19 |
747.3 | Allow user control, if implemented | SUBPAC::FARICELLI | | Thu Apr 25 1996 07:54 | 9 |
747.4 | How about passing the info to the rtl | FORMAT::KIMBALL | Keith Kimball 381-0573 | Thu Apr 25 1996 10:13 | 6 |
747.5 | Why penalize all IO? | SUBPAC::FARICELLI | | Thu Apr 25 1996 11:11 | 11 |
747.6 | EXP does it right, why can't READ | STEVEN::hobbs | Steve Hobbs | Thu Apr 25 1996 15:50 | 32 |
747.7 | Use the RTL flag parameter | ELBANE::GROVE | | Thu Apr 25 1996 19:00 | 13 |
747.8 | working | TLE::WHITLOCK | Stan Whitlock | Fri Apr 26 1996 10:26 | 16 |
747.9 | What's the current status with this? | HDLITE::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Tue Aug 13 1996 14:49 | 9 |
747.10 | when minutes seem like hours | TLE::WHITLOCK | Stan Whitlock | Wed Aug 21 1996 09:05 | 10 |
747.11 | Half year tickler | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Tue Feb 04 1997 11:59 | 62 |
| The vendor is going to have to build their product with
-fpe1 so that it doesn't die when reading data which was
written by other systems (e.g., 1.0e-45 on SGI).
Any ideas as to whether it would be reasonable to write a
"fixup" routine? I'm not sure what it should look like for
optimal performance. Here's the routine I was thinking of.
I compiled it with Fortran V4.1 as follows (I'm really only
concerned with denorms):
f77 -fpe1 -tune ev5 -unroll 1
subroutine fixup(array, length)
integer length
real array(length)
real ZERO/0.0/
integer ndx
integer UNROLL
parameter(UNROLL=16)
do ndx = 1, length-(UNROLL-1), UNROLL
array(ndx+ 0) = array(ndx+ 0) + ZERO
array(ndx+ 1) = array(ndx+ 1) + ZERO
array(ndx+ 2) = array(ndx+ 2) + ZERO
array(ndx+ 3) = array(ndx+ 3) + ZERO
array(ndx+ 4) = array(ndx+ 4) + ZERO
array(ndx+ 5) = array(ndx+ 5) + ZERO
array(ndx+ 6) = array(ndx+ 6) + ZERO
array(ndx+ 7) = array(ndx+ 7) + ZERO
array(ndx+ 8) = array(ndx+ 8) + ZERO
array(ndx+ 9) = array(ndx+ 9) + ZERO
array(ndx+ 10) = array(ndx+ 10) + ZERO
array(ndx+ 11) = array(ndx+ 11) + ZERO
array(ndx+ 12) = array(ndx+ 12) + ZERO
array(ndx+ 13) = array(ndx+ 13) + ZERO
array(ndx+ 14) = array(ndx+ 14) + ZERO
array(ndx+ 15) = array(ndx+ 15) + ZERO
array(ndx+ 1) = array(ndx+ 1) + ZERO
end do
do ndx = ndx, length
array(ndx) = array(ndx) + ZERO
end do
return
end
Looking at the machine instructions, there are 4 trapb instructions,
which I wouldn't think would be optimal (i.e., it will have to wait
for a trap that will likely never happen.
Compiling -fpe0 (which of course has no trapbs and has adds instead
of addssu) schedules the instructions VERY differently.
The only other thing I can think of that might hurt performance
with the code I have above would be all the stores.
Optimally, the stores would only happen if there was a trap, but
I would think the overhead of doing the check would be greater
than the overhead of doing the store.
Would it make sense to do the check? If so, how would it be done?
Would it be a function of how frequently there are denorms?
-- Chuck Newman
|
747.12 | Treat reals5 as integers in your filter ... | HPCGRP::MANLEY | | Tue Feb 04 1997 14:36 | 6 |
|
Since "fixup" is a subroutine, "array" may just as well be declared as an array
of integers. Then, for each array element, you may extract the exponent, test
it for zero, and clear the element if need be. You will store elements only
when they're cleared and you'll never trap.
|
747.13 | FP_CLASS makes it easier | TLE::WHITLOCK | Stan Whitlock | Tue Feb 04 1997 16:42 | 42 |
| So what you have is IEEE binary data read into an array from some unclean
system and you want to stomp the unclean data {denorms, infinities, NaNs}
to 0 {because you want to run with -fpe0 and you don't wanna die}.
There is the intrinsic FP_CLASS
- FP_CLASS(x) returns an integer that indicates the nature of its
REAL or DOUBLE PRECISION argument. /usr/include/fordef.f contains
the definitions of the return value:
PARAMETER FOR_K_FP_SNAN = '00000000'X
PARAMETER FOR_K_FP_QNAN = '00000001'X
PARAMETER FOR_K_FP_POS_INF = '00000002'X
PARAMETER FOR_K_FP_NEG_INF = '00000003'X
PARAMETER FOR_K_FP_POS_NORM = '00000004'X
PARAMETER FOR_K_FP_NEG_NORM = '00000005'X
PARAMETER FOR_K_FP_POS_DENORM = '00000006'X
PARAMETER FOR_K_FP_NEG_DENORM = '00000007'X
PARAMETER FOR_K_FP_POS_ZERO = '00000008'X
PARAMETER FOR_K_FP_NEG_ZERO = '00000009'X
You could code
subroutine fixup(array, length)
integer length
real array(length)
integer i
do i = 1, length
which: select case (fp_class (array (i)))
case (4,5,8)
exit which
case (0,1,2,3,6,7,9)
array (i) = 0.0
end select which
end do
return
end
FP_CLASS can touch a floating point number {it's generic so either REAL*4 or *8}
and not get an exception.
/Stan
|
747.14 | Using float seems faster, esp when few denorms | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Tue Feb 04 1997 18:17 | 26 |
| Well, my code in .11 (without the extra "array(ndx+1)" statement)
seems to be faster by a factor of ~2 - 5, depending on the number of
denorms, than .12 (with statements like one of the following):
if (((array(ndx) .and. '7f800000'x) .eq. 0)
1 .and. (array(ndx) .ne. 0) ) array(ndx) = 0
if (((array(ndx) .and. '7f800000'x) .eq. 0) ) array(ndx) = 0
This is from stuffing the "bad" value into an integer
equivalenced to a real.
If the denorm comes as the result of a calculation
(e.g., if I add 0.0 to the denorm in the main program
and compile it -fpe3), my routine takes much longer�, and
has the bonus feature that the denorm doesn't get fixed up!
I link with -fpe0, but I suspect there is one handler for
the whole image, and it's the one associated with the
"highest" -fpe level specified.
� Takes much longer when there are lots of denorms. For few
denorms .11 is still faster
|
747.15 | 2nd try at FP_CLASS | TLE::WHITLOCK | Stan Whitlock | Wed Feb 05 1997 08:39 | 48 |
| My code in .13 is wrong... SELECT CASE is a Fortran 90 construct that is
supported by f77 but I didn't spell it right:
o "WHICH:" is a construct name that has to start in the statement
field {> column 6}
o EXIT exits DO loops - it's an extension I never got implemented that
EXIT should take you out of named constructs like CASE.
So the correct code is
subroutine fixup(array, length)
integer length
real array(length)
integer i
do i = 1, length
select case (fp_class (array (i)))
! case (4,5,8) ! +/- normal numbers and +0
! nothing to do
case (0,1,2,3,6,7,9) ! NaNs, +/- infinities, +/- denorms, -0
array (i) = 0.0
end select
end do
return
end
RE: .14
-fpe does 2 things:
o compiles code that does or does not allow trap shadows {which allow
exceptions to be fixed up}
o when used on the main program, makes a call to the RTL handler to tell
it what fpe semantics are requested
So if you compile the main program -fpe3 and a subroutine -fpe0 and get an
exception inside the subroutine, bad things may happen since the subroutine code
is not structured to allow the handler to fix up the exception.
In your case, you want the main program compiled -fpe0 and the subroutine at
-fpe1. Then inside your subroutine, you have to call for_set_fpe to set the
handler to fpe 1 {and then reset before you exit}. See for_set_fpe(3f) for
details.
But FP_CLASS is a much easier way to do it and everybody can be compiled -fpe0.
/Stan
|
747.16 | Use 5IAND not .AND. | HPCGRP::MANLEY | | Wed Feb 05 1997 09:52 | 21 |
|
> Well, my code in .11 (without the extra "array(ndx+1)" statement)
> seems to be faster by a factor of ~2 - 5, depending on the number of
> denorms, than .12 (with statements like one of the following):
>
> if (((array(ndx) .and. '7f800000'x) .eq. 0)
> 1 .and. (array(ndx) .ne. 0) ) array(ndx) = 0
>
>
> if (((array(ndx) .and. '7f800000'x) .eq. 0) ) array(ndx) = 0
I had something like
if( iand( array(i),'7f800000'x ).eq.0 ) array(i)=0
in mind.
Frankly, I can't predict what will happen when logical "and"s are used where
boolean "and" are needed. Also, if you want to minimize stores, don't forget
that IEEE -0.0 does not equal integer 0.
|
747.17 | Floating ADD fastest when no denorms, otherwise integer exponent check | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Wed Feb 05 1997 09:59 | 39 |
| re: .15:
�In your case, you want the main program compiled -fpe0 and the subroutine at
�-fpe1. Then inside your subroutine, you have to call for_set_fpe to set the
�handler to fpe 1 {and then reset before you exit}. See for_set_fpe(3f) for
�details.
I compiled with those switches, but I didn't do any calls to for_set_fpe
in my subroutine and it works handsomely. When I add the calls to for_set_fpe
the routine stops cleaning up the denorms. I've tried passing it FPE_M_TRAP_UND
and FPE_M_TRAP_INV.The documentation here is kinda sketchy.
It looks like the fastest thing would be to use the Floating case until I get
a denorm, then use the integer case. How can I count the traps? Is there a
data structure I can hook up to?
Here's the timings for now:
Floating Integer Case
Case A1 8.148438 .0078125 .0054688
Case B1 8.941406 2.476563 4.789063
Case C1 1.535156 3.054688 4.929688
Case A2 .0078125 .0117188 .0585938
Case B2 .3906250 2.539063 4.769531
Case B2 .6250000 3.046875 4.976563
Case A1: Clean 16 reals; the first 8 are denorms. Do this many times
Case B1: Clean 1600 reals; the first 8 are denorms. Do this many times
Case C1: Clean 16000 reals; the first 8 are denorms. Do this many times
Case A2: Clean 16 reals; no denorms. Do this many times
Case B2: Clean 1600 reals; no denorms. Do this many times
Case C2: Clean 16000 reals; no denorms. Do this many times
Floating: Add 0.0 to the number and store it back. Compile -fpe1
Integer: Check the exponent. If zero, store a zero.
Case: Check fp_class. If Nan, infinity, denorm, or -0 store a zero.
|
747.18 | who said speed is beautiful? | TLE::WHITLOCK | Stan Whitlock | Wed Feb 05 1997 16:35 | 45 |
| >>I compiled with those switches, but I didn't do any calls to for_set_fpe
>>in my subroutine and it works handsomely. When I add the calls to for_set_fpe
>>the routine stops cleaning up the denorms. I've tried passing it
>>FPE_M_TRAP_UND and FPE_M_TRAP_INV.The documentation here is kinda sketchy.
You want to set FPE_M_ABRUPT_UND - that's what says "stomp underflow to 0.0".
But on Alpha, at -fpe0, it's not the handler that does the stomping, it's the
hardware; that's why it's fast. At -fpe1, there are trap shadows, the handler
is called, and the ABRUPT_UND is done.
>>It looks like the fastest thing would be to use the Floating case until I get
>>a denorm, then use the integer case. How can I count the traps? Is there a
>>data structure I can hook up to?
The RTL counts exceptions but those counts aren't available to you.
Something doesn't make sense here... when you load a denorm, it's not an
Underflow, it's an Invalid. That's why you need -fpe1 in order to touch the
data. -fpe1 has to have trap shadows and that slows stuff down. When the
"denorm + 0.0" is done, you get an underflow and the handler stomps it to 0.0.
So what we want is -fpe0 and a way to touch the data so it won't get an Invalid
if the data is a denorm. And we want speed...
do i = 1,n
if ((a(i).and. exponent-mask) .eq. 0) a(i) = 0
end do
won't unroll the loop. Plus the fastest thing to do is get the IF turned into
a CMOVE instruction. So the speed wizards suggest
do i=1,n,16 ! hand unrolled by 16, for example
! use DFOR$PREFETCH to prefetch elements I+17 thru I+32
! like CALL DFOR$PREFETCH (A(I+17)) etc
K0 = A(I ) .AND. '7f800000'X
K1 = A(I+ 1) .AND. '7f800000'X
...
K15 = A(I+15) .AND. '7f800000'X
IF (K0 .EQ. 0) A(I ) = 0 ! these should become CMOVEs
IF (K1 .EQ. 0) A(I+ 1) = 0
...
IF (K15 .EQ. 0) A(I+15) = 0
end do
I hope something here has helped. /Stan
|
747.19 | Sure would be nice if I could get to the RTL's exception counts... | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Thu Feb 06 1997 09:46 | 87 |
| �>>I compiled with those switches, but I didn't do any calls to for_set_fpe
�>>in my subroutine and it works handsomely. When I add the calls to for_set_fpe
�>>the routine stops cleaning up the denorms. I've tried passing it
�>>FPE_M_TRAP_UND and FPE_M_TRAP_INV.The documentation here is kinda sketchy.
�
�You want to set FPE_M_ABRUPT_UND - that's what says "stomp underflow to 0.0".
�But on Alpha, at -fpe0, it's not the handler that does the stomping, it's the
�hardware; that's why it's fast. At -fpe1, there are trap shadows, the handler
�is called, and the ABRUPT_UND is done.
Thanks, using FPE_M_ABRUPT_UND does it. Is there an explaination of what these
mean anywhere?
I understand about who squashes the denorms. The hardware will only do it if
the RESULT of an operation is a denorm. If the INPUT to an operation is a
denorm then a handler is needed. My reasoning for doing what I did was as
follows:
1) If there aren't any invalid data, this should scream right along at one
datum per cycle -- 2 memory accesses and one floating operation.
2) If I unroll by 16, there should only need to be 1 trapb at the end of each
iteration of the loop since there are enough unique registers for the trap
handler to figure out what's going on. I should even be able to unroll quite a
bit more. Seems, however, that the compiler doesn't do scheduling to minimize
the number of trapb instructions, so I can only unroll by 6 if I only want one
trapb (Well, I can understand 2 trapb's -- 1 before the first store (leaving a
few more add's to do, and one before the stores for those last few adds -- but I
don't think the compiler should generate any more than that).
�>>It looks like the fastest thing would be to use the Floating case until I get
�>>a denorm, then use the integer case. How can I count the traps? Is there a
�>>data structure I can hook up to?
�
�The RTL counts exceptions but those counts aren't available to you.
Rats!!! That may be my downfall. I could do something with RPCCs at
the head and tail of the loop and compare that against some number, but that's
getting a little to ugly even for me. Any besides, Digital UNIX occasionally
schedules "stuff" that runs in the process's context which will inflate this
number from time to time. And there's cache misses to think about...
The penalty for those cases where there *are* denorms is too high to ignore.
�Something doesn't make sense here... when you load a denorm, it's not an
�Underflow, it's an Invalid. That's why you need -fpe1 in order to touch the
�data. -fpe1 has to have trap shadows and that slows stuff down. When the
�"denorm + 0.0" is done, you get an underflow and the handler stomps it to 0.0.
�
�So what we want is -fpe0 and a way to touch the data so it won't get an Invalid
�if the data is a denorm. And we want speed...
�
� do i = 1,n
� if ((a(i).and. exponent-mask) .eq. 0) a(i) = 0
� end do
�
�won't unroll the loop.
I would think that you'll pay for the trap shadow a little from the trapb as
long as you don't have any denorms (but no more than 4 cycles -- the latency of
an adds). Looking at the exponent requires a couple of integer operations and
some combination of (conditional move/store or test/conditional branch/ store).
If I remember correctly, you can only have two of (load/store/integer operation)
per cycle, so these won't multi-issue as well as using a floating add).
�Plus the fastest thing to do is get the IF turned into
�a CMOVE instruction. So the speed wizards suggest
�
� do i=1,n,16 ! hand unrolled by 16, for example
� ! use DFOR$PREFETCH to prefetch elements I+17 thru I+32
� ! like CALL DFOR$PREFETCH (A(I+17)) etc
� K0 = A(I ) .AND. '7f800000'X
� K1 = A(I+ 1) .AND. '7f800000'X
� ...
� K15 = A(I+15) .AND. '7f800000'X
� IF (K0 .EQ. 0) A(I ) = 0 ! these should become CMOVEs
� IF (K1 .EQ. 0) A(I+ 1) = 0
� ...
� IF (K15 .EQ. 0) A(I+15) = 0
� end do
�
�I hope something here has helped. /Stan
I'll have to try this and see what the compiler generates, but it looks like
you're trying to generate a conditional STORE, not a conditional MOVE. If this
doesn't do it, however, I'm sure I can write it to induce a CMOV -- I've done it
before. I'll let you know what that does to the performance.
|
747.20 | Latest integer code is fastest | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Thu Feb 06 1997 10:42 | 29 |
| As I suspected, Stan's suggestion didn't induce CMOV instructions, but it did
run pretty quick, and only did stores if it needed to.
I was able to induce CMOVs as follows:
f0 = array(ndx)
if (iand(f0 , '7f800000'x) .eq. 0) f0 = 0
array(ndx) = f0
... which always did the store (as does the version with the floating add)
Both of these integer versions are about the same speed.
The two calls to for_set_fpe() has slowed down the floating code when there are
no exceptions, but sped it up when there are.
Looks like I'll use the integer code.
One (hopefully) final question, which will determine which variant of the
integer code to use:
Since the Alpha chips are read-allocate, is there any penalty for doing the
stores even when they aren't needed? Doesn't seem to affect my toy program,
but what will the effect be (if any) on a real application? If no impact on
ev4 and ev5, how about ev6? Learning all the time...
-- Chuck Newman
|
747.21 | Doing stores doubles your mem bandwidth requirements | WIDTH::MDAVIS | Mark Davis - compiler maniac | Thu Feb 06 1997 10:57 | 5 |
| and since you're doing almost nothing with each element, for large
arrays you will be running at mem bandwidth speed - doubling the
requirement will double your time. Try running your toy program
on an 8meg array.
|
747.22 | A better version when there are lots of zeros | HYDRA::NEWMAN | Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26 | Wed Apr 02 1997 16:43 | 87 |
| My vendor is expecting mostly zeros, so Stan's routine would still run
at memory speed since it does the store for denorms and zeros.
The following is better when there are mostly zeros for two reasons --
it avoids the store and uses fewer instructions. It also has a
single precision version and a double precision version.
F.W.I.W., The "mask" variable gave me troubles (I think I used V4.1)
1) If I didn't use it but had the constant in-line, the compiler generated
it via an LDAH for each datum.
2) If I used the mask variable and initialized it with an implicit data
statement, the compiler loaded it from memory for each datum.
3) Only if I used the mask variable and set it with an assignment statement
could I get the compiler to keep it in a register.
options /extend_source
subroutine fixup_4(array, length)
implicit none
integer length
integer array(length)
integer UNROLL
parameter (UNROLL=16)
integer ndx
integer f0, f1, f2, f3, f4, f5, f6, f7, f8
integer f9, f10, f11, f12, f13, f14, f15, f16
integer mask_4
mask_4 = '7f800000'x
do ndx = 1, length-(UNROLL-1), UNROLL
if ((iand(array(ndx + 0), mask_4) .eq. 0) .and. (array(ndx + 0) .ne. 0)) array(ndx + 0) = 0
if ((iand(array(ndx + 1), mask_4) .eq. 0) .and. (array(ndx + 1) .ne. 0)) array(ndx + 1) = 0
if ((iand(array(ndx + 2), mask_4) .eq. 0) .and. (array(ndx + 2) .ne. 0)) array(ndx + 2) = 0
if ((iand(array(ndx + 3), mask_4) .eq. 0) .and. (array(ndx + 3) .ne. 0)) array(ndx + 3) = 0
if ((iand(array(ndx + 4), mask_4) .eq. 0) .and. (array(ndx + 4) .ne. 0)) array(ndx + 4) = 0
if ((iand(array(ndx + 5), mask_4) .eq. 0) .and. (array(ndx + 5) .ne. 0)) array(ndx + 5) = 0
if ((iand(array(ndx + 6), mask_4) .eq. 0) .and. (array(ndx + 6) .ne. 0)) array(ndx + 6) = 0
if ((iand(array(ndx + 7), mask_4) .eq. 0) .and. (array(ndx + 7) .ne. 0)) array(ndx + 7) = 0
if ((iand(array(ndx + 8), mask_4) .eq. 0) .and. (array(ndx + 8) .ne. 0)) array(ndx + 8) = 0
if ((iand(array(ndx + 9), mask_4) .eq. 0) .and. (array(ndx + 9) .ne. 0)) array(ndx + 9) = 0
if ((iand(array(ndx + 10), mask_4) .eq. 0) .and. (array(ndx + 10) .ne. 0)) array(ndx + 10) = 0
if ((iand(array(ndx + 11), mask_4) .eq. 0) .and. (array(ndx + 11) .ne. 0)) array(ndx + 11) = 0
if ((iand(array(ndx + 12), mask_4) .eq. 0) .and. (array(ndx + 12) .ne. 0)) array(ndx + 12) = 0
if ((iand(array(ndx + 13), mask_4) .eq. 0) .and. (array(ndx + 13) .ne. 0)) array(ndx + 13) = 0
if ((iand(array(ndx + 14), mask_4) .eq. 0) .and. (array(ndx + 14) .ne. 0)) array(ndx + 14) = 0
if ((iand(array(ndx + 15), mask_4) .eq. 0) .and. (array(ndx + 15) .ne. 0)) array(ndx + 15) = 0
end do
do ndx = ndx, length
if ((iand(array(ndx), mask_4) .eq. 0) .and. (array(ndx) .ne. 0)) array(ndx) = 0
end do
return
end
c
options /extend_source
subroutine fixup_8(array, length)
implicit none
integer length
integer*8 array(length)
integer UNROLL
parameter (UNROLL=16)
integer ndx
integer*8 f0, f1, f2, f3, f4, f5, f6, f7, f8
integer*8 f9, f10, f11, f12, f13, f14, f15, f16
integer*8 mask_8
mask_8 = '7ff0000000000000'x
do ndx = 1, length-(UNROLL-1), UNROLL
if ((iand(array(ndx + 0), mask_8) .eq. 0) .and. (array(ndx + 0) .ne. 0)) array(ndx + 0) = 0
if ((iand(array(ndx + 1), mask_8) .eq. 0) .and. (array(ndx + 1) .ne. 0)) array(ndx + 1) = 0
if ((iand(array(ndx + 2), mask_8) .eq. 0) .and. (array(ndx + 2) .ne. 0)) array(ndx + 2) = 0
if ((iand(array(ndx + 3), mask_8) .eq. 0) .and. (array(ndx + 3) .ne. 0)) array(ndx + 3) = 0
if ((iand(array(ndx + 4), mask_8) .eq. 0) .and. (array(ndx + 4) .ne. 0)) array(ndx + 4) = 0
if ((iand(array(ndx + 5), mask_8) .eq. 0) .and. (array(ndx + 5) .ne. 0)) array(ndx + 5) = 0
if ((iand(array(ndx + 6), mask_8) .eq. 0) .and. (array(ndx + 6) .ne. 0)) array(ndx + 6) = 0
if ((iand(array(ndx + 7), mask_8) .eq. 0) .and. (array(ndx + 7) .ne. 0)) array(ndx + 7) = 0
if ((iand(array(ndx + 8), mask_8) .eq. 0) .and. (array(ndx + 8) .ne. 0)) array(ndx + 8) = 0
if ((iand(array(ndx + 9), mask_8) .eq. 0) .and. (array(ndx + 9) .ne. 0)) array(ndx + 9) = 0
if ((iand(array(ndx + 10), mask_8) .eq. 0) .and. (array(ndx + 10) .ne. 0)) array(ndx + 10) = 0
if ((iand(array(ndx + 11), mask_8) .eq. 0) .and. (array(ndx + 11) .ne. 0)) array(ndx + 11) = 0
if ((iand(array(ndx + 12), mask_8) .eq. 0) .and. (array(ndx + 12) .ne. 0)) array(ndx + 12) = 0
if ((iand(array(ndx + 13), mask_8) .eq. 0) .and. (array(ndx + 13) .ne. 0)) array(ndx + 13) = 0
if ((iand(array(ndx + 14), mask_8) .eq. 0) .and. (array(ndx + 14) .ne. 0)) array(ndx + 14) = 0
if ((iand(array(ndx + 15), mask_8) .eq. 0) .and. (array(ndx + 15) .ne. 0)) array(ndx + 15) = 0
end do
do ndx = ndx, length
if ((iand(array(ndx), mask_8) .eq. 0) .and. (array(ndx) .ne. 0)) array(ndx) = 0
end do
return
end
|