[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:	Digital Fortran
Notice:	Read notes 1.* for important information
Moderator:	QUARK::LIONEL

Created:	Thu Jun 01 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1333
Total number of notes:	6734

747.0. "Validating REAL data on input (e.g., checking for denormalized numbers)" by HDLITE::NEWMAN (Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F28) Wed Apr 24 1996 18:01

T.R	Title	User	Personal Name	Date	Lines
747.1		TLE::EKLUND	Always smiling on the inside!	`Wed Apr 24 1996 18:32`	19
747.2	How about an environment variable?	HDLITE::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Wed Apr 24 1996 23:34`	19
747.3	Allow user control, if implemented	SUBPAC::FARICELLI		`Thu Apr 25 1996 07:54`	9
747.4	How about passing the info to the rtl	FORMAT::KIMBALL	Keith Kimball 381-0573	`Thu Apr 25 1996 10:13`	6
747.5	Why penalize all IO?	SUBPAC::FARICELLI		`Thu Apr 25 1996 11:11`	11
747.6	EXP does it right, why can't READ	STEVEN::hobbs	Steve Hobbs	`Thu Apr 25 1996 15:50`	32
747.7	Use the RTL flag parameter	ELBANE::GROVE		`Thu Apr 25 1996 19:00`	13
747.8	working	TLE::WHITLOCK	Stan Whitlock	`Fri Apr 26 1996 10:26`	16
747.9	What's the current status with this?	HDLITE::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Tue Aug 13 1996 14:49`	9
747.10	when minutes seem like hours	TLE::WHITLOCK	Stan Whitlock	`Wed Aug 21 1996 09:05`	10
747.11	Half year tickler	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Tue Feb 04 1997 11:59`	62
	The vendor is going to have to build their product with -fpe1 so that it doesn't die when reading data which was written by other systems (e.g., 1.0e-45 on SGI). Any ideas as to whether it would be reasonable to write a "fixup" routine? I'm not sure what it should look like for optimal performance. Here's the routine I was thinking of. I compiled it with Fortran V4.1 as follows (I'm really only concerned with denorms): f77 -fpe1 -tune ev5 -unroll 1 subroutine fixup(array, length) integer length real array(length) real ZERO/0.0/ integer ndx integer UNROLL parameter(UNROLL=16) do ndx = 1, length-(UNROLL-1), UNROLL array(ndx+ 0) = array(ndx+ 0) + ZERO array(ndx+ 1) = array(ndx+ 1) + ZERO array(ndx+ 2) = array(ndx+ 2) + ZERO array(ndx+ 3) = array(ndx+ 3) + ZERO array(ndx+ 4) = array(ndx+ 4) + ZERO array(ndx+ 5) = array(ndx+ 5) + ZERO array(ndx+ 6) = array(ndx+ 6) + ZERO array(ndx+ 7) = array(ndx+ 7) + ZERO array(ndx+ 8) = array(ndx+ 8) + ZERO array(ndx+ 9) = array(ndx+ 9) + ZERO array(ndx+ 10) = array(ndx+ 10) + ZERO array(ndx+ 11) = array(ndx+ 11) + ZERO array(ndx+ 12) = array(ndx+ 12) + ZERO array(ndx+ 13) = array(ndx+ 13) + ZERO array(ndx+ 14) = array(ndx+ 14) + ZERO array(ndx+ 15) = array(ndx+ 15) + ZERO array(ndx+ 1) = array(ndx+ 1) + ZERO end do do ndx = ndx, length array(ndx) = array(ndx) + ZERO end do return end Looking at the machine instructions, there are 4 trapb instructions, which I wouldn't think would be optimal (i.e., it will have to wait for a trap that will likely never happen. Compiling -fpe0 (which of course has no trapbs and has adds instead of addssu) schedules the instructions VERY differently. The only other thing I can think of that might hurt performance with the code I have above would be all the stores. Optimally, the stores would only happen if there was a trap, but I would think the overhead of doing the check would be greater than the overhead of doing the store. Would it make sense to do the check? If so, how would it be done? Would it be a function of how frequently there are denorms? -- Chuck Newman
747.12	Treat reals5 as integers in your filter ...	HPCGRP::MANLEY		`Tue Feb 04 1997 14:36`	6
	Since "fixup" is a subroutine, "array" may just as well be declared as an array of integers. Then, for each array element, you may extract the exponent, test it for zero, and clear the element if need be. You will store elements only when they're cleared and you'll never trap.
747.13	FP_CLASS makes it easier	TLE::WHITLOCK	Stan Whitlock	`Tue Feb 04 1997 16:42`	42
	So what you have is IEEE binary data read into an array from some unclean system and you want to stomp the unclean data {denorms, infinities, NaNs} to 0 {because you want to run with -fpe0 and you don't wanna die}. There is the intrinsic FP_CLASS - FP_CLASS(x) returns an integer that indicates the nature of its REAL or DOUBLE PRECISION argument. /usr/include/fordef.f contains the definitions of the return value: PARAMETER FOR_K_FP_SNAN = '00000000'X PARAMETER FOR_K_FP_QNAN = '00000001'X PARAMETER FOR_K_FP_POS_INF = '00000002'X PARAMETER FOR_K_FP_NEG_INF = '00000003'X PARAMETER FOR_K_FP_POS_NORM = '00000004'X PARAMETER FOR_K_FP_NEG_NORM = '00000005'X PARAMETER FOR_K_FP_POS_DENORM = '00000006'X PARAMETER FOR_K_FP_NEG_DENORM = '00000007'X PARAMETER FOR_K_FP_POS_ZERO = '00000008'X PARAMETER FOR_K_FP_NEG_ZERO = '00000009'X You could code subroutine fixup(array, length) integer length real array(length) integer i do i = 1, length which: select case (fp_class (array (i))) case (4,5,8) exit which case (0,1,2,3,6,7,9) array (i) = 0.0 end select which end do return end FP_CLASS can touch a floating point number {it's generic so either REAL4 or 8} and not get an exception. /Stan
747.14	Using float seems faster, esp when few denorms	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Tue Feb 04 1997 18:17`	26
	Well, my code in .11 (without the extra "array(ndx+1)" statement) seems to be faster by a factor of ~2 - 5, depending on the number of denorms, than .12 (with statements like one of the following): if (((array(ndx) .and. '7f800000'x) .eq. 0) 1 .and. (array(ndx) .ne. 0) ) array(ndx) = 0 if (((array(ndx) .and. '7f800000'x) .eq. 0) ) array(ndx) = 0 This is from stuffing the "bad" value into an integer equivalenced to a real. If the denorm comes as the result of a calculation (e.g., if I add 0.0 to the denorm in the main program and compile it -fpe3), my routine takes much longer�, and has the bonus feature that the denorm doesn't get fixed up! I link with -fpe0, but I suspect there is one handler for the whole image, and it's the one associated with the "highest" -fpe level specified. � Takes much longer when there are lots of denorms. For few denorms .11 is still faster
747.15	2nd try at FP_CLASS	TLE::WHITLOCK	Stan Whitlock	`Wed Feb 05 1997 08:39`	48
	My code in .13 is wrong... SELECT CASE is a Fortran 90 construct that is supported by f77 but I didn't spell it right: o "WHICH:" is a construct name that has to start in the statement field {> column 6} o EXIT exits DO loops - it's an extension I never got implemented that EXIT should take you out of named constructs like CASE. So the correct code is subroutine fixup(array, length) integer length real array(length) integer i do i = 1, length select case (fp_class (array (i))) ! case (4,5,8) ! +/- normal numbers and +0 ! nothing to do case (0,1,2,3,6,7,9) ! NaNs, +/- infinities, +/- denorms, -0 array (i) = 0.0 end select end do return end RE: .14 -fpe does 2 things: o compiles code that does or does not allow trap shadows {which allow exceptions to be fixed up} o when used on the main program, makes a call to the RTL handler to tell it what fpe semantics are requested So if you compile the main program -fpe3 and a subroutine -fpe0 and get an exception inside the subroutine, bad things may happen since the subroutine code is not structured to allow the handler to fix up the exception. In your case, you want the main program compiled -fpe0 and the subroutine at -fpe1. Then inside your subroutine, you have to call for_set_fpe to set the handler to fpe 1 {and then reset before you exit}. See for_set_fpe(3f) for details. But FP_CLASS is a much easier way to do it and everybody can be compiled -fpe0. /Stan
747.16	Use 5IAND not .AND.	HPCGRP::MANLEY		`Wed Feb 05 1997 09:52`	21
	> Well, my code in .11 (without the extra "array(ndx+1)" statement) > seems to be faster by a factor of ~2 - 5, depending on the number of > denorms, than .12 (with statements like one of the following): > > if (((array(ndx) .and. '7f800000'x) .eq. 0) > 1 .and. (array(ndx) .ne. 0) ) array(ndx) = 0 > > > if (((array(ndx) .and. '7f800000'x) .eq. 0) ) array(ndx) = 0 I had something like if( iand( array(i),'7f800000'x ).eq.0 ) array(i)=0 in mind. Frankly, I can't predict what will happen when logical "and"s are used where boolean "and" are needed. Also, if you want to minimize stores, don't forget that IEEE -0.0 does not equal integer 0.
747.17	Floating ADD fastest when no denorms, otherwise integer exponent check	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Wed Feb 05 1997 09:59`	39
	re: .15: �In your case, you want the main program compiled -fpe0 and the subroutine at �-fpe1. Then inside your subroutine, you have to call for_set_fpe to set the �handler to fpe 1 {and then reset before you exit}. See for_set_fpe(3f) for �details. I compiled with those switches, but I didn't do any calls to for_set_fpe in my subroutine and it works handsomely. When I add the calls to for_set_fpe the routine stops cleaning up the denorms. I've tried passing it FPE_M_TRAP_UND and FPE_M_TRAP_INV.The documentation here is kinda sketchy. It looks like the fastest thing would be to use the Floating case until I get a denorm, then use the integer case. How can I count the traps? Is there a data structure I can hook up to? Here's the timings for now: Floating Integer Case Case A1 8.148438 .0078125 .0054688 Case B1 8.941406 2.476563 4.789063 Case C1 1.535156 3.054688 4.929688 Case A2 .0078125 .0117188 .0585938 Case B2 .3906250 2.539063 4.769531 Case B2 .6250000 3.046875 4.976563 Case A1: Clean 16 reals; the first 8 are denorms. Do this many times Case B1: Clean 1600 reals; the first 8 are denorms. Do this many times Case C1: Clean 16000 reals; the first 8 are denorms. Do this many times Case A2: Clean 16 reals; no denorms. Do this many times Case B2: Clean 1600 reals; no denorms. Do this many times Case C2: Clean 16000 reals; no denorms. Do this many times Floating: Add 0.0 to the number and store it back. Compile -fpe1 Integer: Check the exponent. If zero, store a zero. Case: Check fp_class. If Nan, infinity, denorm, or -0 store a zero.
747.18	who said speed is beautiful?	TLE::WHITLOCK	Stan Whitlock	`Wed Feb 05 1997 16:35`	45
	>>I compiled with those switches, but I didn't do any calls to for_set_fpe >>in my subroutine and it works handsomely. When I add the calls to for_set_fpe >>the routine stops cleaning up the denorms. I've tried passing it >>FPE_M_TRAP_UND and FPE_M_TRAP_INV.The documentation here is kinda sketchy. You want to set FPE_M_ABRUPT_UND - that's what says "stomp underflow to 0.0". But on Alpha, at -fpe0, it's not the handler that does the stomping, it's the hardware; that's why it's fast. At -fpe1, there are trap shadows, the handler is called, and the ABRUPT_UND is done. >>It looks like the fastest thing would be to use the Floating case until I get >>a denorm, then use the integer case. How can I count the traps? Is there a >>data structure I can hook up to? The RTL counts exceptions but those counts aren't available to you. Something doesn't make sense here... when you load a denorm, it's not an Underflow, it's an Invalid. That's why you need -fpe1 in order to touch the data. -fpe1 has to have trap shadows and that slows stuff down. When the "denorm + 0.0" is done, you get an underflow and the handler stomps it to 0.0. So what we want is -fpe0 and a way to touch the data so it won't get an Invalid if the data is a denorm. And we want speed... do i = 1,n if ((a(i).and. exponent-mask) .eq. 0) a(i) = 0 end do won't unroll the loop. Plus the fastest thing to do is get the IF turned into a CMOVE instruction. So the speed wizards suggest do i=1,n,16 ! hand unrolled by 16, for example ! use DFOR$PREFETCH to prefetch elements I+17 thru I+32 ! like CALL DFOR$PREFETCH (A(I+17)) etc K0 = A(I ) .AND. '7f800000'X K1 = A(I+ 1) .AND. '7f800000'X ... K15 = A(I+15) .AND. '7f800000'X IF (K0 .EQ. 0) A(I ) = 0 ! these should become CMOVEs IF (K1 .EQ. 0) A(I+ 1) = 0 ... IF (K15 .EQ. 0) A(I+15) = 0 end do I hope something here has helped. /Stan
747.19	Sure would be nice if I could get to the RTL's exception counts...	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Thu Feb 06 1997 09:46`	87
	�>>I compiled with those switches, but I didn't do any calls to for_set_fpe �>>in my subroutine and it works handsomely. When I add the calls to for_set_fpe �>>the routine stops cleaning up the denorms. I've tried passing it �>>FPE_M_TRAP_UND and FPE_M_TRAP_INV.The documentation here is kinda sketchy. � �You want to set FPE_M_ABRUPT_UND - that's what says "stomp underflow to 0.0". �But on Alpha, at -fpe0, it's not the handler that does the stomping, it's the �hardware; that's why it's fast. At -fpe1, there are trap shadows, the handler �is called, and the ABRUPT_UND is done. Thanks, using FPE_M_ABRUPT_UND does it. Is there an explaination of what these mean anywhere? I understand about who squashes the denorms. The hardware will only do it if the RESULT of an operation is a denorm. If the INPUT to an operation is a denorm then a handler is needed. My reasoning for doing what I did was as follows: 1) If there aren't any invalid data, this should scream right along at one datum per cycle -- 2 memory accesses and one floating operation. 2) If I unroll by 16, there should only need to be 1 trapb at the end of each iteration of the loop since there are enough unique registers for the trap handler to figure out what's going on. I should even be able to unroll quite a bit more. Seems, however, that the compiler doesn't do scheduling to minimize the number of trapb instructions, so I can only unroll by 6 if I only want one trapb (Well, I can understand 2 trapb's -- 1 before the first store (leaving a few more add's to do, and one before the stores for those last few adds -- but I don't think the compiler should generate any more than that). �>>It looks like the fastest thing would be to use the Floating case until I get �>>a denorm, then use the integer case. How can I count the traps? Is there a �>>data structure I can hook up to? � �The RTL counts exceptions but those counts aren't available to you. Rats!!! That may be my downfall. I could do something with RPCCs at the head and tail of the loop and compare that against some number, but that's getting a little to ugly even for me. Any besides, Digital UNIX occasionally schedules "stuff" that runs in the process's context which will inflate this number from time to time. And there's cache misses to think about... The penalty for those cases where there are denorms is too high to ignore. �Something doesn't make sense here... when you load a denorm, it's not an �Underflow, it's an Invalid. That's why you need -fpe1 in order to touch the �data. -fpe1 has to have trap shadows and that slows stuff down. When the �"denorm + 0.0" is done, you get an underflow and the handler stomps it to 0.0. � �So what we want is -fpe0 and a way to touch the data so it won't get an Invalid �if the data is a denorm. And we want speed... � � do i = 1,n � if ((a(i).and. exponent-mask) .eq. 0) a(i) = 0 � end do � �won't unroll the loop. I would think that you'll pay for the trap shadow a little from the trapb as long as you don't have any denorms (but no more than 4 cycles -- the latency of an adds). Looking at the exponent requires a couple of integer operations and some combination of (conditional move/store or test/conditional branch/ store). If I remember correctly, you can only have two of (load/store/integer operation) per cycle, so these won't multi-issue as well as using a floating add). �Plus the fastest thing to do is get the IF turned into �a CMOVE instruction. So the speed wizards suggest � � do i=1,n,16 ! hand unrolled by 16, for example � ! use DFOR$PREFETCH to prefetch elements I+17 thru I+32 � ! like CALL DFOR$PREFETCH (A(I+17)) etc � K0 = A(I ) .AND. '7f800000'X � K1 = A(I+ 1) .AND. '7f800000'X � ... � K15 = A(I+15) .AND. '7f800000'X � IF (K0 .EQ. 0) A(I ) = 0 ! these should become CMOVEs � IF (K1 .EQ. 0) A(I+ 1) = 0 � ... � IF (K15 .EQ. 0) A(I+15) = 0 � end do � �I hope something here has helped. /Stan I'll have to try this and see what the compiler generates, but it looks like you're trying to generate a conditional STORE, not a conditional MOVE. If this doesn't do it, however, I'm sure I can write it to induce a CMOV -- I've done it before. I'll let you know what that does to the performance.
747.20	Latest integer code is fastest	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Thu Feb 06 1997 10:42`	29
	As I suspected, Stan's suggestion didn't induce CMOV instructions, but it did run pretty quick, and only did stores if it needed to. I was able to induce CMOVs as follows: f0 = array(ndx) if (iand(f0 , '7f800000'x) .eq. 0) f0 = 0 array(ndx) = f0 ... which always did the store (as does the version with the floating add) Both of these integer versions are about the same speed. The two calls to for_set_fpe() has slowed down the floating code when there are no exceptions, but sped it up when there are. Looks like I'll use the integer code. One (hopefully) final question, which will determine which variant of the integer code to use: Since the Alpha chips are read-allocate, is there any penalty for doing the stores even when they aren't needed? Doesn't seem to affect my toy program, but what will the effect be (if any) on a real application? If no impact on ev4 and ev5, how about ev6? Learning all the time... -- Chuck Newman
747.21	Doing stores doubles your mem bandwidth requirements	WIDTH::MDAVIS	Mark Davis - compiler maniac	`Thu Feb 06 1997 10:57`	5
	and since you're doing almost nothing with each element, for large arrays you will be running at mem bandwidth speed - doubling the requirement will double your time. Try running your toy program on an 8meg array.
747.22	A better version when there are lots of zeros	HYDRA::NEWMAN	Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26	`Wed Apr 02 1997 16:43`	87
	My vendor is expecting mostly zeros, so Stan's routine would still run at memory speed since it does the store for denorms and zeros. The following is better when there are mostly zeros for two reasons -- it avoids the store and uses fewer instructions. It also has a single precision version and a double precision version. F.W.I.W., The "mask" variable gave me troubles (I think I used V4.1) 1) If I didn't use it but had the constant in-line, the compiler generated it via an LDAH for each datum. 2) If I used the mask variable and initialized it with an implicit data statement, the compiler loaded it from memory for each datum. 3) Only if I used the mask variable and set it with an assignment statement could I get the compiler to keep it in a register. options /extend_source subroutine fixup_4(array, length) implicit none integer length integer array(length) integer UNROLL parameter (UNROLL=16) integer ndx integer f0, f1, f2, f3, f4, f5, f6, f7, f8 integer f9, f10, f11, f12, f13, f14, f15, f16 integer mask_4 mask_4 = '7f800000'x do ndx = 1, length-(UNROLL-1), UNROLL if ((iand(array(ndx + 0), mask_4) .eq. 0) .and. (array(ndx + 0) .ne. 0)) array(ndx + 0) = 0 if ((iand(array(ndx + 1), mask_4) .eq. 0) .and. (array(ndx + 1) .ne. 0)) array(ndx + 1) = 0 if ((iand(array(ndx + 2), mask_4) .eq. 0) .and. (array(ndx + 2) .ne. 0)) array(ndx + 2) = 0 if ((iand(array(ndx + 3), mask_4) .eq. 0) .and. (array(ndx + 3) .ne. 0)) array(ndx + 3) = 0 if ((iand(array(ndx + 4), mask_4) .eq. 0) .and. (array(ndx + 4) .ne. 0)) array(ndx + 4) = 0 if ((iand(array(ndx + 5), mask_4) .eq. 0) .and. (array(ndx + 5) .ne. 0)) array(ndx + 5) = 0 if ((iand(array(ndx + 6), mask_4) .eq. 0) .and. (array(ndx + 6) .ne. 0)) array(ndx + 6) = 0 if ((iand(array(ndx + 7), mask_4) .eq. 0) .and. (array(ndx + 7) .ne. 0)) array(ndx + 7) = 0 if ((iand(array(ndx + 8), mask_4) .eq. 0) .and. (array(ndx + 8) .ne. 0)) array(ndx + 8) = 0 if ((iand(array(ndx + 9), mask_4) .eq. 0) .and. (array(ndx + 9) .ne. 0)) array(ndx + 9) = 0 if ((iand(array(ndx + 10), mask_4) .eq. 0) .and. (array(ndx + 10) .ne. 0)) array(ndx + 10) = 0 if ((iand(array(ndx + 11), mask_4) .eq. 0) .and. (array(ndx + 11) .ne. 0)) array(ndx + 11) = 0 if ((iand(array(ndx + 12), mask_4) .eq. 0) .and. (array(ndx + 12) .ne. 0)) array(ndx + 12) = 0 if ((iand(array(ndx + 13), mask_4) .eq. 0) .and. (array(ndx + 13) .ne. 0)) array(ndx + 13) = 0 if ((iand(array(ndx + 14), mask_4) .eq. 0) .and. (array(ndx + 14) .ne. 0)) array(ndx + 14) = 0 if ((iand(array(ndx + 15), mask_4) .eq. 0) .and. (array(ndx + 15) .ne. 0)) array(ndx + 15) = 0 end do do ndx = ndx, length if ((iand(array(ndx), mask_4) .eq. 0) .and. (array(ndx) .ne. 0)) array(ndx) = 0 end do return end c options /extend_source subroutine fixup_8(array, length) implicit none integer length integer8 array(length) integer UNROLL parameter (UNROLL=16) integer ndx integer8 f0, f1, f2, f3, f4, f5, f6, f7, f8 integer8 f9, f10, f11, f12, f13, f14, f15, f16 integer8 mask_8 mask_8 = '7ff0000000000000'x do ndx = 1, length-(UNROLL-1), UNROLL if ((iand(array(ndx + 0), mask_8) .eq. 0) .and. (array(ndx + 0) .ne. 0)) array(ndx + 0) = 0 if ((iand(array(ndx + 1), mask_8) .eq. 0) .and. (array(ndx + 1) .ne. 0)) array(ndx + 1) = 0 if ((iand(array(ndx + 2), mask_8) .eq. 0) .and. (array(ndx + 2) .ne. 0)) array(ndx + 2) = 0 if ((iand(array(ndx + 3), mask_8) .eq. 0) .and. (array(ndx + 3) .ne. 0)) array(ndx + 3) = 0 if ((iand(array(ndx + 4), mask_8) .eq. 0) .and. (array(ndx + 4) .ne. 0)) array(ndx + 4) = 0 if ((iand(array(ndx + 5), mask_8) .eq. 0) .and. (array(ndx + 5) .ne. 0)) array(ndx + 5) = 0 if ((iand(array(ndx + 6), mask_8) .eq. 0) .and. (array(ndx + 6) .ne. 0)) array(ndx + 6) = 0 if ((iand(array(ndx + 7), mask_8) .eq. 0) .and. (array(ndx + 7) .ne. 0)) array(ndx + 7) = 0 if ((iand(array(ndx + 8), mask_8) .eq. 0) .and. (array(ndx + 8) .ne. 0)) array(ndx + 8) = 0 if ((iand(array(ndx + 9), mask_8) .eq. 0) .and. (array(ndx + 9) .ne. 0)) array(ndx + 9) = 0 if ((iand(array(ndx + 10), mask_8) .eq. 0) .and. (array(ndx + 10) .ne. 0)) array(ndx + 10) = 0 if ((iand(array(ndx + 11), mask_8) .eq. 0) .and. (array(ndx + 11) .ne. 0)) array(ndx + 11) = 0 if ((iand(array(ndx + 12), mask_8) .eq. 0) .and. (array(ndx + 12) .ne. 0)) array(ndx + 12) = 0 if ((iand(array(ndx + 13), mask_8) .eq. 0) .and. (array(ndx + 13) .ne. 0)) array(ndx + 13) = 0 if ((iand(array(ndx + 14), mask_8) .eq. 0) .and. (array(ndx + 14) .ne. 0)) array(ndx + 14) = 0 if ((iand(array(ndx + 15), mask_8) .eq. 0) .and. (array(ndx + 15) .ne. 0)) array(ndx + 15) = 0 end do do ndx = ndx, length if ((iand(array(ndx), mask_8) .eq. 0) .and. (array(ndx) .ne. 0)) array(ndx) = 0 end do return end