[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:	Digital Fortran
Notice:	Read notes 1.* for important information
Moderator:	QUARK::LIONEL

Created:	Thu Jun 01 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1333
Total number of notes:	6734

1270.0. "-fpe1 and trapb -- what's required?" by HYDRA::NEWMAN (Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26) Fri Apr 25 1997 16:08

Summary:  Code compiled -fpe1 fails with floating underflow failure.

Details:
Digital UNIX V?.? and DEC Fortran V3.8-711 (but the machine instructions look
the same with Digital Fortran 77 V4.1-92)

I'm working with a software vendor who compiles C/C++ code with -ieee and
FORTRAN code with -fpe1.  The main routine is C or C++, and they bounce around
between the three languages quite a bit.

In the main routine they have the following calls:

  for_rtl_init_ ( &argc, argv );
  io_status = for_set_fpe_(&0x00010001);

I got 0x00010001 as the output of for_get_fpe from a fortran main program built
-fpe1.

This has cleared up most of their errors, but they still have one routine in one
executable that is giving them problems (floating underflow failure).

I've included the FORTRAN lines and the machine instructions.

The failure points to the subs/su.  f26 is a legit s-float, f25 is 0.0, and f27
is a different legit s-float.

I asked them to call for_get_fpe just before this, and it still returns
0x00010001 (i.e., nothing has changed it).

I notice that the first muls/su does *NOT* have a trapb after it, but all 7
subsequent floating operations do (it was unrolled 4 times).  Looking at re-use
of registers (or lack thereof), I would think that either all the floating
operations should have a subsequent trapb or only the  of them should (or am I
missing something w.r.t. the trap shadows?).


Here's what FORTRAN generates: 

            100    39     DO 142 J=K,N
            101   142     A(I,J)=A(I,J)-AMULT*A(K,J)

             04F0    .142:
5AF8B059     04F4    muls/su AMULT, f24, f25    ; f23, f24, f25
8B550000     04F8    lds        f26, (r21)      ; f26, (r21)
5B59B03B     04FC    subs/su f26, f25, f27      ; f26, f25, f27
63FF0000     0500    trapb                      ;
40890404     0504    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
9B750000     0508    sts        f27, (r21)      ; f27, (r21)        ; 000101
8B840000     050C    lds        f28, (r4)       ; f28, (r4)
5AFCB05D     0510    muls/su AMULT, f28, f29    ; f23, f28, f29
63FF0000     0514    trapb                      ;
42BA0415     0518    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
8BD50000     051C    lds        f30, (r21)      ; f30, (r21)        ; 000101
5BDDB021     0520    subs/su f30, f29, f1       ; f30, f29, f1
63FF0000     0524    trapb                      ;
40890404     0528    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
98350000     052C    sts        f1, (r21)       ; f1, (r21)         ; 000101
89840000     0530    lds        f12, (r4)       ; f12, (r4)
5AECB04E     0534    muls/su AMULT, f12, f14    ; f23, f12, f14
63FF0000     0538    trapb                      ;
42BA0415     053C    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
89F50000     0540    lds        f15, (r21)      ; f15, (r21)        ; 000101
59EEB02D     0544    subs/su f15, f14, f13      ; f15, f14, f13
63FF0000     0548    trapb                      ;
40890404     054C    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
99B50000     0550    sts        f13, (r21)      ; f13, (r21)        ; 000101
8A040000     0554    lds        f16, (r4)       ; f16, (r4)
5AF0B052     0558    muls/su AMULT, f16, f18    ; f23, f16, f18
63FF0000     055C    trapb                      ;
42BA0415     0560    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
8A750000     0564    lds        f19, (r21)      ; f19, (r21)        ; 000101
5A72B034     0568    subs/su f19, f18, f20      ; f19, f18, f20
63FF0000     056C    trapb                      ;
40E09007     0570    addl       J, 4, J         ; r7, 4, r7         ; 000100
40E60DA3     0574    cmple      J, r6, r3       ; r7, r6, r3
9A950000     0578    sts        f20, (r21)      ; f20, (r21)        ; 000101
40890404     057C    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
42BA0415     0580    addq       r21, r26, r21   ; r21, r26, r21
F47FFFDA     0584    bne        r3, .142        ; r3, .142


Also, for -fpe1, why are the operations /SU?  Since -fpe1 specifies that
underflows should be zero and the hardware does that anyway, I don't understand
the need for the trap on them (I'm obviously missing something).

T.R	Title	User	Personal Name	Date	Lines
1270.1	To avoid /S and trapb, you must use -fpe0	WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Fri Apr 25 1997 16:32`	16
	/su is needed so that overflows can produce infinities, and other bad operations (such as division by zero) can produce NaN's. No trapb is needed after the first muls/su because (roughly) none of its inputs are overwritten by later instructions. The first trapb is inserted right before an addq r4,r9,r4 instruction that overwrites its own input -- it would be incorrect to reexecute this instruction when resuming after a trap. All the other trapb's in this example are for a similar case. See the Alpha Architecture Handbook or the Alpha Architecture Reference Manual, section 4.7.6.1 for the rules the compiler follows when deciding where a trapb is needed. How common is it for performance-sensitive Alpha code to be compiled with a -fpe mode other than zero? The example posted has many more trapb's than are strictly needed -- is it important for the compiler to generate better code for these cases?
1270.2	More info, please	TLE::EKLUND	Always smiling on the inside!	`Fri Apr 25 1997 17:29`	17
	It would be useful to see exactly the error message that got generated. If you are trying to use the debugger, please do a "ignore fpe" so that any exception will get handled properly by the RTL. Generally speaking, normal floating point code which is compiled -fpe1 may encounter underflows, but they should not cause any "failures" - might slow things down a bit, but that's about all. If there really is a "failure", we need to have a reproducible case (for example, that zero you saw in the register may be something else, like a denormalized number). There is nothing that I noticed that was remotely suspicious about the generated code. We are going to need more information in order to help you - best is an executable example, using more current software. Cheers! Dave Eklund
1270.3	What does the FORTRAN RTL do with this call?	WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Mon Apr 28 1997 08:16`	10
	> io_status = for_set_fpe_(&0x00010001); This asks the FORTRAN RTL to trap underflows and replace the result with zero. In order to do this, the FORTRAN RTL's signal handler needs to be established. Does the for_set_fpe() call do that? Or does it depend on the main program being a FORTRAN program compiled with -fpe1? Or does this routine "pattern match" for this set of requests, and translate it into a call to ieee_set_fp_control() to set IEEE_MAP_UMZ, so that the operating system maps underflows to zero without signaling an exception?
1270.4	rtl action	RTL::HILLIARD		`Tue Apr 29 1997 16:35`	18
	> io_status = for_set_fpe_(&0x00010001); This asks the FORTRAN RTL to trap underflows and replace the result with zero. In order to do this, the FORTRAN RTL's signal handler needs to be established. Does the for_set_fpe() call do that? Or does it depend on the main program being a FORTRAN program compiled with -fpe1? >> The forrtl signal handler gets established by for_rtl_init_(). Doesn't depend on the -fpe option. Or does this routine "pattern match" for this set of requests, and translate it into a call to ieee_set_fp_control() to set IEEE_MAP_UMZ, so that the operating system maps underflows to zero without signaling an exception? >> for_set_fpe() maps the bits of the input arg to a similar mask to ieee_set_fp_control(). In this case, the rtl will set both IEEE_MAP_UMZ and IEEE_TRAP_ENABLE_UNF in the call to ieee_set_fp_control().
1270.5		WIBBIN::NOYCE	Pulling weeds, pickin' stones	`Tue Apr 29 1997 16:52`	5
	>> The forrtl signal handler gets established by for_rtl_init_(). Doesn't depend on the -fpe option. Who is responsible for calling for_rtl_init when the main program isn't Fortran, as in .0's case?
1270.6	The user calls for_rtl_init_ as follows	TLE::EKLUND	Always smiling on the inside!	`Tue Apr 29 1997 17:26`	36
	This is from the release notes (note 22.4): - The documentation for the for_rtl_init_ routine does not describe the arguments to for_rtl_init_. When the main program is a C program that calls Fortran subprograms, the main C program should call the for_rtl_init_ routine, which allows Fortran subprograms to properly use the DEC Fortran RTL. The following should be added to the above: The C function prototype for for_rtl_init_ is as follows: void for_rtl_init_ ( int argc, / Access: Read. # of args to the main program / char argv[] /* Access: Read. Pointer to the list of args / ); Note: Unlike a main C function, the first argument, argc, is passed by address. The following is an example of how to use the for_rtl_init_ routine from a main C function. int main ( int argc, char argv[] ) { extern void for_rtl_init_( int rtl_argc, char rtl_argv[] ); extern void my_fortran_subroutine( void ); for_rtl_init_( &argc, argv ); my_fortran_subroutine(); return( 1 ); }
1270.7	Seems OK (so far)	TLE::EKLUND	Always smiling on the inside!	`Tue Apr 29 1997 17:28`	6
	But the claim is that the user is already doing this (calling for_rtl_init_) correctly. Cheers! Dave Eklund