[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:Digital Fortran
Notice:Read notes 1.* for important information
Moderator:QUARK::LIONEL
Created:Thu Jun 01 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1333
Total number of notes:6734

1270.0. "-fpe1 and trapb -- what's required?" by HYDRA::NEWMAN (Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26) Fri Apr 25 1997 17:08

Summary:  Code compiled -fpe1 fails with floating underflow failure.

Details:
Digital UNIX V?.? and DEC Fortran V3.8-711 (but the machine instructions look
the same with Digital Fortran 77 V4.1-92)

I'm working with a software vendor who compiles C/C++ code with -ieee and
FORTRAN code with -fpe1.  The main routine is C or C++, and they bounce around
between the three languages quite a bit.

In the main routine they have the following calls:

  for_rtl_init_ ( &argc, argv );
  io_status = for_set_fpe_(&0x00010001);

I got 0x00010001 as the output of for_get_fpe from a fortran main program built
-fpe1.

This has cleared up most of their errors, but they still have one routine in one
executable that is giving them problems (floating underflow failure).

I've included the FORTRAN lines and the machine instructions.

The failure points to the subs/su.  f26 is a legit s-float, f25 is 0.0, and f27
is a different legit s-float.

I asked them to call for_get_fpe just before this, and it still returns
0x00010001 (i.e., nothing has changed it).

I notice that the first muls/su does *NOT* have a trapb after it, but all 7
subsequent floating operations do (it was unrolled 4 times).  Looking at re-use
of registers (or lack thereof), I would think that either all the floating
operations should have a subsequent trapb or only the  of them should (or am I
missing something w.r.t. the trap shadows?).


Here's what FORTRAN generates: 

            100    39     DO 142 J=K,N
            101   142     A(I,J)=A(I,J)-AMULT*A(K,J)

             04F0    .142:
5AF8B059     04F4    muls/su AMULT, f24, f25    ; f23, f24, f25
8B550000     04F8    lds        f26, (r21)      ; f26, (r21)
5B59B03B     04FC    subs/su f26, f25, f27      ; f26, f25, f27
63FF0000     0500    trapb                      ;
40890404     0504    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
9B750000     0508    sts        f27, (r21)      ; f27, (r21)        ; 000101
8B840000     050C    lds        f28, (r4)       ; f28, (r4)
5AFCB05D     0510    muls/su AMULT, f28, f29    ; f23, f28, f29
63FF0000     0514    trapb                      ;
42BA0415     0518    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
8BD50000     051C    lds        f30, (r21)      ; f30, (r21)        ; 000101
5BDDB021     0520    subs/su f30, f29, f1       ; f30, f29, f1
63FF0000     0524    trapb                      ;
40890404     0528    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
98350000     052C    sts        f1, (r21)       ; f1, (r21)         ; 000101
89840000     0530    lds        f12, (r4)       ; f12, (r4)
5AECB04E     0534    muls/su AMULT, f12, f14    ; f23, f12, f14
63FF0000     0538    trapb                      ;
42BA0415     053C    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
89F50000     0540    lds        f15, (r21)      ; f15, (r21)        ; 000101
59EEB02D     0544    subs/su f15, f14, f13      ; f15, f14, f13
63FF0000     0548    trapb                      ;
40890404     054C    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
99B50000     0550    sts        f13, (r21)      ; f13, (r21)        ; 000101
8A040000     0554    lds        f16, (r4)       ; f16, (r4)
5AF0B052     0558    muls/su AMULT, f16, f18    ; f23, f16, f18
63FF0000     055C    trapb                      ;
42BA0415     0560    addq       r21, r26, r21   ; r21, r26, r21     ; 000100
8A750000     0564    lds        f19, (r21)      ; f19, (r21)        ; 000101
5A72B034     0568    subs/su f19, f18, f20      ; f19, f18, f20
63FF0000     056C    trapb                      ;
40E09007     0570    addl       J, 4, J         ; r7, 4, r7         ; 000100
40E60DA3     0574    cmple      J, r6, r3       ; r7, r6, r3
9A950000     0578    sts        f20, (r21)      ; f20, (r21)        ; 000101
40890404     057C    addq       r4, r9, r4      ; r4, r9, r4        ; 000100
42BA0415     0580    addq       r21, r26, r21   ; r21, r26, r21
F47FFFDA     0584    bne        r3, .142        ; r3, .142


Also, for -fpe1, why are the operations /SU?  Since -fpe1 specifies that
underflows should be zero and the hardware does that anyway, I don't understand
the need for the trap on them (I'm obviously missing something).
T.RTitleUserPersonal
Name
DateLines
1270.1To avoid /S and trapb, you must use -fpe0WIBBIN::NOYCEPulling weeds, pickin' stonesFri Apr 25 1997 17:3216
/su is needed so that overflows can produce infinities, and other bad
operations (such as division by zero) can produce NaN's.

No trapb is needed after the first muls/su because (roughly) none of its
inputs are overwritten by later instructions.  The first trapb is inserted
right before an addq r4,r9,r4 instruction that overwrites its own input
-- it would be incorrect to reexecute this instruction when resuming after
a trap.  All the other trapb's in this example are for a similar case.
See the Alpha Architecture Handbook or the Alpha Architecture Reference Manual,
section 4.7.6.1 for the rules the compiler follows when deciding where a
trapb is needed.

How common is it for performance-sensitive Alpha code to be compiled with
a -fpe mode other than zero?  The example posted has many more trapb's than
are strictly needed -- is it important for the compiler to generate better
code for these cases?
1270.2More info, pleaseTLE::EKLUNDAlways smiling on the inside!Fri Apr 25 1997 18:2917
    	It would be useful to see exactly the error message that got
    generated.  If you are trying to use the debugger, please do a
    "ignore fpe" so that any exception will get handled properly by
    the RTL.  Generally speaking, normal floating point code which
    is compiled -fpe1 may encounter underflows, but they should not
    cause any "failures" - might slow things down a bit, but that's
    about all.  If there really is a "failure", we need to have a
    reproducible case (for example, that zero you saw in the register
    may be something else, like a denormalized number).  There is
    nothing that I noticed that was remotely suspicious about the
    generated code.  We are going to need more information in order
    to help you - best is an executable example, using more current
    software.  
    
    Cheers!
    Dave Eklund
    
1270.3What does the FORTRAN RTL do with this call?WIBBIN::NOYCEPulling weeds, pickin' stonesMon Apr 28 1997 09:1610
>  io_status = for_set_fpe_(&0x00010001);

This asks the FORTRAN RTL to trap underflows and replace the result with zero.
In order to do this, the FORTRAN RTL's signal handler needs to be established.
Does the for_set_fpe() call do that?  Or does it depend on the main program
being a FORTRAN program compiled with -fpe1?

Or does this routine "pattern match" for this set of requests, and translate
it into a call to ieee_set_fp_control() to set IEEE_MAP_UMZ, so that the
operating system maps underflows to zero without signaling an exception?
1270.4rtl actionRTL::HILLIARDTue Apr 29 1997 17:3518
>  io_status = for_set_fpe_(&0x00010001);

This asks the FORTRAN RTL to trap underflows and replace the result with zero.
In order to do this, the FORTRAN RTL's signal handler needs to be established.
Does the for_set_fpe() call do that?  Or does it depend on the main program
being a FORTRAN program compiled with -fpe1?

>> The forrtl signal handler gets established by for_rtl_init_().
   Doesn't depend on the -fpe option.

Or does this routine "pattern match" for this set of requests, and translate
it into a call to ieee_set_fp_control() to set IEEE_MAP_UMZ, so that the
operating system maps underflows to zero without signaling an exception?

>> for_set_fpe() maps the bits of the input arg to a similar mask to
   ieee_set_fp_control().  In this case, the rtl will set both IEEE_MAP_UMZ
   and IEEE_TRAP_ENABLE_UNF in the call to ieee_set_fp_control().

1270.5WIBBIN::NOYCEPulling weeds, pickin' stonesTue Apr 29 1997 17:525
>> The forrtl signal handler gets established by for_rtl_init_().
   Doesn't depend on the -fpe option.

Who is responsible for calling for_rtl_init when the main program isn't
Fortran, as in .0's case?
1270.6The user calls for_rtl_init_ as followsTLE::EKLUNDAlways smiling on the inside!Tue Apr 29 1997 18:2636
	This is from the release notes (note 22.4):

	-   The documentation for the for_rtl_init_ routine does not describe
	    the arguments to for_rtl_init_.

	      When the main program is a C program that calls Fortran
	      subprograms, the main C program should call the for_rtl_init_
	      routine, which allows Fortran subprograms to properly use the DEC
	      Fortran RTL.

	    The following should be added to the above:

	      The C function prototype for for_rtl_init_ is as follows:

	      void for_rtl_init_ (
	         int *argc,   /* Access: Read.  # of args to the main program */
	         char *argv[] /* Access: Read.  Pointer to the list of args */
	         );

	      Note:  Unlike a main C function, the first argument, argc, is
	             passed by address.

	      The following is an example of how to use the for_rtl_init_
	      routine from a main C function.

	      int main ( int argc, char *argv[] )
	        {
	        extern void for_rtl_init_( int *rtl_argc, char *rtl_argv[] );
	        extern void my_fortran_subroutine( void );

	        for_rtl_init_( &argc, argv );
	        my_fortran_subroutine();
	        return( 1 );
	        }

    
1270.7Seems OK (so far)TLE::EKLUNDAlways smiling on the inside!Tue Apr 29 1997 18:286
    	But the claim is that the user is already doing this (calling
    for_rtl_init_) correctly.
    
    Cheers!
    Dave Eklund