[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:Digital Fortran
Notice:Read notes 1.* for important information
Moderator:QUARK::LIONEL
Created:Thu Jun 01 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1333
Total number of notes:6734

1301.0. "Performance question with assumed shape arrays" by PEACHS::DALEY ('Course it don't come with @!# wafers!) Fri May 16 1997 17:32

Hello All.
 
Got this from a customer:
-------------------------

  In the DEC F90 compiler, there appears to be a performance problem
connected with the use of assumed rather than explicit shape arrays as
argument declarations. The performance deterioration become signficantly
worse with higher levels of optimization. We have not seen a similar
problem in our other F90 compilers.

   Below are the timings for our code run using explicit or assumed shape
argument arrays for the compiler options -g, -O[1-5]. The timings are labeled
with explicit or assumed shape and the level of optimization. The asterisks
mark the timing for the section of the computation in which the assumed shape
argument arrays are used while the plus symbols mark the total CPU time for
the run. The only difference between the explicit_shape and assumed_shape codes
are the following lines which appear in one of the subroutines timed under the
Divergence & mass matrix heading in the table below:

<    subroutine div_matrix (mijk, x, y, z, cx, cy, cz, v, ml)
<       real (kind=rkind), intent(in),  dimension(numnp)    :: x, y, z
<       real (kind=rkind), intent(out), dimension(numel)    :: v
<       real (kind=rkind), intent(out), dimension(numnp)    :: ml
---
>    subroutine div_matrix (mijk, x, y, z, cx, cy, cz, v, ml)
>       real (kind=rkind), intent(in),  dimension(:)        :: x, y, z
>       real (kind=rkind), intent(out), dimension(:)        :: v
>       real (kind=rkind), intent(out), dimension(:)        :: ml

Note that the arrays involved are rather large (in this particular case
they were on the order of 75000 points).

   At optimization levels -O1/-O2 and above the use of assumed shape rather
then explicit argument array declarations results in an order of magnitude
slow down in the Divergence & mass matrix computation. Also as a matter of
general interest, note that the optimization is not improved for our code by
any level above -O2, i.e. additional global optimization (-O3), automatic
inlining (-O4) and software pipelining (-O5). These timings are not
significantly improved by inlining all subroutines within the same module.

mc_wind.log.g.explicit_shape
        Mesh, initial wind field    [sec.] ........... 1.3762E-01
*       Divergence & mass matrices  [sec.] ........... 2.4746E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 3.5478E+00
        Form & decompose PEM matrix [sec.] ........... 2.0784E+01
        Solution of lamda           [sec.] ........... 8.4338E+01
        Wind field update & output  [sec.] ........... 4.3188E+00
        Total system-call time      [sec.] ........... 8.0422E-01
+       Total CPU time              [sec.] ........... 1.3904E+02
        Element cycle time  [sec./element] ........... 1.8646E-03
mc_wind.log.g.assumed_shape
        Mesh, initial wind field    [sec.] ........... 1.4738E-01
*       Divergence & mass matrices  [sec.] ........... 5.5468E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 3.5907E+00
        Form & decompose PEM matrix [sec.] ........... 2.0891E+01
        Solution of lamda           [sec.] ........... 8.4594E+01
        Wind field update & output  [sec.] ........... 4.3325E+00
        Total system-call time      [sec.] ........... 1.1624E+00
+       Total CPU time              [sec.] ........... 1.7019E+02
        Element cycle time  [sec./element] ........... 2.2847E-03

mc_wind.log.O0.explicit_shape
        Mesh, initial wind field    [sec.] ........... 1.4445E-01
*       Divergence & mass matrices  [sec.] ........... 2.4832E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 3.5487E+00
        Form & decompose PEM matrix [sec.] ........... 2.0754E+01
        Solution of lamda           [sec.] ........... 8.4251E+01
        Wind field update & output  [sec.] ........... 4.3188E+00
        Total system-call time      [sec.] ........... 1.0336E+00
+       Total CPU time              [sec.] ........... 1.3892E+02
        Element cycle time  [sec./element] ........... 1.8660E-03
mc_wind.log.O0.assumed_shape
        Mesh, initial wind field    [sec.] ........... 1.4542E-01
*       Divergence & mass matrices  [sec.] ........... 5.4376E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 3.6580E+00
        Form & decompose PEM matrix [sec.] ........... 2.0821E+01
        Solution of lamda           [sec.] ........... 8.4388E+01
        Wind field update & output  [sec.] ........... 4.3315E+00
        Total system-call time      [sec.] ........... 9.6234E-01
+       Total CPU time              [sec.] ........... 1.6875E+02
        Element cycle time  [sec./element] ........... 2.2629E-03

mc_wind.log.O1.explicit_shape
        Mesh, initial wind field    [sec.] ........... 8.8816E-02
*       Divergence & mass matrices  [sec.] ........... 3.9840E+00
        Normals, New Ct, BCs & RHS  [sec.] ........... 9.3403E-01
        Form & decompose PEM matrix [sec.] ........... 7.7494E+00
        Solution of lamda           [sec.] ........... 2.7913E+01
        Wind field update & output  [sec.] ........... 6.9589E-01
        Total system-call time      [sec.] ........... 8.9597E-01
+       Total CPU time              [sec.] ........... 4.2281E+01
        Element cycle time  [sec./element] ........... 5.7570E-04
mc_wind.log.O1.assumed_shape
        Mesh, initial wind field    [sec.] ........... 8.4912E-02
*       Divergence & mass matrices  [sec.] ........... 2.7152E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 8.9597E-01
        Form & decompose PEM matrix [sec.] ........... 7.5913E+00
        Solution of lamda           [sec.] ........... 2.8070E+01
        Wind field update & output  [sec.] ........... 7.0272E-01
        Total system-call time      [sec.] ........... 8.5010E-01
+       Total CPU time              [sec.] ........... 6.5243E+01
        Element cycle time  [sec./element] ........... 8.8124E-04

mc_wind.log.O2.explicit_shape
        Mesh, initial wind field    [sec.] ........... 9.6624E-02
*       Divergence & mass matrices  [sec.] ........... 1.8856E+00
        Normals, New Ct, BCs & RHS  [sec.] ........... 5.6803E-01
        Form & decompose PEM matrix [sec.] ........... 5.3329E+00
        Solution of lamda           [sec.] ........... 1.9365E+01
        Wind field update & output  [sec.] ........... 4.7824E-01
        Total system-call time      [sec.] ........... 8.3350E-01
        Total CPU time              [sec.] ........... 2.8391E+01
        Element cycle time  [sec./element] ........... 3.8966E-04
mc_wind.log.O2.assumed_shape
        Mesh, initial wind field    [sec.] ........... 9.4672E-02
*       Divergence & mass matrices  [sec.] ........... 2.0513E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 5.5144E-01
        Form & decompose PEM matrix [sec.] ........... 5.4607E+00
        Solution of lamda           [sec.] ........... 1.9851E+01
        Wind field update & output  [sec.] ........... 5.0850E-01
        Total system-call time      [sec.] ........... 8.3253E-01
        Total CPU time              [sec.] ........... 4.7662E+01
        Element cycle time  [sec./element] ........... 6.4659E-04

mc_wind.log.O3.explicit_shape
        Mesh, initial wind field    [sec.] ........... 8.5888E-02
*       Divergence & mass matrices  [sec.] ........... 1.8564E+00
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.8605E-01
        Form & decompose PEM matrix [sec.] ........... 5.0449E+00
        Solution of lamda           [sec.] ........... 1.8840E+01
        Wind field update & output  [sec.] ........... 4.7726E-01
        Total system-call time      [sec.] ........... 8.0227E-01
+       Total CPU time              [sec.] ........... 2.7523E+01
        Element cycle time  [sec./element] ........... 3.7767E-04
mc_wind.log.O3.assumed_shape
        Mesh, initial wind field    [sec.] ........... 8.7840E-02
*       Divergence & mass matrices  [sec.] ........... 1.7146E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.7531E-01
        Form & decompose PEM matrix [sec.] ........... 4.9552E+00
        Solution of lamda           [sec.] ........... 1.9459E+01
        Wind field update & output  [sec.] ........... 4.7726E-01
        Total system-call time      [sec.] ........... 8.6474E-01
+       Total CPU time              [sec.] ........... 4.3289E+01
        Element cycle time  [sec./element] ........... 5.8871E-04

mc_wind.log.O4.explicit_shape
        Mesh, initial wind field    [sec.] ........... 8.9792E-02
*       Divergence & mass matrices  [sec.] ........... 1.7968E+00
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.8214E-01
        Form & decompose PEM matrix [sec.] ........... 4.8995E+00
        Solution of lamda           [sec.] ........... 1.9074E+01
        Wind field update & output  [sec.] ........... 4.4994E-01
        Total system-call time      [sec.] ........... 8.3448E-01
+       Total CPU time              [sec.] ........... 2.7501E+01
        Element cycle time  [sec./element] ........... 3.7780E-04
mc_wind.log.O4.assumed_shape
        Mesh, initial wind field    [sec.] ........... 8.8816E-02
*       Divergence & mass matrices  [sec.] ........... 1.7061E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.5482E-01
        Form & decompose PEM matrix [sec.] ........... 4.8595E+00
        Solution of lamda           [sec.] ........... 1.8911E+01
        Wind field update & output  [sec.] ........... 4.9385E-01
        Total system-call time      [sec.] ........... 8.4619E-01
+       Total CPU time              [sec.] ........... 4.2620E+01
        Element cycle time  [sec./element] ........... 5.7955E-04

mc_wind.log.O5.explicit_shape
        Mesh, initial wind field    [sec.] ........... 8.9792E-02
*       Divergence & mass matrices  [sec.] ........... 1.8173E+00
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.5677E-01
        Form & decompose PEM matrix [sec.] ........... 5.3055E+00
        Solution of lamda           [sec.] ........... 1.8415E+01
        Wind field update & output  [sec.] ........... 4.5970E-01
        Total system-call time      [sec.] ........... 8.5205E-01
+       Total CPU time              [sec.] ........... 2.7219E+01
        Element cycle time  [sec./element] ........... 3.7428E-04
mc_wind.log.O5.assumed_shape
        Mesh, initial wind field    [sec.] ........... 8.9792E-02
*       Divergence & mass matrices  [sec.] ........... 1.7240E+01
        Normals, New Ct, BCs & RHS  [sec.] ........... 4.4115E-01
        Form & decompose PEM matrix [sec.] ........... 5.1992E+00
        Solution of lamda           [sec.] ........... 1.8244E+01
        Wind field update & output  [sec.] ........... 4.5774E-01
        Total system-call time      [sec.] ........... 8.7352E-01
+       Total CPU time              [sec.] ........... 4.2274E+01
        Element cycle time  [sec./element] ........... 5.7531E-04


Any comments?  Is this expected?  They've provided me with code, but
it's 3000 lines or so.

They have not provided me with the version of Fortran...yet.

Any help/information is appreciated.

John

T.RTitleUserPersonal
Name
DateLines