[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:Digital Fortran
Notice:Read notes 1.* for important information
Moderator:QUARK::LIONEL
Created:Thu Jun 01 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1333
Total number of notes:6734

1178.0. "indirections" by RTOMS::PARETIJ () Mon Feb 17 1997 09:39

f77 on Unix 4.0b
-----------------

I'd be interested to know what's the best way to handle  
frquent indirect memory access in time critical routines like 
in the following fragment. 

Thanks,
Joseph
 
          DO 80 ZCELL=MINXCELL(3),MAXXCELL(3)
            DO 90 YCELL=MINXCELL(2),MAXXCELL(2)
               DO 100 XCELL=MINXCELL(1),MAXXCELL(1)

                  J = GRID(XCELL,YCELL,ZCELL)
 1                IF (J.GT.I) THEN
C--- Coordinate distances of particle pair:
                     DX(1) = X(1,I) - X(1,J)
                     DR    = DX(1)*DX(1)
                     DO 110 D=2,DIM
                        DX(D) = X(D,I) - X(D,J)
                        DR    = DR + DX(D)*DX(D)
 110                 CONTINUE
 ...
T.RTitleUserPersonal
Name
DateLines
1178.1WIDTH::MDAVISMark Davis - compiler maniacWed Feb 19 1997 17:0040
A lot depends on what's in the "..."

1. usually there's a divide by sqrt of DR.  You want to use
-fast to permit this to become a reverse_sqrt.

2. usually X(*,I) and X(*,J) get modified to reflect the force
each exerts on the other , depending on the distance between them.
These look messy to the compiler because it doesn't know that
I .ne. J.  You might copy X(*,I) into a temp array XI(*) before
the loop, reference it instead of X(*,I) inside the loop, and copy
it back into X(*,I) after  the loop.


3. I would expect DIM to be only 3: you could unroll the DO 110 loop.
If you don't unroll it, the compiler will guess that it can unroll it
 at least 4 times, and maybe software pipeline it - but this will be
a waste if DIM is only 3...

4. How much memory gets swept over by X(*,J) for these 3 nested loops?
If the memory is < 90k, then everything will probably hit in at least
the scache.  If the range is larger, several meg, then X(1,J) will be
a long pause to fetch the value from memory.  You could try doing a
prefetch of the X(1,next_j).

	I'll assume that in the inner loop, once you get a J .gt. I,
you'll probably get some more (but I don't rely on it).

                  J = GRID(XCELL,YCELL,ZCELL)
 1                IF (J.GT.I) THEN
c Safe value to use for prefetch: I
			next_j = i
			if (xcell .lt. MAXXCELL(1)) then
				next_j = GRID(XCELL+1,YCELL,ZCELL)
			endif
			if (next_j .gt. i) then
				prefetch (x(1,next_j)
			endif
			...

It would be nicer not to have to check "if (xcell .lt. MAXXCELL(1))"