[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:	Digital Fortran
Notice:	Read notes 1.* for important information
Moderator:	QUARK::LIONEL

Created:	Thu Jun 01 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1333
Total number of notes:	6734

1178.0. "indirections" by RTOMS::PARETIJ () Mon Feb 17 1997 09:39

f77 on Unix 4.0b
-----------------

I'd be interested to know what's the best way to handle  
frquent indirect memory access in time critical routines like 
in the following fragment. 

Thanks,
Joseph
 
          DO 80 ZCELL=MINXCELL(3),MAXXCELL(3)
            DO 90 YCELL=MINXCELL(2),MAXXCELL(2)
               DO 100 XCELL=MINXCELL(1),MAXXCELL(1)

                  J = GRID(XCELL,YCELL,ZCELL)
 1                IF (J.GT.I) THEN
C--- Coordinate distances of particle pair:
                     DX(1) = X(1,I) - X(1,J)
                     DR    = DX(1)*DX(1)
                     DO 110 D=2,DIM
                        DX(D) = X(D,I) - X(D,J)
                        DR    = DR + DX(D)*DX(D)
 110                 CONTINUE
 ...

T.R Title User Personal
Name Date Lines

1178.1 WIDTH::MDAVIS Mark Davis - compiler maniac Wed Feb 19 1997 17:00 40

T.R	Title	User	Personal Name	Date	Lines
1178.1		WIDTH::MDAVIS	Mark Davis - compiler maniac	`Wed Feb 19 1997 17:00`	40
	A lot depends on what's in the "..." 1. usually there's a divide by sqrt of DR. You want to use -fast to permit this to become a reverse_sqrt. 2. usually X(,I) and X(,J) get modified to reflect the force each exerts on the other , depending on the distance between them. These look messy to the compiler because it doesn't know that I .ne. J. You might copy X(,I) into a temp array XI() before the loop, reference it instead of X(,I) inside the loop, and copy it back into X(,I) after the loop. 3. I would expect DIM to be only 3: you could unroll the DO 110 loop. If you don't unroll it, the compiler will guess that it can unroll it at least 4 times, and maybe software pipeline it - but this will be a waste if DIM is only 3... 4. How much memory gets swept over by X(*,J) for these 3 nested loops? If the memory is < 90k, then everything will probably hit in at least the scache. If the range is larger, several meg, then X(1,J) will be a long pause to fetch the value from memory. You could try doing a prefetch of the X(1,next_j). I'll assume that in the inner loop, once you get a J .gt. I, you'll probably get some more (but I don't rely on it). J = GRID(XCELL,YCELL,ZCELL) 1 IF (J.GT.I) THEN c Safe value to use for prefetch: I next_j = i if (xcell .lt. MAXXCELL(1)) then next_j = GRID(XCELL+1,YCELL,ZCELL) endif if (next_j .gt. i) then prefetch (x(1,next_j) endif ... It would be nicer not to have to check "if (xcell .lt. MAXXCELL(1))"

A lot depends on what's in the "..."

1. usually there's a divide by sqrt of DR.  You want to use
-fast to permit this to become a reverse_sqrt.

2. usually X(*,I) and X(*,J) get modified to reflect the force
each exerts on the other , depending on the distance between them.
These look messy to the compiler because it doesn't know that
I .ne. J.  You might copy X(*,I) into a temp array XI(*) before
the loop, reference it instead of X(*,I) inside the loop, and copy
it back into X(*,I) after  the loop.


3. I would expect DIM to be only 3: you could unroll the DO 110 loop.
If you don't unroll it, the compiler will guess that it can unroll it
 at least 4 times, and maybe software pipeline it - but this will be
a waste if DIM is only 3...

4. How much memory gets swept over by X(*,J) for these 3 nested loops?
If the memory is < 90k, then everything will probably hit in at least
the scache.  If the range is larger, several meg, then X(1,J) will be
a long pause to fetch the value from memory.  You could try doing a
prefetch of the X(1,next_j).

	I'll assume that in the inner loop, once you get a J .gt. I,
you'll probably get some more (but I don't rely on it).

                  J = GRID(XCELL,YCELL,ZCELL)
 1                IF (J.GT.I) THEN
c Safe value to use for prefetch: I
			next_j = i
			if (xcell .lt. MAXXCELL(1)) then
				next_j = GRID(XCELL+1,YCELL,ZCELL)
			endif
			if (next_j .gt. i) then
				prefetch (x(1,next_j)
			endif
			...

It would be nicer not to have to check "if (xcell .lt. MAXXCELL(1))"