[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nicctr::kap-users

Title:Kuck Associates Preprocessor Users
Notice:KAP V2.1 (f90,f77,C) SSB-kits - see note 2
Moderator:HPCGRP::DEGREGORY
Created:Fri Nov 22 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:390
Total number of notes:1440

366.0. "Remplacement for C$DIR.. ?" by 48641::BOIRIN () Mon Feb 10 1997 08:15

    Hello happy HPC peoples !!
    
    I'm not a Fortran guru and I have a benchmark to run. I have some
    issues with the following, what KAP's directives should be used ??
    
    	     .
    	     .		
             kavz(i44)=1
    C$DIR BEGIN_TASKS
            call loop2000(eo1,vcoul1,str1,i1,i2)
    C$DIR NEXT_TASK
            call loop2000(eo2,vcoul2,str2,i22,i3)
    C$DIR NEXT_TASK
            call loop2000(eo3,vcoul3,str3,i33,i4)
    C$DIR NEXT_TASK
            call loop2000(eo4,vcoul4,str4,i44,i5)
    C$DIR END_TASKS
              
    Thank you for your help. Jean-Pierre.
    
T.RTitleUserPersonal
Name
DateLines
366.1a thoughtHPCGRP::DEGREGORYKaren 223-5801Mon Feb 10 1997 13:5619
Jean-Pierre -

I believe what those Cray directives are telling you is that you
should run those 4 calls in parallel.

The KAP products support parallelism on a loop level.  So, my guess
is that you would end up parallelizing loop2000 itself.  So the
the first call would run on all the processors, then the second call
would run on all the processors etc.  In other words, KAP would
do the parallelism one level down.

If you have the module that has loop2000, run the kap automatic parallelizer
over it and see if it picks up and parallelizes the loop automatically
(you will see a call to mppfrk and the entire loop gets done as a subroutine
named with a PKprogamname_loopnumber (programname is the name of this
module, and loop number is incremented for every loop kap parallelizes).

Karen
366.248641::BOIRINThu Feb 13 1997 10:4012
    Hi Karen
    
    Thank you for your help. I have suppressed all the C$DIR directives,
    linked the four calls together and used KAP to do automatic
    parallelization of subroutine's loops.
    
    The main issue is that this code was wrote with tasking in mind and now
    I have a lot data dependencies within the loops. So the speed-up = 0 !!
    
    Do you think there is an easy way to simulate tasking with KAP ? 
    
    Thank again. Best regards. Jean-Pierre. 
366.3Try doing what the source code said.GEMGRP::PIEPERThu Feb 13 1997 11:2836
Jean-pierre,

	You need to change the code like this:

	.
	.
	kavz(i44)=1

	new_eo(1) = eo1
	new_eo(2) = eo2
	new_eo(3) = eo3
	new_eo(4) = eo4

	(similar assignments for vcoul[1-4] and str[1-4])

	new_arg4(1) = i1
	new_arg4(2) = i22
	(etc., and similar for the 5th argument)

c$dir parallel do
	do new_i = 1, 4
		call loop2000(
			new_eo(new_i),
			new_vcoul(new_i),
			new_str(new_i),
			new_arg4(new_i),
			new_arg5(new_i))
	end do

This will let KAP parallelize the four calls just the way it used to do.
You may have to spell "c$dir parallel do" some other way for KAP -- that is the
PCF spelling. Karen can advise better about details. (maybe you need a
concurrent_call directive too?)

You are limited to 4-way parallelism, but the code as written had that
restriction, too.
366.4Is this what you want?HPCGRP::MANLEYThu Feb 13 1997 13:4132
Re: .0, .2

Change this:

             kavz(i44)=1
    C$DIR BEGIN_TASKS
            call loop2000(eo1,vcoul1,str1,i1,i2)
    C$DIR NEXT_TASK
            call loop2000(eo2,vcoul2,str2,i22,i3)
    C$DIR NEXT_TASK
            call loop2000(eo3,vcoul3,str3,i33,i4)
    C$DIR NEXT_TASK
            call loop2000(eo4,vcoul4,str4,i44,i5)
    C$DIR END_TASKS

to something like this:

    C*$*    ASSERT CONCURRENT CALL
    C*$*    ASSERT DO( CONCURRENT )
            DO I=1,4
                IF(     I.EQ.1 )THEN
                    call loop2000(eo1,vcoul1,str1,i1,i2)
                ELSEIF( I.EQ.2 )THEN
                    call loop2000(eo2,vcoul2,str2,i22,i3)
                ELSEIF( I.EQ.3 )THEN
                    call loop2000(eo3,vcoul3,str3,i33,i4)
                ELSE
                    call loop2000(eo4,vcoul4,str4,i44,i5)
                ENDIF
            ENDDO
   
366.5That's itGEMGRP::PIEPERThu Feb 13 1997 18:542
That is very much in the spirit of the original directives.
And a lot less typing. Kudos, Mr. Manley!
366.648641::BOIRINFri Feb 14 1997 10:3011
    Thank you everybody. I have ran my tests this afternoon and I achieved
    a speedup of 3.5 on 4 processors.
    
    The customer is very impressed byr our numbers.
     
    We are fighting for 32 processors cluster against SGI, Convex and Sun.
    
    Thank you again. I have learned little things (but not enough) in HPC
    with this bid. We'll win !!!!
    
    JP.