[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nicctr::dxml

Title:Digital Extended Math Library
Notice:Kit locations: 9.last (UNIX), 10.last (VMS)
Moderator:RTL::CHAOFGREN
Created:Mon Apr 30 1990
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:324
Total number of notes:1402

320.0. "FFT DXML Benchmark" by TAV02::KATZAV () Tue Apr 08 1997 05:44

     
    A customer here in Israel is runing a DXML FFT BM on 2100 with 2 cpus.
    The results on 1 cpu is 1.9 sec. while on two is 3.1 !!
    
    Could someone please look into the short piece of code and figure 
    why we get those results ??
    
    Many Thanks,
    Shimon.
    
    
    
    
    
    ==============================================================
    			Library : DXML
    
    SIZE: 8K array * 1000
    ==============================================================

    Code:   try.c
    
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <fcntl.h>
    #include <sys/stat.h>
    #include <time.h>
    #include <dxmldef.h>
    #define SIZE1 8192
    
    void main ()
    
    {
      struct dxml_s_fft_structue ST ;
      float in[SIZE1], out[SIZE1];
      int  i,a,status=1,  stride=1,  sz=SIZE1;
      long int t_dxml ;
    
    for (a=0;a<SIZE1;a++)
                  in[a]=a;
    sfft_init_(&sz,&ST,&stride);
    clock();			  
    for (i=0;i<1000;i++)
    sfft_apply ("r", "c", in, out, &ST, &stride);
    sfft_exit_(&ST);
    t_dxml=clock() ;
    printf("Time DXML : %f sec/m", (double)t_dxml/CLOCKS_PER_SEC) ;
    }



    
    Compiling :
    cc -migrate try.c -0 try -ldxmlp
    
    Results:
    cpu 1 : 1.9 sec
    cpu 2 : 3.1 sec
    KMP_STACKSIZE 262144
    
    Hardware:
    AS2100 DUNIX V4.0 (464)
T.RTitleUserPersonal
Name
DateLines
320.1RTL::HANEKThu Apr 17 1997 17:207
The problem is that an 8k FFT is not big enough to make parallel processing
profitable.

In the sample code, you are performing 1000, 8k FFTs.  In order to make parallel
processing attractive for this application, you should consider doing all 1000
FFTs at once - i.e. use some form of the grp_fft routine.