[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nicctr::kap-users

Title:Kuck Associates Preprocessor Users
Notice:KAP V2.1 (f90,f77,C) SSB-kits - see note 2
Moderator:HPCGRP::DEGREGORY
Created:Fri Nov 22 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:390
Total number of notes:1440

361.0. "Help, C-program tuning/parallelize" by TKOVOA::ISHIBASHI_T () Mon Jan 27 1997 10:31

Hi, 
    I'm supporting the benchmark of AlphaServre8000 bussiness.  We need to 
improve the performance of the C-program by tuning or parallel-processing.    
Are there any idea or modification of source file,  to improve the 
performance and parallelize the outer loop ?

I attach the part of source in the bottom. The source(one function) has 
a 4-th nests of loop, which is the hottest in the program. 

KAP-C can parallelize the inner loop(loop of the index i) by "-conc" option,

		for ( i = starty; i < endy; i++, cy+=2 ) {
		  cx = rad_x + ((tmpdx + startx) * 2 );
		  for ( j = startx; j < endx; j++, cx+=2 ) {
		    bdiff += m_abs( n_frm[ i ][ j ] - p_frm[ cy ][ cx ] );
		  }
		}

but this program cannot improve the performance by parallel processing 
at all. This length of the i-loop and j-loop are 8, so they are too 
small loop to improve by parallel, I think.

Outer loop is longer. The length of l-loop and k-loop are 31 and l*8,
respectivly.
  
	for ( l = 1; l <= s_range; ++l ) {
	  tmpdy = pdy - l;
	  tmpdx = pdx - l;
	  for ( k = 0; k < 8 * l; ++k ) {
	    dy = 2 * tmpdy;
			:
			:
I expect to improve the performance if the outer loop can be parallelized.
So I put "pragma _KAP concurrent" before l-loop snd k-loop, but KAP cannot
do it.

Thank you,
toshihiro ishibashi
  Digital Japan




/********************************************************/
/*	Functions for coding of inter frame prediction	*/
/*	 for video coding simulation program.		*/
/********************************************************/

#include	"v_codec.h"
#include	"st_ext.h"
#include 	"huff_ext.h"
#undef		__MAIN_FILE__
#include	"common.h"

/*#define	DEBUG				/* enable to display message for debug			*/


/********* Compile options **********/
/*** Extended prediction mode ***/
#define	ENABLE_BILINEAR			/* enable bilinear prediction				*/
#define	ENABLE_INTER_4V			/* enable inter 4v mode like as VM			*/
#define	ENABLE_AVERAGE			/* enable average 16x16 MB and 8x8 MB			*/

/*** Motion vector search ***/
#define	SET_ZERO_OFFSET			/* enable to set auto SADzero value			*/
#define	ZERO_OFFSET	257		/* SADzero - ZERO_OFFSET is initial SAD for MV search	*/


/*** Motion vector refinement ***/
#define	ENABLE_REFINE			/* enable to refine for motion vector			*/
/*#define	REFINE_BREAK			/* enable to break, when SAD of a MB is larger 		*/

/********* global variables **********/
extern M_vect	MV_buf[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB][5];	/* current MV buffer 			*/
extern M_vect	MV_hbuf[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB][5];	/* for interpolation 			*/
extern int	MC_diff[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB][5];	/* block difference by half pel MV 	*/
extern int	MB_mode[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB];	/* MB coding mode 			*/
extern int	MC16_diff[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB];	/* block difference by 16x16 int MV 	*/
extern int	MC8_diff[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB];	/* block difference by 8x8 int MV 	*/
extern	int	MB_cod[NV_GOB * MBV_GOB][NH_GOB * MBH_GOB];	/* COD of MB */
extern	VOP_conf	vop_format;	/* picture format, size of VOP */
extern	VOL_conf	vol_config;	/* options, parameters of VOL */
extern int	tri_reg[7][16][16];				/* region number of splitted triangle 	*/

static	int	mv_blk_pattn[ 37 ][ 45 ];

/********* global functions **********/
extern	int	MB_coder();		/* coding function of a Macro Block 		*/
extern	void	MB_fix();		/* save Macro Block when forced to be fixed 	*/
extern	void	MB_get();		/* get predictedMacro Block with MC 		*/
extern	void	MB_diff();		/* get Macro Block difference 			*/
extern	void	MB_rest();		/* restore difference to Macro Block 		*/
extern	void	MB_same();		/* copy Macro Block buffer 			*/
extern	int	MB_bctrl();		/* virtual buffer control for MBs 		*/
extern	void	reset_dc_pred();	/* reset predictor for DC 			*/
extern	int	get_qstep();		/* get quantizer step size 			*/
extern	int	buf_cont;		/* buffer content saving area 			*/

/*	search order of motion vector	*/
static	M_vect	v_odr[ 8 ] = { {-1,-1}, { 0,-1}, { 1,-1}, {-1, 0},
			       { 1, 0}, {-1, 1}, { 0, 1}, { 1, 1} };
/*	impulse response of Loop Filter	*/
static	int	lf_coef[ 3 ][ 3 ] = { { 1, 2, 1}, { 2, 4, 2}, { 1, 2, 1} };

/********* functions **********/
/************************************************/
/*	function to search motion vector	*/
/*	 which produces minimum error		*/
/*	 for adaptive field/frame ME		*/
/************************************************/

int	srch_vect_adpt_affine( p_frm, n_frm, rad_x, rad_y, mbdiff, mv, mmv, type )

Ref_mem		p_frm;		/* previous reference frame buffer */
int		n_frm[16][16];		/* current Macro Block buffer */
int		rad_x, rad_y;		/* phisical address of block incl. offset */
int		mbdiff;		/* block difference (initial value) */
M_vect		*mv;		/* found motion vector of minimum error */
M_vect		*mmv;	/* prediction of MV as a median of predictors */
int		type;
{
	int	i, j, ihf, jhf;	/* address of pixels in macro block */
	int	k, l;		/* parameter for spiral search */
	int	bdiff;		/* block difference */
	int	pdx, pdy;	/* displacement from prediction of MV */
	int	tmpdx, tmpdy;	/* temporary displacement */
	int	dx, dy;		/* temporary displacement, half precision */
	M_vect	mvwrk;	/* motion vector work area */
	int	s_range;	/* maximum Mv search range */
	int	srange_mx, srange_my, srange_px, srange_py;	/* MV ranges */
	Array	diff;
	int	cx, cy;
	int	startx, starty, endx, endy;

/* MCV search area */
#ifndef	MC_AREX
#define	MC_AREX	15
#endif
#ifndef	MC_AREY
#define	MC_AREY	15
#endif
#define	MC_SCALE	16		/* scaling factor by f_code */
#define	MC_AREXE	31
#define	MC_AREYE	31
#define	MV_OFS	32
	s_range = MC_AREX;
	if ( vol_config.fcode_fwd > 1 )
	  s_range = (MC_SCALE << (vol_config.fcode_fwd - 1)) - 1;
	srange_mx = 0 - s_range;
	srange_my = 0 - s_range;
	srange_px = s_range;
	srange_py = s_range;
	pdx = pdy = 0;

	mvwrk.dx = mvwrk.dy = 0;	/* reset work area of motion vector */
	/* Now, search algorithm is modified to spiral search */
	for ( l = 1; l <= s_range; ++l ) {
	  tmpdy = pdy - l;
	  tmpdx = pdx - l;
	  for ( k = 0; k < 8 * l; ++k ) {
	    dy = 2 * tmpdy;
	    dx = 2 * tmpdx;
	    if ( !( (tmpdy == 0) && (tmpdx == 0) )
		&& (tmpdx >= srange_mx) && (tmpdy >= srange_my)
		&& (tmpdx <= srange_px) && (tmpdy <= srange_py) ) {
	      bdiff = 0;	/* reset block difference */
		startx = SumBlkPtn[type].sx;
		starty = SumBlkPtn[type].sy;
		endx = SumBlkPtn[type].ex;
		endy = SumBlkPtn[type].ey;
		
		cy = rad_y + ((tmpdy + starty) * 2 );
		for ( i = starty; i < endy; i++, cy+=2 ) {
		  cx = rad_x + ((tmpdx + startx) * 2 );
		  for ( j = startx; j < endx; j++, cx+=2 ) {
		    bdiff += m_abs( n_frm[ i ][ j ] - p_frm[ cy ][ cx ] );
		  }
		}
		if ( bdiff < mbdiff ) {	/* found minimum error vector */
		  mbdiff = bdiff;
		  mvwrk.dx = tmpdx;
		  mvwrk.dy = tmpdy;
		}
	    }
	    if ( k < 2 * l )
	      ++tmpdx;
	    else if ( k < 4 * l )
	      ++tmpdy;
	    else if ( k < 6 * l )
	      --tmpdx;
	    else	/* (k < 8 * l) */
	      --tmpdy;
	  }
	}
	mv -> dx = mvwrk.dx;
	mv -> dy = mvwrk.dy;
	return( mbdiff );	/* return minimum error */
}
    
T.RTitleUserPersonal
Name
DateLines
361.1Each block is independent, isn't it?WIBBIN::NOYCEPulling weeds, pickin&#039; stonesMon Jan 27 1997 10:5218
Don't you want to try to apply parallelism at a higher level?

Each frame to be encoded has lots of independent blocks of pixels
-- let each processor work on a different group of blocks.  Somewhere
you have a loop (or nest of loops) around calls to this routine.
Make that loop run in parallel, after declaring this call to be
safe for concurrent execution.  You'll need private variables to
pass in for n_frm, mv, etc...

Your customer will probably want to consider how to modify this
code to take advantage of the new PERR (pixel error) instruction
that is being added to new Alpha processors (PCA56 and EV6).  Given
8 bytes in register A and 8 bytes in register B, this instruction
sums the absolute values of the differences of corresponding bytes
-- just as your inner loop is doing with ints.

You probably want to contact Digital's Multimedia group, who have
surely implemented code like this already.
361.2can you mail me the header files?HPCGRP::DEGREGORYKaren 223-5801Mon Jan 27 1997 13:324
If you could mail me the headers I might be able to find out why
KAP won't parallelize the outer loop.

Karen
361.3Thank you for your replyTKOVOA::ISHIBASHI_TTue Jan 28 1997 08:1936
    Thank you, for your reply.
    
Re:1
	>Each frame to be encoded has lots of independent blocks of pixels
	>-- let each processor work on a different group of blocks.  Somewhere
	>you have a loop (or nest of loops) around calls to this routine.
	>Make that loop run in parallel, after declaring this call to be
	>safe for concurrent execution.  You'll need private variables to
	>pass in for n_frm, mv, etc...
	>

		I think so, but it is too complex source to modiy the code.
		This function is called in the loop of the parent routine,
		but the length of loop is 1. So it is not good for 
		the parallelize.

	>Your customer will probably want to consider how to modify this
	>code to take advantage of the new PERR (pixel error) instruction
	>that is being added to new Alpha processors (PCA56 and EV6).  Given
	>8 bytes in register A and 8 bytes in register B, this instruction
	>sums the absolute values of the differences of corresponding bytes
	>-- just as your inner loop is doing with ints.
	>

		I'll explain the new implimentation on EV6.
		I hope it is available by just only compiler option 
		like "-tune ev6".


Re:2
	I mail you the header files. And I put 5 header files in the 
	next topic.

Thank you,
toshihiro
    
361.4Header filesTKOVOA::ISHIBASHI_TTue Jan 28 1997 08:21864
    I attached the 5 header files, which are used in the C source files.

		common.h
		config.h
		v_codec.h
		huff_ext.h
		st_ext.h

    Thank you,
    toshihiro
    
    
>>>>>>>>>>>>>>>>>>>>>>>>>	common.h
/************************************************/
/*     affine.h                                 */
/*		struct, variables definition	*/
/************************************************/

/*********************************************/
/*  define constant number                   */
/*********************************************/

/********* compile options for prediction **********/
#define	MV_PRED					/* enable to predict MV for affine transform			*/

#define	AFFINE_SPLIT	4		/* split pattern number for affine prediction	*/

#define	MAX_PM		4

#define	SPLIT_2		2
#define	SPLIT_8		4
#define SPLIT_4		6


/*********************************************/
/*  define struction as typedef		     */
/*********************************************/
typedef struct {
  double a;       			/* Matrix of A  		*/
  double b;
  double c;
  double d;
  double e;
  double f;
} affine_para;


typedef struct {
  double u;      			/* previous pixel location x	*/
  double v;				/*                         y	*/
  double x;      			/* current pixel location  x	*/
  double y;				/*                         y	*/
} pixel_loc;


typedef struct {
  int	x1;				/* point of triangle		*/
  int 	y1;
  int	x2;
  int	y2;
  int	x3;
  int	y3;
  int   mv_num[ 3 ];			/* motion vector location	*/
} region_tbl;


typedef struct {
  int	     	reg_num;		/* triangle region number 	*/
  region_tbl 	AllRegionTable[ 8 ];	/* triangle region definition 	*/
} def_region;


typedef struct {
  int           min_reg;		/* minimum region number	*/
  affine_para 	AffineParam[8];		/* affine parameter of region	*/
} my_reg_def;


typedef struct {			/* half motion vector struction	*/
  int  mx;
  int  my;
} M_vect_half;


typedef struct {			/* define strcut for Extended macroblock pattern	*/
  int		MB_dir;			/* splitting direction for MB	*/
  int		MB_level;		/* splitting level for MB	*/
  int		MB_pred_mode[2];	/* prediction mode for MB	*/
  M_vect_half	MB_mvs[9];		/* MVs for MB			*/
} EXTBP;


typedef struct {			/* VLC for prediciton mode	*/
  int		length;
  unsigned long		vlc;
} Pred_Mode_VLC;


typedef struct {
  int		sx;			/* x location for start loop	*/
  int		sy;			/* y location for start loop 	*/
  int		ex;			/* x value for end of counter	*/
  int		ey;			/* y value for end of counter	*/
  int		pixel_num;		/* effective number of pixel 	*/
} Sum_Block;

typedef		int Array[16][16];	/* 2D array difinition		*/

/****************************************/
/*	define macro                    */
/****************************************/
#define	m_abs( x )		( ( (x) < 0 ) ? (-(x)) : (x) )
#define m_int( x )      	( ( (x) < 0 ) ? ((int)(x-0.5)) : ((int)(x+0.5)) )
#define	m_round( x, y )		( ( (x) < 0 ) ? ( (x) - (y) ) : ( (x) + (y) ) )


/****************************************/
/*	define global or extern         */
/****************************************/

/*#ifndef __MAIN_FILE__*/
#define __DEF_SCOPE__ extern

__DEF_SCOPE__ F_mem	back_Y;		/* background original frame Y */
__DEF_SCOPE__ Fc_mem	back_U;		/* background original frame U */
__DEF_SCOPE__ Fc_mem	back_V;		/* background original frame V */
__DEF_SCOPE__ int 	affine_MB;
__DEF_SCOPE__ int 	BMC_MB;

__DEF_SCOPE__ M_vect_half 	mv_map[ 37 ][ 45 ];		/* MV map on all control grid points		*/
__DEF_SCOPE__ int	  	srch_mv_map[ 37 ][ 45 ];	/* searched MV map, 0:Not searched, 1:Searched 	*/
__DEF_SCOPE__ int	  	used_mv_map[ 37 ][ 45 ];	/* used MV map, 0:Not used, 1:Used		*/
__DEF_SCOPE__ int	  	send_mv_map[ 37 ][ 45 ];	/* send MV map, 0:Not send, 1:Send		*/
__DEF_SCOPE__ int	  	zero_mv_map[ 37 ][ 45 ];	/* zero vector, 0:zero,     1:Not zero		*/
__DEF_SCOPE__ int	  	refined_mv_map[ 37 ][ 45 ];	/* refined MV map, 0:Not refined, 1:Refined	*/
__DEF_SCOPE__ int  		weight[4][16][16];    		/* weight for pixel matching 			*/
__DEF_SCOPE__ def_region   	myRegionTbl[ 7 ];		/* region spliiting pattern for affine		*/
__DEF_SCOPE__ int		pl_offset[ 9 ][ 2 ];		/* pixel location offset for affine transform	*/
__DEF_SCOPE__ EXTBP 		ExtBP[ 37 ][ 45 ];		/* buffer of extended block type 		*/
__DEF_SCOPE__ Array		Def_Region[8];			/* region definition each split patter		*/
__DEF_SCOPE__ Pred_Mode_VLC	PMode[ 3 ][2][ MAX_PM ];	/* VLC for each prediction mode			*/
__DEF_SCOPE__ int		MB_PMODE[ 18 ][ 22 ];		/* Prediction information for MB		*/
__DEF_SCOPE__ Sum_Block		SumBlkPtn[ 16 ];		/* summation block pattern			*/
__DEF_SCOPE__ int		MB_WMODE[ 18 ][ 22 ];		/* WMODE information for MB			*/


/*#endif*/


#ifdef __MAIN_FILE__
#undef __DEF_SCOPE__
#define __DEF_SCOPE__

/**********************************************/
/*  define global varriables                  */
/**********************************************/
__DEF_SCOPE__ F_mem	back_Y;		/* background original frame Y */
__DEF_SCOPE__ Fc_mem	back_U;		/* background original frame U */
__DEF_SCOPE__ Fc_mem	back_V;		/* background original frame V */
__DEF_SCOPE__ int 	affine_MB;
__DEF_SCOPE__ int 	BMC_MB;

__DEF_SCOPE__ M_vect_half 	mv_map[ 37 ][ 45 ];		/* MV map on all control grid points		*/
__DEF_SCOPE__ int	  	srch_mv_map[ 37 ][ 45 ];	/* searched MV map, 0:Not searched, 1:Searched 	*/
__DEF_SCOPE__ int	  	used_mv_map[ 37 ][ 45 ];	/* used MV map, 0:Not used, 1:Used		*/
__DEF_SCOPE__ int	  	send_mv_map[ 37 ][ 45 ];	/* send MV map, 0:Not send, 1:Send		*/
__DEF_SCOPE__ int	  	zero_mv_map[ 37 ][ 45 ];	/* zero vector, 0:zero,     1:Not zero		*/
__DEF_SCOPE__ int	  	refined_mv_map[ 37 ][ 45 ];	/* refined MV map, 0:Not refined, 1:Refined	*/
__DEF_SCOPE__ EXTBP 		ExtBP[ 37 ][ 45 ];		/* buffer of extended block type 		*/
__DEF_SCOPE__ int		MB_PMODE[ 18 ][ 22 ];		/* Prediction information for MB		*/
__DEF_SCOPE__ int		MB_WMODE[ 18 ][ 22 ];		/* WMODE information for MB			*/



/****************************************/
/*  define summation block  pattern 	*/
/****************************************/
__DEF_SCOPE__ Sum_Block		SumBlkPtn[ 16 ] = {		/* summation block pattern			*/
  { 0, 0,  0,  0,   0 }, { 0, 0,  8,  8,  64 }, 
  { 8, 0, 16,  8,  64 }, { 0, 0, 16,  8, 128 },
  { 0, 8,  8, 16,  64 }, { 0, 0,  8, 16, 128 }, 
  { 0, 0,  0,  0,   0 }, { 0, 0,  0,  0,   0 },
  { 8, 8, 16, 16,  64 }, { 0, 0,  0,  0,   0 }, 
  { 8, 0, 16, 16, 128 }, { 0, 0,  0,  0,   0 },
  { 0, 8, 16, 16, 128 }, { 0, 0,  0,  0,   0 }, 
  { 0, 0,  0,  0,   0 }, { 0, 0, 16, 16, 256 }
};

/****************************************/
/*  define weighting pattern for MB	*/
/****************************************/
__DEF_SCOPE__ int weight[ 4 ][ 16 ][ 16 ] = {  			/* weight for pixel matching 			*/
  {
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },

/*  {
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,      /*    (subset)               */
/*    0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,0,
    0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,0,
    0,1,1,1,1,2,2,2,2,2,1,1,1,1,0,1,
    1,0,1,1,1,2,3,3,3,2,1,1,1,0,1,0,
    0,1,1,1,1,2,3,4,3,2,1,1,1,1,0,1,
    1,0,1,1,1,2,3,3,3,2,1,1,1,0,1,0,
    0,1,1,1,1,2,2,2,2,2,1,1,1,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,0,
    0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,0,
    0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
    0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1 },
*/

  {
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
    0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,
    1.0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
    0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
    0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,1,1,1,2,2,2,2,2,1,1,1,1,0,
    0,1,0,1,1,1,2,3,3,3,2,1,1,1,0,1,
    1,0,1,1,1,1,2,3,4,3,2,1,1,1,1,0,
    0,1,0,1,1,1,2,3,3,3,2,1,1,1,0,1,
    1,0,1,1,1,1,2,2,2,2,2,1,1,1,1,0,
    0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
    0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,
    1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
    0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1
 },
  
/*  {
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,          
    1,2,3,3,3,3,3,3,3,3,3,3,3,3,2,1,          
    1,2,3,4,4,4,4,4,4,4,4,4,4,3,2,1,          
    1,2,3,4,5,5,5,5,5,5,5,5,4,3,2,1,          
    1,2,3,4,5,6,6,6,6,6,6,5,4,3,2,1,          
    1,2,3,4,5,6,7,7,7,7,6,5,4,3,2,1,          
    1,2,3,4,5,6,7,8,8,7,6,5,4,3,2,1,          
    1,2,3,4,5,6,7,8,8,7,6,5,4,3,2,1,          
    1,2,3,4,5,6,7,7,7,7,6,5,4,3,2,1,          
    1,2,3,4,5,6,6,6,6,6,6,5,4,3,2,1,          
    1,2,3,4,5,5,5,5,5,5,5,5,4,3,2,1,          
    1,2,3,4,4,4,4,4,4,4,4,4,4,3,2,1,          
    1,2,3,3,3,3,3,3,3,3,3,3,3,3,2,1,          
    1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }*/
/*  {
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,          
    1,2,3,3,3,3,3,3,3,3,3,3,3,2,1,1,          
    1,2,3,4,4,4,4,4,4,4,4,4,3,2,1,1,          
    1,2,3,4,5,5,5,5,5,5,5,4,3,2,1,1,          
    1,2,3,4,5,6,6,6,6,6,5,4,3,2,1,1,          
    1,2,3,4,5,6,7,7,7,6,5,4,3,2,1,1,          
    1,2,3,4,5,6,7,8,7,6,5,4,3,2,1,1,          
    1,2,3,4,5,6,7,7,7,6,5,4,3,2,1,1,          
    1,2,3,4,5,6,6,6,6,6,5,4,3,2,1,1,          
    1,2,3,4,5,5,5,5,5,5,5,4,3,2,1,1,          
    1,2,3,4,4,4,4,4,4,4,4,4,3,2,1,1,          
    1,2,3,3,3,3,3,3,3,3,3,3,3,2,1,1,          
    1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,          
    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }*/
  {
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,          
    2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 0,          
    2, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 2, 0,          
    2, 4, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,10,10,10,10,10,10,10, 6, 4, 2, 0,          
    2, 4, 6, 8,10,12,12,12,12,12,10, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,12,14,14,14,12,10, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,12,14,16,14,12,10, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,12,14,14,14,12,10, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,12,12,12,12,12,10, 8, 6, 4, 2, 0,          
    2, 4, 6, 8,10,10,10,10,10,10,10,10, 6, 4, 2, 0,          
    2, 4, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 4, 2, 0,          
    2, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 2, 0,          
    2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 0,          
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,          
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  {
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,          
    1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 0,          
    1, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 4, 1, 0,          
    1, 4, 9,16,16,16,16,16,16,16,16,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,25,25,25,25,25,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,36,36,36,36,36,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,36,49,49,49,36,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,36,49,64,49,36,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,36,49,49,49,36,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,36,36,36,36,36,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,25,25,25,25,25,25,25,16, 9, 4, 1, 0,          
    1, 4, 9,16,16,16,16,16,16,16,16,16, 9, 4, 1, 0,          
    1, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 4, 1, 0,          
    1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 0,          
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,          
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

};



/****************************************/
/*  region discription table		*/
/****************************************/
__DEF_SCOPE__ int		pl_offset[ 9 ][ 2 ] = {		/* pixel location offset for affine transform	*/
  {  8,  8 }, {  0,  0 }, { 16,  0 }, {  0, 16 }, { 16, 16 },
  {  8,  0 }, {  0,  8 }, { 16,  8 }, {  8, 16 }
};


/****************************************/
/*  region discription table		*/
/****************************************/
__DEF_SCOPE__ def_region myRegionTbl[ 7 ] = {
  {  1,{ {  0,  0,  0,  0,  0,  0, 0, 0, 0 },		/* region #0 BMA		*/
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 } } },

  {  1,{ {  0,  0,  0,  0,  0,  0, 0, 0, 0 },		/* region #1 INTRA		*/
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
	 {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 } } },

  {  2,{ {  0,  0, 15,  0,  0, 15, 1, 2, 3 },		/* region #2 affine #1		*/
	 { 16,  0 , 0, 16, 16, 16, 2, 3, 4 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 } } },

  {  2,{ {  0,  0,  0, 16, 16, 16, 1, 3, 4 },		/* region #3 affine #2		*/
         {  0,  0, 16,  0, 16, 16, 1, 2, 4 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 } } },

  { 8, { {  0,  0,  7,  0,  0,  7, 1, 5, 6 },		/* region #4 affine #3 		*/
         {  7,  1,  7,  7,  1,  7, 5, 6, 0 },
         {  8,  0, 15,  0,  8,  7, 5, 2, 0 },
         { 15,  1,  9,  7, 15,  7, 2, 0, 7 },
         {  0,  8,  7,  8,  0, 15, 6, 0, 3 },
         {  7,  9,  1, 15,  7, 15, 0, 3, 8 },
         {  8,  8,  8, 15, 15,  8, 0, 7, 8 },
         {  9, 15, 15,  9, 15, 15, 7, 8, 4 } } },

  { 8, { {  0,  0,  0,  7,  7,  7, 1, 6, 0 },		/* region #5 affine #4 		*/
         {  1,  0,  7,  0,  7,  6, 1, 5, 0 },
         {  8,  0,  8,  7, 15,  7, 5, 0, 7 },
         {  9,  0, 15,  0, 15,  6, 5, 2, 7 },
         {  0,  8,  0, 15,  7, 15, 6, 3, 8 },
         {  1,  8,  7,  8,  7, 14, 6, 0, 8 },
         {  8,  8,  8, 15, 15, 15, 0, 8, 4 },
         {  9,  8, 15,  8, 15, 14, 0, 7, 4 } } },

  {  4,{ {  0,  0,  7,  7,  0, 16, 1, 0, 3 },		/* region #6 affine #5 		*/
         {  1,  0, 16,  0,  8,  7, 1, 0, 2 },
         {  7,  8,  0, 15, 16, 16, 0, 3, 4 },
         {  8,  8, 16,  0, 16, 16, 2, 0, 4 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 },
         {  0,  0,  0,  0,  0,  0, 0, 0, 0 } } }
};


/****************************************/
/*  define region splitting pattern	*/
/****************************************/
__DEF_SCOPE__ Array		Def_Region[8] = {	/* region definition each split patter		*/
  {	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },		/*** reserved region definition	***/
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
      }, {
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },		/*** no split region definition ***/
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
      }, {
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },		/*** split from roght upper to left lower ***/
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 }
      }, {
	{ 1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },		/*** split from left upper to right lower ***/
	{ 1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
      }, {
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },		/*** split up and down ***/
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },
	{ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 }
      }, {
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },		/*** split left and right ***/
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 }
      }, {
	{ 1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 },		/*** split as x ***/
	{ 1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,4 },
	{ 1,1,1,2,2,2,2,2,2,2,2,2,2,2,4,4 },
	{ 1,1,1,1,2,2,2,2,2,2,2,2,2,4,4,4 },
	{ 1,1,1,1,1,2,2,2,2,2,2,2,4,4,4,4 },
	{ 1,1,1,1,1,1,2,2,2,2,2,4,4,4,4,4 },
	{ 1,1,1,1,1,1,1,2,2,2,4,4,4,4,4,4 },
	{ 1,1,1,1,1,1,1,1,2,4,4,4,4,4,4,4 },
	{ 1,1,1,1,1,1,1,3,4,4,4,4,4,4,4,4 },
	{ 1,1,1,1,1,1,3,3,3,4,4,4,4,4,4,4 },
	{ 1,1,1,1,1,3,3,3,3,3,4,4,4,4,4,4 },
	{ 1,1,1,1,3,3,3,3,3,3,3,4,4,4,4,4 },
	{ 1,1,1,3,3,3,3,3,3,3,3,3,4,4,4,4 },
	{ 1,1,3,3,3,3,3,3,3,3,3,3,3,4,4,4 },
	{ 1,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4 },
	{ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4 }
      }, {
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },		/*** split as + ***/
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 },
	{ 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 }
      }
};


/****************************************/
/*  define VLC for prediction mode	*/
/****************************************/
__DEF_SCOPE__ Pred_Mode_VLC	PMode[ 3 ][2][ MAX_PM ] = {		/* VLC for each prediction mode			*/
  {
    {					/*** P6 combination 1 ***/
      /*** coded ***/
      {	2, 0x80000000 },		/* PM_BMC	*/
      {	1, 0x00000000 },		/* PM_AMC1	*/
      {	0, 0x00000000 },		/* PM_BGMC	*/
      {	0, 0x00000000 }			/* none		*/
    }, {
      /*** not coded ***/
      {	2, 0xc0000000 },		/* PM_BMC	*/
      {	0, 0x00000000 },		/* PM_AMC1	*/
      {	0, 0x00000000 },		/* PM_BGMC	*/
      {	0, 0x00000000 }			/* none		*/
    }

  }, {
    {					/*** P6 combination 2 ***/
      /*** coded ***/
      { 2, 0xc0000000 },		/* PM_BMC	*/
      {	2, 0x80000000 },		/* PM_BILINEAR	*/
      {	2, 0x40000000 },		/* PM_8x8	*/
      {	2, 0x00000000 }			/* PM_AVG	*/
    }, {
      /*** not coded ***/
      { 2, 0xc0000000 },		/* PM_BMC	*/
      {	2, 0x80000000 },		/* PM_BILINEAR	*/
      {	2, 0x40000000 },		/* PM_8x8	*/
      {	2, 0x00000000 }			/* PM_AVG	*/
    }

  }, {
    {					/*** P6 combination 3 ***/
      /*** coded ***/
      {	2, 0x40000000 },		/* PM_BMC	*/
      {	2, 0x00000000 },		/* PM_AMC1	*/
      {	0, 0x00000000 },		/* PM_BGMC	*/
      {	2, 0x80000000 }			/* PM_AMC2	*/
    }, {
      /*** not coded ***/
      {	3, 0xc0000000 },		/* PM_BMC	*/
      {	0, 0x00000000 },		/* PM_AMC1	*/
      {	0, 0x00000000 },		/* PM_BGMC	*/
      {	3, 0xe0000000 }			/* PM_AMC2	*/
    }
  }
};

	
#endif


>>>>>>>>>>>>>>>>>>>>>>>>>	config.h
/********************************************************/
/*	Header file for configuring video coding	*/
/*	 simulation program, to perform experiments.	*/
/********************************************************/

/*#define	CIF*/
		/* for coding CIF pictures */
/*#define	CIF_360*/
		/* input sequence has 360(CIF) 180(QCIF) horizontal pels */
/*#define	MT_SEPA */
		/* Motion Texture separation mode coding */
#define	VM_ORIG
		/* same method as original VM description */
/*#define	REST_MV*/
		/* restricted MV mode */
/*#define	EXTND_MV*/
		/* MV extention is used in unrestricted MV mode */
#define	USE_OVMC
		/* overlapped MC is used (advanced prediction) */
#define	USE_88MC
		/* 8x8 MC is incorporated (advanced prediction) */
#define	MC8_RANGE	2
		/* MV search range for 8x8 blocks (around 16x16 MV) */
#define	PREF_16_VEC	129
		/* preference to choose 16X16 MV to 8x8 MV */
#define	MV_SRCH_DEC
		/* MV search is performed between input and decoded picture */
/*#define	HALF_ORG*/
		/* half pel search is also done between original */
/*#define	USE_DC_PRED*/
		/* Enable DC prediction coding for Intra Pictures/MBs */
/*#define	NON_FLATQ*/
		/* DCT coefs. are MPEG-type quantized with weighting matrix */
#define	ENBL_QCHNG
		/* enable quantizer change MB by MB basis (DQ) */
/*#define	NO_INTRA*/
		/* Intra mode in Inter picture will be never used */

/*#define	ORIG_TM*/
/*#define	DEC_IP*/
/*#define	IP_ONLY*/

>>>>>>>>>>>>>>>>>>>>>>>>>	v_codec.h
/********************************************************/
/*	Header file of video coding simulation program,	*/
/*		according to CCITT refference model.	*/
/********************************************************/

#include	<stdio.h>
#include	<sys/types.h>
#include	<sys/socket.h>
#include	<strings.h>
#include	<fcntl.h>
#include	<math.h>
#include	"config.h"

#define	SIZE_FH	352	/* number of pixels per line (Luminance) CIF */
#define	SIZE_FH2	176	/* number of pixels per line (Chr.) CIF */
#define	SIZE_FV	288	/* number of active lines (Luminance) CIF */
#define	SIZE_FV2	144	/* number of active lines (Chr.) CIF */
#define	SIZE_QH	176	/* number of pixels per line (Luminance) QCIF */
#define	SIZE_QH2	88	/* number of pixels per line (Chr.) QCIF */
#define	SIZE_QV	144	/* number of active lines (Luminance) QCIF */
#define	SIZE_QV2	72	/* number of active lines (Chr.) QCIF */
#define	FTOTALPEL	101376	/* number of total pixels in a frame (FCIF) */
#define	FTOTALPEL2	25344	/* number of total pixels in a frame (U,V) */
#define	OUT_PELS	144	/* pels outside the actual frame */

/*	define static values for Full C.I.F.image as maximum assignment	*/
#define	SIZE_H	352	/* number of pixels per line (Luminance) */
#define	SIZE_H2	176	/* number of pixels per line (Chrominance) */
#define	SIZE_V	288	/* number of active lines (Luminance) */
#define	SIZE_V2	144	/* number of active lines (Chrominance) */
#define	NUM_GOB	18	/* number of GOB (Group of Block) in a frame */
#define	NH_GOB	1	/* number of GOB in a frame for horizontal direction */
#define	NV_GOB	18	/* number of GOB in a frame for vertical direction */
#define	MBinGOB	22	/* number of MBs (Macro Blocks) in a GOB */
#define	MBH_GOB	22	/* number of MBs in a GOB for horizontal direction */
#define	MBV_GOB	1	/* number of MBs in a GOB for vertical direction */
#define	NUM_MB	396	/* total number of MB in a frame */
#define	TOTALPEL	101376	/* number of total pixels in a frame */
#define	TOTALPEL2	25344	/* number of total pixels in a frame (U,V) */


typedef	unsigned char	Byte;
/*typedef	enum { FALSE = 0, TRUE }	Boolean; */
#ifdef TRUE
#undef TRUE
#endif
#ifdef FALSE
#undef FALSE
#endif
typedef	enum { FALSE, TRUE }	Boolean;
typedef	struct m_vect { int dx, dy; }	M_vect;		/* motion vector */
typedef	struct mv_dbl { double dx, dy; }	MV_dbl;	/* MVs (double) */
typedef	struct	m_blk { 
  int	Y_sblk[16][16];		/* luminance sub block */
  int	U_sblk[8][8];		/* chrominance sub block (U) */
  int	V_sblk[8][8];		/* chrominance sub block (V) */
}	M_blk;		/* Macro Block definition */
typedef	struct	m_dblk { 
  double	Y0_dblk[8][8];		/* luminance sub block 0 (Y00) */
  double	Y1_dblk[8][8];		/* luminance sub block 1 (Y01) */
  double	Y2_dblk[8][8];		/* luminance sub block 2 (Y10) */
  double	Y3_dblk[8][8];		/* luminance sub block 3 (Y11) */
  double	U_dblk[8][8];		/* chrominance sub block (U) */
  double	V_dblk[8][8];		/* chrominance sub block (V) */
}	M_dblk;		/* Macro Block for DCT calculations */
/*	memory for a Group of Blocks	*/
typedef	M_blk	GOB_mem[MBinGOB];
/*	frame memory as Macro Block unit	*/
typedef	M_blk	Frm_mem[NUM_GOB][MBinGOB];
/*	frame memory for 2 dimentional area	*/
typedef	int	F_mem[SIZE_V][SIZE_H];
typedef	int	Fc_mem[SIZE_V2][SIZE_H2];
/*	frame memory for 2 dimentional area (unrestricted MV mode)	*/
typedef	int	Ref_mem[2 * (SIZE_V + OUT_PELS)][2 * (SIZE_H + OUT_PELS)];
typedef	int	Refc_mem[2 * (SIZE_V2 + OUT_PELS)][2 * (SIZE_H2 + OUT_PELS)];
typedef	struct	vop_conf {
  int	h_pels;		/* number of pixels per line (Luminance) */
  int	hc_pels;	/* number of pixels per line (Chrominance) */
  int	v_pels;		/* number of active lines (Luminance) */
  int	vc_pels;	/* number of active lines (Chrominance) */
  int	num_GOBs;	/* number of GOB (Group of Block) in a frame */
  int	num_h_GOB;	/* number of GOB in a frame for horizontal direction */
  int	num_v_GOB;	/* number of GOB in a frame for vertical direction */
  int	num_MB_GOB;	/* number of MBs (Macro Blocks) in a GOB */
  int	num_h_MB;	/* number of MBs in a GOB for horizontal direction */
  int	num_v_MB;	/* number of MBs in a GOB for vertical direction */
  int	cols_MB;	/* horizontal number of MBs in a frame */
  int	rows_MB;	/* vertical number of MBs in a frame */
  int	num_MBs;	/* total number of MBs in a frame */
  int	num_MB_len;	/* code length necessary for numbering of MBs */
  int	all_pels;	/* number of total pixels in a frame */
  int	allc_pels;	/* number of total pixels in a frame (U,V) */
}	VOP_conf;		/* VOP configuration (size, format) */

typedef	struct	vol_conf {
  int	VOL_id;		/* Video Object Layer ID */
  int	VOL_shape;	/* shape of Video Object Layer (rect., binary, gray) */
  int	quant_type;	/* type of Quantizer (H.263, MPEG1/2) */
  int	error_res;	/* error resilient disable */
  int	acdc_pred;	/* Intra AC/DC prediction disable */
  int	deblock_filt;	/* deblocking filter disable */
  int	multi_pred;	/* multi mode warping prediction disable (P6 only) */
  int	fcode_fwd;	/* VOL f_code forward */
  int	fcode_bwd;	/* VOL f_code backward */
  int	sepa_mst;	/* separate motion_shape_texture */
  int	scalable;	/* scalability */
}	VOL_conf;		/* VOL configuration (flags, options) */

#define	LEN_EOB	2	/* End of Block code length */

/* define start code identifier values */
#define	VS_START	0xB0
#define	VS_END	0xB1
#define	VOP_START	0xC0
/* define VOP (picture) prediction type identifier values */
#define	I_VOP_ID	0
#define	P_VOP_ID	1
#define	B_VOP_ID	2
/* define resynchronization marker for error resilience */
#ifndef	ER_RESYNC
#define	ER_RESYNC	0x00008000L
#endif
#ifndef	RESYNC_LEN
#define	RESYNC_LEN	17
#endif

/* define coding modes of Macroblocks */
#define	INTER_MB	0		/* Inter 	*/
#define	INTERQ_MB	1		/* Inter+Q	*/
#define	INTER_4V	2		/* Inter 4V 	*/
#define	INTRA_MB	3		/* Intra	*/
#define	INTRAQ_MB	4		/* Intra+Q	*/
#define	INTER_MV0_MB	5		/* None		*/

#define	BILINEAR_MB	6		/* bilinear	*/
#define	BGMC_MB		7		/* BGMC		*/
#define	AVERAGE_MB	8		/* Averaged MB	*/


/*	define macro to round floating point number to integer	*/
#define	round( x )	( ( x < 0 ) ? x - 0.5 : x + 0.5 )

>>>>>>>>>>>>>>>>>>>>>>>>>	huff_ext.h
#include	"huff_code.h"

extern	B_ptr	*init_bstr();
extern	Boolean	open_bfile( char *bf_name );
extern	void	close_bfile();
extern	Boolean	write_bfile( int len );
extern	Boolean	read_bfile();
extern	int	tell_str_len();
extern	int	tell_str_len_bit();
extern	void	VLC_copy( B_ptr *c_src, B_ptr *c_dst );
extern	int	put_zero( B_ptr *c_buf );
extern	int	put_phead( int t_ref, int p_type, int p_quant, B_ptr *c_buf );
extern	int	put_ghead( int gnum, int g_quant, B_ptr *c_buf );
extern	int	put_cod_bit( int cod_ind, B_ptr *c_buf );
extern	int	put_mcbpc( int intra_flag, int m_type, int cbpc,
			  int m_quant, B_ptr *c_buf );
extern	int	put_cbpy( int intra_flag, int cbpy, B_ptr *c_buf );
extern	int	put_ptrn( int b_ptrn, B_ptr *c_buf );
extern	int	put_mvd( int mv_dif_x, int mv_dif_y, int f_code,
			B_ptr *c_buf );
extern	int	put_dc_intra( int c_kind, int dc_val, int acdc_flag, B_ptr *c_buf );
extern	int	put_dct_coef( int intra_flag, int order,
			     int last, int run, int level, B_ptr *c_buf );
extern	int	search_start( B_ptr *c_buf, int hunt_flag );
extern	int	get_phead( int *t_ref, int *p_type, int *p_quant,
			  B_ptr *c_buf, int head_flag );
extern	int	get_ghead( int	*g_quant, B_ptr *c_buf );
extern	int	get_cod_bit( B_ptr *c_buf );
extern	int	get_mcbpc( int intra_flag, int *m_type, int *cbpc,
			  int *m_quant, B_ptr *c_buf );
extern	int	get_cbpy( int intra_flag, int *cbpy, B_ptr *c_buf );
extern	int	get_mvd( int *mv_dif_x, int *mv_dif_y, B_ptr *c_buf );
extern	int	get_ptrn( int *b_ptrn, B_ptr *c_buf );
extern	int	get_dc_intra( B_ptr *c_buf );
extern	int	get_dct_coef( int intra_flag, int order, int *last, int *run,
			     int *level, B_ptr *c_buf );

extern	B_ptr	*bit_str_ptr;
extern	B_ptr	*wrk_str_ptr;
extern	int	Error_Status;			/* indicate Error Status */


>>>>>>>>>>>>>>>>>>>>>>>>>	st_ext.h
/************************************************/
/*	Variables to calculate statistics	*/
/*	 as result, (external declaration)	*/
/*	 for video coding simulation program,	*/
/*	 according to CCITT refference model.	*/
/************************************************/

extern	double	MV_step;	/* Mean value of step size */
extern	double	MV_NZC;	/* Mean value of number of non-zero coefficients */
extern	double	MV_ZC;	/* Mean value of number of zeroes before the last NZ */
extern	int	BM_Fix;		/* Block type of MACRO (Fixed) */
extern	int	BM_Intra;	/* Block type of MACRO (Intra) */
extern	int	BM_IntraQ;	/* Block type of MACRO (Intra + Q) */
extern	int	BM_Inter;	/* Block type of MACRO (Inter) */
extern	int	BM_InterQ;	/* Block type of MACRO (Inter + Q) */
extern	int	BM_MC_4V;	/* Block type of MACRO (Inter 4V) */
/***** added by N.Ema *****/
extern	int	BM_Affine;	/* Block type of MACRO (Inter) */
extern	int	BM_Affine4;	/* Block type of MACRO (Inter + Q) */
extern	int	BM_BGMC;	/* Block type of MACRO (Inter 4V) */
/**************************/
extern	int	BY_Fix;		/* Block type of Y (Fixed) */
extern	int	BY_Intra;	/* Block type of Y (Intra) */
extern	int	BY_FMC;		/* Block type of Y (Fixed MC) */
extern	int	BY_Coded;	/* Block type of Y (Coded) */
extern	int	BY_CMC;		/* Block type of Y (Coded MC) */
extern	int	BC_Fix;		/* Block type of C (Fixed) */
extern	int	BC_Intra;	/* Block type of C (Intra) */
extern	int	BC_Coded;	/* Block type of C (Coded) */
extern	int	NB_Mat;		/* Number of bits (Macro attr.) */
extern	int	NB_EOB;		/* Number of bits (End of Block) */
extern	int	NB_MV;		/* Number of bits (Motion vector) */
extern	int	NB_coY;		/* Number of bits (Coeff. Y) */
extern	int	NB_coU;		/* Number of bits (Coeff. U) */
extern	int	NB_coV;		/* Number of bits (Coeff. V) */

    
361.5You really should modify the sourceKAMPUS::NEIDECKEREUROMEDIA: Distributed Multimedia ArchivesWed Jan 29 1997 10:1432
    This looks like an overly complicated function for doing the normal
    sum_of_absolute differences that is used for motion estimation in
    about all of the ISO algorithms (MPEG-1,MPEG-2,H.261,H.263). My guess
    would be that this is for H.263+.
    
    A few unusual things:
    
    1) The macroblock buffers here contain "int", all optimized versions
       of this I know of use "unsigned char", processing them using byte
       vectors.
    
    2) Even if you fix the inner sum-of-absolutes loop you will loose
       a lot of performance in the overly complex address calculations for
       the searches.
    
    3) All optimized byte-vector versions of this thing use assembly
       language (and will continue to do so with PERR instructions).
       The instruction you need to make this fly right now is CMPBGE,
       PERR obviously will make it even faster.
    
    If the customer truely needs the value range of ints for the
    calculations (which I doubt, as all other ISO algorithms work
    in 8 bit), there isn't much beyond fixing the whole addressing
    stuff and doing "early outs" (your code keeps summing "bdiff"
    and compares later whether it is a better matche instead of
    checking occasionally in-between whether this is ever going to
    be less than the existing optimum "mbdiff". 
    
    We went through this exact exercise with another (?) Japanese
    company a few months ago for MPEG-2 motion estimation and sped
    their code on normal EV5 by a factor of 4 by rewriting this
    one routine.