[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

6220.0. "mccdfw_alloc() very bad performance" by TAEC::WEBER () Wed Jan 25 1995 14:03

    Anyone can help answering this topic ?
    This is on MCC development toolkit, V1.3
    Thanks for any inputs,
    Florence

From:	VBORMC::"[email protected]" "Stephen Baker" 24-JAN-1995 21:38:42.65
To:	taec::guivier, taec::weber
CC:	[email protected], [email protected], [email protected]
Subj:	Memory allocation in TeMIP (Ultrix)

Pascale/Florence,

I've changed Operating Systems and am now running TeMIP (v1.1) / MCC
(v1.3) on Ultrix.  We have noticed some performance problems with our
AM. 

Our code sets all the attributes about 130 entities to "Attribute Not
Available".  This is taking between 30 and 60 minutes!!

We ran our AM through the profiler and the most expensive routine is
mccdfw_alloc().  We ran some specific tests and found that the
performance of this routine is extremely poor.

Here is an example code fragment

/* start */
 MCC_T_CVR status = MCC_S_NORMAL;
 struct MCCDFW_R_MEMORY_LIST *p_alloc_mem_list;
 MCC_T_Unsigned32 new_buffer_size = sizeof(MCC_T_Descriptor);
 void *pointer;
 int i;

 status = mccdfw_init_alloc_list(&p_alloc_mem_list);

 for (i=0; i < MAX; i++)
   {
     status = mccdfw_alloc(&p_alloc_mem_list,
				    &pointer,
				    &new_buffer_size);

   }

  status = mccdfw_free_alloc_list(&p_alloc_mem_list);

/* end */

This code initialises an alloc mem list, allocates space for an
MCC_T_Descriptor MAX times, and the frees the alloc mem list.

As MAX increases, the time taken to execute increases significantly.
Here are some example runs.

 MAX	 time
 1000	 0.089
 2000	 1.167
 4000	 8.374
 8000	 37.192
16000	153.904

As you can see execution time is not growing linearly.  In fact it
looks as it it grows quadratically (MAX increases by a factor of 2,
time increases by a factor of 4)!

I traced the calls to mccdfw_alloc and it appears that this routine
maintains a list of allocated memory blocks.  Each time a new block of
memory is requested it appears to have to search through the list of
already allocated blocks!  Time is proportional to the number of
already allocated blocks!  This would explain the quadratic time
behaviour.

The result is that our AM has performance that is unacceptable to the
customer.  We can't abandon alloc mem lists as they are part of the
required interface to the MCC framework.

We have re-implemented alloc mem lists locally and they run about 100
times faster.  Unfortunately I cannot make our implementation 100%
compatible with the existing mccdfw_alloc code because I don't know
how they work internally.

Could you

a) confirm that my results are valid 
b) inform us if there is patch that corrects this problem
  (Note that the versions of mccdfw_alloc work properly on OSF/1,
   so I presume the problem was detected during the "port")
c) tell us how we could work around this
  (If we had the source code for the mccdfw_alloc routines we could
   re-write them)

Thanks for your help.

Regards,
	Steve Baker

p.s.   

The performance is currently so bad that we cannot ship our AM to the
customer (no sense in making the customer aware of MCC problems if we
can avoid it).  As such we would appreciate a quick response to avoid
any delays that would affect the customer.

T.R Title User Personal
Name Date Lines

6220.1 Le operating system do what they have to do SEISME::ANTEUNIS Knowledge is a deadly tool, in the hands of fools (King Crimson) Mon Feb 06 1995 12:33 40

T.R	Title	User	Personal Name	Date	Lines
6220.1	Le operating system do what they have to do	SEISME::ANTEUNIS	Knowledge is a deadly tool, in the hands of fools (King Crimson)	`Mon Feb 06 1995 12:33`	40
	Florence, we kicked out the memory lists by setting some MAX_KEEP_thing to 0. This boils down to letting the malloc() and free() functions do what they like to. Of course there is the multi-thread issue, but CMA is perfectly capable of bringing that under control as well. The recommended way (regardless of what you find in DECmcc) when using VAX C: surround malloc() and free() with the same mutex so that only 1 thread at the time can use them when using DEC C: make sure you compile with the multithread option or (OpenVMS only) use the DECC$SET_REENTRANCY (MULTITHREAD) built-in (and not portable) function. From then on all your worries about malloc(0 and free() are gone; and the thing runs with a decent speed as well. when using yet another C compiler: if you can't verify it has a builtin behaviour for being compatible with threads, refer to the VAX C paragraph. The main idea here is "let compilers and operating systems do malloc/free. The engineers that build compilers and operating systems are paid to do just that with a respectable performance. If you think that in a particular situation you can do better PROOVE it." I know that the DECmcc code does something different. But the people who coded that considered themselves as much more competent then the compiler and operating system people. With the provable consequence that you mentioned in 6220.0 Dirk P.S. I am fanatic about LIB$GET_VM with a ZONE, (i.e. not using the default zone of 0) In the specific case where you know in advance the size of the allocated stuff is is far superior in performance and non-fragmentation then any malloc I have ever seen. I case of threads one needs to protect with mutexes, because OpenVMS does not know about them.

Florence,

we kicked out the memory lists by setting some MAX_KEEP_thing to 0.

This boils down to letting the malloc() and free() functions do what they like to.
Of course there is the multi-thread issue, but CMA is perfectly capable of bringing
that under control as well.

The recommended way (regardless of what you find in DECmcc)

when using VAX C: surround malloc() and free() with the same mutex
so that only 1 thread at the time can use them

when using DEC C: make sure you compile with the multithread option
or (OpenVMS only) use the DECC$SET_REENTRANCY (MULTITHREAD)
built-in (and not portable) function. From then on all your worries
about malloc(0 and free() are gone; and the thing runs with a decent
speed as well.

when using yet another C compiler: if you can't verify it has a builtin
behaviour for being compatible with threads, refer to the VAX C
paragraph.

The main idea here is "let compilers and operating systems do malloc/free. The engineers
that build compilers and operating systems are paid to do just that with a respectable
performance. If you think that in a particular situation you can do better PROOVE it."

I know that the DECmcc code does something different. But the people who coded that
considered themselves as much more competent then the compiler and operating system people.

With the provable consequence that you mentioned in 6220.0

Dirk

P.S. I am fanatic about LIB$GET_VM with a ZONE, (i.e. not using the default zone of 0)
In the specific case where you know in advance the size of the allocated stuff is is far
superior in performance and non-fragmentation then any malloc I have ever seen. I case of
threads one needs to protect with mutexes, because OpenVMS does not know about them.