[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECmcc user notes file. Does not replace IPMT. |
Notice: | Use IPMT for problems. Newsletter location in note 6187 |
Moderator: | TAEC::BEROUD |
|
Created: | Mon Aug 21 1989 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6497 |
Total number of notes: | 27359 |
6220.0. "mccdfw_alloc() very bad performance" by TAEC::WEBER () Wed Jan 25 1995 14:03
Anyone can help answering this topic ?
This is on MCC development toolkit, V1.3
Thanks for any inputs,
Florence
From: VBORMC::"[email protected]" "Stephen Baker" 24-JAN-1995 21:38:42.65
To: taec::guivier, taec::weber
CC: [email protected], [email protected], [email protected]
Subj: Memory allocation in TeMIP (Ultrix)
Pascale/Florence,
I've changed Operating Systems and am now running TeMIP (v1.1) / MCC
(v1.3) on Ultrix. We have noticed some performance problems with our
AM.
Our code sets all the attributes about 130 entities to "Attribute Not
Available". This is taking between 30 and 60 minutes!!
We ran our AM through the profiler and the most expensive routine is
mccdfw_alloc(). We ran some specific tests and found that the
performance of this routine is extremely poor.
Here is an example code fragment
/* start */
MCC_T_CVR status = MCC_S_NORMAL;
struct MCCDFW_R_MEMORY_LIST *p_alloc_mem_list;
MCC_T_Unsigned32 new_buffer_size = sizeof(MCC_T_Descriptor);
void *pointer;
int i;
status = mccdfw_init_alloc_list(&p_alloc_mem_list);
for (i=0; i < MAX; i++)
{
status = mccdfw_alloc(&p_alloc_mem_list,
&pointer,
&new_buffer_size);
}
status = mccdfw_free_alloc_list(&p_alloc_mem_list);
/* end */
This code initialises an alloc mem list, allocates space for an
MCC_T_Descriptor MAX times, and the frees the alloc mem list.
As MAX increases, the time taken to execute increases significantly.
Here are some example runs.
MAX time
1000 0.089
2000 1.167
4000 8.374
8000 37.192
16000 153.904
As you can see execution time is not growing linearly. In fact it
looks as it it grows quadratically (MAX increases by a factor of 2,
time increases by a factor of 4)!
I traced the calls to mccdfw_alloc and it appears that this routine
maintains a list of allocated memory blocks. Each time a new block of
memory is requested it appears to have to search through the list of
already allocated blocks! Time is proportional to the number of
already allocated blocks! This would explain the quadratic time
behaviour.
The result is that our AM has performance that is unacceptable to the
customer. We can't abandon alloc mem lists as they are part of the
required interface to the MCC framework.
We have re-implemented alloc mem lists locally and they run about 100
times faster. Unfortunately I cannot make our implementation 100%
compatible with the existing mccdfw_alloc code because I don't know
how they work internally.
Could you
a) confirm that my results are valid
b) inform us if there is patch that corrects this problem
(Note that the versions of mccdfw_alloc work properly on OSF/1,
so I presume the problem was detected during the "port")
c) tell us how we could work around this
(If we had the source code for the mccdfw_alloc routines we could
re-write them)
Thanks for your help.
Regards,
Steve Baker
p.s.
The performance is currently so bad that we cannot ship our AM to the
customer (no sense in making the customer aware of MCC problems if we
can avoid it). As such we would appreciate a quick response to avoid
any delays that would affect the customer.
T.R | Title | User | Personal Name | Date | Lines |
---|
6220.1 | Le operating system do what they have to do | SEISME::ANTEUNIS | Knowledge is a deadly tool, in the hands of fools (King Crimson) | Mon Feb 06 1995 12:33 | 40 |
| Florence,
we kicked out the memory lists by setting some MAX_KEEP_thing to 0.
This boils down to letting the malloc() and free() functions do what they like to.
Of course there is the multi-thread issue, but CMA is perfectly capable of bringing
that under control as well.
The recommended way (regardless of what you find in DECmcc)
when using VAX C: surround malloc() and free() with the same mutex
so that only 1 thread at the time can use them
when using DEC C: make sure you compile with the multithread option
or (OpenVMS only) use the DECC$SET_REENTRANCY (MULTITHREAD)
built-in (and not portable) function. From then on all your worries
about malloc(0 and free() are gone; and the thing runs with a decent
speed as well.
when using yet another C compiler: if you can't verify it has a builtin
behaviour for being compatible with threads, refer to the VAX C
paragraph.
The main idea here is "let compilers and operating systems do malloc/free. The engineers
that build compilers and operating systems are paid to do just that with a respectable
performance. If you think that in a particular situation you can do better PROOVE it."
I know that the DECmcc code does something different. But the people who coded that
considered themselves as much more competent then the compiler and operating system people.
With the provable consequence that you mentioned in 6220.0
Dirk
P.S. I am fanatic about LIB$GET_VM with a ZONE, (i.e. not using the default zone of 0)
In the specific case where you know in advance the size of the allocated stuff is is far
superior in performance and non-fragmentation then any malloc I have ever seen. I case of
threads one needs to protect with mutexes, because OpenVMS does not know about them.
|