| From: DEC:.REO.REOVTX::HUDSON "[email protected] - UK Software
Partner Engineering 830-4121" 25-FEB-1997 11:13:19.04
To: nm%vbormc::"[email protected]"
CC: HUDSON
Subj: re:ASAP Question:segv inside malloc
Hello Erwin Glassee
Thank-you for your ASAP mail question on SEGV errors from malloc() calls.
I have seen many cases of programs that fail in calls to malloc/free/etc., and
they are almost always due to programming errors (I personally have never seen
a case where the problem was caused by a bug in the operating system libraries;
the last time I can find such a thing reported for OSF was fixed in version
2.1).
> Is this a bug
>in the standard libraries (after all, calling malloc should never
>cause a SEGV, should it ?) ? On all other platforms this code
>runs fine, and I have no idea what else I could try.
The routines that manipulate the heap typically use some kind of linked list
data structure to keep track of how much memory you have requested and
free'd, in an attempt to allow for efficient memory use. Different versions of
Unix may implement these routines in different ways, and so if your program
mistakenly overwrites data in the heap that it doesn't own, it doesn't
necessarily mean that subsequent malloc()'s will fail. But it might do.
For example, if you malloc() a 20 element array, and write into element 25,
then it could be that
- you won't notice anything wrong (no-one else is using that memory)
- some data structure of yours will be corrupted
- some data structure used by the heap routines will be corrupted
It is not possible to predict what will happen as it will depend on the state
of the heap, and what other routines have been using it recently, etc..
Given that malloc() will be traversing a linked list to find suitable bits of
memory to give you, it is quite easy to see that a SEGV could occur if one of
the pointers it is following has been corrupted.
>At first I tought this could be caused by calling a mixture of
>C malloc and C++ new, but I introduced a global new operator that
>calls malloc, and the behaviour hasn't altered.
The C++ global "new" operator uses the standard malloc() underneath. There is
no problem mixing "new" and "malloc()" in the same program; the only
restriction is that you can't mix use of them for the same data structure.
E.g. if you "new" something, you must use "delete" when you've finished with
it, and if you "malloc" something, you must use "free". You can't do:
int *a = malloc(sizeof(int));
delete a; // !! wrong !!
>Could you suggest something else that can be wrong ?
The most common things that could be happening are:
1) using more memory than you've allocated
e.g.
int *a = new int[10];
a[10] = 1; // !! wrong, max subscript is 9
2) using a pointer to memory which you've released
e.g.
int *a = new int[10]
myfunc(a);
delete[] a;
a[1] = 1; // !! wrong, *a is undefined
3) free'ing memory more than once
e.g.
int *a;
a = (int *)malloc(sizeof(int));
free(a);
free(a); // !! wrong, a already free'd
All of the above will compile with no errors. At runtime, you may get a
problem or you may not; it is impossible to predict.
Unfortunately these errors are not always so easy to spot in a real
application. So usually you will need to do some extra work.
There are a couple of things I would suggest. One thing that some people do is
to incorporate extra error checking. For example, in the case of C, you could
define your own versions of malloc() and free(), which keep a log of which
memory your application has been using, so that when the program fails you
could look in the log for any obvious discrepancies.
Another approach that some people use is to try and make the program fail very
soon after the illegal operation (in your case, it may be that the SEGV is
happening many millions of instructions after the error occurred, which
complicates the investigation). What you could do is ensure that whenever you
malloc() memory, you fill it with a known pattern, say "0xff". When you free
memory, you fill it with another known pattern, say "0xee" just before you free
it. In a "legal" program this will not cause a problem, but because many
problems are caused because the heap contains pointers, the values "0xeeee.."
and "0xffff..." will be much more likely to cause a problem quickly.
The other thing to look at is a tool called "Third Degree", which is one of the
Atom tools. Third Degree will keep track of when you are misusing the heap,
and will give you a list of all the things your application is doing wrong
(e.g. it will catch any of the 3 scenarios I mentioned above). This tool is
documented in the "Programmer's Guide", chapter 7 of the Unix documentation
set. It is a very powerful tool, but you may find it picks up a lot more
errors than you were expecting (including some memory leaks in the standard
libraries), so you might spend a long time fixing everything it reports!
I hope this information is helpful to you
Regards
Nick Hudson
Digital Software Partner Engineering.
|
| From: VBORMC::"[email protected]" "Erwin Glassee" 26-FEB-1997 08:46:57.84
To: "[email protected] - UK Software Partner Engineering 830-4121
25-Feb-1997 1113 +0000" <[email protected]>
CC:
Subj: Re: ASAP Question:segv inside malloc
[email protected] - UK Software Partner Engineering 830-4121
25-Feb-1997 1113 +0000 wrote:
>
> Hello Erwin Glassee
>
> Thank-you for your ASAP mail question on SEGV errors from malloc() calls.
>
> The routines that manipulate the heap typically use some kind of linked list
> data structure to keep track of how much memory you have requested and
> free'd, in an attempt to allow for efficient memory use. Different versions of
> Unix may implement these routines in different ways, and so if your program
> mistakenly overwrites data in the heap that it doesn't own, it doesn't
> necessarily mean that subsequent malloc()'s will fail. But it might do.
> I hope this information is helpful to you
>
> Regards
>
> Nick Hudson
> Digital Software Partner Engineering.
I was able to track the problem down, and it was indeed due to
allocating insufficient memory for an array. On Dec Alpha, the end
of the array must have been located in the free block list of the
malloc routines, while this was not the case on the other platforms,
which would explain the symptomatics of the problem.
Thank you very much for your help and your suggestions.
Regards
Erwin.
--
Erwin Glassee [email protected]
tel. +32/16/384527 (direct) or +32/16/384500 fax +32/16/384550
LMS Numerical Technologies, Interleuvenlaan 70,
Researchpark Haasrode Z1, B-3001 Leuven, Belgium
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: from mail.vbo.dec.com (mail.vbo.dec.com [16.36.208.34]) by
vbormc.vbo.dec.com (8.7.3/8.7) with ESMTP id JAA03713 for
<[email protected]>; Wed, 26 Feb 1997 09:43:51 +0100
% Received: from server21.digital.fr (server21.digital.fr [193.56.15.21]) by
mail.vbo.dec.com (8.7.3/8.7) with ESMTP id JAA09260 for
<[email protected]>; Wed, 26 Feb 1997 09:48:35 +0100 (MET)
% Received: from ironman.lms.be ([193.121.76.65]) by server21.digital.fr
(8.7.5/8.7) with ESMTP id JAA08453 for <[email protected]>; Wed, 26 Feb
1997 09:51:03 +0100 (MET)
% Received: from lmsnit.be (jedi.lmsnit.be [192.168.1.20]) by ironman.lms.be
with SMTP (8.7.1/8.7.1) id JAA14590 for <[email protected]>; Wed, 26
Feb 1997 09:40:35 +0100 (MET)
% Received: from joske by lmsnit.be (4.1/SMI-4.1) id AA14130; Wed, 26 Feb 97
09:40:21 +010
% Sender: [email protected]
% Message-Id: <[email protected]>
% Date: Wed, 26 Feb 1997 10:40:34 +0100
% From: Erwin Glassee <[email protected]>
% Organization: LMS Numerical Technologies
% X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.0 i686)
% Mime-Version: 1.0
% To: "[email protected] - UK Software Partner Engineering 830-4121
25-Feb-1997 1113 +0000" <[email protected]>
% Subject: Re: ASAP Question:segv inside malloc
% References: <[email protected]>
% Content-Type: text/plain; charset=us-ascii
% Content-Transfer-Encoding: 7bit
|