How NOT to write a shared library

**CptPicard** · November 16th, 2007

Got to admit I feel his argument that "software should always save data with worst case scenario in mind in case of crash" is weak

I'd take checking for malloc == NULL any day over being totally paranoid about saving my user's data all the time.

The argument for not terminating from library is valid... maybe not in this case but it is...

**public_void** · November 16th, 2007

That was very informative by the Pulse Audio developer. So there are reasons to call exit in a library. I'd like to thank him for his good reply.

**j_g** · November 16th, 2007

Originally Posted by public_void

So there are reasons to call exit in a library.

I disagree completely, for the detailed reasons given in my previous posts. I feel that any serious application programmer, as well as enduser, also would agree that a situation where apps are abruptly terminated without their knowledge, permission, and recourse to save work, is "unreasonable" at best.

I am relieved that there are other developers here who have agreed.

**CptPicard** · November 16th, 2007

Originally Posted by j_g

situation where apps are abruptly terminated without their knowledge, permission, and recourse to save work, is "unreasonable" at best.

How do you plan on actually managing that in practice in a hard-OOM situation? On modern systems, you're going to need to malloc just to get out of the mess.

**Wybiral** · November 17th, 2007

Originally Posted by CptPicard

How do you plan on actually managing that in practice in a hard-OOM situation? On modern systems, you're going to need to malloc just to get out of the mess.

That's a common argument I hear on this situation. People often say "What does it matter, you're out of memory anyway... How can you recover?"

But there's a huge difference from being able to load a 1gb audio track to being able to allocate the small amount of memory required to say "Unable to load file (needs more memory)".

I don't know anything about pulse or how they're using it, but in general... If you have the possibility of reporting the error (instead of exiting the program) then I would suggest you do that.

The same goes with harddrive space... If I'm trying to save a file and my program bails because I ran out of space, I would be pretty upset. But if it says "Not enough disk space, please free some and try again." then I'm happy.

**j_g** · November 17th, 2007

Originally Posted by CptPicard

How do you plan on actually managing that in practice in a hard-OOM situation?

First of all, a decent operating system will not typically cause some sort of "kernel panic" as a result of a memory failure, and start terminating apps. (I realize the Pulse Audio author has implied otherwise, but that's because he doesn't want people to reject an excuse such as "Why should I bother doing proper error checking in my code? The Linux kernel does worse anyway.". That's a straw man argument). Such a situation is only a last resort, in extreme situations. If a memory allocation fails, a decent OS will first and foremost try to return an error code to the app. Let me repeat that: a decent OS will first and foremost try to return an error code to the app. In the case of malloc, it returns 0.

I'm going to stress this next point as well, because I don't want people to be misled by a self-serving, and misleading, implication.

An app should ALWAYS assume that malloc may return a 0, and in such a case, properly deal with that. If someone ever tells you that it's "pointless" to try to gracefully handle a malloc failure, do not believe it.

An app should never assume that a failed call to malloc will result in the app being terminated. (Well, unless you're calling malloc via Pulse Audio's pa_xalloc. Then, assume crash position). Properly dealing with the situation involves exhausting every possibility to preserve the user's work, and keeping your app running. Improperly dealing with the situation would be doing something like calling exit() right then and there, especially doing it in a shared lib that is to be used as a major component of a distro, such as the audio core, and therefore used by many other apps.

Think about it. How many apps have you written that have a direct, or indirect, call to malloc somewhere? Pretty much every app, right? If every failed call to malloc typically resulted in the caller being abruptly terminated, that would be a really, really, really crummy, useless OS. Would anyone use such an OS to run their servers? Would anyone use it for any serious work? I very much doubt it.

So why do people use Linux to run their servers? Because Linux typically does not terminate an app when a call to malloc fails. It's not a totally crummy, useless OS (despite what certain programmers would have you believe, in an effort to excuse their own bad coding).

So, given that you should expect a malloc failure to return 0, in answer to the question "how does an app deal with an OOM condition (with malloc)?", I say:

Check for a zero return, and try to recover from it in some way. See my post in the thread "Why proper error handling should ALWAYS be done" for an anecdotal example, and a much more detailed discussion of what I mean by "recover".

P.S. Read Wy's post as well for a sober, and more reasonable response than "Why should I do proper error checking in my code? The Linux kernel does worse anyway".

**Wybiral** · November 17th, 2007

I decided to test this out on my own computer using this code:

Code:

#include <stdio.h>
#include <stdlib.h>

#define MB (1024 * 1024)

int main(int argc, char *argv[])
{
    char *mem = NULL;
    long long unsigned int size = MB * 512;
    for(;;)
    {
        if((mem = malloc(size)) != NULL)
        {
            free(mem);
            printf("Successfully allocated and freed %llu bytes!\n", size);
        }
        else
        {
            printf("Unable to allocate %llu bytes. Resize and try again!\n", size);
            break;
        }
        size += MB * 512;
    }
    return 0;
}

Here's the results:

Code:

Successfully allocated and freed 536870912 bytes!
Successfully allocated and freed 1073741824 bytes!
Successfully allocated and freed 1610612736 bytes!
Successfully allocated and freed 2147483648 bytes!
Unable to allocate 2684354560 bytes. Resize and try again!

It worked! It properly failed and reported it...

BTW, I only have 512mb ram. So this issue would actually come up if I were having to use too intensive of an application. I would much rather it give me a message like this than bail without saving my work.

But like I said, I haven't looked in the pulse code so I have no idea what context it's being used. But the generic argument that it's pointless not to bail from a failed malloc is wrong. Please don't write code that is going to destroy peoples work without warning.

**j_g** · November 17th, 2007

Originally Posted by Wybiral

It worked!

Oh, I knew that a failed call to malloc typically does not cause the kernel to terminate an app, and I've been telling folks this in previous posts. I've already had a situation where I received back a 0 from malloc on Linux. (Made a mistake in the number of bytes I told it to allocate). I knew that when the Pulse Audio author said spending much time thinking about handling malloc() returning NULL is generally a waste of it, he was wrong, wrong, wrong. And then he was wrong too. Several other statements meant to suggest that a failed malloc call may terminate an app anyway, so why worry about it, are a straw man argument.

But some people who didn't know any better believed it. Why? Because he had some software bundled with Fedora? Good for him. Congratulations. But his statements are still wrong.

And he still shouldn't be putting an exit() in a shared lib, especially one that is being used as Fedora's "audio core". He needs to stop excusing that, saying things that make other programmers misunderstand what really happens with a malloc failure, and redesign his code. He should also be very careful what he says. I've read some messages on lists where he talks about latency, and other features, of Pulse Audio being superior to other APIs like ALSA. Don't anyone kid themselves. This guy wants Pulse Audio to be taken seriously for important audio use, and be chosen over other APIs for all audio use. But then he says things like audio software should be the first one to be killed (terminated), audio *never* is a crucial component of the system, and If audio crashes due to what reason whatsoever all you lose is that your music stops to play -- but no data is lost. Now imagine you're the developer of RoseGarden, or Audacity, or Ubuntu Studio, or any other software used by people to do serious audio work. What are you going to think about statements like that? Mark my words. Those will come back to haunt him. Although lots of people have now seen that page, I wouldn't be surprised if those statements mysteriously "disappear" at some point.

Look, I don't have anything against the guy. I have read some things about Pulse Audio, and there are actually some pretty good ideas in it. (But, I haven't looked over all the particular code, so I'm definitely not yet prepared to say that I think his particular implementation is good). And there may be various code in Pulse Audio that I'd say "that's done well". But not in this particular instance. Not in pa_xmalloc. Not with that call to exit(). Not with other Pulse Audio functions directly, and indirectly, winding their way through that code. And his closing of a "bug report" about it as "invalid" only hours after the report, coupled with various statements he made in his explanation, means that I personally would never use his code (nor any update to the code) without a complete security audit to see if there was any unacceptable design in the code.

I haven't looked in the pulse code so I have no idea what context it's being used.

Here it is in a nutshell. He has a function called pa_xmalloc. It's a wrapper around malloc. When malloc returns a 0 (um... why is even checking for 0 after he said that handling malloc() returning NULL is generally a waste of it... like I said, his whole argument just seemed illogical to me), he calls exit(). (Ok, he pedantically insisted that he doesn't call exit. He uses a macro. But it defines to a call to abort() and exit(). Don't believe it? Take your example and modify it to call pa_malloc as it exists now. You will NOT see that last message. Hell, the guy even documents pa_xmalloc as "on OOM, terminate". Since when is terminate not terminate??? I'd say that his explanation is not as forthright as it could be).

And then other Pulse Audio functions also call pa_xmalloc. Plus, he documents it as part of the API, so presumably, he expects apps to also call it, although I would definitely not encourage that.

**j_g** · November 17th, 2007

Originally Posted by Wybiral

I decided to test this out on my own computer

Wy, do me a favor. (I'm asking you to do it because there are certain people who apparently need to hear it from you too).

On that malloc failure, go ahead and ignore the zero return value. After all, we all know that spending much time thinking about handling malloc() returning NULL is generally a waste of it. So just ignore the fact that malloc returns 0 and go ahead and use that return, such as maybe store a value there:

Code:

*mem = 'A';

Tell us (read: anyone who hasn't used C in over 15 years) what happens when you don't worry about properly handling a zero return value from malloc.

And just for the sake of laughs, then try putting the following code in your app:

Code:

if (mem == 0) exit();

Tell us what difference that makes (ie, how much "better" the latter error handling is than the previous code), so we can compare how well all the above is versus the (proper) error handling of your original code.

I'm in the mood for a good laugh.

**aks44** · November 17th, 2007

It frightens me that there is the need to even explain this simple thing. In fact, what *really* scares me most is that renowned developers (RedHat, libxine authors, who else?...) knowingly decided to use a flawed library without regard to their end-user's stability.

So much for the "peer-review ensures quality code" adage...

I guess I'd better stop my rant here before it becomes unpleasant...

Thread: How NOT to write a shared library

Thread Tools

Display

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Re: How NOT to write a shared library

Bookmarks

Bookmarks

Posting Permissions