Page 1 of 5 123 ... LastLast
Results 1 to 10 of 45

Thread: Why proper error handling should ALWAYS be done

  1. #1
    Join Date
    Feb 2007
    Beans
    236

    Why proper error handling should ALWAYS be done

    Let's start off with an anecdote. I once used some shared lib that employed a call to longjmp() to handle error conditions. As usual with a technique like that, I found that the lib was leaking memory and resources. So I had a conversation with the original author that went like this:

    ==========================

    ME: Why did you use longjmp() to handle errors?

    HIM: I considered it to be the best design decision under the circumstances.

    ME: I'm not sure what circumstances you mean, but why not try to add some resource tracking so you can recover better when that longjmp happens?

    HIM: It would have been pointless.

    ME: Why??

    HIM: Because my code uses some other code that also doesn't do proper error handling. It leaks memory, and also doesn't even check whether malloc returns a nul pointer. So I figured that, as long as I'm inheriting all that behavior, I may as well not worry about whether my code does the same things.

    ==========================

    I got so mad about it that I totally rewrote his code, jettisoning the other poorly designed code he was using, ripping out longjmp altogether, and doing proper error handling. The net result is that it stopped leaking resources for me, and all the other people who used my rewrite.

    Now back to the present, we have the following reply (posted off-topic in another thread):

    Quote Originally Posted by pmasiar
    "note that due to overcommiting on Linux OOM errors are generally not handled by malloc() returning NULL, but by killing processes via the OOM killer. So spending much time thinking about handling malloc() returning NULL is generally a waste of it."

    ...they had a good laugh out of it
    Yes, it's true that Linux runs the risk of abruptly terminating apps, and potentially losing very important data, under low memory conditions. (As an aside, I have personally experienced running Win32 software under low memory conditions. The OS properly allowed the app to become aware of the condition, and to continue running. I consider Windows to be more robust in this scenario).

    But just because the OS may abruptly kill an app (and potentially any work the user has done with that app) is no more an acceptable explanation for not doing proper error handling, than it was in the case of that anecdote above.

    Now, when an app abruptly terminates and I lose some work, I get mad. I have no doubt nearly every other enduser has the same reaction. If it's due to a bug (ie, unintentional programming error) in the software, I try to be understanding. I'm a programmer too, and we all make mistakes. (On the other hand, most endusers don't make that distinction. If a newcomer to Linux finds the software he's using to be unstable, regardless of why, he's going to say "Linux is a piece of ____, and I'm going back to Windows and spend the next couple months telling everyone I know that Linux is a piece of ____". That's how it is. If you don't know this to be true, you've not dealt with endusers).

    But when I lose my data, tell the author of the software about it, and he "laughs" about it, telling me he designed it that way and he doesn't think it's a problem.... well, I'm going to say it, and I don't care what some overly repressive moderator thinks... I'd like to shoot the guy. I suspect that the vast majority of endusers would harbor the same inclination.

    Now, the Pulse Audio authors have already dismissed (with frankly little regard) this qualm about their design. (And that's a very good reason for someone to want to dismiss Pulse Audio when it's time to consider adding a flawed design as a major component to a distro. I can say, after doing my own audit of ALSA, that ALSA does NOT have an intentional design flaw that terminates a calling app. It is a vastly more robust audio system for that reason alone). But this is more than just about Pulse Audio.

    Programmers, do endusers a big favor, and at all costs, try to avoid using software that has such design flaws. And resist the urge to blame your own design flaws upon someone else. Take responsibity for your own coding. Endusers don't want to hear someone pass the buck, nor care about the reasons why their app just went up in smoke, along with their work. They're going to get mad. If you just "laugh" about it, that makes matters even worse.

    P.S. I'll point out that, Linux does not always handle OOM conditions by terminating an app. Well, it does under Pulse Audio. Why? Simply because Pulse Audio calls exit() regardless to handle OOM. I assume we're supposed to laugh about that. Ha. Ha. Joke's on us!
    Last edited by j_g; November 16th, 2007 at 09:24 PM.

  2. #2
    Join Date
    Aug 2006
    Location
    60°27'48"N 24°48'18"E
    Beans
    3,458

    Re: Why proper error handling should ALWAYS be done

    As far as I understood it, on Linux, malloc itself tends to kill the process and does not return NULL, right?
    LambdaGrok. | #ubuntu-programming on FreeNode

  3. #3
    Join Date
    Jun 2006
    Location
    CT, USA
    Beans
    5,267
    Distro
    Ubuntu 6.10 Edgy

    Re: Why proper error handling should ALWAYS be done

    I just copy-pasted comments by pulseaudio developers on http://pulseaudio.org/ticket/158 so please don't pretend I said it and quote properly, if you are so touchy about what you said. See here
    And please don't shoot developers - they are people, too
    Last edited by pmasiar; November 16th, 2007 at 09:25 PM.

  4. #4
    Join Date
    Feb 2007
    Beans
    236

    Re: Why proper error handling should ALWAYS be done

    Quote Originally Posted by CptPicard View Post
    As far as I understood it, on Linux, malloc itself tends to kill the process and does not return NULL, right?
    Unfortunately, yes. This is considered to be a very, very bad thing. So as much as possible, it is to be avoided. A robust operating system seeks to avoid this in every possible way.

    NOTE: Linux does NOT always fail to inform the app if it can't fullfill the allocation request. In fact, it tries hard NOT to do this. (Whether it does this better or worse than other operating systems can only be discovered through rigorous testing). So do not assume that you should NOT check for a 0 return, and that you CANNOT recover from a failure to allocate memory. You should always assume that you'll get a 0 back from malloc (if it can't allocate your memory), and that your app will NOT be terminated by the operating system, and you can go on to deal with the situation... Well, unless you use Pulse Audio. Then your app terminates because that's how Pulse Audio was designed, and the author made a concious decision to do it this way, and believes it's the right decision. Why? He frankly admits it. He thinks "audio isn't important". Other things are more important, so your audio using app should just vanish in a puff of smoke to free up memory for other apps. (Of course, the enduser doesn't want to be informed that he's low on memory, so that he can manually close down other apps himself, saving his work. No, let Pulse Audio make that decision for you. You don't need THAT app, and THAT data, right?)

    So any app that uses audio, whether it be an app where audio is even incidental (ie, maybe it's a word processor that decides to make a beep noise when you enter an invalid value -- out of memory? -- oops. You're history, buddy). I'd advise anyone using Pulse Audio to turn off any audio features of all software that may be important to you.

    Let's all have a good laugh about it.
    Last edited by j_g; November 19th, 2007 at 02:41 PM.

  5. #5
    Join Date
    Jan 2006
    Beans
    Hidden!
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: Why proper error handling should ALWAYS be done

    After reading the developer reply (thank you pmasiar for forming this bridge between us and the developers of PulseAudio) I understand the reason for a wrapper around malloc that kills the program.

    Since I tend to be pragmatic (and I like to gloat about my new system) ... I have 8GB of RAM and if malloc returns NULL, I have a bigger problem on my hands (before upgrade, I had 1GB of RAM and 3GB of swap, now it's just 8GB of RAM).

    If a program MUST have some memory to run but can't get it, why have the program commit sepuku? Have someone else do it painlessly to the program.

    Quote Originally Posted by j_g
    Unfortunately, yes. This is considered to be a very, very bad thing. So as much as possible, it is to be avoided. A robust operating system seeks to avoid this in every possible way.
    My question becomes, should malloc be a blocking system call?

    If I need 100MB of memory and can't do my function if it isn't there, what should I/OS do if there isn't 100MB free that I can use?
    I am infallible, you should know that by now.
    "My favorite language is call STAR. It's extremely concise. It has exactly one verb '*', which does exactly what I want at the moment." --Larry Wall
    (02:15:31 PM) ***TimToady and snake oil go way back...
    42 lines of Perl - SHI - Home Site

  6. #6
    Join Date
    Aug 2006
    Location
    60°27'48"N 24°48'18"E
    Beans
    3,458

    Re: Why proper error handling should ALWAYS be done

    Seconded... failing memory allocations usually happen, say, somewhere deep in data structures. If you fail, those structures are usually left in an unusable state, and pushing up something like an OOM exception for explicit upper-level handling would be ugly.

    From the OS perspective, running out of resources neccessitates killing off something so it might just as well kill the process that failed the allocation.
    LambdaGrok. | #ubuntu-programming on FreeNode

  7. #7
    Join Date
    Jun 2006
    Location
    CT, USA
    Beans
    5,267
    Distro
    Ubuntu 6.10 Edgy

    Re: Why proper error handling should ALWAYS be done

    Quote Originally Posted by j_g View Post
    Unfortunately, yes. This is considered to be a very, very bad thing. So as much as possible, it is to be avoided. A robust operating system seeks to avoid this in every possible way.
    In wikipedia, they would add [citation requested] tag Here, we can ask: Considered by whom? In what circumstances?

    Personally I don't care about audio at all, so if audio went dead to make sure something more important could run, I am fine with that and don't even want to be bothered. I assume I can restart it later, no?

  8. #8
    Join Date
    Aug 2006
    Location
    60°27'48"N 24°48'18"E
    Beans
    3,458

    Re: Why proper error handling should ALWAYS be done

    Your audio wouldn't go dead. Your entire app would

    Damn, it's tough being able to appreciate both sides of the argument...
    LambdaGrok. | #ubuntu-programming on FreeNode

  9. #9
    Join Date
    Feb 2007
    Beans
    236

    Re: Why proper error handling should ALWAYS be done

    Quote Originally Posted by slavik View Post
    should malloc be a blocking system call?
    Absolutely not (well, not unless there's a way to disable blocking. And even then, it should be off by default). Being able to allocate memory is probably the most basic function that an app needs to do. And this may indeed be done in a situation where some important process needs to complete. So, malloc should not block an app. If malloc can't fullfill the request, it should "immediately" return 0, and not terminate the app, and let the app decide how it wants to proceed.

    Note I put "immediately" in parentheses. This is because, typically malloc is a very time-consuming call. It takes awhile for it to complete. And note that it's almost a guarantee that, on a multi-tasking OS, malloc will have some sort of "memory resource arbitration". After all, malloc somehow has to manage a list of available free memory, and you can't have two threads simultaneously monkeying with that list. That's like having two cooks making one cake, blindfolding them, and giving them both the same set of ingredients. They're probably going to mess up each other's actions. So malloc itself may indeed "serialize" access to the list. In other words, it may indeed block your app. But it should ideally do so for the shortest possible amount of time, and absolutely not indefinitely. (I'm trying to explain this in a really simple, clear way. It's very complicated).

    If I need 100MB of memory and can't do my function if it isn't there, what should I/OS do if there isn't 100MB free that I can use?
    The OS (essentially, malloc) should do exactly what I describe above -- not terminate you, and instead return 0 immediately.

    You should do whatever you think is the best way to recover. For example, maybe plan ahead. I'll give you an anecdote to illustrate.

    I once saw this error handling function in a shared lib. The function took an error number, and it returned a nul-terminated error message (for that error number). For example, if you passed an error number of 5, it would return the nul-terminated string "Out of memory". This is so that the app could display an error message to the enduser.

    Now, the guy decided that he wanted to have error messages in various languages (ie, english, spanish, republican, democrat, etc). So he put all of the english messages in one text file. Then, he put all of the corresponding spanish messages in another text file. He used the computer's locale to decide which text file to reference.

    This isn't the actual code, but let's say it was something like this. (NOTE: I'm showing half-finished untested code without error-checking, just to move things along):

    Code:
    char * get_error_message(int error)
    {
       FILE *fh;
       char *str;
       char buf[256];
    
       // Open the text file
       fh = fopen(MyTextFilename, "r");
    
       str = 0;
    
       // Go through the lines, to find the line corresponding
       // to the error number
       do
       {
          str = fgets(buf, sizeof(buf), fh);
       }  while (error-- && str);
    
       // Make a copy of the string
       if (str)
       {
           char *temp;
    
           temp = malloc(strlen(str) + 1);
    
          // NOTE: Here the Pulse Audio author would call
          // exit()... oh excuse me, _exit()... if temp == 0.
    
          strcpy(temp, str);
          str = temp;
       }
    
       fclose(fh);
    
       // Return error msg to app. NOTE: App must free()
       // it when done
       return(str);
    }
    Now there's a potential problem with regard to malloc. What if the call fails? The function returns a 0 which means the app has no error message to display. Not too useful. (In fact, it's downright funny. If the app passes a 5, and malloc fails, that means the function failed to return the message "Out of memory" because... um... it ran out of memory. I laughed. I thought it was a whole lot funnier than pmasiar thought it was when the Pulse Audio author said that it doesn't matter whether you handle malloc failing).

    I thought the best solution was to allocate memory, right at the start of the program, to load the text file into an array of nul-terminated strings (with an extra nul at the end). Then I rewrote his function (again not actual code):

    Code:
    char * get_error_message(int error)
    {
       char *str;
    
       // Get the array already in memory
       str = ArrayOfNulTermStrs;
    
       // Go through the lines, to find the line corresponding
       // to the error number
       while (error-- && *str)
          str += strlen(str) + 1;
    
       // Return error msg to app. NOTE: App must NOT free()
       // it when done, nor alter it at all. Just display it as is
       return(str);
    }
    So I planned ahead by eliminating the malloc where it was very inconvenient to have it. If I couldn't alloc the mem when the app started up, well at least the user was informed of that before he started working. He didn't get to a situation where something failed, and yet he couldn't be informed what happened.

    Note that an easy way of fixing it could have been to alter the malloc'ing:

    Code:
    temp = malloc(strlen(str) + 1);
    if (temp)
    {
       strcpy(temp, str);
       str = temp;
    }
    else
       str = "Please close other running programs and retry the operation";
    Ok, maybe he doesn't get the error message he was supposed to get. And it's in english which is maybe not the right language. But it's better than nothing at all, right? If the enduser follows the advice, and it works, he should be happy. And if he's happy, then you accomplished your job. Sometimes, posting an error message like the above is perfectly fine (assuming your app isn't like some heart monitor software. You have to design your program with the usage in mind).

    But please, please, please don't stick a call to exit() (or _exit()) in there when malloc fails, even if the Pulse Audio author tells you (and pmasiar agrees) it's ok because your app "isn't important", and maybe the OS will terminate you anyway, and the enduser didn't need to save his work anyway. Please, please, please, folks. Just don't. Your endusers will thank you.
    Last edited by j_g; November 17th, 2007 at 08:15 AM.

  10. #10
    Join Date
    Jun 2006
    Location
    CT, USA
    Beans
    5,267
    Distro
    Ubuntu 6.10 Edgy

    Re: Why proper error handling should ALWAYS be done

    Quote Originally Posted by j_g View Post
    But please, please, please don't stick a call to exit() (or _exit()) in there when malloc fails, even if the Pulse Audio author tells you (and pmasiar agrees) it's ok because your app "isn't important",
    I really don't have opinion on audio - I mostly don't use it. So don't look at my opinion - I don't have any.

    I got involved in this discussion for sole reason that you claimed there is error on unhandled exception or something, and you refused to report it to developers. I did, now we know their opinion and reasons, from both sides, and people can make their own informed opinions whom to trust. Your opinion alone looked for me too allarmistic, and my gut feeling was kinda right: there **are** valid reasons to do it the way audio developers did - and one does not have to agree with them.

    Personally I consider CptPicard's comment as closest to my feeling about the issue

Page 1 of 5 123 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •