Why proper error handling should ALWAYS be done
Let's start off with an anecdote. I once used some shared lib that employed a call to longjmp() to handle error conditions. As usual with a technique like that, I found that the lib was leaking memory and resources. So I had a conversation with the original author that went like this:
ME: Why did you use longjmp() to handle errors?
HIM: I considered it to be the best design decision under the circumstances.
ME: I'm not sure what circumstances you mean, but why not try to add some resource tracking so you can recover better when that longjmp happens?
HIM: It would have been pointless.
HIM: Because my code uses some other code that also doesn't do proper error handling. It leaks memory, and also doesn't even check whether malloc returns a nul pointer. So I figured that, as long as I'm inheriting all that behavior, I may as well not worry about whether my code does the same things.
I got so mad about it that I totally rewrote his code, jettisoning the other poorly designed code he was using, ripping out longjmp altogether, and doing proper error handling. The net result is that it stopped leaking resources for me, and all the other people who used my rewrite.
Now back to the present, we have the following reply (posted off-topic in another thread):
Yes, it's true that Linux runs the risk of abruptly terminating apps, and potentially losing very important data, under low memory conditions. (As an aside, I have personally experienced running Win32 software under low memory conditions. The OS properly allowed the app to become aware of the condition, and to continue running. I consider Windows to be more robust in this scenario).
Originally Posted by pmasiar
But just because the OS may abruptly kill an app (and potentially any work the user has done with that app) is no more an acceptable explanation for not doing proper error handling, than it was in the case of that anecdote above.
Now, when an app abruptly terminates and I lose some work, I get mad. I have no doubt nearly every other enduser has the same reaction. If it's due to a bug (ie, unintentional programming error) in the software, I try to be understanding. I'm a programmer too, and we all make mistakes. (On the other hand, most endusers don't make that distinction. If a newcomer to Linux finds the software he's using to be unstable, regardless of why, he's going to say "Linux is a piece of ____, and I'm going back to Windows and spend the next couple months telling everyone I know that Linux is a piece of ____". That's how it is. If you don't know this to be true, you've not dealt with endusers).
But when I lose my data, tell the author of the software about it, and he "laughs" about it, telling me he designed it that way and he doesn't think it's a problem.... well, I'm going to say it, and I don't care what some overly repressive moderator thinks... I'd like to shoot the guy. I suspect that the vast majority of endusers would harbor the same inclination.
Now, the Pulse Audio authors have already dismissed (with frankly little regard) this qualm about their design. (And that's a very good reason for someone to want to dismiss Pulse Audio when it's time to consider adding a flawed design as a major component to a distro. I can say, after doing my own audit of ALSA, that ALSA does NOT have an intentional design flaw that terminates a calling app. It is a vastly more robust audio system for that reason alone). But this is more than just about Pulse Audio.
Programmers, do endusers a big favor, and at all costs, try to avoid using software that has such design flaws. And resist the urge to blame your own design flaws upon someone else. Take responsibity for your own coding. Endusers don't want to hear someone pass the buck, nor care about the reasons why their app just went up in smoke, along with their work. They're going to get mad. If you just "laugh" about it, that makes matters even worse.
P.S. I'll point out that, Linux does not always handle OOM conditions by terminating an app. Well, it does under Pulse Audio. Why? Simply because Pulse Audio calls exit() regardless to handle OOM. I assume we're supposed to laugh about that. Ha. Ha. Joke's on us!
Last edited by j_g; November 16th, 2007 at 09:24 PM.