Having bugs in your code may be unavoidable, but crashing *is* avoidable. Barring cosmic rays playing yahtzee with your memory there is no reason why your program should ever crash. Crashing is totally avoidable!
What do I mean by crashing? A program has crashed when the operating system has to close your application or one of the threads of your application for you. Usually this is accompanied by a user-unfriendly dialog box popping up with messages like, "An access violation has occurred in your application. Press OK to debug or Cancel to close." Sometimes it makes a nice blue screen, and sometimes you manage to confuse the OS so much that it ceases to function altogether. Crashes in your code should always fall into the "nasty dialog box" category unless you are writing device drivers or other kernel level stuff, or operating in less safe OSes such as Windows 95, Windows 3.1, and the like.
When a problem arises you want the worst case scenario to be a managed shutdown of your application. You want to be able to control the closing of your threads and applications; you do not want the OS to have to kill them. If you have control you can close handles, let remote services know you are going down, free up resources, etc. The biggest advantage of managing the shutdown yourself is that you can tell the user what happened in the correct context and in a user friendly way. You can put up a dialog box saying something like, "A fatal error was encountered while trying to connect to resolve the server xxx.yyy.com, please run the error reporting utility to forward your error logs to technical support." A user or a programmer is much more likely to be able to figure out what went wrong with a nice informative error message like that than with the nebulous "access violation" we saw in the crash scenario.
In code you write, there are only two things that will crash your program, accessing or deleting memory you do not own and failing to catch an exception at the top of a thread. Lets break these down.
Accessing or deleting memory you do not ownvoid InfiniteRecurse( int x )
{
if ( false )
{
// terminating condition which is never met
return;
}
else
{
// recurse condition which is always met
InifiniteRecurse(x+1);
}
} void BigRecurse( unsigned int x )
{
int aBigArray[1000];
if( x >= 1000 )
{
return;
}
else
{
aBigArray[x] = x;
BigRecurse(x+1);
}
} If we prevent the two cases above from occurring then code you write will not crash. Third party code that you call can still crash, but we will get to how to minimize that shortly.
First, we want to prevent access to memory we do not own. Let me lay out some rules to follow:
Look at these rules and apply them to the seven causes of memory violations.
That takes care of your code causing memory violations, but third party code that your code calls might still blow up. The vast majority of such blowups are caused by your code passing in a NULL when this third party code did not expect it. If this code followed rule 3 there would not be a problem, but since it doesn't you will have to do the NULL check yourself. You must not pass NULL to any function that does not specifically allow for it in its documentation. Other violations that can result in exceptions being thrown are covered in the next section.
How to prevent unhandled exception violationsWe also have to stop exceptions from forcing the OS to kill our threads or our application. The final line of defense here is to put an all-encompassing try/catch block in each thread start function and the main. The thread start functions are the first function called when starting a new thread; when this function exits the thread will terminate. A thread start function is often referred to as a ThreadProc. This catch all will stop all exceptions from killing your threads or application, but this is not the preferred place to catch any exception. You cannot tell anything about this exception from a "catch(...)". All you can say is, "some unknown error occurred!" This is not acceptable in a professional application. Instead, you should catch all exceptions as soon as they happen; this will give you the most context so you can report exactly what did cause this exception.
Third party code libraries you use *should* only throw documented exceptions. If you catch all the exceptions that they document you would think you would be safe, but of course you know that things do not always work as advertised. To catch these unexpected problems as soon as possible, follow these exception rules:
You follow all the rules above, and put catch all statements at the top of threads. Now your code only crashes in the places you forgot to follow the rules. This no crashing is nice, but your code still does not actually do what it is supposed to do all the time. Before, you at least saw the explosion when something went wrong, and using that you could sort of tell what might have happened. Now it just silently does not work. Well, that is because you have to actually handle and report all the errors! Ignoring errors will not make them go away! You will often see (wrong) code like this:
try
{
DoSomeFunction();
// ignoring return code
}
catch(...)
{
// ignoring exception
} When these errors are ignored then you of course get silent failures. Here are some basic error handling rules: