More 4 Sure: How to be Crash Free

Friday, March 17, 2006

Having bugs in your code may be unavoidable, but crashing *is* avoidable. Barring cosmic rays playing yahtzee with your memory there is no reason why your program should ever crash. Crashing is totally avoidable!

What do I mean by crashing? A program has crashed when the operating system has to close your application or one of the threads of your application for you. Usually this is accompanied by a user-unfriendly dialog box popping up with messages like, "An access violation has occurred in your application. Press OK to debug or Cancel to close." Sometimes it makes a nice blue screen, and sometimes you manage to confuse the OS so much that it ceases to function altogether. Crashes in your code should always fall into the "nasty dialog box" category unless you are writing device drivers or other kernel level stuff, or operating in less safe OSes such as Windows 95, Windows 3.1, and the like.

When a problem arises you want the worst case scenario to be a managed shutdown of your application. You want to be able to control the closing of your threads and applications; you do not want the OS to have to kill them. If you have control you can close handles, let remote services know you are going down, free up resources, etc. The biggest advantage of managing the shutdown yourself is that you can tell the user what happened in the correct context and in a user friendly way. You can put up a dialog box saying something like, "A fatal error was encountered while trying to connect to resolve the server xxx.yyy.com, please run the error reporting utility to forward your error logs to technical support." A user or a programmer is much more likely to be able to figure out what went wrong with a nice informative error message like that than with the nebulous "access violation" we saw in the crash scenario.

In code you write, there are only two things that will crash your program, accessing or deleting memory you do not own and failing to catch an exception at the top of a thread. Lets break these down.

Accessing or deleting memory you do not own

Dereferencing a NULL pointer

*(NULL)
NULL->member
NULL[1]
NULL->function()
strcpy( NULL, "hello" )
*(NULL)(params);
this == NULL during an implicit (*this).

Dereferencing an uninitialized pointer

blah* pPointer; *pPointer
All same cases as (1)

Dereferencing a deleted pointer

delete pPointer; *pPointer
All same cases as (1)

Deleting an uninitialized pointer

blah* pPointer; delete pPointer;

Deleting a pointer twice

delete pPointer; delete pPointer;

Deleting non-dynamic memory

int x; int* p = &x; delete p;

Writing beyond the bounds of an array

int x[10]; x[-1] = 1;
int x[10]; x[10] = 1;
(a) and (b) but hidden in loops

Uncaught exceptions

Divide by zero

int x = 0; 2/x
double x = 0.0; 2.0/x
int x = 0; 2%x
You will also see overflow and underflow occasionally

Stack overflow

Infinitely recursive function

void InfiniteRecurse( int x )
{
 if ( false )
 {
     // terminating condition which is never met
     return;
 }
 else
 {
     // recurse condition which is always met
     InifiniteRecurse(x+1);
 }
}

Infinitely recursive set of functions
Same as (a) but a set of functions are mutually recursive, so the call stack looks like a -> b -> c -> a -> b -> c -> a -> b -> c -> a -> b -> c -> ...

Valid recursive function but each call using too much stack space

void BigRecurse( unsigned int x )
{
 int aBigArray[1000];

 if( x >= 1000 )
 {
     return;
 }
 else
 {
     aBigArray[x] = x;
     BigRecurse(x+1);
 }
}

Out of memory; this may show up as an exception on some systems, others will just return NULL from the new or malloc (Visual C++'s C library returns NULL and does not throw an exception).

int* p = new int;

User or library code generated exceptions that failed to get wrapped in a try/catch. Third party code may throw exceptions under some circumstances. Your code might intentionally throw exceptions. If these miss getting caught then the exceptions will make it all the way to the top of the thread.

ret = ThisFunctionThrowsAnException();

How to prevent memory violations

If we prevent the two cases above from occurring then code you write will not crash. Third party code that you call can still crash, but we will get to how to minimize that shortly.

First, we want to prevent access to memory we do not own. Let me lay out some rules to follow:

Pointers must be initialized when they are created, either to NULL or to valid memory.
Deleted pointers must always be set to NULL or to valid memory on the very next line after the delete.
Before dereferencing a pointer, you must check that it is not NULL. You can only skip this check if you checked the pointer before in the same function, and you did not call ANY function or execute any code that could access that pointer between then and now.
Only one pointer can own a given block of memory. This means that for any block of memory there can be only one definitive pointer and all other pointers to that block of memory must be temporary and be set back to NULL as soon as possible. You cannot trust any temporary pointer to be valid between function calls or be valid once you called code in another object.
Bounds must be checked before using an index to dereference an array pointer.

Look at these rules and apply them to the seven causes of memory violations.

Dereferencing a NULL is prevented by rule 3.
Dereferencing an uninitialized pointer is prevented by rule 1.
Dereferencing a deleted pointer is prevented by the combination of rule 2 and 3 in most cases and by rule 4 in rare cases.
Deleting an uninitialized pointer is prevented by rule 1.
Deleting a pointer twice is prevented by rule 4 and rule 2 in different cases.
Deleting non-dynamic memory is prevented by rule 4 (and common sense).
Writing beyond the bounds of an array is prevented by rule 5.

That takes care of your code causing memory violations, but third party code that your code calls might still blow up. The vast majority of such blowups are caused by your code passing in a NULL when this third party code did not expect it. If this code followed rule 3 there would not be a problem, but since it doesn't you will have to do the NULL check yourself. You must not pass NULL to any function that does not specifically allow for it in its documentation. Other violations that can result in exceptions being thrown are covered in the next section.

How to prevent unhandled exception violations

We also have to stop exceptions from forcing the OS to kill our threads or our application. The final line of defense here is to put an all-encompassing try/catch block in each thread start function and the main. The thread start functions are the first function called when starting a new thread; when this function exits the thread will terminate. A thread start function is often referred to as a ThreadProc. This catch all will stop all exceptions from killing your threads or application, but this is not the preferred place to catch any exception. You cannot tell anything about this exception from a "catch(...)". All you can say is, "some unknown error occurred!" This is not acceptable in a professional application. Instead, you should catch all exceptions as soon as they happen; this will give you the most context so you can report exactly what did cause this exception.

Third party code libraries you use *should* only throw documented exceptions. If you catch all the exceptions that they document you would think you would be safe, but of course you know that things do not always work as advertised. To catch these unexpected problems as soon as possible, follow these exception rules:

Code that accesses hardware fairly directly is always highly suspect. Due to this, any function which accesses an external subsystem like network, hard drive, etc. must be wrapped in try/catch blocks to catch any and all potential unexpected exceptions.
Any function of a complexity for which you cannot test every case must be wrapped with try/catch blocks.
Any function you suspect has the potential to change outside of your control must be wrapped with try/catch blocks.

Preventing silent problems

You follow all the rules above, and put catch all statements at the top of threads. Now your code only crashes in the places you forgot to follow the rules. This no crashing is nice, but your code still does not actually do what it is supposed to do all the time. Before, you at least saw the explosion when something went wrong, and using that you could sort of tell what might have happened. Now it just silently does not work. Well, that is because you have to actually handle and report all the errors! Ignoring errors will not make them go away! You will often see (wrong) code like this:

        try
       {
           DoSomeFunction();
           // ignoring return code
       }
       catch(...)
       {
           // ignoring exception
       }

When these errors are ignored then you of course get silent failures. Here are some basic error handling rules:

You must handle every error condition; ignoring a problem will not make it go away.
Every non-trivial function must return an error object.
The error object shall be filled in with detailed error information when an error occurs. Suggested information:

Error level
Error code
User displayed or internal flag
Error description string or string ID for user-displayed errors
Call stack if possible or whatever information you know about line/module/class/function instead
Thread ID
Timestamp

The error object should normally be handled before it gets back to the event loop.
If an error makes it up to the event loop it must be logged and action appropriate to the error level must be taken.
Errors shall have a level above which they are always logged; if they are not logged before their destructor is called, they must log themselves.
Non-logged errors should only be used for expected failures.
Exceptions should always create a logged error; if an exception is expected, it is not an exception (some of the external libraries you use may violate this beyond your control).
Do not pass on an error that you can handle appropriately.

by Christopher McGee
Courtesy : wwwdevcentral.com

¶ 1:57:00 AM

Comments:

Ya!

# posted by

Anonymous : Monday, March 20, 2006 2:17:00 AM

wat ya!....

# posted by

Jeff : Monday, March 20, 2006 9:25:00 AM

Hello, i think that i saw you visited my web site so i came to “return the
favor”.I am trying to find things to improve my site!

I suppose its ok to use a few of your ideas!!

my blog post ... generateur de Code psn

# posted by

Anonymous : Tuesday, May 28, 2013 10:19:00 AM

Hello to all, the contents existing at this web site are in fact amazing for people experience, well, keep up the nice work fellows.

my website; http://xiahseum1215.Cafe24.com

# posted by

Anonymous : Wednesday, June 05, 2013 3:31:00 AM

About Me