The Complete Guide to Error Handling in Python

I often encounter developers who know the mechanics of error handling in Python very well, but when I start reviewing their code, it turns out to be less than perfect. Exceptions in Python are one of those areas that most people know the surface level of, but many developers don't even realize there is a deeper, almost arcane level. If you want to test yourself on this topic, then see if you can answer the following questions:

  • When should you catch exceptions thrown by functions you call, and when should you not?
  • How do you know which exception classes to catch?
  • What should you do when catching exceptions to “handle” them?
  • Why is catching all exceptions considered bad practice, and when is it acceptable to do so?

Are you ready to learn the secrets of error handling in Python? Then let's go!

Basics: Two Ways to Handle Errors in Python

I will start with what I think many of my readers know and what has been written about on various resources. There are two main styles of writing error handling code in Python, often referred to by the unpronounceable acronyms LBYL and EAFP. Are they familiar to you? If not, read below for a short introduction to them.

Look Before You Leap (LBYL)

The “look before you leap” error handling pattern states that before executing the action itself, it is necessary to check those conditions for its execution that may be erroneous.

if can_i_do_x():
    do_x()
else:
    handle_error()

As an example, consider the task of deleting a file from disk. When using LBYL the code will look like this:

if os.path.exists(file_path):
    os.remove(file_path)
else:
    print(f"Error: file {file_path} does not exist!")

Although at first glance this code looks quite reliable, in practice it is not.

The main problem here is that we need to know everything that could theoretically go wrong when deleting a file, so that we can check for all these potential problems before calling remove(). Obviously, the file must exist, but a missing file is not the only reason why file deletion fails:

What would the file deletion example shown above look like if we had to add all these checks?

As you can see, using the LBYL pattern, it is quite difficult to write reliable code, because you need to know all the possible reasons for the failure of the called function, and sometimes there are simply too many of them.

Another problem with the LBYL pattern is the existence of race conditions. If you test for failure conditions and then take actions, there is always a chance that the conditions will change in the short period of time between the tests and the action.

Easier to Ask Forgiveness than Permission (EAFP)

You've probably already realized that I don't have a very good opinion of the LBYL pattern (but, as we'll see below, it's useful in some situations). The EAFP pattern states that “it is easier to ask for forgiveness than for permission.” What does it mean? This means that you need to perform an action, and only then process any errors that arise.

In Python, the EAFP pattern is best implemented using exceptions:

try:
    do_x()
except SomeError:
    handle_error()

Deleting a file using EAFP is done like this:

try:
    os.remove(file_path)
except OSError as error:
    print(f"Error deleting file: {error}")

I hope you agree that EAFP is preferable to LBYL in most cases.

The significant improvement is that with this pattern, the target function is tasked with checking for errors and generating error messages, so we as the caller can make the call and trust that the function will tell us if the action fails.

On the other hand, we need to know that exceptions are written to the statement exceptbecause all the exception classes we missed will bubble up at some point and will potentially crash the Python application. If a file is deleted, it is safe to assume that any errors encountered will be related to OSError or one of its subclasses, but in other cases, you need to consult the documentation or source code to find out what exceptions a function might throw.

You may be wondering: why not catch all possible exceptions to make sure you don't miss any. This is a bad pattern that creates more problems than solutions, so I don't recommend using it except for niche cases, which I'll discuss below. The problem is that usually bugs in our own code appear in the form of unexpected exceptions. If you catch and suppress all exceptions every time you call a function, you are likely to miss exceptions that shouldn't be thrown; they are caused by bugs that need to be fixed.

To avoid the risk of losing application bugs that appear as unexpected exceptions, you should always catch the shortest list of exception classes possible, and if possible, not catch exceptions at all. Always consider not catching exceptions at all as an error handling strategy. This may seem like a contradiction, but it really isn't. We will return to this later.

Handling Python Errors in the Real World

Unfortunately, traditional knowledge about error handling is not very deep. You can fully understand LBYL and EAFP, know how they work

try

And

except

but often don't know or understand how best to write error handling code.

Therefore, now we will look at errors from a completely different angle, with the main focus on the errors themselves, and not on how they are handled. I hope this makes it much easier for you to understand what needs to be done.

New errors and pop-up errors

First, we need to classify errors based on their source. There are two types:

If you look at it, in fact, errors can only occur in two situations. You have to introduce a new error yourself and put it into the system so that some other part of the application can handle it, or receive the error from some other place and decide what to do with it.

The expression “bubbled-up” is an attribute of exceptions. When a block of code throws an exception, the party that called the error function has a chance to catch the exception in the block try/except. If the caller does not catch it, then the exception is proposed to the next caller in the call stack, and so on until some code decides to catch the exception and handle it. When an exception moves to the top of the call stack, it is called “bubbling”. If the exception is not caught and bubbles to the very top, then Python will abort the application and then you will see a stack trace with all the levels through which the error bubbled up – a very useful thing when debugging.

Errors from which you can and cannot recover

Along with determining whether the error is new or recurring, you need to decide whether it is possible to recover from it. A recoverable error is an error that the code dealing with it can resolve before continuing execution. For example, if a block of code tries to delete a file and finds that the file doesn't exist, it won't be a big problem; it can simply ignore the error and continue executing.

A non-recoverable error is an error that the code cannot recover from, or in other words, an error that prevents the code at that level from continuing to execute. As an example, we will give a function that needs to read some data from the database, change it and save it back. If the read fails, the function must exit prematurely because it cannot do the rest of the work.

We now have a simple way to categorize errors based on their source and recoverability. Thus, we get only four possible error configurations that we need to somehow handle. In the following sections, I will tell you exactly what you need to do with each of the four types.

Type 1: Handling new recoverable errors

This is a simple case: we have a block of code in our own application that encountered an error condition. Fortunately, this code can recover from this error on its own and continue executing.

What do you think is the best way to handle this case? We simply recover from the mistake and continue without interfering with anything else!

Let's look at an example:

def add_song_to_database(song):
    # ...
    if song.year is None:
        song.year="Unknown"
    # ...

Here we have a function that writes the song to the database. Let's say that in the database schema, the release year of a song cannot be null.

Using ideas from the LBYL pattern, we can check if the song's year attribute is set so that the database entry does not fail. How can we recover from this mistake? In this situation, we set the year to unknown and continue working, knowing that writing to the database will not fail (at least for this reason).

Of course, how you recover from an error depends greatly on the application and the error. In the example above, I assumed that the release year of the song was stored as a string in the database. If it's stored as a number, then probably an acceptable way to handle songs with unknown year would be to assign a value 0. In another application, the year may be mandatory, then for this application such an error will not be recoverable.

Logical? If you find an error or inconsistency in the current state of the application and you can correct the state without causing an error, then you do not need to call it, just correct the state and continue execution.

Type 2: Handling pop-up recoverable errors

The second case is a variation of the first. We have an error that is not new, it is an error that popped up from a called function. As in the previous case, the nature of the error is such that the code receiving the error knows how to recover from it and continue working.

How to handle such a case? We use EAFP to catch the error, then do whatever is necessary to recover from it and continue.

Here is the other part of the function add_song_to_database()demonstrating this case:

def add_song_to_database(song):
    # ...
    try:
        artist = get_artist_from_database(song.artist)
    except NotFound:
        artist = add_artist_to_database(song.artist)
    # ...

The functions need to be retrieved from the artist database by song, but it may fail from time to time, such as when adding the first song of that artist. The function uses EAFP to catch the error

NotFound

database, and then corrects the error by adding an unknown artist to the database and continues working.

As in the first case, here the code that needs to handle the error knows how to change the state of the application to continue executing it, so it can receive the error and continue. None of the layers in the call stack above this code need to know that an error occurred, so bubbling that error ends at this point.

Type 3: handling new unrecoverable errors

With the third case the situation is more interesting. Now we have a new error of such severity that the code does not know what to do and cannot continue executing. The only reasonable action here would be to terminate the current function and send an error notification one level up the call stack, hoping that the caller knows what to do. As discussed above, in Python the preferred way to notify the caller of an error is to throw an exception, which is what we'll do.

This strategy works well due to the interesting property of unrecoverable errors. In most cases, an unrecoverable error will sooner or later become recoverable once it reaches a high enough position in the call stack. So if an error can float up the call stack until it becomes recoverable, then it will become a type 2 error, which we already know how to handle.

Let's go back to the function add_song_to_database(). We decided that if the year of a song was not specified, we could recover from this error and prevent an error in the database by assigning the year to 'Unknown'. However, if the song doesn't have a title, it's much harder to know if it's the right thing to do at this level, so we can say that for this feature, a missing title is a non-recoverable error. Here's how we handle this error:

def add_song_to_database(song):
    # ...
    if song.name is None:
        raise ValueError('The song must have a name')
    # ...

The choice of exception class depends on the application and your personal opinion. For many errors, you can use Python's own exceptions, but if none of the built-in exceptions are suitable, you can always create your own exception subclasses. Here's an implementation of the same example with its own exception:

class ValidationError(Exception):
    pass

# ...

def add_song_to_database(song):
    # ...
    if song.name is None:
        raise ValidationError('The song must have a name')
    # ...

It is important to note that the keyword

raise

interrupts the function. This is necessary because, as we said, there is no way to recover from this error, so the rest of the function after the error will not be able to perform the actions it needs and must be terminated. Throwing an exception interrupts the current function and bubbles an error, starting from the nearest caller and moving down the call stack until some code decides to catch the exception.

Type 4: handling pop-up unrecoverable errors

We are left with the last type of error, the most interesting and my favorite.

Let's say we have a block of code that called a certain function, the function caused an error and we in our function have no idea how to fix the situation so that execution can continue, so we consider this error unrecoverable. What should we do now?

The answer will surprise you. In this case, we do absolutely nothing!

As discussed above, no error handling can be an excellent error handling strategy; This is exactly the situation I had in mind. Let's look at an example of how you can handle an error without doing anything:

def new_song():
    song = get_song_from_user()
    add_song_to_database(song)

Let's say both functions called in

new_song()

may fail and throw exceptions. Here are a couple of examples of what can go wrong with these functions:

If we cannot recover from such errors in any way, then there is no point in intercepting them. Doing nothing is actually the most useful thing we can do because it allows exceptions to bubble up. Sooner or later, exceptions will reach a level where the code knows how to recover, at which point they will be considered Type 2 errors, which are easily caught and handled.

You might think this is an extremely rare situation, but I believe it is not. In fact, you should design your applications so that as much code as possible is in functions that don't need to deal with error handling. Moving error handling code into higher level functions is a very good strategy to help you write clean, maintainable code.

Probably some of you will disagree with this. You may think that the function add_song() should at least display an error message to inform the user about the failure. I won't argue, but let's think about this for a minute. Can we be sure that we will have a console for output? What if it's a GUI application? Doesn't have a GUI stdoutthey display errors visually to users through some kind of notification or message box. Or maybe it's a web application? In web applications, we display errors by returning an HTTP error response to the user. Should this function know what type of application is running and how errors are displayed to the user? Principle sharing of responsibilities says it shouldn't.

Again, just because we don't do anything in this function doesn't mean that the error is ignored, it means that we let the error bubble up to some other part of the application with more context that can handle it correctly.

Catch all exceptions

One of the reasons you may doubt that the fourth type of error should be the most common in your application is that if we allow exceptions to bubble up freely, then they can reach the very top without being caught by any from levels, and cause the application to crash. This is a completely reasonable concern, but it is a very easy problem to solve.

You need to design your application so that an exception can never reach the Python layer. And this can be done by adding a block to the highest level try/exceptcatching all escaped exceptions.

If we were writing a command line application, we would do it like this:

import sys

def my_cli()
    # ...

if __name__ == '__main__':
    try:
        my_cli()
    except Exception as error:
        print(f"Unexpected error: {error}")
        sys.exit(1)

Here the top level of the application is in the conditional statement

if __name__ == '__main__'

and it considers all errors that reach this level as recoverable. The recovery mechanism is to display the error to the user and exit the application with an exit code

1

which tells the shell or parent process that the application failed. With this logic, the application knows how to fail, so there is no need to re-implement it somewhere else. The application may simply allow errors to bubble up, and eventually they will be caught here, an error message will be displayed, and the application will exit with an error code.

As you may remember, I said above that catching all exceptions is a bad practice. However, that is exactly what I am doing here! The reason is that at this level we can never let exceptions get to Python, so we don't want the program to ever crash. And it is in this situation that it makes sense to catch all exceptions. This is the exception (pun intended) that proves the rule.

Having a high-level block that catches all exceptions is a fairly common pattern implemented in most application frameworks. Here are two examples:

Example

I want to show you an example of how you can improve your code using a well-thought-out error handling architecture. I'll use Flask for this, but the principle applies to most other frameworks and application types.

Let's say this is a database application using the Flask-SQLAlchemy extension. In my consulting and code review experience, many developers write database operations in Flask endpoints like this:

# ВНИМАНИЕ: это пример того, как НЕ надо выполнять обработку исключений!
@app.route('/songs/<id>', methods=['PUT'])
def update_song(id):
    # ...
    try:
        db.session.add(song)
        db.session.commit()
    except SQLAlchemyError:
        current_app.logger.error('failed to update song %s, %s', song.name, e)
        try:
            db.session.rollback()
        except SQLAlchemyError as e:
            current_app.logger.error('error rolling back failed update song, %s', e)
        return 'Internal Service Error', 500
    return '', 204

Here route tries to save the song to the database and catches database errors; they are all subclasses of the exception class

SQLAlchemyError

. If an error occurs, it logs a message with an explanation and then rolls back the database session. But, of course, the rollback operation can also sometimes fail, so there is a second exception catcher to catch rollback errors and write them to the log. After all this, a 500 error is returned to the user so that he knows that a server error has occurred. This pattern is repeated on every endpoint that writes to the database.

This is a very bad decision. First, this function cannot do anything to recover from a rollback error. If a rollback error occurs, there are serious problems with the database, so you will likely want to continue. to see errors, and logging a rollback error won't help you at all. Logging an error message when a database write fails may seem useful at first, but this particular log is missing important information, particularly the stack trace of the error, the most important debugging tool you'll need later when trying to find what happened. At a minimum this code should use logger.exception() instead of logger.error()since in this case an error message and a stack trace will be written to the log. But we can do better.

This endpoint is a Type 4 endpoint, so it can be coded in a “do nothing” manner, resulting in a much better implementation:

@app.route('/songs/<id>', methods=['PUT'])
def update_song(id):
    # ...
    db.session.add(song)
    db.session.commit()
    return '', 204

Why does this work? As we saw above, Flask catches all errors, so your application will never crash due to an inability to catch an error. As part of its error handling, Flask logs the error message and stack trace, which is what we need, so we don't have to do it ourselves. In addition, Flask also returns a 500 error to the client to indicate an unexpected server error. In addition, the Flask-SQLAlchemy extension is attached to the exception handling mechanism in Flask and itself rolls back the session when a database error occurs, which is also important and necessary for us. Thus, on this path we will not have to do anything ourselves!

The process of recovering from database errors is the same for most applications, so you can leave all the dirty work to the framework, thus greatly simplifying your application code.

Errors in production and errors in development

I said that one of the benefits of moving as much error-handling logic as possible high up in the application's call stack is that it improves maintainability and readability of the code.

Another benefit of moving most of the error handling code into a separate part of the application is that having the error handling code in one place will make it easier for you to manage how the application reacts to errors. The best example here is the ease of changing behavior in the event of errors in production and application development configurations.

During the development process, it is quite possible for applications to crash and display a stack trace. In fact, this is a good thing, because we want errors and bugs to be visible and fixable. But, of course, the same application must be perfectly reliable in production; errors must be recorded in the log with notifications sent to developers, if possible. This should not leak internal or confidential error details to users.

This becomes much easier to implement when all error handling is in one place and separated from the application logic. Let's go back to the example command line application and add development and production modes to it:

import sys

mode = os.environ.get("APP_MODE", "production")

def my_cli()
    # ...

if __name__ == '__main__':
    try:
        my_cli()
    except Exception as error:
        if mode == "development":
            raise  # в режиме разработки мы позволяем приложению вылетать!
        else:
            print(f"Unexpected error: {error}")
            sys.exit(1)

Isn't it wonderful? When we run a program in development mode, we throw exceptions to crash applications so that we can see errors and stack traces while running. But we do this without compromising the reliability of the production version, which continues to catch all errors and prevent crashes. More importantly, the application logic does not need to know about these configuration differences.

Does this remind you of what Flask, Django and other web frameworks do? Many web frameworks have a development or debug mode that displays crashes in the console and sometimes in the web browser. The same solution I showed you in the fictitious CLI application example, but implemented in a web application!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *