Using Underscores in Python Code

Underline _ — is a symbol used in names in Python code. It affects how the code works and how the code is read. Knowing where to place the underscore helps you write code.

Specifications and Agreements

Let's talk about specifications and conventions. Both are rules that are followed when writing code. The difference is that specifications are like the laws of physics, they cannot be broken, while conventions can be broken. An example of a convention is code style. PEP8.

Generally accepted conventions

Agreements are important for team projects when participants work with someone else's code. Uncle Bob gives the relationship in the article reading to coding as 10:1. Conventions help you extract more information from your code when you read it, and they also make it faster to read because the code matches your expectations.

In the text I will talk about the conventions supported by popular tools, such as linters, formatters and IDEs. Support by tools helps save time on following the conventions. In some cases you don't even need to talk about these conventions, the tools themselves will tell you when you violate them.

For example, a convention I learned from a linter message: Don't use f-strings in an exception constructor, because the string with the source code is written to the logs next to the message itself, resulting in duplication. This can make the logs harder to read (https://docs.astral.sh/ruff/rules/f-string-in-exception/). The code has become slightly better and you have not spent anything on it. The entry threshold is zero, you do not need to know this rule and the tools will tell you about it. At the review stage, I will never ask to fix this, it is too insignificant. The positive effect of some agreements will be only with full automation.

I had a positive experience using automation when I was teaching people Python programming from scratch. When students take their first steps in programming, they are bombarded with a lot of information. To avoid overloading them with formatting rules, I created a project template on GitHub, which already has linters and formatters connected and configured. Homework was submitted as a pull request, on which checks were automatically launched. As a result, I never saw poorly formatted code or common beginner mistakes in students. When reviewing, I spent time on Python concepts, algorithms, and features of the program logic. I used a simplified version of my personal template for new projects project_template.

Names in Python

Names are the names of variables, functions, classes, class attributes, modules, and argument names. In the examples I will often use functions, but this applies to other names as well.

Examples of attributes:

module_scope_variable_name = "Hello"


def function_name(argument_name):
    local_variable_name = argument_name

    
class ClassName(ParentName):
    def method_name(self, argument_name):
        self.attribute_name = argument_name

Underscores in names

Underscore in the middle of the name

def underscore_helps_you_to_read_names_composed_from_many_words():
    this_is_a_good_practice = True
    it_is_called_snake_case = True
    comparetothisname = False
    cameCaseIsAnAlternative = True

    
IS_SCREAMING_SNAKE_CASE = True

Specification

Underscore is one of the characters that can be used in names.

Agreement

Underlining makes phrases in names easier to read by providing visual separation between words, for example: сказочное_бали. If you remove the underscore, you can have different readings. This style is called snake_case because it looks like a snake crawling. Style convention for internal libraries in Python (PEP8) recommends using snake case for variable, function, and method names. Snake_case is not the only way to improve the readability of names. In addition, there is CamelCase, which is also used in Python, but for class names.

Underscore at the beginning of names (attributes)

Attributes are elements accessible from the outside. Functions, classes, and variables are module attributes. Class methods and variables are class attributes. Local variables are not attributes because they are not visible from the outside. A simple rule: if you can access it via a dot, it is an attribute. For example math.logwhere the function log – module attribute math.

def _function():
    ...

    
class Foo():
    def _foo(self):
        ...
    def foo():
        self._foo()

Specification

When importing via *, names with underscores are not imported. I will immediately stipulate that imports with an asterisk are not recommended to be used, as they clutter the namespace.

Agreement

Names starting with an underscore are for internal use. Function name _function() tells us that it should not be used outside the module. In the documentation, you can find another synonym “non-public”.

Hiding a piece of data inside is called encapsulation. It's an important concept, but in Python it's supported by conventions. As a beginner developer, I didn't understand why something needed to be hidden, but as the size of projects grew and I gained experience supporting products, I came to understand why encapsulation was needed.

Encapsulation

Public functions are difficult to change

As the Fox said: “People have forgotten this truth, but don't forget: you are forever responsible for the code you publish.” What is available to others will not be used in the way you intended.

To change something public, you have to take into account all the uses. This takes time from developers and users. Let's look at an example from the standard library: a public module asynchat was removed in three steps:

  • Python3.6 released on 23 Dec 2016: The asynchat has been deprecated in favor of asyncio.

  • Python3.10 released on 4 Oct 2021: asynchatasyncore, smtpd These modules have been marked as deprecated in their module documentation since Python 3.6. An import-time DeprecationWarning has now been added to all three of these modules.

  • Python3.12 released on 2 Oct 2023: The asynchatasyncore, and imp modules have been removed

The process took six years and involved a dozen developers. participation.

Public functions are harder to write

A function, whether public or internal, should work and solve the problem.

  • Choosing a function name and arguments: the name should match the context of use. For internal ones, this is the module where it is described. For public ones, the context may be broader.

  • Protection against misuse: public functions will not be used as expected. So, you need to be stricter about checking input data. In internal functions, you control usage and checking input parameters is not necessary. A good pattern for arguments is when you check arguments inside a public function and call an internal one, which will do the work. The logic of the code is consistent, first checks, then actions. You can test both parts separately.

def do_something(**kwargs):
    valid_data = _validate_data(**kwargs)
    _do_something_internal(valid_data)
  • Signature: for public functions you need to be comfortable to use, for example, add default arguments and use abstract types. Internal functions should work. Abstractions are not needed, you know and control all the code that calls your function.

  • Documentation: needed for public functions. If there are strict documentation requirements, then do everything possible internally and do not document. Tools that check for documentation ignore internal functions without documentation.

Review

Internal functions speed up code writing and simplify reviews because they have fewer requirements. For a full review, you need to understand the context of use. Internal functions have minimal context and imperfections are forgivable. For example, for public code def get_satus(nAmber) will be a blocker. A typo in an argument name looks unprofessional, and a typo in a function name will make you suffer. In the internal code _get_satus(nAmber) is not so critical and such code can be fixed later. When the build and testing processes are long, this will allow the code to be sent to the next stage faster.

Using Encapsulation

It is worth starting to use encapsulation from the moment you know that you will not be working with this code alone. The cost of writing a public function is higher, since you need to take into account more context. In terms of time spent, writing a public function right away and writing an internal one, and then converting it to public are absolutely the same. But in the second case, the process of transition to public can happen much later after writing and during this time you will gain much more knowledge about the context. And it may not happen at all. The strategy of starting with internal and making it public when necessary is always a win-win. In the case when your modules begin to be used outside the team, this gain can be significant. If you are not 146% sure that it should be public, make it internal, then change it.

When else do you need internal functions?

Besides the ability to defer decisions until they are really needed, there are two utilitarian scenarios for using internal functions. Code reuse and combating complexity. Code reuse is intuitive for humans and we will not dwell on it in detail. We will focus on complexity and the fact that it is normal to create a function for one use.

# Плохо
def print_stats(base):
response = requests.get(base + "/accounts/")
    response.raise_for_status()
    data = json.loads(response.text[5:])
    for a in data:
        print(f"{a.username}: {a.email}")

        
# Хорошо
def print_stats(base_url + "/accounts/"):
    accounts =_get_accounts()
    _print_account_stats(data)

From a business perspective, the code in the examples above does two things: reads data and outputs the result. In a bad example, we see both business logic and low-level operations in one place. You start reading about downloading and parsing data, and so you move on to iterating over the rows. This code does not speak business language to you. Somewhere in the middle, you will most likely get distracted to see why we are not getting JSON directly from the response. And only after reading the entire code will you be able to build in your head what exactly it does.

In a good example, you can focus separately on the business logic and separately on how it is implemented. The external function allows you to understand the overall structure very quickly. The review can be divided into three stages: logic, receiving data, outputting data. It is convenient to test each function separately. This principle is difficult to explain with small examples. You can simply load the entire code into your head and process it. When you have 1000 lines, it can take days to understand the logic. Here is an example of complex code that mixes business logic (the behavior of a computer player) and technical code (receiving data, sending commands): https://github.com/freeorion/freeorion/blob/3255213f8025002be445b5e29e2572b90353d4e5/default/python/AI/ProductionAI.py#L158C1-L158C48

Support by tools

The tools know and respect the internal attributes and will warn you of errors.

  • Using internal attributes outside the context will generate a warning.

  • Missing documentation for internal methods will not generate a warning.

Underscore at the beginning of names (variables)

class TestConnector:
    def connect(param, _url, _username, _password):
        return

Specification

For Python, this is a normal name, nothing special.

Agreement

An indication that an argument is not used. Typically you want this when you interact with other code.

Class from a third-party library:

class Connector:
    def connect(param, url, usarname, password):
        ...

Stub for tests, signature is the same, but we don't need arguments.

class TestConnector:
    def connect(param, _url, _username, _password):
        return

Tool support

Unused arguments that begin with an underscore will not generate a warning.

The underline is the name

_, second = function_that_return_pair()

Specification

For Python, this is a normal name, nothing special.

Agreement

If a variable is not used, it can be called _. This often happens when you need to unpack a sequence.

pair = 1, 2
_, second = pair

In the Python console (REPL), this variable has a special meaning. It contains the results of the last calculation. If you forgot to assign it, you can do it later.

>>> "com" + "putation"
'computation' 
>>> a = _
>>> print(a)
computation

Underscore at the end

list_ = [1, 2, 3]

Specification

For Python, this is a normal name, nothing special.

Agreement

The underscore at the end helps to use pretty names that are already taken by Python.

def print_(list_):
    print(", ".join(list_))

    
def distance(from_, to):
    ...

Python uses many simple names, such as print, list, file And jsonSome names are reserved for the languageFor example, fromBy adding underscores, you get readable and unique names.

The standard library also uses this approach: operator.and_.

Two underscores at the beginning in methods

Specification

The two underscores at the beginning of class attributes are part of the language specification. Let's first look at the problem they solve.

class Foo:
    def _get(self):
        return self.__class__.__name__
    def get_name(self):
        return self._get()

      
class Boo(Foo):
    def _get(self):
        return datetime.now()
    def get_time(self):
        return self._get()


assert Boo().get_name() == "Boo"  # AssertionError

The code in this example is broken. When calling boo.get_name()the method is expected to be called Foo._get. But it is rewritten in the class Boo and will be called Boo._get.

Every time you change Foo or Boo you may find yourself in this situation. For safety, at least one class should be given some kind of name conflict avoidance. For example, add the class name to internal methods _get -> _Foo__get.

class Foo:
    def _Foo__get(self):
        return self.__class__.__name__
    def get_name(self):
        return self._Foo__get()

      
class Boo(Foo):
    def _get(self):
        return datetime.now()
    def get_time(self):
        return self._get()

      
assert Boo().get_name() == "Boo"  # работает

Now Foo safe, the chances are that someone will decide to name their method _Foo__getclose to zero. However, it would be unpleasant to support such a solution. Therefore, Python provides a simple and convenient solution using two underscores at the beginning _get -> __get.

class Foo:
    def __get(self):
        return self.__class__.__name__
    def get_name(self):
        return self.__get()


class Boo(Foo):
    def __get(self):
        return datetime.now()
    def get_time(self):
        return self._get()


assert Boo().get_name() == "Boo"

Now Boo.__getdoes not overwriteFoo.__getthese are two different methods. To understand how they both get along in __dict__, let's take a look inside it.

>>> boo = Boo()
>>> print(dir(boo))
['_Boo__get', '_Foo__get', ...]

Methods with double underscores exist only in a text file. When Python loads a class, all methods with double underscores will be replaced with methods with the class name. In this case, using __getoutside the class declaration, will not be changed.

Such attributes are called private, but that won't stop you from using them. You can still call them by name.

>>> boo._Boo__get()
datetime.datetime(2024, 3, 31, 9, 42, 47, 141187)
>>> boo._Foo__get()
'Boo'

This approach is called name mangling and is used to avoid name conflicts.

Agreement

This convention is similar to internal attributes starting with a single underscore. Code analysis tools will help you find incorrect usages.

For classes that do not participate in inheritance, it is normal to use a single leading underscore in their names.

Two underscores at the beginning, two at the end

Specification

The two underscores at both ends are part of the specification, and are used to denote methods and attributes with special meaning in Python. They are described in the data model. (Data model).

Let's look at two large groups.

The first group is variables that are set by the interpreter:

"""Sample module."""

if name == "main_":
    print(f"Code from {file}")
    print(f"Docstring {doc}")

The second group is magic or Dander methods (Double UNDeRscore), with their help they create custom classes that are integrated into the Python ecosystem.

class Foo:
    def __init__(self, number: int):
        self._number = number

        
    def __add__(self, other):
        if isinstance(other, Foo):
            return Foo(self._number + other._number)
        return NotImplemented
  
    def __hash__(self):
        return hash(self._number)
      
    def __repr__(self):
        return f"Foo({self._number})"

Let's run some code in the terminal.

>>> foo = Foo(1)

__init__ called when we create an object.

>>> one = Foo(1)
>>> two = one + Foo(1)
>>> print(one, two)
Foo(1) Foo(2)

__add__ Called when you use the + operator.

>>> hash(one)
1

__hash__ describes how your class will behave when a function is calledhash.

__repr__ participates in method calls str And repr.

Built-in functions (len, bool, iter), mathematical operators (+, -)context managers, work with objects that have implemented corresponding dandr methods. Be careful, these methods are not called directly, there is additional logic. For example, bool if it does not find implementation bool will take advantage lenand if this method does not exist, it will return True.

There is one exception when you can and should call such a method in your code: when you create a magic method and call a similar method on the parent.

def __init__(self, my_arg, parent_args):
    super().__init__(parent_args)

You can always find a description of how this or that method works in Data model.

Agreement

Consider these methods as internal. There is no restriction on creating your own methods, but I would recommend avoiding this, it will confuse people, since Python has hundreds of such methods and it will be difficult to distinguish a built-in one from yours.

Conclusion

Let's sum it up.

Knowing how to use underscores in code will help you write better code and communicate your ideas better to others.

Specifications and conventions are well supported in code quality control tools. flake8, pylint And ruffso it won’t be difficult to set up stricter adherence to these rules on the project.

Before using the agreements, be sure to communicate them to all team members.

Specifications

Double underline at both ends — this is Python Data Model:

def __init__(): , __file__

Double underscore at the beginning of methods — avoiding conflicts during inheritance:

class Parent: def __get(): ...

Agreements

In the middle – This snake_register, popular naming style:

i_use_underscore = True

At the end – the ability to use good names:

list_ = ...

All names underscore — shows a deliberate failure to use a variable:

_, last = get_first_and_last_names()

Leading underscore -, methods, classes – encapsulation:

def _foo(): ...

Leading underscores in argument names — intentional failure to use:

def mock_connection(_url, _username, _password): ...

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *