Understanding Attribute Access in Python

The translation of the article is published specifically for future students of the course Python Developer. Professional “


I wonder how many people realize that there are many syntactic sugar? I am not saying that it is similar to Lisp-like languages, where the syntax is as naked as possible (although the comparison with Lisp not entirely justified), but most of the Python syntax is technically unnecessary since there are mostly function calls under the hood.

Well, so what? Why think about how Python makes more function calls in less syntax? There are actually two reasons for this. First, it’s helpful to know how Python actually works in order to better understand / debug your code when things go wrong. Secondly, this way you can identify the minimum required to implement the language.

That is why, in order to do self-education and at the same time think about what might be needed to implement Python for WebAssembly or the bare bones API in C, I decided to write this article about what access to attributes looks like and what is hidden behind the syntax.

We can now try to piece together everything related to attribute access by reading Python reference… So you can come up with expressions attribute references and data models for customization accessing attributeshowever, it is still important to tie everything together to understand how access works. So I prefer to go from source to CPython and figure out what’s going on in the interpreter (I am specifically using the repository tag CPython 3.8.3as I have stable links and I am using the latest version at the time of writing).

At the beginning of this article, you will see some C code, but I do not expect you to thoroughly understand what is happening there. I will write about what will need to be understood from it, so if you do not have the slightest knowledge in C, then it’s okay, you will still understand everything that I am talking about.

We look at the bytecode

So let’s deal with the following expression:

obj.attr

Probably the easiest starting point to learn is bytecode. Let’s look at this line and see what the compiler does:

>>> def example(): 
...     obj.attr
... 
>>> import dis
>>> dis.dis(example)
  2           0 LOAD_GLOBAL              0 (obj)
              2 LOAD_ATTR                1 (attr)
              4 POP_TOP
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE

The most important opcode here is LOADATTR… If interested, it replaces the object at the top of the stack with the result of accessing the named attribute as specified in conames[i]

The CPython interpreter loop lies in Python / ceval.c. It is based on the massive operator switchwhich branches depending on the opcode being executed. If you look deeper, you will find the following lines in C for LOADATTR:

case TARGET(LOAD_ATTR): {
            PyObject *name = GETITEM(names, oparg);
            PyObject *owner = TOP();
            PyObject *res = PyObject_GetAttr(owner, name);
            Py_DECREF(owner);
            SET_TOP(res);
            if (res == NULL)
                goto error;
            DISPATCH();
        }

Source

Most of this code is just working with the stack, we can omit it. The key bit is the challenge PyObject_GetAttr ()which provides access to the attributes.

The name of this function looks familiar

Now this name looks just like getattr (), only in the C function naming convention used by CPython. Digging into Python / bltinmodule.cwhere all Python built-in modules are located, we can check if our guess is correct. Searching for «getattr» in the file, you will find the line that binds the name getattr with function builtin_getattr ()

static PyObject *
builtin_getattr(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
    PyObject *v, *name, *result;


    if (!_PyArg_CheckPositional("getattr", nargs, 2, 3))
        return NULL;


    v = args[0];
    name = args[1];
    if (!PyUnicode_Check(name)) {
        PyErr_SetString(PyExc_TypeError,
                        "getattr(): attribute name must be string");
        return NULL;
    }
    if (nargs > 2) {
        if (_PyObject_LookupAttr(v, name, &result) == 0) {
            PyObject *dflt = args[2];
            Py_INCREF(dflt);
            return dflt;
        }
    }
    else {
        result = PyObject_GetAttr(v, name);
    }
    return result;
}

Source

There are a bunch of things that are about parameters that we are not interested in, however, you will definitely notice that when you pass two arguments to getattr(), will be called PyObjectGetAttr ()

What does this mean? Well, that means you can “unpack” obj.attr as getattr(obj, "attr")… It also means that if we understand how it works PyObjectGetAttr(), then we will understand how this function works and, therefore, how access to attributes in Python is organized.

Understanding getattr ()

At this point, I will stop inserting the C code, since its complexity only grows, and it no longer demonstrates so well that obj.attr this is a spelling option getattr(obj, "attr"). However, in the comments of the pseudocode, I will continue to refer to it for those who decided to dive deep into CPython. Also note that Python code should be thought of as pseudocode because the code that accesses attributes itself has access to them, but at the C level it does not go through the normal attribute access mechanism. So while you meet the symbol «.»which is used as syntax in pseudocode, be aware that at the C level, attribute access is not recursive and in fact functions as you would naively assume on your own.

What we already know

At the moment about getattr() we know three things. First, this feature requires at least two attributes. Second, the second argument must be subclassed str, otherwise TypeError with a static string argument (which is probably static for performance reasons).

def getattr(obj: Any, attr: str, default: Any) -> Any:
    if not isinstance(attr, str):
        raise TypeError("getattr(): attribute name must be string")

    ...  # Fill in with PyObject_GetAttr().

Writing a function for getattr ()

Finding Attributes Using Special Methods

Object attributes are accessed using two special methods. The first method is getattribute()which is called when trying to access all the attributes. The second is getattr()which calls AttributeError… The first method (as of today) should always be defined, while the second method is optional.

Python looks for special methods on the type of the object, not on the object itself. To be clear, I’ll say that I’m using the word “type” very specifically here: the type of an instance is its class, the type of a class is its type. Fortunately, it is very easy to get the type of something thanks to the constructor typereturning an object type: type(obj)

We also need to know method resolution order (method resolution order, MRO). This will define the order of the type hierarchy for the object. The algorithm used in Python comes from the language Dylan and is called C3. From Python code, the MRO is expanded with type(obj).mro()

The processing of the object type is done on purpose, as it speeds up search and access. In general, this eliminates additional searches, skipping an instance every time we search for something. At the CPython level, this allows you to create special methods that are in the field struct to speed up the search. So while it seems a bit odd to ignore an object and use a type instead, it makes some sense.

Now, in the name of simplicity, I will cheat a little and make getattr() process methods getattribute() and getattr() explicitly, while CPython does some manipulation under the hood to make the object handle both methods on its own. Ultimately, the semantics of our goals are the same.

# Based on https://github.com/python/cpython/tree/v3.8.3.
from __future__ import annotations
import builtins

NOTHING = builtins.object()  # C: NULL


def getattr(obj: Any, attr: str, default: Any = NOTHING) -> Any:
    """Implement attribute access via  __getattribute__ and __getattr__."""
    # Python/bltinmodule.c:builtin_getattr
    if not isinstance(attr, str):
        raise TypeError("getattr(): attribute name must be string")

    obj_type_mro = type(obj).mro()
    attr_exc = NOTHING
    for base in obj_type_mro:
        if "__getattribute__" in base.__dict__:
            try:
                return base.__dict__["__getattribute__"](obj, attr)
            except AttributeError as exc:
                attr_exc = exc
                break
    # Objects/typeobject.c:slot_tp_getattr_hook
    # It is cheating to do this here as CPython actually rebinds the tp_getattro
    # slot with a wrapper that handles __getattr__() when present.
    for base in obj_type_mro:
        if "__getattr__" in base.__dict__:
            return base.__dict__["__getattr__"](obj, attr)

    if default is not NOTHING:
        return default
    elif attr_exc is not NOTHING:
        raise attr_exc
    else:
        raise AttributeError(f"{self.__name__!r} object has no attribute {attr!r}")

Pseudocode that implements getattr()

Understanding object.getattribute ()

Even though we can get the implementation getattr(), it, unfortunately, does not tell us much about how Python works and how to look up attributes, since a very large part is processed in the method getattribute() object. Therefore, I will tell you how it works object.getattribute()

Finding a data descriptor

The first important thing we are going to do in object.getattribute() Is a search descriptor data for the type. If you’ve never heard of descriptors, this is a way to programmatically control how a particular attribute works. You may have never heard of them at all, but if you’ve been using Python for a while, I suspect you’ve already used descriptors: properties, classmethod and staticmethod Are all descriptors.

There are two types of descriptors: data descriptors and non-data descriptors. Both types of descriptors define a method get to get what the attribute should be. Data descriptors also define methods set and del, while dataless descriptors do not. A property is a data descriptor, classmethod and staticmethod Are descriptors without data.

If we cannot find a data descriptor for an attribute in a type, the next place we look is the object itself. Everything turns out to be simple thanks to objects that have an attribute dictwhich stores the attributes of the object itself in a dictionary.

If the object itself has no attribute, then we will see if there is a descriptor there without data. Since we have already searched for a descriptor earlier, we can assume that if it was found but not yet used when we looked for a data descriptor, then it is a descriptor without data.

Finally, we found a type attribute and it was not a descriptor, now we return it. As a result, the order of searching for attributes looks like this:

  • The data descriptor is searched by type;

  • Search by object;

  • A descriptor without data is searched by type;

  • Anything is searched by type.

You will notice that first we are looking for some kind of descriptor, then, if we failed, we are looking for a regular object that matches the kind of descriptor we were looking for. First we look for data, then something else. It all makes sense if you think about how the method self.attr = val в init() stores data about an object. Chances are, if you run into this, then you want it to stand before a method or something similar. And you need the descriptors first, because if you defined an attribute programmatically, you probably would like it to always be used.

def _mro_getattr(type_: Type, attr: str) -> Any:
    """Get an attribute from a type based on its MRO."""
    for base in type_.mro():
        if attr in base.__dict__:
            return base.__dict__[attr]
    else:
        raise AttributeError(f"{type_.__name__!r} object has no attribute {attr!r}")


class object:
    def __getattribute__(self, attr: str, /) -> Any:
        """Attribute access."""
        # Objects/object.c:PyObject_GenericGetAttr
        self_type = type(self)
        if not isinstance(attr, str):
            raise TypeError(
                f"attribute name must be string, not {type(attr).__name__!r}"
            )

        type_attr = descriptor_type_get = NOTHING
        try:
            type_attr = _mro_getattr(self_type, attr)
        except AttributeError:
            pass  # Hopefully an instance attribute.
        else:
            type_attr_type = type(type_attr)
            try:
                descriptor_type_get = _mro_getattr(type_attr_type, "__get__")
            except AttributeError:
                pass  # At least a class attribute.
            else:
                # At least a non-data descriptor.
                for base in type_attr_type.mro():
                    if "__set__" in base.__dict__ or "__delete__" in base.__dict__:
                        # Data descriptor.
                        return descriptor_type_get(type_attr, self, self_type)

        if attr in self.__dict__:
            # Instance attribute.
            return self.__dict__[attr]
        elif descriptor_type_get is not NOTHING:
            # Non-data descriptor.
            return descriptor_type_get(type_attr, self, self_type)
        elif type_attr is not NOTHING:
            # Class attribute.
            return type_attr
        else:
            raise AttributeError(f"{self.__name__!r} object has no attribute {attr!r}")

Implementation object.getattribute()

Conclusion

As you can see, a lot of interesting things happen when you look up attributes in Python. Even though I would say that none of the parts are conceptually complex, in total we end up with a lot of operations. This is why some programmers try to minimize attribute access in Python to avoid this whole mechanism when it comes to the importance of performance.

Historically, almost all of this semantics came to Python as part of classes. new stylerather than “classic” ones. This distinction disappeared in Python 3 when classic classes were a thing of the past, so if you haven’t heard of them, then that’s a good thing, I guess.

Other articles in this series can be found under the tag “syntactic sugar»On this blog. You will find the code from this article here.


Learn more about the Python Developer. Professional course.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *