Fast ENUM

tl; dr

github.com/QratorLabs/fastenum

pip install fast-enum

Why enumeration is needed

(if you know everything – go down to the section "Enumerations in the standard library")

Imagine that you need to describe a set of all possible states of entities in your own database model. Most likely, you will take a bunch of constants defined directly in the module namespace:

# /path/to/package/static.py:
INITIAL = 0
PROCESSING = 1
PROCESSED = 2
DECLINED = 3
RETURNED = 4
...

… or as static class attributes:

class MyModelStates:
  INITIAL = 0
  PROCESSING = 1
  PROCESSED = 2
  DECLINED = 3
  RETURNED = 4

Such an approach will help to refer to these states by mnemonic names, while in your repository they will be ordinary integers. Thus, you simultaneously get rid of magic numbers scattered in different parts of the code, at the same time making it more readable and informative.

However, both the module constant and the class with static attributes suffer from the intrinsic nature of Python objects: they are all mutable (mutable). You can accidentally assign a value to your constant at run time, and debugging and rolling back broken objects is a separate adventure. So you may want to make the bundle of constants unchanged in the sense that the number of declared constants and their values ​​to which they are mapped will not change during program execution.

To do this, you can try organizing them into named tuples with namedtuple ()as in the example:

MyModelStates = namedtuple ('MyModelStates', ('INITIAL', 'PROCESSING', 'PROCESSED', 'DECLINED', 'RETURNED'))
EntityStates = MyModelStates (0, 1, 2, 3, 4)

But it doesn’t look very neat and readable, but objects namedtuple, in turn, are not very extensible. Suppose you have a UI that displays all of these states. You can use your constants in modules, a class with attributes, or named tuples to render them (the last two are easier to render, since we are talking about this). But such a code does not make it possible to provide the user with an adequate description for each state you define. In addition, if you plan to implement multilingualism and i18n support in your UI, you will realize how quickly completing all the translations for these descriptions becomes an incredibly tedious task. Matching state names will not necessarily mean matching the description, which means that you cannot just display all your INITIAL state to the same description in gettext. Instead, your constant takes the following form:

INITIAL = (0, 'My_MODEL_INITIAL_STATE')

Or your class becomes like this:

class MyModelStates:
  INITIAL = (0, 'MY_MODEL_INITIAL_STATE')

Finally, the named tuple turns into:

EntityStates = MyModelStates ((0, 'MY_MODEL_INITIAL_STATE'), ...)

Already not bad – now it guarantees that both the status value and the translation stub are displayed in languages ​​supported by the UI. But you may notice that the code using these mappings has become a mess. Each time, trying to assign an entity value, you have to extract the value with index 0 from the display you are using:

my_entity.state = INITIAL[0]

or

my_entity.state = MyModelStates.INITIAL[0]

or

my_entity.state = EntityStates.INITIAL[0]

And so on. Remember that the first two approaches that use constants and class attributes, respectively, suffer from mutability.

And here the transfers come to our aid

class MyEntityStates (Enum):
  def __init __ (self, val, description):
    self.val = val
    self.description = description

  INITIAL = (0, 'MY_MODEL_INITIAL_STATE')
  PROCESSING = (1, 'MY_MODEL_BEING_PROCESSED_STATE')
  PROCESSED = (2, 'MY_MODEL_PROCESSED_STATE')
  DECLINED = (3, 'MY_MODEL_DECLINED_STATE')
  RETURNED = (4, 'MY_MODEL_RETURNED_STATE')

That's all. Now you can easily iterate over the listing in your render (Jinja2 syntax):

{% for state in MyEntityState%}
  
{% endfor%}

An enumeration is immutable both for a set of elements – you cannot define a new member of an enumeration at runtime and you cannot delete an already defined member, and for those values ​​of elements that it stores – you cannot [пере]Assign any attribute values ​​or delete an attribute.

In your code, you simply assign values ​​to your entities, like this:

my_entity.state = MyEntityStates.INITIAL.val

Everything is quite clear, informative and extensible. This is what we use enumerations for.

How could we make it faster?

Enumeration from the standard library is rather slow, so we asked ourselves – can we speed it up? As it turned out, we can, namely, the implementation of our enumeration:

  • Three times faster on access to member enumeration;
  • ~ 8.5 faster when accessing an attribute (name, value) member;
  • 3 times faster when accessing a member by value (call enumeration constructor MyEnum (value));
  • 1.5 times faster when accessing a member by name (as in the dictionary Myenum[name])

Types and objects in Python are dynamic. But there are tools to limit such a dynamic nature of objects. You can get significant performance improvements with __slots__. There is also potential for speed gains if you avoid using data descriptors where possible – but you must consider the possibility of a significant increase in application complexity.

Slots

For example, you can use a class declaration with __slots__ – in this case, all instances of classes will have only a limited set of properties declared in __slots__ and all __slots__ parent classes.

Descriptors

By default, the Python interpreter returns the value of the attribute of the object directly (we note that in this case the value is also a Python object, and not, for example, unsigned long long in terms of the C language):
value = my_obj.attribute # is direct access to the attribute value by the pointer that the object stores for this attribute.

According to the Python data model, if the attribute value is an object that implements the descriptor protocol, then when trying to get the value of this attribute, the interpreter will first find a link to the object that the property refers to, and then call a special method on it __get__, which will pass our source object as an argument:

obj_attribute = my_obj.attribute
obj_attribute_value = obj_attribute .__ get __ (my_obj)

Enumerations in the Standard Library

At least properties name and value members of the standard implementation of transfers are declared as types.DynamicClassAttribute. This means that when you try to get the values name and value the following will happen:

one_value = StdEnum.ONE.value # this is what you write in code

# and this is what will happen briefly in reality
one_value_attribute = StdEnum.ONE.value
one_value = one_value_attribute .__ get __ (StdEnum.ONE)

# and this is what __get__ actually does (in python3.7 implementation):
   def __get __ (self, instance, ownerclass = None):
        if instance is None:
            if self .__ isabstractmethod__:
                return self
            raise AttributeError ()
        elif self.fget is None:
            raise AttributeError ("unreadable attribute")
        return self.fget (instance)

# since DynamicClassAttribute is a decorator over the `name` and` value` methods, the __get __ () call stack ends with:

    @DynamicClassAttribute
    def name (self):
        "" "The name of the Enum member." ""
        return self._name_

    @DynamicClassAttribute
    def value (self):
        "" "The value of the Enum member." ""
        return self._value_

Thus, the entire sequence of calls can be represented by the following pseudo-code:

def get_func (enum_member, attrname):
        # there is actually a search in __dict__, so hash calculations and searches in the hash table also have a place to be
    return getattr (enum_member, f '_ {attrnme} _')

def get_name_value (enum_member):
    name_descriptor = get_descriptor (enum_member, 'name')
    if enum_member is None:
        if name_descriptor .__ isabstractmethod__:
            return name_descriptor
        raise AttributeError ()
    elif name_descriptor.fget is None:
        raise AttributeError ("unreadable attribute")

    return get_func (enum_member, 'name')

We wrote a simple script demonstrating the output described above:

from enum import Enum

class StdEnum (Enum):
   def __init __ (self, value, description):
       self.v = value
       self.description = description

   A = 1, 'One'
   B = 2, 'Two'

def get_name ():
   return StdEnum.A.name

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

graphviz = GraphvizOutput (output_file = 'stdenum.png')

with PyCallGraph (output = graphviz):
   v = get_name ()

And after execution, the script gave us the following picture:

This shows that every time you access attributes name and value members of enumerations from the standard library, the handle is called. This descriptor, in turn, ends with a call from the class Enum from the standard method library def name (self)decorated with a descriptor.

Compare with our FastEnum:

from fast_enum import FastEnum

class MyNewEnum (metaclass = FastEnum):
   A = 1
   B = 2

def get_name ():
   return MyNewEnum.A.name

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

graphviz = GraphvizOutput (output_file = 'fastenum.png')

with PyCallGraph (output = graphviz):
   v = get_name ()

What can be seen in the following image:

All this really happens inside the standard enumeration implementation every time you access properties name and value their members. This is also the reason why our implementation is faster.

The implementation of enumerations in the Python standard library uses a lot of calls to objects that implement the data descriptor protocol. When we tried to use the standard implementation of enumerations in our projects, we immediately noticed how many data descriptors were called from name and value.
And since the enumerations were used quite extensively throughout the code, the resulting performance was low.

In addition, the standard Enum class contains several auxiliary “protected” attributes:

  • _member_names_ – a list containing all the names of the members of the enumeration;
  • _member_map_OrderedDict, which maps the name of an enumeration member to its value;
  • _value2member_map_ – a dictionary containing a matching in the opposite direction: the values ​​of the transfer members to the corresponding transfer members.

The dictionary search is slow, since each call leads to the calculation of the hash function (unless, of course, the result is cached separately, which is not always possible for unmanaged code) and the search in the hash table, which makes these dictionaries not an optimal basis for enumerations. Even the search for members of the transfers (as in StdEnum.MEMBER) is a dictionary search.

Our approach

We created our implementation of enumerations with an eye to elegant enumerations in C and beautiful extensible enumerations in Java. The main functions that we wanted to implement at home were as follows:

  • enumeration should be as static as possible; “static” here means the following – if something can be calculated only once and during the announcement, then it should be calculated at this (and only at this) moment;
  • it is impossible to inherit from an enumeration (it must be a "final" class) if the inheriting class defines new members of the enumeration – this is true for implementation in the standard library, except that inheritance is prohibited there, even if the inheriting class does not define new members;
  • the enumeration should have ample scope for expansion (additional attributes, methods, etc.)

We use dictionary search in the only case – this is the inverse mapping of the value value per member transfer. All other calculations are performed only once during class declaration (where metaclasses are used to configure type creation).
Unlike the standard library, we process only the first value after the sign = in the class declaration as a member value:
A = 1, 'One' in the standard library the whole tuple 1, "One" regarded as meaning value;
A: 'MyEnum' = 1, 'One' in our implementation only one regarded as meaning value.

Further acceleration is achieved through the use of __slots__ where possible. In Python classes declared using __slots__ no attribute is created on instances __dict__, which contains the mapping of attribute names to their values ​​(therefore, you cannot declare any property of an instance that is not mentioned in __slots__) In addition, access to attribute values ​​defined in __slots__, is carried out by a constant offset in the pointer to the instance of the object. This is high-speed access to properties because it avoids hash calculations and hash table scans.

What are the extra chips?

FastEnum is not compatible with any version of Python prior to 3.6 because it universally uses type annotations implemented in Python 3.6. It can be assumed that the installation of the module typing from PyPi will help. The short answer is no. The implementation uses PEP-484 for the arguments of some functions, methods, and pointers to the return type, so any version prior to Python 3.5 is not supported due to syntax incompatibility. But, again, the very first line of code in __new__ The metaclass uses the PEP-526 syntax to indicate the type of variable. So Python 3.5 won't work either. You can port the implementation to older versions, although we at Qrator Labs tend to use type annotations whenever possible, as this helps a lot in developing complex projects. Well, in the end! You don’t want to get stuck in Python prior to version 3.6, since in newer versions there is no backward incompatibility with your existing code (provided that you are not using Python 2), but in the implementation asyncio a lot of work has been done compared to 3.5, which, in our opinion, is worth an immediate update.

This in turn makes special imports unnecessary. auto, unlike the standard library. You simply indicate that the member of the enumeration will be an instance of this enumeration without providing a value at all – and the value will be generated automatically for you. Although Python 3.6 is sufficient for working with FastEnum, keep in mind that preserving the order of keys in dictionaries was introduced only in Python 3.7 (and we did not use it separately for case 3.6 OrderedDict) We don’t know of any examples where the automatically generated order of values ​​is important, since we assume that if the developer provided the environment with the task of generating and assigning a value to an enumeration member, then the value itself is not so important to it. However, if you still haven't switched to Python 3.7, we warned you.

Those who need their enumerations to start from 0 (zero) instead of the default value (1) can do this with a special attribute when declaring an enumeration _ZERO_VALUEDwhich will not be saved in the resulting class.

However, there are some restrictions: all the names of the members of the enumeration must be written in CAPITAL letters, otherwise they will not be processed by the metaclass.

Finally, you can declare a base class for your enumerations (keep in mind that the base class can use the metaclass itself, so you do not need to provide the metaclass to all subclasses) – just define the general logic (attributes and methods) in this class and not define the members of the enumeration (so the class will not be "finalized"). After you can declare as many inheriting classes of this class as you want, and the heirs themselves will have the same logic.

Aliases and how they can help

Suppose you have code using:

package_a.some_lib_enum.MyEnum

And that the MyEnum class is declared as follows:

class MyEnum (metaclass = FastEnum):
  ONE: 'MyEnum'
  TWO: 'MyEnum'

Now, you have decided that you want to do some refactoring and transfer the listing to another package. You create something like this:

package_b.some_lib_enum.MyMovedEnum

Where MyMovedEnum is declared like this:

class MyMovedEnum (MyEnum):
  pass

Now you are ready for the stage at which the transfer located at the old address is considered obsolete. You rewrite the imports and calls of this enum so that the new name of this enum (its alias) is now used – you can be sure that all members of this alias enum are actually declared in the class with the old name. In your project documentation, you declare that Myenum deprecated and will be removed from the code in the future. For example, in the next release. Suppose your code stores your objects with attributes containing enumeration members using pickle. At this point you use MyMovedEnum in its code, but internally, all enumeration members are still instances Myenum. Your next step is to swap ads Myenum and MyMovedEnumto MyMovedEnum was not a subclass Myenum and declared all his members himself; Myenum, on the other hand, now does not declare any members, but becomes just an alias (subclass) MyMovedEnum.

That's all. When you restart your applications at the stage unpickle all enumeration members will be redeclared as instances MyMovedEnum and become associated with this new class. The moment you are sure that all your objects stored, for example, in the database, have been re-deserialized (and possibly serialized again and stored in the repository) – you can release a new release, in which it was previously marked as an obsolete class Myenum may be declared more unnecessary and removed from the code base.

Try it for yourself: github.com/QratorLabs/fastenum, pypi.org/project/fast-enum.
Pros in karma go to the author FastEnum – santjagocorkez.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *