12 examples of how to improve code with @dataclass

Within the course “Python Developer. Basic “ prepared a translation of useful material for you.

We also invite everyone to open webinar on the topic “Three whales: map (), filter () and zip ()”… Can I write code that requires loops but no loops? Can. Could it be faster than if we were using loops in Python? Can. To implement the plan, you need to know the words “callback”, “iterator” and “lambda”. It will be difficult, but interesting. Join us.


We add clustering algorithms using scikit-learn, Keras and others packages to Photonai. Using 12 examples, we will show how @dataclass improves Python code. To do this, we use code from the Photonai package for Machine Learning.

Upgrade to Python 3.7 or later

Decorator @dataclass was added in Python 3.7. You can use Python 3.7 from a Docker image by adding the following commands to the file /.bashrc_profile or /bashrc.txt

devdir="<path-to-projects>/photon/photonai/dockerSeasons/dev/"
testdir="<path-to-projects>/photon/photonai/dockerSeasons/test/"
echo $devdir
echo $testdir
export testdir
export devdir
#
alias updev="(cd $devdir; docker-compose up) &"
alias downdev="(cd $devdir; docker-compose down) &"
alias builddev="(cd $devdir; docker-compose build) &"
#
alias uptest="(cd $testdir; docker-compose up) & "
alias downtest="(cd $testdir; docker-compose down) &"
alias buildtest="cd $testdir; docker-compose build) &"

If you cannot find the file /bashrc.txt create it yourself with touch/bashrc.txt… (in the case of macOS or one of the flavors of Linux or Unix operating systems.)

Note: Do not forget to include as source ˜/.bashrc_profile or ˜/bashrc.txtwhen you’re done editing them.

Here you will find more details on the Docker implementation I am using.

Note: you can add Docker code to your project from a cloned repository on GitHub

Add type hints

Python is a dynamically typed language. Python 3.5 versions have type hints (PEP 484). I emphasize that hints are exactly what they do, since they do not affect how the Python interpreter works. As far as you know, the Python interpreter ignores them altogether.

Type hints (note, not strict type checking) allow you to find bugs, find security holes, and statically check types after the first run and during unit testing.

In Python 3.7, type hints are needed for fields in a class definition when using a decorator @dataclass

I am adding type hints to all the examples given @dataclass… If you want to know more about them, I recommend reading:

  1. https://medium.com/swlh/future-proof-your-python-code-20ef2b75e9f5

  2. https://realpython.com/python-type-checking/

  3. https://docs.python.org/3/library/typing.html

The @dataclass decorator reduces boilerplate

@dataclass was added in Python 3.7. The main driving force was the desire to get rid of the stereotyped state associated with the class definition. def

Classes can exist stateless with only methods, but what’s the point? Classes are needed to encapsulate state (data fields) and methods that work with data fields. If there is no state to be encapsulated, you can convert the methods to functions.

Note: If you don’t use pandas, you can speed up these functions by using quick insert @jit from the package numba

@dataclass decorates the class definition def and automatically generates 5 methods init(), repr(), str, eq(), and hash()

Note: it generates other methods as well, but more on that later.

Note that all of these 5 methods work directly with encapsulating class state. @dataclass practical completely removes the repetitive boilerplate code required to define a base class.

An example of a short class in photon/photonai/base/hyperpipe.pydecorated with @dataclass.

### Example #1

class Data:
    def __init__(self, X=None, y=None, kwargs=None):
        self.X = X
        self.y = y
        self.kwargs = kwargs

Example 1, after decoration =>

from dataclasses import dataclass
from typing import Dict
import numpy as np
@dataclass
class Data:
    X: np.ndarray = None  # The field declaration: X
    y: np.array = None    # The field declaration: y
    kwargs: Dict = None   # The field declaration: kwargs

Note: If the type is not part of the declaration, then the field is ignored. Use type any to substitute a type if it changes or is unknown at runtime.

Was the code generated eq()?

### Example #2

data1 = Data()
data2 = Data()
data1 == data1

Example 2, output =>

True

Yes! What about methods repr() and str?

### Example #3

print(data1)
data1

Example, output =>

Data(X=None, y=None, kwargs=None)
Data(X=None, y=None, kwargs=None)

Yes! And the methods hash() and init?

Example #4

@dataclass(unsafe_hash=True)
class Data:
    X: np.ndarray = None
    y: np.array = None
    kwargs: Dict = None
        
data3 = Data(1,2,3)
{data3:1}

Example 4, output =>

{Data(X=1, y=2, kwargs=3): 1}

Yes!

Note: The generated method init still signature (X, y, kwargs)… Also, note that type hints have been ignored by the Python 3.7 interpreter.

Note: Have init(), repr(), str and eq() default keyword value Truewhile in hash() default False

you can use inspect the same as for any other instance.

### Example #5

from inspect import signature
print(signature(data3.__init__))

Example 5, output =>

(X: numpy.ndarray = None, y: <built-in function array> = None, 
kwargs: Dict = None) -> None

Cool!

Longer example from photon/photonai/base/hyperpipe.py

### Example #6

class CrossValidation:

    def __init__(self, inner_cv, outer_cv,
                 eval_final_performance, test_size,
                 calculate_metrics_per_fold,
                 calculate_metrics_across_folds):
        self.inner_cv = inner_cv
        self.outer_cv = outer_cv
        self.eval_final_performance = eval_final_performance
        self.test_size = test_size
        self.calculate_metrics_per_fold = calculate_metrics_per_fold
        self.calculate_metrics_across_folds =
            calculate_metrics_across_folds

        self.outer_folds = None
        self.inner_folds = dict()Example #6 Output=>

Example 6, after decoration =>

from dataclasses import dataclass
@dataclass
class CrossValidation:
    inner_cv: int
    outer_cv: int
    eval_final_performance: bool = True
    test_size: float = 0.2
    calculate_metrics_per_fold: bool = True
    calculate_metrics_across_folds: bool = False
Note:(Example #6) As any signature, keyword arguments fields with default values must be declared last.
Note:(Example #6)  class CrossValidation: Readability has increased substantially by using @dataclass and type hinting.
### Example #7
cv1 = CrossValidation()

Example 7, output =>

TypeError: __init__() missing 2 required positional arguments: 'inner_cv' and 'outer_cv'
Note:(Example #7) inner_cv and outer_cv are positional arguments. With any signature, you declare a non-default field after a default one. (Hint: If this were allowed, inheritance from a parent class breaks.)((Why? Goggle interview question #666.))
### Example #8
cv1 = CrossValidation(1,2)
cv2 = CrossValidation(1,2)
cv3 = CrossValidation(3,2,test_size=0.5)
print(cv1)
cv3

Example 8, output =>

CrossValidation(inner_cv=1, outer_cv=2, eval_final_performance=True, test_size=0.2, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
CrossValidation(inner_cv=3, outer_cv=2, eval_final_performance=True, test_size=0.5, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
### Example #9
cv1 == cv2

Example 9 output =>

True
### Example #10

cv1 == cv3

Example 10, output =>

False
### Example #11
from inspect import signature
print(signature(cv3.__init__))
cv3

Example 11, output =>

(inner_cv: int, outer_cv: int, eval_final_performance: bool = True, test_size: float = 0.2, calculate_metrics_per_fold: bool = True, calculate_metrics_across_folds: bool = False) -> None
CrossValidation(inner_cv=3, outer_cv=2, eval_final_performance=True, test_size=0.5, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
Note: (Example #11) The inspect function shows the signature of the class object while the__str__ default shows the instance state variables and their values.

Very cool!

Oops, what about:

self.outer_folds = None
self.inner_folds = dict()

We have state variables, but they are not created when called. Do not worry, @dataclass will cope with this. I will show it in the next section.

Post-initialization processing

There is a method like post-initwhich is part of the definition @dataclass… Method post_init performed after initgenerated @dataclass… It enables post-state processing of the signature.

We complete the transformation by setting the remaining state CrossValidation:

### Example 12
from dataclasses import dataclass
@dataclass
class CrossValidation:
    inner_cv: int
    outer_cv: int
    eval_final_performance: bool = True
    test_size: float = 0.2
    calculate_metrics_per_fold: bool = True
    calculate_metrics_across_folds: bool = False
    def __post_init__(self):
        self.outer_folds = None
        self.inner_folds = dict()

Sources

You will find great use cases for the decorator here @dataclass:

  1. https://realpython.com/python-data-classes/

  2. https://blog.usejournal.com/new-buzzword-in-python-is-here-dataclasses-843dd1d372a5

Conclusion

Using 12 before and after examples, I showed how @dataclass converts classes in Photonai Machine Learning package. We saw how @dataclass improved performance and code readability.

Improving readability makes it easier for everyone to understand the code in production. You understand the results better, you test better, make fewer mistakes and spend less on maintenance.

Adding @dataclass and type hints demonstrates that Python continues to grow and evolve.

Note: You can add updated Photonai code to your project from cloned repository on GitHub.

I have not shown all the possibilities @dataclass… Since we are only adding clustering, I will continue to document the changes in photonai.


Learn more about the course “Python Developer. Basic “.

Watch an open webinar on the topic “Three whales: map (), filter () and zip ()”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *