12 examples of how to improve code with @dataclass
Within the course “Python Developer. Basic “ prepared a translation of useful material for you.
We also invite everyone to open webinar on the topic “Three whales: map (), filter () and zip ()”… Can I write code that requires loops but no loops? Can. Could it be faster than if we were using loops in Python? Can. To implement the plan, you need to know the words “callback”, “iterator” and “lambda”. It will be difficult, but interesting. Join us.
We add clustering algorithms using scikit-learn, Keras and others packages to Photonai. Using 12 examples, we will show how @dataclass
improves Python code. To do this, we use code from the Photonai package for Machine Learning.
Upgrade to Python 3.7 or later
Decorator @dataclass
was added in Python 3.7. You can use Python 3.7 from a Docker image by adding the following commands to the file /.bashrc_profile
or /bashrc.txt
…
devdir="<path-to-projects>/photon/photonai/dockerSeasons/dev/"
testdir="<path-to-projects>/photon/photonai/dockerSeasons/test/"
echo $devdir
echo $testdir
export testdir
export devdir
#
alias updev="(cd $devdir; docker-compose up) &"
alias downdev="(cd $devdir; docker-compose down) &"
alias builddev="(cd $devdir; docker-compose build) &"
#
alias uptest="(cd $testdir; docker-compose up) & "
alias downtest="(cd $testdir; docker-compose down) &"
alias buildtest="cd $testdir; docker-compose build) &"
If you cannot find the file /bashrc.txt
create it yourself with touch/bashrc.txt
… (in the case of macOS or one of the flavors of Linux or Unix operating systems.)
Note: Do not forget to include as source ˜/.bashrc_profile
or ˜/bashrc.txt
when you’re done editing them.
Here you will find more details on the Docker implementation I am using.
Note: you can add Docker code to your project from a cloned repository on GitHub…
Add type hints
Python is a dynamically typed language. Python 3.5 versions have type hints (PEP 484). I emphasize that hints are exactly what they do, since they do not affect how the Python interpreter works. As far as you know, the Python interpreter ignores them altogether.
Type hints (note, not strict type checking) allow you to find bugs, find security holes, and statically check types after the first run and during unit testing.
In Python 3.7, type hints are needed for fields in a class definition when using a decorator @dataclass
…
I am adding type hints to all the examples given @dataclass
… If you want to know more about them, I recommend reading:
The @dataclass decorator reduces boilerplate
@dataclass
was added in Python 3.7. The main driving force was the desire to get rid of the stereotyped state associated with the class definition. def
…
Classes can exist stateless with only methods, but what’s the point? Classes are needed to encapsulate state (data fields) and methods that work with data fields. If there is no state to be encapsulated, you can convert the methods to functions.
Note: If you don’t use pandas, you can speed up these functions by using quick insert @jit
from the package numba…
@dataclass
decorates the class definition def
and automatically generates 5 methods init()
, repr()
, str
, eq()
, and hash()
…
Note: it generates other methods as well, but more on that later.
Note that all of these 5 methods work directly with encapsulating class state. @dataclass
practical completely removes the repetitive boilerplate code required to define a base class.
An example of a short class in photon/photonai/base/hyperpipe.py
decorated with @dataclass.
### Example #1
class Data:
def __init__(self, X=None, y=None, kwargs=None):
self.X = X
self.y = y
self.kwargs = kwargs
Example 1, after decoration =>
from dataclasses import dataclass
from typing import Dict
import numpy as np
@dataclass
class Data:
X: np.ndarray = None # The field declaration: X
y: np.array = None # The field declaration: y
kwargs: Dict = None # The field declaration: kwargs
Note: If the type is not part of the declaration, then the field is ignored. Use type any
to substitute a type if it changes or is unknown at runtime.
Was the code generated eq()
?
### Example #2
data1 = Data()
data2 = Data()
data1 == data1
Example 2, output =>
True
Yes! What about methods repr()
and str
?
### Example #3
print(data1)
data1
Example, output =>
Data(X=None, y=None, kwargs=None)
Data(X=None, y=None, kwargs=None)
Yes! And the methods hash()
and init
?
Example #4
@dataclass(unsafe_hash=True)
class Data:
X: np.ndarray = None
y: np.array = None
kwargs: Dict = None
data3 = Data(1,2,3)
{data3:1}
Example 4, output =>
{Data(X=1, y=2, kwargs=3): 1}
Yes!
Note: The generated method init
still signature (X, y, kwargs)… Also, note that type hints have been ignored by the Python 3.7 interpreter.
Note: Have init()
, repr()
, str
and eq()
default keyword value Truewhile in hash()
default False…
you can use inspect
the same as for any other instance.
### Example #5
from inspect import signature
print(signature(data3.__init__))
Example 5, output =>
(X: numpy.ndarray = None, y: <built-in function array> = None,
kwargs: Dict = None) -> None
Cool!
Longer example from photon/photonai/base/hyperpipe.py
### Example #6
class CrossValidation:
def __init__(self, inner_cv, outer_cv,
eval_final_performance, test_size,
calculate_metrics_per_fold,
calculate_metrics_across_folds):
self.inner_cv = inner_cv
self.outer_cv = outer_cv
self.eval_final_performance = eval_final_performance
self.test_size = test_size
self.calculate_metrics_per_fold = calculate_metrics_per_fold
self.calculate_metrics_across_folds =
calculate_metrics_across_folds
self.outer_folds = None
self.inner_folds = dict()Example #6 Output=>
Example 6, after decoration =>
from dataclasses import dataclass
@dataclass
class CrossValidation:
inner_cv: int
outer_cv: int
eval_final_performance: bool = True
test_size: float = 0.2
calculate_metrics_per_fold: bool = True
calculate_metrics_across_folds: bool = False
Note:(Example #6) As any signature, keyword arguments fields with default values must be declared last.
Note:(Example #6) class CrossValidation: Readability has increased substantially by using @dataclass and type hinting.
### Example #7
cv1 = CrossValidation()
Example 7, output =>
TypeError: __init__() missing 2 required positional arguments: 'inner_cv' and 'outer_cv'
Note:(Example #7) inner_cv and outer_cv are positional arguments. With any signature, you declare a non-default field after a default one. (Hint: If this were allowed, inheritance from a parent class breaks.)((Why? Goggle interview question #666.))
### Example #8
cv1 = CrossValidation(1,2)
cv2 = CrossValidation(1,2)
cv3 = CrossValidation(3,2,test_size=0.5)
print(cv1)
cv3
Example 8, output =>
CrossValidation(inner_cv=1, outer_cv=2, eval_final_performance=True, test_size=0.2, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
CrossValidation(inner_cv=3, outer_cv=2, eval_final_performance=True, test_size=0.5, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
### Example #9
cv1 == cv2
Example 9 output =>
True
### Example #10
cv1 == cv3
Example 10, output =>
False
### Example #11
from inspect import signature
print(signature(cv3.__init__))
cv3
Example 11, output =>
(inner_cv: int, outer_cv: int, eval_final_performance: bool = True, test_size: float = 0.2, calculate_metrics_per_fold: bool = True, calculate_metrics_across_folds: bool = False) -> None
CrossValidation(inner_cv=3, outer_cv=2, eval_final_performance=True, test_size=0.5, calculate_metrics_per_fold=True, calculate_metrics_across_folds=False)
Note: (Example #11) The inspect function shows the signature of the class object while the__str__ default shows the instance state variables and their values.
Very cool!
Oops, what about:
self.outer_folds = None
self.inner_folds = dict()
We have state variables, but they are not created when called. Do not worry, @dataclass
will cope with this. I will show it in the next section.
Post-initialization processing
There is a method like post-init
which is part of the definition @dataclass
… Method post_init
performed after init
generated @dataclass
… It enables post-state processing of the signature.
We complete the transformation by setting the remaining state CrossValidation
:
### Example 12
from dataclasses import dataclass
@dataclass
class CrossValidation:
inner_cv: int
outer_cv: int
eval_final_performance: bool = True
test_size: float = 0.2
calculate_metrics_per_fold: bool = True
calculate_metrics_across_folds: bool = False
def __post_init__(self):
self.outer_folds = None
self.inner_folds = dict()
Sources
You will find great use cases for the decorator here @dataclass
:
Conclusion
Using 12 before and after examples, I showed how @dataclass
converts classes in Photonai Machine Learning package. We saw how @dataclass
improved performance and code readability.
Improving readability makes it easier for everyone to understand the code in production. You understand the results better, you test better, make fewer mistakes and spend less on maintenance.
Adding @dataclass
and type hints demonstrates that Python continues to grow and evolve.
Note: You can add updated Photonai code to your project from cloned repository on GitHub.
I have not shown all the possibilities @dataclass
… Since we are only adding clustering, I will continue to document the changes in photonai.
Learn more about the course “Python Developer. Basic “.
Watch an open webinar on the topic “Three whales: map (), filter () and zip ()”.