building scalable and clean applications in python (Tutorial)
When it comes to building scalable and maintainable applications, understanding important concepts such as clean code principles, architectural patterns, and SOLID design practices is critical. By learning these principles, beginners will gain an understanding of how to build robust, flexible, and easily testable applications, allowing them to keep their code base clear and maintainable as their projects grow.
A bit of clean code theory
Before diving into the architecture, I'd like to answer a few frequently asked questions:
What are the benefits of specifying types in python?
What are the reasons for dividing an application into layers?
What are the benefits of using OOP?
What are the disadvantages of using global variables or singletons?
Feel free to skip the theory sections if you already know the answers and go directly to the “Creating a Program” section.
Always specify types
Type annotation significantly improves the code, increasing its clarity, reliability, and maintainability:
Type safety: Type annotations help identify type mismatches early on, which reduces errors and ensures that your code behaves as expected.
Self-documenting code: Type hints improve code readability and act as inline documentation by clarifying the expected types of function inputs and outputs.
Improving code quality: The use of type hints improves design and architecture by promoting thoughtful planning and implementation of data structures and interfaces.
Improved tool support: Tools such as mypyuse type annotations for static type checking, identifying potential errors before execution begins, thereby simplifying the development and testing process.
Support for modern libraries: FastAPI, Pydantic and other libraries use type annotations to automate data validation, generate documentation, and reduce code duplication.
Advantages of typed data classes over simple data structures: Typed data classes improve readability, structured data, and type safety over arrays and tuples. They use attributes instead of string keys, which minimizes errors due to typos and improves code completion. Dataclasses also provide clear definition of data structures, support default values, and simplify code maintenance and debugging.
Why do we need to divide the application into layers
Separating an application into layers improves maintainability, scalability, and flexibility. Key reasons for this strategy include:
Sharing concerns
Each layer focuses on a specific aspect, making it easier to develop, debug, and maintain.
Reusability
Scalability
Ease of maintenance
Improved collaboration
Flexibility and adaptability
Changes in technology or design can be implemented in certain layers. Only the affected layers need adaptation, the rest remain unaffected.
Testability
Using a layered architecture provides significant benefits in development speed, operational management, and long-term maintenance, making systems more reliable, manageable, and adaptable to change.
Global constants vs. injected parameters
When developing software, the choice between using global constants and using dependency injection (DI) can have a significant impact on the flexibility, maintainability, and scalability of applications. This analysis examines the disadvantages of global constants and contrasts them with the advantages provided by dependency injection.
Global Constants
Fixed configuration: Global constants are static and cannot dynamically adapt to different environments or requirements without changing the code base. This rigidity limits their use in various operating scenarios.
Limited scope of testing: Testing becomes difficult when using global constants because they are not easily overridden. Developers may need to change global state or use complex workarounds to accommodate different test scenarios, increasing the risk of bugs.
Reducing modularity: Relying on global constants reduces modularity because components become dependent on specific values set globally. This dependency reduces the ability to reuse components across projects or contexts.
High connectivity: Global constants integrate specific behavior and configurations directly into the codebase, making it difficult to adapt or evolve the application without significant changes.
Hidden dependencies: Like global variables, global constants hide dependencies within an application. It becomes unclear which parts of the system depend on these constants, making the code difficult to understand and maintain.
Difficulties of maintenance and refactoring: Over time, using global constants can lead to maintenance problems. Refactoring such a codebase is risky because changes to constants may accidentally affect different parts of the application.
Duplicating state at the module level: In Python, module-level code can be executed multiple times if the import occurs in different paths (for example, absolute and relative). This can lead to duplicate global instances and difficult to track down service errors.
Injected parameters
Dynamic flexibility and customizability: Dependency injection allows you to dynamically configure components, making applications adaptable to changing conditions without the need to change code.
Improved testability: DI improves testability by allowing mocks or alternative configurations to be introduced during testing, effectively isolating components from external dependencies and providing more reliable test results.
Increased modularity and reusability: Components become more modular and reusable since they are designed to work with any injected parameters corresponding to the expected interfaces. This separation of concerns increases the portability of components across different parts of the application or even across different projects.
Low connectivity: Injected parameters promote low coupling by decoupling the system's logic from its configuration. This approach makes it easier to update and make changes to the application.
Explicitly declaring dependencies: In DI, components explicitly declare their dependencies, usually through constructor parameters or setters. This clarity makes the system easier to understand, maintain, and expand.
Scalability and complexity management: As applications grow, DI helps manage complexity by isolating problems and decoupling configuration from usage, enabling efficient scaling and maintenance of large systems.
Procedural programming vs OOP
Using object-oriented programming (OOP) and dependency injection (DI) can significantly improve the quality and maintainability of code compared to a procedural approach with global variables and functions. Here's a simple comparison demonstrating these benefits:
Procedural Approach: Global Variables and Functions
# Global configuration
database_config = {
'host': 'localhost',
'port': 3306,
'user': 'user',
'password': 'pass'
}
def connect_to_database():
print(f"Connecting to database on {database_config['host']}...")
# Assume connection is made
return "database_connection"
def fetch_user(database_connection, user_id):
print(f"Fetching user {user_id} using {database_connection}")
# Fetch user logic
return {'id': user_id, 'name': 'John Doe'}
# Usage
db_connection = connect_to_database()
user = fetch_user(db_connection, 1)
Code duplication:
database_config
must be passed or accessed globally across multiple functions.Testing difficulties: Simulating a database connection or configuration involves manipulating global state, which is prone to errors.
High connectivity: Functions directly depend on global state and specific implementations.
OOP + DI approach
from typing import Dict, Optional
from abc import ABC, abstractmethod
class DatabaseConnection(ABC):
@abstractmethod
def connect(self):
pass
@abstractmethod
def fetch_user(self, user_id: int) -> Dict:
pass
class MySQLConnection(DatabaseConnection):
def __init__(self, config: Dict[str, str]):
self.config = config
def connect(self):
print(f"Connecting to MySQL database on {self.config['host']}...")
# Assume connection is made
def fetch_user(self, user_id: int) -> Dict:
print(f"Fetching user {user_id} from MySQL")
return {'id': user_id, 'name': 'John Doe'}
class UserService:
def __init__(self, db_connection: DatabaseConnection):
self.db_connection = db_connection
def get_user(self, user_id: int) -> Dict:
return self.db_connection.fetch_user(user_id)
# Configuration and DI
config = {
'host': 'localhost',
'port': 3306,
'user': 'user',
'password': 'pass'
}
db = MySQLConnection(config)
db.connect()
user_service = UserService(db)
user = user_service.get_user(1)
Reduced code duplication: The database configuration is encapsulated in a connection object.
DI capabilities: Easy to replace
MySQLConnection
to another database connection class, for examplePostgresConnection
without changing the codeUserService
.Encapsulation and abstraction: The implementation details of how users are retrieved or how the database is connected are hidden from view.
Convenience of mocking and testing:
UserService
can be easily tested by injecting a stubDatabaseConnection
.Object lifetime management: The lifecycle of database connections can be managed in more detail (for example, using context managers).
Using OOP principles: Demonstrates inheritance (abstract base class), polymorphism (implementation of abstract methods), and protocols (interfaces defined
DatabaseConnection
).
By structuring an application using OOP and DI, the code becomes more modular, easier to test, and flexible to changes such as replacing dependencies or changing configuration.
Creating a program
All examples and more detailed information with comments can be found in repositories
Start of a new project
A small checklist:
1. Manage Projects and Dependencies with Poetry
poetry new python-app-architecture-demo
This command will create a minimal directory structure: separate folders for the application and tests, a project metainformation file pyproject.toml
local dependency files and git configurations.
2. Version Control with Git
Initialize git:
git init
Add a file .gitignore
to exclude unnecessary files from your repository. Use standard .gitignore
provided by GitHub and add the rest of the exceptions like .DS_Store
for macOS and editors folder (.idea
, .vscode
, .zed
etc):
wget -O .gitignore https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
echo .DS_Store >> .gitignore
3. Dependency management
Install your project's dependencies using poetry:
poetry add fastapi pytest aiogram
You can install all dependencies later using:
poetry install
Consult each library's official documentation if you need more specific instructions.
4. Configuration files
Create a file config.py
to centralize application settings is a common and effective approach.
Set environment variables for secrets and settings:
touch .env example.env
.env
contains sensitive data and should be git-ignored while example.env
contains placeholder or default values and is stored in the repository.
5. Application entry point
Define your application's entry point to main.py
:
python_app_architecture/main.py:
def run():
print('Hello, World!')
if __name__ == '__main__': # avoid run on import
run()
Make your project usable as a library and allow programmatic access by importing the function run
V __init__.py
:
python_app_architecture/init.py
from .main import run
Enable direct project execution with Poetry by adding a shortcut to __main__.py
. This will allow you to use the command poetry run python python_app_architecture
instead of a longer one poetry run python python_app_architecture/main.py
.
python_app_architecture/main.py:
from .main import run
run()
Defining Directories and Layers
Disclaimer:
Of course, every application is different, and their architecture will differ depending on their goals and objectives. I'm not saying that this is the only correct option, but it seems to me that it is quite average and suitable for a large part of projects. Try to focus on basic approaches and ideas rather than specific examples.
Now let's set up directories for the different layers of the application.
It generally makes sense to version the API (for example, by creating subdirectories like api/v1
), but we will keep things simple for now and skip this step.
.
├── python_app_architecture_demo
│ ├── coordinator.py
│ ├── entities
│ ├── general
│ ├── mappers
│ ├── providers
│ ├── repository
│ │ └── models
│ └── services
│ ├── api_service
│ │ └── api
│ │ ├── dependencies
│ │ ├── endpoints
│ │ └── schemas
│ └── telegram_service
└── tests
app
entities – data structures of the entire application. Purely data carriers without logic.
general – suitcase with tools. Folder for common utilities, helpers, and library wrappers.
mappers – specialists in transforming data, such as database models into entities, or between different data formats. It is good practice to encapsulate mappers within their scope, rather than keeping them global. For example, the models-entities mapper can be part of a repository module. Another example: the schemas-entities mapper should remain inside the API service and be its private tool.
providers – the basis of business logic. Providers implement core application logic but remain independent of interface details, keeping their operations abstract and isolated.
repositories – librarians. Guardians of data access, abstracting the complexities of interaction with the database.
services – each service acts as an (almost) autonomous sub-application, organizing its own specific area of business logic and delegating the main tasks to providers. This configuration ensures centralized and consistent logic throughout the application
api_service – manages external communications via http/s, structured around the FastAPI framework.
dependencies – the core tools and helpers needed for the various parts of your API, integrated using the DI FastAPI system
endpoints – http interface endpoints
schemas – definition of data structures for API requests and responses
telegram_service – works similarly to the API service, providing the same functionality in a different interface, but without duplicating the business logic code by calling the same providers, the chir uses the API service.
tests – the directory is intended exclusively for testing and contains all test code, maintaining a clear separation from the application logic.
The connection between layers will look something like this:
note that entities – not active components, but only data structures that are transferred between layers:
Remember that layers are not directly related, but only depend on abstractions. Implementations are passed using dependency injection:
Such a flexible structure makes it easy to add functionality, for example, change the database, create a service or connect a new interface without unnecessary changes or duplication of code, since the logic of each module is located on its own layer:
At the same time, all the logic of a separate service is encapsulated inside it:
Learning to code
Endpoint
Let's start from the end point:
# api_service/api/endpoints/user.py
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, status
from entities.user import UserCreate
from ..dependencies.providers import (
user_provider, # 1
UserProvider # 2
)
router = APIRouter()
@router.post("/register")
async def register(
user: UserCreate, # 3
provider: Annotated[UserProvider, Depends(user_provider)] # 4
):
provider.create_user(user) # 5
return {"message": "User created!"}
Import the dependency injection helper function (we'll look at that in a minute)
Importing UserProvider protocol for type annotation
The endpoint requires the request body to contain a schema
UserCreate
in json formatParameter
provider
in functionregister
represents an implementation instanceUserProvider
injected by FastAPI using the mechanismDepends
.In method
create_user
functionsUserProvider
parsed user data is transmitted. This demonstrates a clear separation of concerns where the API layer delegates business logic to the provider layer, adhering to the principle that front-end layers should not contain business logic.
UserProvider
Now let's look at the business logic:
# providers/user_provider.py
from typing import Protocol, runtime_checkable, Callable
from typing_extensions import runtime_checkable
from repository import UserRepository
from providers.mail_provider import MailProvider
from entities.user import UserCreate
@runtime_checkable
class UserProvider(Protocol): # 1
def create_user(self, user: UserCreate): ...
@runtime_checkable
class UserProviderOutput(Protocol): # 2
def user_provider_created_user(self, provider: UserProvider, user: UserCreate): ...
class UserProviderImpl: # 3
def __init__(self,
repository: UserRepository, # 4
mail_provider: MailProvider, # 4
output: UserProviderOutput | None, # 5
on_user_created: Callable[[UserCreate], None] | None # 6
):
self.repository = repository
self.mail_provider = mail_provider
self.output = output
self.on_user_created = on_user_created
# Implementation
def create_user(self, user: UserCreate): # 7
self.repository.add_user(user) # 8
self.mail_provider.send_mail(user.email, f"Welcome, {user.name}!") # 9
if output := self.output: # unwraping the optional
output.user_provider_created_user(self, user) # 10
# 11
if on_user_created := self.on_user_created:
on_user_created(user)
Interface Definition:
UserProvider
is a protocol that defines a methodcreate_user
, which any class that adheres to this protocol must implement. It serves as the formal contract for the user creation functionality.Observer Protocol:
UserProviderOutput
serves as an observer (or delegate) that receives notification when a user is created. This protocol provides free communication and improves the event-driven architecture of the application.Protocol Implementation:
UserProviderImpl
implements user creation logic, but does not need to explicitly declare its allegianceUserProvider
due to the dynamic nature of Python and the use of duck typing.Basic dependencies: The constructor accepts
UserRepository
AndMailProvider
– both are defined as protocols – as parameters. Relying solely on these protocols,UserProviderImpl
remains separate from specific implementations, illustrating the principles of Dependency Injection, where the provider is independent of the underlying details, interacting only through specific contracts.Optional Output Delegate: The constructor accepts an optional instance
UserProviderOutput
which, if provided, will be notified when user creation is complete.Callback function: As an alternative to the output delegate, you can pass the callable function
on_user_created
to handle additional actions after user creation, providing flexibility in responding to events.Central business logic: Method
create_user
encapsulates the core business logic for adding a user, demonstrating separation from API processing.Interacting with the repository: Uses
UserRepository
to abstract database operations (such as adding a user), ensuring that the provider does not directly manipulate the database.Advanced business logic: Involves sending email via
MailProvider
illustrating that a provider's responsibilities may extend beyond simple CRUD operations.Event Notification: If an output delegate is provided, it notifies it of the user creation event, using the observer pattern to enhance interactivity and modular response to events.
Executing a callback: Optionally performs a callback function, providing a simple method for extending functionality without complex class hierarchies or dependencies.
FastAPI Dependencies
Okay, but how to instantiate the provider and implement it? Let's look at the injection code implemented using the FastAPI DI engine:
# services/api_service/api/dependencies/providers.py
from typing import Annotated
from fastapi import Request, Depends
from repository import UserRepository
from providers.user_provider import UserProvider, UserProviderImpl
from providers.mail_provider import MailProvider
from coordinator import Coordinator
from .database import get_session, Session
import config
def _get_coordinator(request: Request) -> Coordinator:
# private helper function
# NOTE: You can pass the DIContainer in the same way
return request.app.state.coordinator
def user_provider(
session: Annotated[Session, Depends(get_session)], # 1
coordinator: Annotated[Coordinator, Depends(_get_coordinator)] # 2
) -> UserProvider: # 3
# UserProvider's lifecycle is bound to short endpoint's lifecycle, so it's safe to use strong references here
return UserProviderImpl( # 4
repository=UserRepository(session), # 5
mail_provider=MailProvider(config.mail_token), # 6
output=coordinator, # 7
on_user_created=coordinator.on_user_created # 8
# on_user_created: lambda: coordinator.on_user_created() # add a lambda if the method's signature is not compatible
)
Obtaining a database session through FastAPI's dependency injection system, ensuring that every request has a clean session.
Retrieving from an instance's application state
Coordinator
which is responsible for managing broader application-level tasks and acts as an event manager.Note: the function returns the protocol, but not the exact implementation.
Instance construction
UserProviderImpl
by injecting all necessary dependencies. This demonstrates the practical application of dependency injection for assembling complex objects.Initialization
UserRepository
with a session received from the FastAPI DI system. This repository handles all data persistence operations, abstracting database interaction from the provider.Settings
MailProvider
using a configuration token.Injection
Coordinator
as an output protocol. It is assumed thatCoordinator
implements the protocolUserProviderOutput
allowing it to receive notifications when a user is created.Assigns a method from
Coordinator
as a callback that will be executed when the user is created. This allows additional operations or notifications to be triggered as a side effect of the user creation process.
This structured approach ensures that UserProvider
equipped with all the necessary tools to effectively perform its tasks, while adhering to the principles of free communication and high connectivity.
Coordinator
The Coordinator class acts as the main orchestrator in your application, managing various services, interactions, events, establishing initial state, and injecting dependencies. Here is a detailed description of its roles and functionality based on the code provided:
# coordinator.py
from threading import Thread
import weakref
import uvicorn
import config
from services.api_service import get_app as get_fastapi_app
from entities.user import UserCreate
from repository.user_repository import UserRepository
from providers.mail_provider import MailProvider
from providers.user_provider import UserProvider, UserProviderImpl
from services.report_service import ReportService
from services.telegram_service import TelegramService
class Coordinator:
def __init__(self):
self.users_count = 0 # 1
self.telegram_service = TelegramService( # 2
token=config.telegram_token,
get_user_provider=lambda session: UserProviderImpl(
repository=UserRepository(session),
mail_provider=MailProvider(config.mail_token),
output=self,
on_user_created=self.on_user_created
)
)
self.report_service = ReportService(
get_users_count = lambda: self.users_count # 3
)
# Coordinator's Interface
def setup_initial_state(self):
fastapi_app = get_fastapi_app()
fastapi_app.state.coordinator = self # 4
# 5
fastapi_thread = Thread(target=lambda: uvicorn.run(fastapi_app))
fastapi_thread.start()
# 6
self.report_service.start()
self.telegram_service.start()
# UserProviderOutput Protocol Implementation
def user_provider_created_user(self, provider: UserProvider, user: UserCreate):
self.on_user_created(user)
# Event handlers
def on_user_created(self, user):
print("User created: ", user)
self.users_count += 1
# 7
if self.users_count >= 10_000:
self.report_service.interval_seconds *= 10
elif self.users_count >= 10_000_000:
self.report_service.stop() # 8
Some states may be shared across providers, services, layers, and the entire application.
Building implementations and dependency injection
Something to be aware of here are circular references, deadlocks and memory leaks, see for details full code.
Pass the coordinator instance to the FastAPI application state so that you can access it at endpoints through the FastAPI DI system.
Run all services in separate threads
Already running in a separate thread inside the service
Some cross-service logic, just for example
Example of managing services from the coordinator
This orchestrator centralizes control and communication between various components, increasing application manageability and scalability. It effectively coordinates actions between services, ensuring that the application responds appropriately to state changes and user interactions. This design pattern is very important to maintain a clean separation of concerns and provide more robust and flexible application behavior.
Container DI
However, in large-scale applications, manual use of DI can result in a significant amount of boilerplate code. That's when DI Container comes to the rescue. DI Containers, or Dependency Injection Containers, are powerful tools used in software development to manage dependencies in an application. They serve as a central place where objects and their dependencies are registered and managed. When an object requires a dependency, the DI container automatically handles the instantiation and provisioning of those dependencies, ensuring that the objects receive all the necessary components to function effectively. This approach promotes loose coupling, improves testability and overall maintainability of the codebase by abstracting complex dependency management logic from the business logic of the application. DI containers simplify the development process by automating and centralizing the configuration of component dependencies.
There are many libraries for python providing different implementations of DI Container, I have looked through almost all of them and noted down the best ones IMO
python-dependency-injector – automated, class based, has different lifecycle options such as Singleton or Factory
lagom – dictionary interface with automatic resolution
dishka – good scope control through the context manager
that-depends – support for context managers (objects must be closed at the end), built-in fastapi integration
punq – a more classic approach with methods
register
Andresolve
.rodi – classic, simple, automatic
main.py
Finally, update the file main.py:
# main.py
from coordinator import Coordinator
def run(): # entry point, no logic here, only run the coordinator
coordinator = Coordinator()
coordinator.setup_initial_state()
if __name__ == '__main__':
run()
Conclusion
To get a complete understanding of the architectural and implementation strategies discussed, it is useful to review all files in repositories. Despite the limited amount of code, each file is provided with meaningful comments and additional details that allow you to further understand the structure and functionality of the application. Learning these aspects will improve your familiarity with the system, ensuring that you are well prepared to effectively adapt or extend the application.
This approach is universal for various Python applications. It is effective for stateless backend servers, such as those built with FastAPI, but its benefits are especially pronounced in frameworkless and stateful applications. This includes desktop applications (both GUI and command line) as well as systems that control physical devices such as IoT devices, robotics, drones, and other hardware-centric technologies.
Also, I recommend reading the book Clean code Robert Martin. You can find a summary and main conclusions Here.
The approach shown has been tested in practice and is used for the main programs of the hub and speaker of a smart home system MajorDomthe details of which I periodically write in telegram.