Isolation and silo for data warehouses in multitenant solutions
In one of the previous articles, we examined several key points of multi-tenant setup (hereinafter multitenant) cluster Amazon ex. As for security, this is a very extensive topic. It is important to understand that security applies not only to the application cluster, but also to the data warehouse.
Aws as a platform for Saas The solution has great variability for data warehouses. But, as elsewhere, a competent security setting, working out a multitenant architecture for it, setting up various isolation levels require certain knowledge and understanding of the specifics of the work.
Multitenant Data Warehouse
To manage multitenant data conveniently using bins, Silo. The main feature is the separation of rental data (hereinafter tenant) in multitenant decisions Saas. But before we talk about specific cases, we’ll touch on a bit of general theory.
Only tenant should have access
Data security is a priority for Saas decisions. It is necessary to protect data not only from external intrusions, but also from interaction with others tenant. Even in the case when two tenants cooperate with each other, and access to common data is controlled and configured according to business logic.
Industry Standards for Encryption and Security
Standards tenant may vary by industry. Some require data encryption with a clearly defined key change frequency, while others require shared keys instead tenant oriented. By identifying data arrays with specific tenant, different encryption standards and security settings can be applied to individual tenant as an exception.
Performance tuning based on tenant subscription
Usually providers Saas recommend a shared workflow for everyone tenant. From the point of view of practice, this may not always be convenient in relation to a specific business logic. Therefore, it can be done differently. To each tenant assign different sets of properties and performance limits depending on the standard Tier. So that customers get the performance stated in the agreement Saas providers have to track the use of individual tenant. Thanks to this, all customers receive equal access to resources.
Data management
With growth Saasservice is growing and the number tenant. If the client changes the provider, most often he wants all the data to be reloaded to another resource, and the old ones to be deleted. If the first desire can be challenged, then the fulfillment of the second is guaranteed by the EU General Data Protection Rules. For the correct execution of the rules, Saas the provider must initially identify the data arrays of the individual tenant.
How to turn a regular Data Warehouse into multitenant
Just want to note that the magic code does not exist. You can’t just take it and set it up tenant data warehouse bunker. The following aspects should be considered:
- Service Agreement
- Access patterns for reading and writing;
- Compliance with regulations;
- Expenses.
But there are a number of generally accepted data sharing and isolation practices. Consider these cases using the relational database as an example. Amazon aurora.
Partitioning tenant data in shared repositories and instances
Table used by all tenant. Individual data shared and identified by key tenant_id. Authorization in a relational database is implemented at the row level (row-level security) Access to the application is based on the access policy and takes into account the specific tenant.
Pros:
- It’s not expensive.
Minuses:
- Database authorization. This implies several authorization mechanisms within the solution: AWS IAM and database policies;
- For identification tenant will have to develop application logic;
- Without complete isolation it is impossible to enforce the agreement Tier about service;
- Database authorization limits access tracking with AWS CloudTrail. This can only be compensated by adding information from outside. And it would be better to track and troubleshoot.
Isolation of data on shared instance
Rental (tenancy) is still shared at the instance level. But at the same time, data bunkering occurs at the database level. This enables AWS IAM authentication and authorization.
Pros:
- It’s not expensive;
- AWS IAM is fully responsible for authentication and authorization;
- AWS IAM allows you to keep an audit trail on AWS CloudTrail without crutches as separate applications.
Minuses:
- Basic DB instances are shared between tenant, in this regard, the outflow of resources is possible, which does not allow to fully implement the agreement Tier about service.
Database instance isolation for tenant
The diagram shows the implementation tenant databases for instance isolation. Today it is probably the best solution combining safety and reliability. There is also AWS IAM, and audit from AWS CloudTrail, and complete isolation tenant.
Pros:
- AWS IAM provides both authentication and authorization;
- There is a full audit;
- Clear distribution of resources between tenant.
Minuses:
- DB and instance isolation among tenant – it is expensive.
How access of applications to multitenant data is implemented
Ensuring that applications have the correct access to data is more important than storing data in a tenant model that meets business requirements. It is not difficult if you use AWS IAM for access control (see examples above). Applications that provide data access for tenant can also use AWS IAM. This can be seen with an example. Amazon ex.
To provide access to Iam at the level pod in Experfect OpenID Connect (OIDC), along with account annotations Kubernetes. The result will be an exchange Jwt from STS, which will create temporary application access to the necessary cloud resources. With this approach, there is no need to introduce advanced permissions for basic work nodes Amazon ex. Instead, you can only configure permissions. Iam for an account associated with pod. This is done based on the actual permissions of the application, which works as part of pod. As a result, we get full control of application permissions and pod.
Integration Iam supports a comprehensive authorization system for access tenant to data warehouses. In this case, access to the database is controlled only through authentication, which means that you need to enter another level of security.
Amazon EKS accesses AWS DynamoDB multitenant database
Take a closer look at multitenant access as an application running on Amazon exgets access to multitenant database Amazon DynamoDB. In many cases multitenant processes in Amazon DynamoDB implement at the table level (in the ratio of tables and tenant 1: 1). As an example, consider the principle AWS IAM (aws-dynamodb-tenant1-policy), which perfectly illustrates the access pattern, where all the data is associated with Tenant1.
{
...
"Statement": [
{
"Sid": "Tenant1",
"Effect": "Allow",
"Action": "dynamodb:*",
"Resource": "arn:aws:dynamodb:${region}-${account_id}:table/Tenant1"
}
]
}
The next step is to associate this role with the cluster account. Exwhich uses Openid.
eksctl utils associate-iam-oidc-provider
--name my-cluster
--approve
--region ${region}
eksctl create iamserviceaccount
--name tenant1-service-account
--cluster my-cluster
--attach-policy-arn arn:aws:iam::xxxx:policy/aws-dynamodb-tenant1-policy
--approve
--region ${region}
Definition podcontaining the necessary specification serviceAccountNamewill help use the new service account tenant1-service-account.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
serviceAccountName: tenant1-service-account
containers:
- name: tenant1
…
Although the account and policy Iam tenant oriented, static and controlled with tools such as Terraform and Ansiblespecification pod can be configured dynamically. If you use a template generator, for example, Helm, serviceAccountName can be set as a variable in the corresponding accounts tenant services. In the end, each tenant there will be its own dedicated deployment of the same application. In fact, each tenant There should be a dedicated namespace, where applications will be launched.
Conclusion
For Saas-services it is important to think over how data access will be carried out. Consider storage, encryption, performance, and management requirements. tenant. IN multitenant there is no one preferred way of sharing data. Performance advantage multitenant workloads on Aws is an AWS IAM, which can be used to simplify access control for tenant data. Besides, AWS IAM helps you configure application access to data in dynamic mode.
The described features and techniques that may come in handy have affected a bit of theory. But in special cases, it is always necessary to independently analyze the source information and create a personalized solution.