Isolation and silo for data warehouses in multitenant solutions

6 min


In one of the previous articles, we examined several key points of multi-tenant setup (hereinafter multitenant) cluster Amazon ex. As for security, this is a very extensive topic. It is important to understand that security applies not only to the application cluster, but also to the data warehouse.

Aws as a platform for Saas The solution has great variability for data warehouses. But, as elsewhere, a competent security setting, working out a multitenant architecture for it, setting up various isolation levels require certain knowledge and understanding of the specifics of the work.

Multitenant Data Warehouse

To manage multitenant data conveniently using bins, Silo. The main feature is the separation of rental data (hereinafter tenant) in multitenant decisions Saas. But before we talk about specific cases, we’ll touch on a bit of general theory.

Hidden text

The term “bunker” has not yet fully taken root in the Russian slang of IT-specialists, but we will use it precisely by analogy with the “data lake”.

Only tenant should have access

Data security is a priority for Saas decisions. It is necessary to protect data not only from external intrusions, but also from interaction with others tenant. Even in the case when two tenants cooperate with each other, and access to common data is controlled and configured according to business logic.

Industry Standards for Encryption and Security

Standards tenant may vary by industry. Some require data encryption with a clearly defined key change frequency, while others require shared keys instead tenant oriented. By identifying data arrays with specific tenant, different encryption standards and security settings can be applied to individual tenant as an exception.

Performance tuning based on tenant subscription

Usually providers Saas recommend a shared workflow for everyone tenant. From the point of view of practice, this may not always be convenient in relation to a specific business logic. Therefore, it can be done differently. To each tenant assign different sets of properties and performance limits depending on the standard Tier. So that customers get the performance stated in the agreement Saas providers have to track the use of individual tenant. Thanks to this, all customers receive equal access to resources.

Hidden text

Naturally, this will affect the client’s accounts. Anyone who uses more resources will pay more.

Data management

With growth Saasservice is growing and the number tenant. If the client changes the provider, most often he wants all the data to be reloaded to another resource, and the old ones to be deleted. If the first desire can be challenged, then the fulfillment of the second is guaranteed by the EU General Data Protection Rules. For the correct execution of the rules, Saas the provider must initially identify the data arrays of the individual tenant.

Hidden text

Why is it worth mentioning EU regulations and does it apply to companies from Russia ?! Yes, if residents of the European Union will use the services of companies that collect personal data. And this is about a third of large companies from Russia. But a similar topic should probably be singled out in a separate article and painted in more detail.

How to turn a regular Data Warehouse into multitenant

Just want to note that the magic code does not exist. You can’t just take it and set it up tenant data warehouse bunker. The following aspects should be considered:

  • Service Agreement
  • Access patterns for reading and writing;
  • Compliance with regulations;
  • Expenses.

But there are a number of generally accepted data sharing and isolation practices. Consider these cases using the relational database as an example. Amazon aurora.

Partitioning tenant data in shared repositories and instances

Table used by all tenant. Individual data shared and identified by key tenant_id. Authorization in a relational database is implemented at the row level (row-level security) Access to the application is based on the access policy and takes into account the specific tenant.

Pros:

  • It’s not expensive.

Minuses:

  • Database authorization. This implies several authorization mechanisms within the solution: AWS IAM and database policies;
  • For identification tenant will have to develop application logic;
  • Without complete isolation it is impossible to enforce the agreement Tier about service;
  • Database authorization limits access tracking with AWS CloudTrail. This can only be compensated by adding information from outside. And it would be better to track and troubleshoot.

Isolation of data on shared instance

Rental (tenancy) is still shared at the instance level. But at the same time, data bunkering occurs at the database level. This enables AWS IAM authentication and authorization.

Pros:

  • It’s not expensive;
  • AWS IAM is fully responsible for authentication and authorization;
  • AWS IAM allows you to keep an audit trail on AWS CloudTrail without crutches as separate applications.

Minuses:

  • Basic DB instances are shared between tenant, in this regard, the outflow of resources is possible, which does not allow to fully implement the agreement Tier about service.

Database instance isolation for tenant

The diagram shows the implementation tenant databases for instance isolation. Today it is probably the best solution combining safety and reliability. There is also AWS IAM, and audit from AWS CloudTrail, and complete isolation tenant.

Pros:

  • AWS IAM provides both authentication and authorization;
  • There is a full audit;
  • Clear distribution of resources between tenant.

Minuses:

  • DB and instance isolation among tenant – it is expensive.

How access of applications to multitenant data is implemented

Ensuring that applications have the correct access to data is more important than storing data in a tenant model that meets business requirements. It is not difficult if you use AWS IAM for access control (see examples above). Applications that provide data access for tenant can also use AWS IAM. This can be seen with an example. Amazon ex.

To provide access to Iam at the level pod in Experfect OpenID Connect (OIDC), along with account annotations Kubernetes. The result will be an exchange Jwt from STS, which will create temporary application access to the necessary cloud resources. With this approach, there is no need to introduce advanced permissions for basic work nodes Amazon ex. Instead, you can only configure permissions. Iam for an account associated with pod. This is done based on the actual permissions of the application, which works as part of pod. As a result, we get full control of application permissions and pod.

Hidden text

And due to the fact that AWS CloudTrail logs every call Ex pod to API, you can keep a detailed event log.

Integration Iam supports a comprehensive authorization system for access tenant to data warehouses. In this case, access to the database is controlled only through authentication, which means that you need to enter another level of security.

Amazon EKS accesses AWS DynamoDB multitenant database

Take a closer look at multitenant access as an application running on Amazon exgets access to multitenant database Amazon DynamoDB. In many cases multitenant processes in Amazon DynamoDB implement at the table level (in the ratio of tables and tenant 1: 1). As an example, consider the principle AWS IAM (aws-dynamodb-tenant1-policy), which perfectly illustrates the access pattern, where all the data is associated with Tenant1.

{
   ...
   "Statement": [
       {
           "Sid": "Tenant1",
           "Effect": "Allow",
           "Action": "dynamodb:*",
           "Resource": "arn:aws:dynamodb:${region}-${account_id}:table/Tenant1"
       }
   ]
}

The next step is to associate this role with the cluster account. Exwhich uses Openid.

eksctl utils associate-iam-oidc-provider 
      --name my-cluster 
      --approve 
      --region ${region}


eksctl create iamserviceaccount 
      --name tenant1-service-account 
      --cluster my-cluster 
      --attach-policy-arn arn:aws:iam::xxxx:policy/aws-dynamodb-tenant1-policy 
      --approve 
      --region ${region}

Definition podcontaining the necessary specification serviceAccountNamewill help use the new service account tenant1-service-account.

apiVersion: v1
kind: Pod
metadata:
 name: my-pod
spec:
serviceAccountName: tenant1-service-account
 containers:
 - name: tenant1
…

Although the account and policy Iam tenant oriented, static and controlled with tools such as Terraform and Ansiblespecification pod can be configured dynamically. If you use a template generator, for example, Helm, serviceAccountName can be set as a variable in the corresponding accounts tenant services. In the end, each tenant there will be its own dedicated deployment of the same application. In fact, each tenant There should be a dedicated namespace, where applications will be launched.

Hidden text

The same methods can be implemented using Amazon Aurora Serverless, Amazon Neptune, and Amazon S3 containers.

Conclusion

For Saas-services it is important to think over how data access will be carried out. Consider storage, encryption, performance, and management requirements. tenant. IN multitenant there is no one preferred way of sharing data. Performance advantage multitenant workloads on Aws is an AWS IAM, which can be used to simplify access control for tenant data. Besides, AWS IAM helps you configure application access to data in dynamic mode.

The described features and techniques that may come in handy have affected a bit of theory. But in special cases, it is always necessary to independently analyze the source information and create a personalized solution.


0 Comments

Leave a Reply