Vault HashiCorp and the Dream of One Secure Button

Hi! My name is Lev, I am the head of integration development at the financial marketplace Banki.ru. More than a year ago, we began the transition to a microservice architecture. There were more and more secrets: passwords, tokens, certificates, keys. And it was becoming more and more difficult to manage them. The number of teams has long exceeded several dozen, and the number of integrations with partners, which are also protected by secrets, is over a hundred.

We were faced with the task of building a convenient and secure structure for storing and managing secrets. In this article, I will share our experience and tell you:

  • What was the problem with secrets management, how did we solve it, and how could we improve the process?

  • What methods of storing secrets exist and how can they be integrated?

  • How can storage be implemented through Vault?

  • What are access policies and how do they apply to secret structures?

  • The implementation options that best suit our specific situation.

What the process of managing secrets looked like

Until recently, we stored secrets encrypted in Git. The process for switching looked like this:

  1. To add new secrets, the manager (PM) or support passed the information to the team leader, who then passed the data to DevOps.

  2. Devops made changes to the encrypted secrets file in Git.

  3. After this, the process of assembling a new build or image was launched, which in some cases might require testing.

  4. After testing, DevOps deployed the changes to production.

  5. The production support team included the service in monitoring and monitored its operation, ready to roll back changes at any time.

Previous process of changing secrets/certs

I'm sure many of you know this process from personal experience. The key problem here is that you have to use the entire development pipeline. This takes a lot of time and requires coordination between different teams.

Targeted process of changing secrets/certs

At one fine moment (after another incident on production due to a secret that was not updated in time) the final realization came that we couldn’t live like this any longer! I wanted something simpler and more reliable: for secrets to be changed by pressing one button.
We came up with a concept for a future solution:

  • Processing of secrets should take place through a user-friendly interface.

  • When a new secret arrives, the support guys make changes and save them themselves with that same button.

  • The application updates automatically, without requiring manual deployment or restart.

Looks like a great goal to strive for.

Where to keep shared secrets

Let's consider the options for storing and managing secrets that are most often used in practice.

  1. The easiest way is to use shared storagesuch as a Wiki page or an Excel spreadsheet. However, there is no full access control or user action auditing, making the approach unsuitable for serious tasks with a large number of commands.

  2. A more complicated option – password manager like KeyPassThey protect passwords with encryption and allow you to restrict access, but they are still not suitable for teamwork, since they do not have the appropriate tools.

  3. Advanced level – keep secrets in encrypted in Git. This approach provides security, but the process of secrets management is quite slow: it requires assembly, testing, and restarting applications. As was the case with us.

  4. The most correct and mature approach is to use specialized secrets management systems, such as Vault. These systems integrate with Kubernetes and can be built into your deployment pipeline, allowing you to centrally, securely, and efficiently manage access to secrets across teams running full CI/CD.

What Vault provides and what is valuable to us

First, a little more detail about Vault itself.

Vault is a system safe storage of secretsAll passwords in it are encrypted, and even if a bad person gets physical access to the disk with the storage, he will not be able to do anything with it.

  • The structure of secrets is presented as key-value and supports JSON format.

  • Vault API provides access to secretsand access rights are checked through external authentication services, which can be Kubernetes. The ServiceAccount mechanism in Kubernetes generates short-lived tokens for secure access to secrets. This solves the problem of the password for all passwords for technical accounts.

  • Vault has a mechanism auditwhich records all transactions for subsequent analysis by the security service.

  • Integrations with Kubernetes make it easy to implement Vault into your existing infrastructure.

  • Vault supports industry protocols OAuth2.0, OpenID for accessing secrets.

  • Vault has a built-in mechanism for sharing secrets with third parties using one-time tokensThis mechanism provides access control and detects compromises.

  • An important advantage is versioning secrets: In case of any errors, changes can be rolled back.

Unsealing the Vault

Let me remind you that all secrets in Vault are encrypted – in order to work with them, they need to be decrypted. This happens when Vault starts and is called unsealing the vault.

The process of unsealing the Vault is not an easy task. By default, five secret keys are generated, which are then distributed among engineers. To unseal, you need to enter several keys at the same time, for example, three of the five. If the unsealing fails, the Vault will not rise. It is very reminiscent of cranberry action films with nuclear missile launches and Armageddon :-).

This algorithm ensures the security of the vault, but at the same time creates difficulties for administrators. Especially when restarting Vault, when the vault needs to be re-sealed. Pay attention to this when building a highly available system.

Vault stores can be configured to use external storage systems such as Postgres or Consul. They have recently moved to using internal storage.

The Vault cluster runs on a distributed instance protocol, which ensures reliability and resilience. It is recommended to use an odd number of Vault instances in the cluster, for example, at least three. Ideally, they should be distributed across different data centers to increase fault tolerance.

Current Vault implementation via Ansible

Deploying Vault to dozens of teams requires significant administrative effort to synchronize the transition and significant changes to the underlying pipelines.
That's why we went for an intermediate solution. We have a Vault cluster, a Kubernetes cluster, Ansible for creating manifests, and pods with applications. There is also a Git repository where templates, variables for Ansible, and source code are stored. AD is deployed nearby to manage corporate accounts of Vault users.

We have defined three basic roles:

  • administrators with full access to secrets;

  • information security (IS) specialists who supervise the audit;

  • users divided into groups or products.

The simple solution is that Ansible, when generating manifests to deploy our applications to Kubernetes, accesses Vault to get secrets and insert them into the manifests. This is not the most secure approach, as there is a master password for Ansible access to Vault.
Additionally, secrets are stored in manifests and, as a consequence, in the Kubernetes configuration in etcd. In the target solution, we will remove this drawback.

With Vault, we have secret management tools that allow groups of users to change secrets and notify DevOps to deploy applications. This allows the maintenance team to change and manage passwords, and gives security more control over the process.

Targeted Vault implementation via vault-k8s injections

Our target architecture looks more complex and includes Vault integration with Kubernetes and automatic secrets updates without DevOps involvement.

In this diagram, we have a cluster including Vault, Git, Ansible, and Kubernetes (k8s), shown on the left, as well as our application running on Kubernetes.
When a pod starts, an Init container is started. This container accesses Vault to get secrets, which are then passed to the container with our application. Vault interacts with Kubernetes as an authentication service.

Init container is launched on behalf of an account inside the Kubernetes cluster via the SeviceAccounts mechanism. Vault trusts Kubernetes to generate and verify tokens for Init container. As a result, Kubernetes does not store the password for all passwords, but only short-lived service tokens.

Next, Init container extracts secrets and passes them to containers on the fly when launch of our application. As a result, there are no application secrets in the manifests, just as there are none in the etcd configuration of Kubernetes itself.

There is a complication in the scheme: if we change secrets in Vault and want them to appear in the application, it needs to be restarted. Only then will the process with Init container work. To solve this problem, we use the sidecar tool. This is an additional container that runs in the pod of our application and monitors secret changes. When a secret changes, the sidecar updates it in the application or restarts the application.

To ensure that an application reads secrets passed to it by an Init container or sidecar, the following scheme is used by default:

  • For the application, the container shares a disk in RAM, the file with secrets is written there as a mounted disk.

  • The application reads these secrets as if they were from a regular configuration file.

This approach provides a high level of security because secrets are not stored on disk, but only in RAM.

If the application cannot dynamically read secrets from files, then each time the secrets are changed, the application must be restarted. To avoid this, you need to make changes to the application itself. This is a separate topic and a target task for development.

The main drawback of a centralized secrets store is that if you don't pay enough attention to it, Vault becomes a single point of failure. You need to carefully monitor the Vault logs and rotate keys as needed. If the log gets full, Vault goes into denial of service mode. This causes all applications with secrets to stop running.

Methods for restarting pods

It is not always possible to have the resources to run an additional sidecar, especially if you have several hundred services. In this case, you can consider restarting pods by the maintenance team without changing the Kubernetes manifests. The simplest method is to use the kubectl rollout restart command. The figure below shows other ways to restart applications in Kubernetes, including automatic ones. It is important that the maintenance team is aware of this process and is prepared to respond to possible problems.

Vault Features

Vault's features carry risks that may render the tool inoperable.

  1. Unsealing the vault, decrypting, and reading at startup are critical operations and must be monitored to ensure proper operation of the Vault.

  2. It is important to monitor the audit log and not allow it to become full. Setting up policies for audit usage and cleanup is also important to ensure stable operation.

How accesses are linked to secrets

Secrets in the Vault are organized into a hierarchical structure resembling a tree. This means that they are divided into categories and subcategories for easier management.

For every role (user, administrator, IB) are determined relevant policies. A policy in Vault is like an access control list that determines what actions can be performed on certain secrets.

For example, a policy might allow reading, writing, or deleting secrets in a particular path based on the role of a user or group.

path “secret/*” {
capabilities = [“create”, “read”, “update”, “patch”, “delete”, “list”]
}

Regular expressions can be used in Vault policies. In fact, you can even substitute the identity attributes of the user accessing Vault directly into these policies.

You can also search through the key hierarchy and apply access rights to them based on the mask you set. Detailed information on what rights you can configure to access secrets is available here link.

In policies, we specify the path to secrets, a set of specific secrets, and define what access rights we want to grant to this set. This policy is then linked to users, AD groups, and other identities.

Structure of secrets

One of the key points in using Vault is to correctly define the structure of secrets. If you make a mistake at this stage, the process of secrets management will turn into a nightmare, and changing the structure will be akin to moving the entire infrastructure.

The simplest, but also the worst option is a flat list of microservices. Simplicity will hide the complexity of maintenance, since the administrator will have to write a policy for each service, and there will be several times more policies than services. In this case, it is necessary to take into account the roles of users, development environments, etc.

The next option that comes to mind is grouping services by products. This is much better, because we can set up a couple of policies using regular expressions and thus isolate teams from each other. The solution is really good, but only if you have only one deployment environment :-). This does not happen in real life. Another anti-pattern is to add environments one level below services. If you have access separation to different environments, then you will again have to create a bunch of policies for different user roles. After all, maintenance usually only goes to production, and they do not need test environments. At the same time, developers need a development stand, and production is usually closed to them.

We studied these points and developed an optimal structure:

  • The hierarchy starts with the product specification, followed by the environment with the microservice – production, test, etc.

  • In addition, we use additional division into sub-environments. For example, integration, functional or divided into teams. Each of them can have its own access policies and its own threat model.

  • In addition to separating into sub-environments, we use the approach of separating shared secrets into a “commons” folder. This avoids duplicating secrets across services.

  • At the very end of the hierarchy, we specify a folder in the source code repository of a specific microservice.

Here is an example of the structure of secrets for clarity:

  • Product: MPK

    • Wednesday: Production

      • Sub-environment: Main

        • Commons – Common secrets used by all MPK microservices (RabbitMQ, Kafka, etc.)

        • akbarsbank-adapter (specific MPK microservice from Git repository)

This structure allows us to efficiently organize and manage access to secrets based on the environment and user role.

Roles

We have identified three main types of roles for ourselves. The figure shows their description.

Thanks to roles, secret hierarchy and rich functionality of Vault for setting up policies, we can take into account the needs of all teams and their specifics. We do not need to force everyone into one process. For example, somewhere QA needs to be given additional rights to the integration stand, and somewhere — to the system analyst for a specific service in production. All this is flexibly configured by adding new custom roles.

conclusions

We are now at an intermediate point.

  • Implemented Vault and the process of changing secrets.

  • Secrets are inserted into manifests at build time via Ansible scripts.

  • We removed team leads from the process and are not wasting their valuable time.

  • When changing secrets, only support and devops are involved.

  • The time to change secrets has been reduced from one or two days to one hour.

  • Already now incidents on production due to secrets have practically disappeared, we update all secrets on time.

Of course, I want more and to do that very thing one super safe button, but that will be in the next episode.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *