harmony between Terraform and Ansible

In a DevOps world where automation plays a key role, managing resources and infrastructure upgrade processes in the cloud is a mission-critical task. Many modern projects, especially those deployed in the AWS cloud environment, use Auto Scaling Groups (ASG) to achieve three main goals: load balancing, improving service reliability, and optimizing operating costs.

Imagine this: you work for a company that deploys its applications on Amazon resources. Your applications are important because they serve thousands of users every day. For this purpose, the Auto Scaling Groups (ASG) mechanism is often used to achieve three main goals: load balancing, increasing service reliability and optimizing operating costs.

And to speed up the deployment process and simplify configuration management, you use pre-staged AMIs. These images are created using tools like HashiCorp Packer (or others similar) and contain everything you need to get your application up and running quickly and smoothly. To deploy the infrastructure itself, you use Terraform, which has become a de facto standard in many large companies that manage cloud resources and use the IaC (Infrastructure as Code) approach.

But sometimes you find yourself needing to upgrade your instances to a new AMI version, whether it’s due to installing the latest security updates or adding new functionality. And this is where the difficulties begin. How to update an already running ASG without downtime? How can you ensure that the new AMI will perform as well as the old one?

Fortunately, ASG provides a solution in the form instance refresh, a powerful tool that allows you to upgrade instances within a fleet, minimizing downtime and ensuring high availability. Everything would be fine, but how can you make sure that the update was successful, especially when it comes to large and complex systems?

Unfortunately, Terraform resources (for example the same aws_autoscaling_group) do not allow you to track the progress and success of the ASG update operation within the scope of instance refresh, but can only start it. If some other parts of the infrastructure (for example, updating certificates or DNS records) somehow depend on the state and version of the running instances, then it is advisable to monitor the completion of the update process to obtain the correct state of the infrastructure after terraform shuts down.

To solve this problem, we introduce Ansible into the game. This tool, which has a long proven track record in configuration management and automation, can help here too. It is Ansible that will allow us to control the update process and ensure its successful completion. Thus, by combining Terraform and Ansible, you can create a powerful and flexible solution for managing and updating ASGs on AWS.

1. Prepare Terraform:

The first step is to create a Terraform configuration that provides the necessary structure and process for updating the ASG.

resource "aws_autoscaling_group" "example" {
  desired_capacity     = 3
  max_size             = 5
  min_size             = 2
  vpc_zone_identifier  = ["subnet-0bb1c79de3EXAMPLE"]

  Т {
    id      = aws_launch_template.example.id
    version = aws_launch_template.example.latest_version
  }
  
  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 100
      instance_warmup        = 120
    }
    triggers = ["tag"]
  }

  health_check_type          = "EC2"
  force_delete               = true
  wait_for_capacity_timeout  = "0"
}

Detailed analysis of the block instance_refresh:

This block is important because it sets the parameters for updating instances in ASG:

  • strategy = "Rolling": This strategy ensures that the update is performed incrementally, minimizing potential service availability issues.

  • preferences: This block contains two key settings:

    • min_healthy_percentage = 100: Specifies that the upgrade process must maintain 100% party health, which is critical to maintaining service reliability.

    • instance_warmup = 120: This is the time in seconds that new instances are allowed to warm up before they are brought into production.

  • triggers = ["tag"]: These are triggers that cause instances to update when specified attributes change. This is useful, for example, when changing resource tags.

You should also pay special attention to the launch_template block. Often they simply put `version = “$Latest”`. There is no need to do this. If you set the value $Latest For version, this means that the autoscaling group will always use the latest version of the launch template when creating new EC2 instances. However, this will not automatically update already running instances, even if the template changes.

To trigger an instance refresh process when a template changes, you must use the value latest_version from the resource aws_launch_template as a template version. This way, whenever the template changes and Terraform is subsequently applied, it will see the changes to the template version and trigger instances to update.

Now let’s add an Ansible call to Terraform, which we need to control the update process. To do this, we use a special Terraform resource known as null_resource:

resource "null_resource" "ansible_run" {
  triggers = {
    template_version = aws_autoscaling_group.example.launch_template[0].version
  }

  provisioner "local-exec" {
    command = join(" ",
      [
        "ansible-playbook ${path.module}/asg_refresh_handler.yml -i 'localhost,'",
        "-e asg_name=${aws_autoscaling_group.example.name}"
      ]
    )
}

In Terraform, null_resource is a way of performing actions that are not associated with any actual cloud provider resource. This resource is ideal for integrating external tools such as Ansible.

  1. Triggers: triggers is a construct in Terraform that specifies under what conditions a resource should be recreated. In our case, every time the version launch_template V aws_autoscaling_group.example changes, Terraform will run the Ansible playbook. This ensures that after each ASG update Ansible is called to monitor the status instance refresh.

  2. Provisioner “local-exec”: This provisioner tells Terraform to execute the command on the local machine. In this case, we are running the Ansible playbook.

    • ansible-playbook ${path.module}/asg_refresh_waiter.yml

      indicates the path to our playbook.

    • -i 'localhost,' tells Ansible to run on the local machine.

    • -e asg_name=${aws_autoscaling_group.example.name} passes Ansible the name of the autoscaling group to work with.

This way, every time Terraform updates the ASG due to changes in launch_template, it automatically calls Ansible to monitor the instance refresh process.

2. Compose Ansible Playbook:

Let’s now move on to developing an Ansible Playbook that will track the update process based on the data received from AWS. As we can see from the code above, we need a file called asg_refresh_waiter.yml,which we will place in the same directory as the code of our module for terraform.

---
- name: ASG Refresh Handler
  hosts: localhost
  gather_facts: false
  connection: local
  tasks:

    - name: Obtain ASG Information
      amazon.aws.ec2_asg_info:
        name: '{{ asg_name }}'
      register: asg_status

    - name: Display ASG Instances
      debug:
        msg: '{{ asg_status.results[0].instances }}'

    - name: Display ASG Launch Template Info
      debug:
        msg: '{{ asg_status.results[0].launch_template }}'

    - name: Await Instance Refresh Completion
      amazon.aws.ec2_asg_info:
        name: '{{ asg_name }}'
      register: updated_asg_status
      retries: 300
      until:
        - >-
          updated_asg_status.results[0].instances
            | map(attribute="launch_template.version")
            | union([updated_asg_status.results[0].launch_template.version])
            | length == 1
        - >-
          updated_asg_status.results[0].instances
            | map(attribute="launch_template.version")
            | unique
            | length == 1
      when: asg_status.results[0].launch_template.version is defined

    - name: Display Updated Instances
      debug:
        msg: '{{ updated_asg_status.results[0].instances }}'

Let’s look at the details:

  • Obtain ASG Information: This task retrieves current information about the ASG, allowing you to assess whether an update is required and possible.

  • Display ASG Instances And Display ASG Launch Template Info: These tasks help with debugging by displaying current information about the state of the instances and launch template.

  • Await Instance Refresh Completion: This is the heart of our playbook. Here we use the mechanism retries/untilwhich allows us to track the update process until it is completed:

    • retries: 300 indicates that the task will be repeated up to 300 times until the condition until will not be executed.

    • This task uses the condition until with two conditions to determine completion of the update process.

Parsing conditions in a block until:

In the problem Await Instance Refresh Completionin the block until two checks are presented. These checks are required to ensure that all instances have been updated to the latest version of the Launch Template.

  1. First check:

updated_asg_status.results[0].instances
  | map(attribute="launch_template.version")
  | union([updated_asg_status.results[0].launch_template.version])
  | length == 1

This check does the following:

  • Retrieves the launch template versions of all instances in the ASG.

  • Connects the resulting list of versions with the version of the ASG launch template.

  • Checks that all versions are the same, that is, the list contains only one unique version.

  1. Second check:

updated_asg_status.results[0].instances
  | map(attribute="launch_template.version")
  | unique
  | length == 1

The second check ensures that there are no differences between versions of Launch Templates among instances, ensuring that all instances are updated to the latest version.

Conclusion

If everything is done correctly, then when the Terraform code runs and a new version of the AMI appears, the launch_template version for the autoscaling group will be updated, and the instance refresh process will be automatically launched. After which Terraform will launch the Ansible playbook with the parameters we specified, passing the value of the autoscaling group name to the playbook.

An Ansible playbook launched will check the ASG state and template version of the running machine instances for a given time, waiting until all versions of the running machines are the same version as the updated launch_template version of the ASG.

The given example of Ansible playbook code is quite universal and depends on a single input parameter – the name of the autoscaling group. Therefore, it can be easily used in almost any environment and with any terraform code without modification.

I hope that this example of a combination of Terraform and Ansible will help someone build a more efficient and reliable service update system.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *