Trimming Lines: Migrating from Puppet Enterprise to Ansible Tower. Part 2

The National Environmental Satellite Information Service (NESDIS) has reduced its Red Hat Enterprise Linux (RHEL) configuration management costs by 35% by moving from Puppet Enterprise to Ansible Tower. In this video of the “how we did it” category, system engineer Michael Rau substantiates the implementation of this migration, shares useful tips and experience gained as a result of the transition from one SCM to another.

From this video you will learn:

  • How to justify management the feasibility of moving from Puppet Enterprise to Ansible Tower;
  • what strategies to use for the smoothest transition;
  • tips for transcoding PE manifests in Ansible Playbook;
  • Best practices for installing Ansible Tower.

Trimming Lines: Migrating from Puppet Enterprise to Ansible Tower. Part 1

I have tags for initial deployment that all run. I have tags that check for changes in 20% of the infrastructure that occur in 80% of the working time. A similar situation with roles. There is one maintenance code, a second code that organizes the deployment, and a third that launches the equipment, say, once a day. The role checks for changes to all equipment to be deployed, and checks are done every two hours for equipment that is subject to change. This applies to the firewall, some administrative things, and the like.

Use the good habits you acquired when writing code for Puppet, because they will come in handy for Ansible. For example, such a thing as idempotency is a property of an object or operation when applying the operation again to the object to give the same result as the first. In our case, this means that when you run the same playbook script again, nothing changes, and the system tells you that nothing has changed. If the changes are committed, it means that something is wrong with your code. That is, idempotency will help you detect when something goes wrong when starting the same routine operations.
Use facts and patterns, avoid hard-coded data. Ansible allows you to do this with scripts, flexibly manipulating a data set and catching changes on the fly. There is no such thing in Puppet, you must place all the important data inside the source code. Therefore, use roles, this will greatly facilitate your task.

Use event handlers that are caused by a configuration change. The good thing is that handlers can work with protected sequences. Document everything you do. I hate it when I come across a code that I wrote six months ago, without documenting anything at that moment, and I can’t remember why I wrote it at all. Therefore, any task should have a description. Each role and script you write should have its own ReadMe, which will record how to use them and what they do. Believe me, later this will be very useful to you.

You must take advantage of the new freedoms that Ansible provides. You can affect the same file several times if necessary. For example, if you need to change several parameters of the sshd_config file in one script that contains important parameters for other scripts, you can do this. Puppet does not allow this.
With Ansible, you get predictable program execution. They work exactly as you expect from them, and you do not need to monitor code execution errors. If you work with EXEC, be aware that Ansible handlers are smarter than Puppet handlers. Use Ansible tricks like my favorite Delegation feature. For example, you started a playbook for server A, but part of the steps in this script should be performed independently on server B. I used the delegation function to migrate from puppet to Ansible. With this function, you can configure the execution of the task on a different host, and not on the one that was configured using the delegate_to key. The module will continue to be executed once for each machine, but instead of working on the target machine, it will work on a delegated host, and all available facts will apply to the current host. This will significantly reduce the amount of manual system settings.

Your Puppet Hiera library data can be used as Ansible facts, which are information about connected nodes. Facts – this is what the Gather facts module collects during execution: the amount of disk space, the version and type of operating system, hostname, the amount of available memory, processor architecture, IP addresses, network interfaces and their status. I do not mean service information hidden deep inside the data – just use the scripts of your equipment to form groups and variable nodes. My hardware script tells the system which physical sites it contains. Later I use roles, for example, one of the roles contains infrastructure facts for each physical site, such as NTP resources, DNS servers, IP addresses of modular equipment, Nessus scanners, and the like. Collect common variables into one role if you place them in more than one place and place them on the host as the file /etc/ansible/facts.d/.

So, we will consider actually migration process. I repeat – for me it took a lot of time, but you can reduce it. First of all, you need to buy and deploy the Tower. This is a very understandable process, just follow the installation documentation.

After installing Tower, you will immediately get access to the web interface. Next, you need to install an inventory script and create a list of equipment. You can copy and paste existing scripts into the Tower.

Next, set the Git permissions and permissions by giving Tower direct access to the Git repository where you store playbooks. Thus, you will allow Tower to instantly receive information from the scripts about the changes and immediately implement them. You do not need to tell Tower anything, it will simply check the status of the system and run the latest version of the configuration.

Install a standard Tower account for SSH on your hosts. I use remote access, so I use the standard account and the SUDO system administration program to set privileges. Of course, Tower has a password, so you do not risk security using the SUDO password.

Configure Tower authentication according to your organization’s access structure. Decide on access for departments, teams, distribution of access to roles, deal with special permissions. This is a very voluminous task, depending on the size of your organization, but remember that from the point of view of administering the Tower, thanks to the flexible configuration of accesses, it can greatly simplify your life.

Now that you have the Git repository deployed, install and configure working templates for playbooks. Test the performance of everything you do with Tower. After you verify that your scripts are in perfect order, you can proceed with the host migration. Use Ansible to remove the Puppet agent and “clean up” the node from the Puppet server using a special script. It is very simple.

I created a group called Tower, added it to the Ansible script, and sent it to all hosts. After that, this playbook stopped Puppet services, uninstalled all Puppet Enterprise packages, and cleared the directories. He also deleted the Puppet users – since we delete PE, then we delete PE users as well.

We see the Delegation function in action. Now I can go into the Puppet Master and use the 2-3 commands to clear the registry, and then take a screenshot showing that this SCM has been deleted. It will serve as a documentary evidence that we no longer use PE.

Now let’s look at what should be given maximum attention – my mistakes that should be avoided. This primarily relates to the dependence of the scenarios on each other. It is possible that the variables of one playbook depend on the variables of another playbook. Remember that you can insert requirements into each scenario that allow the use of a different scenario or role. I created roles for the variables of all our sites, and each of these roles contained many variables. Therefore, I used the requirements.yml file to centralize common variables. It allows you to install multiple Ansible content collections with one team.

If we change the default gateway or NTP server, these changes will immediately be reflected in all elements of the infrastructure.

Avoid using massive, bulky roles and scripts. Short scenarios and roles for specific tasks are more efficient and reliable, easier to manage and easier to track.

Pay attention to one more thing – when you launch Tower and go to its page, you will see a display of many parameters in red. This color is an alarm, but here’s the thing. You start the process on hundreds of hosts, and if 99 of them work successfully, and one does not, Tower will report a process failure. He will place a bright red marker on the screen illustrating this work. Do not panic, but try to find out why this single node does not work. When you look at the Puppet Master screen and see 99 green lights and one red, you think that everything is in order, the system works fine. Tower is more strict, but less informative about error messages. Perhaps in the next versions of Ansible this shortcoming will be eliminated, but for now just try to figure out the cause of the alarm, remembering that such an alarm may not carry anything critical – just in this way the system reports one failure on one host.

Be careful if you have temporary hosts that are not always online. For example, my system has several laptops. We administer them using Ansible Tower in the same way as permanent hosts, but since these are mobile devices, they are not always present on the network. If you use temporary hosts when performing standard processes, but the Tower does not detect them on the network when the system is initiated, an alarm and a process failure message will immediately appear on its screen. Tower is unaware that these laptops are currently turned off. There is no problem with this, just keep in mind that such a situation can cause an alarm message.

There is a good way to use the Tower API. When the laptop boots up as part of the standard system boot procedure, it uses the API to tell Tower: “hey, I’m here, is there any work for me?”, After which the Tower checks for tasks for this particular machine now, because knows that she is online.

Another thing that we initially had problems with was the execution of parallel operations. By default, Ansible uses 5 parallel hosts to do one job. Therefore, the launch of the same scheduled work for 100 machines takes up to 20 minutes per host by checking the parameters of the main configuration. Thus, first we start the configuration on 5 hosts, then another 5, another 5 and so on. At first, this circumstance made us seriously nervous, because the deployment of the system on 50 hosts took place within 2 hours. The solution to this problem is as follows.
Just set the number of parallel hosts on the Ansible Tower server to be different than 5. Since I have 150 hosts that are running simultaneously, I set this value to 25. After that, 6 patches, for example, were installed quite quickly. If you wish, you can set this parameter to 50 – it all depends on what computing power and how much RAM you have. In this way, Tower allows you to customize the execution of parallel processes to suit your needs.

If you have any questions about the topic of the report, do not hesitate to contact the indicated contacts. You see the email address where you can send an email describing the problem that occurred when switching from Puppet to Ansible, and I will try to answer you as soon as possible. I thank you for your participation and wish you a successful migration!

A bit of advertising 🙂

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends, cloud VPS for developers from $ 4.99, A unique analogue of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only here 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands! Dell R420 – 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB – from $ 99! Read about How to Build Infrastructure Bldg. class c using Dell R730xd E5-2650 v4 servers costing 9,000 euros for a penny?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *