Implementing CI / CD & DevOps in Enterprise (Rostelecom) – part 2
Rostelecom Digital Products Platform. How does it work
The date of creation of the Digital Products Platform (DSP) can be considered summer 2017. The old name is Digital Sandbox, or simply Sandbox.
The infrastructure is based on two tenants based on OpenStack-KVM virtualization, located in independent data centers of the National Cloud Platform: a productive PCP stand on M9, a development stand on M10.
Due to the isolation of the company’s divisions, the development needed a tool with preliminary integrations with the main services and products of Rostelecom and the ability to instantly organize a workplace. This tool became the PCP.
The creation of the platform allowed to significantly save time on the development of digital services and products, simplifying their development and operation due to the fact that now there is no need to prepare the infrastructure from scratch. Now it is possible to easily, quickly and efficiently create websites, web applications, chat bots, various integration services, and the development team within the PDS can prototype, deploy and administer applications without the need to configure any infrastructure and technologies. The time for providing a ready-made stand was reduced to several hours. All this helped to make PCP the center of the IT core of Rostelecom.
The network segment DMZ-KSPD-NOP allowed us to integrate with Rostelecom systems, which later helped launch services that work with personal data according to FZ-152.
Thanks to all this, we were able to make the PDS the optimal solution for the Russian market, since the platform core is built on the basis of Open source using OpenShift, Kubernetes, Docker technologies, which do not require the purchase of additional licenses, since they are free software, on NOP resources with a set of ready-made to the work of services.
For clarity, let’s give an example below:
The PDA toolkit allows you to manage CloudNative applications in a single DevOps cycle, fully implementing the CI / CD technological practices (which will be discussed a little later).
Each product (project) has its own set of permissions and resource limits for PCP, while architectural requirements, resource allocation rules, role-based access model, approaches to publishing and replicating services are standardized.
Do not forget that common subsystems for collecting metrics and monitoring ensure both the operation of the platform itself and uniform monitoring standards for each instance, service, application. This allows you to save operating resources, since there is no need to deploy a separate technological “kit” for collecting metrics and monitoring for each product / project, as well as development, which is required to provide unloading of its metrics and logging services in certain formats.
In conclusion, we would like to mention that various types of DBMS are used as data storages in projects, determined by the architecture of a particular service: RDBMS (PostgreSQL, Oracle), NoSQL (Redis, MongoDB, Elasticsearch, OrientDB, Reindexer, ClickHouse), Time-series (Prometheus) … For production outlines, as a rule, these storages are located on separate virtual machines. Everything here depends on the business requirements in terms of ensuring the availability and fault tolerance of services of a particular project, as well as compliance with the information security regulations adopted by the Company. It sounds boring, but this part of our bloody enterprise life with the help of practices, PCP tools and a sense of humor, we minimize.
How we work at the level of CI / CD processes and tools
At different stages of the CI / CD cycle, we use specialized solutions integrated into the PCP, which provide process automation:
● Development, testing, debugging
○ Jira + Confluence – requirements and tasks management, knowledge base of each of the projects in the PDS.
○ Gitlab – code and configuration storage, integrated with Jira.
○ Nexus – Artifact repository.
○ Rundeck – manage build-test-release stages.
○ Gitlab CI / CD, Gitlab Runner – Automate CI / CD processes.
● Publishing applications at stands
○ Docker is a technology for packaging applications into containers that run in containerization environments.
○ OpenShift is an environment for publishing and managing containerized applications (Pods based on docker images), services and clusters.
○ Graylog, Sentry – collection and analysis of logs and errors, event audit.
○ Prometheus, Zabbix, Grafana – monitoring.
A wide range of autotest solutions at all levels (for example, unit tests in code, Selenium-based autotest scripts for UI or test cases, Jmeter or Gatling for load tests, etc.).
The general CI / CD process using the PCP example looks like this:
Risk management in the course of work. CI / CD
Continuous Integration (CI)
● Updates to the code are made regularly, up to several times a day.
● CI tools (GitLab Runner, Rundeck, etc.) run scripts to automatically build test code for each change uploaded to the repository.
● First, a new change is tested separately (feature branches, unit testing, etc.), after which, after passing a merge request, it is added to the code in the main branch (release), then it is tested in its entirety (integration tests, regression testing etc.).
Continuous Delivery + Continuous Deployment (CD)
● The tested software is deployed in a test, preproduction, or production environment.
● CD implies automatic code deployment (with possible manual confirmation of the release installation) immediately after each new integration.
● Ideally, CD is a fully automatic process that runs without command control, when in case of an unsuccessful update of a service instance, it is rolled back to its previous state, and the cluster configurations are updated in a rolling manner, without long service stoppages. In most cases, pipelines are launched automatically by changing the code in the corresponding branch. However, for some specific pipeline cases on test circuits can be started manually by a developer or a tester from the Git interface. For a productive environment, deployments are initiated from the Rundeck by the project manager or someone who has been granted permission. After that, Rundeck takes the artifact that was previously tested on test circuits and pre-sale from the Nexus repository and deploys it directly. Important! Code building is no longer performed at this stage.
Server Virtualization, Application Containerization and Cloud Native Architecture
● The architecture of solutions and CI / CD uses the approaches of Infrastructure as Code, Cloud Native, Continuous configuration automation, etc.
● Virtualization of the server landscape is applied as IaaS in a private cloud, including automation of VM deployment (Ansible, Terraform) and centralized monitoring of VM infrastructure.
● Applications (services) are packaged in docker containers with all local dependencies, services are implemented according to Cloud Native principles.
● Containers are deployed in the OpenShift infrastructure, or in simple cases – via docker-compose, swarm, etc.
● Automation tools (Liquibase, Flyway, etc.) can be used to work with changes in the database schemes.
Cloud Native Application Architecture Principles
Let’s highlight the most significant from our point of view and experience:
● Separate applications from each other with persistence and without persistence (Stateless). Each request is atomic and can be executed by any instance of a stateless application.
● We work in a cluster configuration with the ability to quickly replicate service instances, request routing and auto-discovery of instances.
● We configure services through environment variables, self API or config maps in OpenShift.
● Implement a health check endpoint for our services or applications so that containerization and orchestration systems can check the state of the application and respond accordingly (written healthchecks).
● Logs and telemetry data are constantly published, which will be stored and analyzed by systems such as Graylog, Elastic Stack (Elastic + FluentBit), Prometheus.
● We integrate with Sentry (both the backend part of the service and its frontend, if available). Important: with this tool, the development team is more closely involved in the operation of their services on production environments.
● Handle errors correctly so that the orchestration system can recover by restarting or replacing the application with a new copy.
● Does not require human intervention to start and start work.
Git CI and OpenShift Integration – Process Diagram
And for example, a couple of pipelines:
Pipeline of the project “Knowledge Management System” (KMS)
This example allows you to see how the process is implemented on a project with a preproduct environment located in the outer perimeter. Also, the KMS project shows the experience of introducing CI / CD into a mature legacy system with refactoring and dividing the kernel into components.
Pipeline of the project “Motivational and Training Portal of Rostelecom Sales Managers”
This example allows you to see how the process is implemented on a project, where pre-product and product branches are completely within the perimeter of the PCP and are managed by the development team.
The implementation of CI / CD processes is proceeding in stages, Rostelecom has developed a universal methodology and regulations for implementation into projects and systems of any degree of maturity and state. We will talk about this in the next and final part of our materials on promoting CI / CD & DevOps in Enterprise.
Follow Rostelecom IT news!