Setting up CI

It is believed that building CI/CD is a task for DevOps. Globally this is true, especially when it comes to the initial setup. But often developers are faced with the need to tighten up certain stages of the process. The ability to fix something minor on your own allows you to avoid wasting time going to your colleagues (and waiting for their reaction), i.e. in general, it increases the comfort of work and gives an understanding of why everything happens the way it does.

There are a lot of settings for the Gitlab pipeline. In this article, without going into the depths of tuning, we’ll talk about what the pipeline script looks like, what blocks it consists of and what it can contain.

(picture from Gitlab CI/CD documentation)

(picture from Gitlab CI/CD documentation)

The article was written based on materials from an internal introductory meeting for developers.

Hi all! My name is Denis, I am a DevOps engineer on an internal project of our company – Mondiad. In this article we will talk about how to write pipelines for GitLab CI/CD, which is mainly used in our project. Let's talk about the contents of this script – how to read, understand and debug it.

The CI script is written in YAML. .gitlab-ci.yml is the only file that lies directly in the project root. In any other folders, GitLab simply will not read it, and therefore the pipeline will not work.

Diagnostic Tools

To debug and understand the pipeline, you can use the tool built into the GitLab shell itself: Build -> Pipeline Editor.

The tool interface has four tabs:

  • Items on the bookmark Edit allow you to correct the pipeline. In principle, you don’t even have to commit it, but work like this.

  • On the bookmark Visualize you can see all the stages and tasks that are included in the pipeline.

  • Validate performs validation – checks the syntax (otherwise you can write such a pipeline that GitLab will not understand it and will throw an error).

  • Last tab – Full configuration – displays the full text of the pipeline script. When we use various tricks to shorten the code, Full configuration displays the situation without them. It’s convenient to write a script in the editor and check the Full configuration to see how he put it together.

Minimal script

A minimal GitLab CI script looks like this:

stages:
  - build

TASK_NAME:
  stage: build
  script:
    - ./build_script.sh

The pipeline stage and task name are indicated here.

Features of the .gitlab-ci.yml file

Stage

https://docs.gitlab.com/ee/ci/yaml/#stages

In addition to the stages that we create ourselves, there are two hidden stages:

  • .pre – always executed first;

  • .post – always executed last. As a rule, it is used to clear some data (for example, if an infrastructure is being deployed for testing tasks). .post passes even if errors occur as individual stages are executed.

These stages do not have to be announced. However, the pipeline cannot contain only these two stages.

stages:
   - build

job1:
  stage: build
  script:
    - echo "This job runs in the build stage."

first-job:
  stage: .pre
  script:
    - echo "This job runs in the .pre stage, before all other stages."

last-job:
  stage: .post
  script:
    - echo "This job runs in the .post stage, after all other stages."

Default section

https://docs.gitlab.com/ee/ci/yaml/#default

This section contains a global set of job options that can be overridden later.

The most common:

  • before_script

  • after_script

  • cache

  • image

  • services

  • tags

  • retry

default:
  image: ruby:3.0
  retry: 2

This section is mainly used to shorten the body of the script itself.

Variables section

https://docs.gitlab.com/ee/ci/yaml/#variables

To control the build process, you can use variables from GitLab itself in the script. Most used:

  • $CI_PROJECT_DIR – full path to the project inside the container.

  • $CI_COMMIT_REF_NAME – contains the name of the branch and the name of the tag.

  • $CI_COMMIT_TAG – tag name. This variable only receives data if we insert a tag.

  • $CI_PROJECT_NAMESPACE – namespace of the project.

  • $CI_PROJECT_NAME – project name.

  • $CI_REGISTRY – path for GitLab registry.

  • $CI_REGISTRY_USER – user for GitLab registry.

  • $CI_REGISTRY_PASSWORD – password for GitLab registry.

A complete list of predefined variables is in the documentation: https://docs.gitlab.com/ee/ci/variables/predefined_variables.html

If these variables are not enough, you can set your own in the variables section and use them in tasks.

variables:
  DEPLOY_SITE: "https://example.com/"

Workflow section

https://docs.gitlab.com/ee/ci/yaml/#workflow

The next section allows you to define general launch rules for the entire pipeline. In fact, this is also an abbreviation of the script – default rules for all tasks in the pipeline.

workflow:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      variables:
        PROJECT1_PIPELINE_NAME: 'MR pipeline: $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME'
    - if: '$CI_MERGE_REQUEST_LABELS =~ /pipeline:run-in-ruby3/'
      variables:
        PROJECT1_PIPELINE_NAME: 'Ruby 3 pipeline'
    - when: always  # Other pipelines can run, but use the default name

image option

https://docs.gitlab.com/ee/ci/yaml/#image

Our project mainly uses runners with docker images. Image specifies the name of the image in which the task will be executed. For example, if the project uses Ruby, then we need an image with Ruby installed inside.

rspec:
  image: registry.example.com/my-group/my-project/ruby:2.7
  script: bundle exec rspec

Job Options

Option Tags

https://docs.gitlab.com/ee/ci/yaml/#tags

job:
  tags:
    - ruby
    - postgres

Tags determines which runner the task should be executed on. For example, we need a runner that should be in a test environment where there is a certain database. In this case, you must specify the appropriate runner tag.

before_script / script / after_script options

https://docs.gitlab.com/ee/ci/yaml/#before_script

https://docs.gitlab.com/ee/ci/yaml/#script

https://docs.gitlab.com/ee/ci/yaml/#after_script

The main body of our task can be divided into three parts:

  • before_script is a set of commands that is executed immediately before the job body, but after all artifacts and cache have been received. Typically, commands are entered there to prepare the environment for completing the task.

  • script is the body of the job. Basically it consists of a set of commands that we, roughly speaking, type in the shell to complete this task – run, playbook.

  • after_script – a set of commands that is executed after the job body. It is important that this set is always executed, even if the job fails. The only option in which after_script will not work is if before_script fails, since in this case the task will not even start.
    Another important point is that after_script only applies to a specific job. If you need a certain stage to work, there is .post.
    Typically, tasks are placed in after_script that clean up sensitive data behind them. In the example below, this is the key that is used for access. There are also scripts that prepare data for tests or perform other tasks that need to be done anyway.

You can use long commands in scripts…

Scripts can use long commands that begin with I or >. The runner interprets these symbols differently.

  • I – each new line is a transition to a new command (similar to the shell);

  • > – the new command is interpreted after the new line.

In older versions of YAML, only one of the symbols was available, but now you can use both.

test-stage:
  stage: deploy
  image: ansible:2.9.18
  before_script:
    - echo "$DEPLOY_KEY" > ~/.ssh/id_rsa
    - chmod 700 ~/.ssh/id_rsa
  script:
    - >
      echo “run”
      ansible-playbook -i invetory.ini playbook.yaml
  after_script:
    - rm ~/.ssh/id_rsa

Artifacts option

https://docs.gitlab.com/ee/ci/yaml/#artifacts

Next come the artifacts. Typically, they are used to store some data that we create while running a job. These can be binary assemblies, logs for further processing, and anything else (otherwise we will simply lose all this).

In this section we write the path to the file (path) where we collect this data. The path can be specified through a mask using *. To include a folder and its subfolders, you should use the **/* construction (namely two asterisks), and exclude allows you to exclude some files.

job:
  artifacts:
    paths:
      - binaries/
      - .config
    exclude:
      - binaries/**/*.o
    expire_in: 1 week

Another useful thing is expire. It stores the artifact for a certain amount of time, after which it will automatically be deleted from GitLab so as not to take up space.

cache option

https://docs.gitlab.com/ee/ci/yaml/#cache

The next useful option is cache. It allows you to cache data that is created during the execution of a task and that needs to be reused in the future to speed up the pipeline (for example, so that the same Gradle does not pump out its packages every time). Caching is very often used for npm packages. Path allows you to specify the path to the cache folder.

By default, the cache is used everywhere. This is where Key can be useful, with which you can assign a unique key to the cache. This can be useful when you need to have a different cache for each branch, for example, if different versions of packages are used.

job:
  script:
    - echo "This job uses a cache."
  cache:
    key: binaries-cache-$CI_COMMIT_REF_SLUG
    paths:
      - .gradle/caches

Option needs

https://docs.gitlab.com/ee/ci/yaml/#needs

allows you to set a dependency on an artifact or other task. For example, a certain artifact appears at a certain stage, and accordingly, the task becomes dependent.

The option also allows you to speed up the transition to the next task (since with it you do not have to wait for the entire stage to be completed).

test-job1:
  stage: test
  needs:
	- job: build_job1
  	artifacts: true

test-job2:
  stage: test
  needs:
	- job: build_job2
  	artifacts: false

test-job3:
  needs:
	- job: build_job1
  	artifacts: true
	- job: build_job2
	- build_job3

Services option

https://docs.gitlab.com/ee/ci/yaml/#services

The option allows you to launch an additional docker container for the job. For example, with a Postgres database or docker-dind (a container that is necessarily present when building docker images). Because We work inside docker, building is prohibited there. Accordingly, we can send the assembly to this service.

test-job1:
  stage: test
  services:
    - name: docker:dind
    - name: postgres:9.6

When option

https://docs.gitlab.com/ee/ci/yaml/#when

The next option – when – allows you to set the simplest conditions for starting a task. For example, if we want to run it only manually.

deploy_job:
  stage: deploy
  script:
    - make deploy
  when: manual

cleanup_build_job:
  stage: cleanup_build
  script:
    - cleanup build when failed
  when: on_failur

Rules

https://docs.gitlab.com/ee/ci/yaml/#rules

More complex launch rules can be built using rules. Here you can use a simple if condition (the task will run if, say, it was launched from a specific branch) or track changes in files using changes (the task will run if a specific file has changed).

job:
  script: echo "Hello, Rules!"
  rules:
    - if: $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME =~ /^feature/
      when: never
      allow_failure: true
   - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - Dockerfile
      when: manual
      allow_failure: true

How to shorten the script

Let's move on to the final stage – how to shorten the script without losing much of its readability. There are two approaches:

  • Use Gitlab's built-in extends option https://docs.gitlab.com/ee/ci/yaml/#extends. It allows us to create something like a function where we place frequently repeated options. This is convenient if the script contains many tasks of the same type. Options from Extends are combined with what is already in the script.

.my_extend:

stage: build

variables:

USERNAME: my_user

script:

– extend script

TASK_NAME:

extends: .my_extend

variables:

VERSION: 123

PASSWORD: my_pwd

script:

– task script

TASK_NAME:

stage: build

variables:

VERSION: 123

PASSWORD: my_pwd

USERNAME: my_user

script:

– task script

  • Use reference links. It's almost the same. We also insert a link, but it allows you to overwrite options rather than add them. In some cases, for example, when you need to completely erase the entire default set of variables, it is convenient to use this approach. Reference links can be nested – each next one will overwrite the previous one.

.my_extend: &my_extend

stage: build

variables:

USERNAME: my_user

script:

– extend script

TASK_NAME:

<<: *my_extend

variables:

VERSION: 123

PASSWORD: my_pwd

script:

– task script

TASK_NAME:

stage: build

variables:

VERSION: 123

PASSWORD: my_pwd

script:

– task script

Author: Denis Palaguta, Maxilect.

Thanks to the Maxilect DevOps team for their help in preparing and commenting on the article.

PS We publish our articles on several Runet sites. Subscribe to our page at VK or at Telegram channelto learn about all publications and other news from Maxilect.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *