Analysis of Docker images for compliance requirements

Introduction

Hello everyone! My name is Maxim Chinenov, I work at Swordfish Security, where I implement, develop and research tools and processes related to Cloud & Container Security practices.

Today we will analyze the operation of the tool OCI-image-compliance-scannerdeveloped in our company to cover the tasks of auditing images for best layout practices.

Composition/Compliance Scan

This tool belongs to the Composition/Compiance Scanning class of solutions. As the name suggests, it can include any set of necessary checks: both the configuration of startup parameters set by the developer when assembling the image, and the contents of its file system.

The most common use case for such a tool is when you need to check images from public sources for compliance with your internal information security standards and best practices. This class of solutions is also used when receiving images from an external developer as part of a hardware and software complex.

In the above cases, there is a need to analyze images to identify defects at the acceptance stage in order to reduce potential information security risks during their further use. A problem arises: it is impossible to obtain reliable information about the assembly process itself and the base images used by the developer. This means that we will have to check the finished artifacts for the presence of certain components in the file system and the launch parameters set by the developer.

Typically, in organizations, such a process is formalized to some extent by internal documents. Often, the requirements for images are described in the “Test Program and Methodology” or “TPM” – this is a technical document that formalizes the stage of product testing. The TPM is designed to identify parameters that ensure compliance with various requirements, which may include a set of best industry practices and any other internal company requirements.

Similar requirements for image verification can often be found in banking organizations and government agencies, where information security is given a lot of attention.

What OpenSource solutions are there?

Having studied the available projects on GitHub, we did not find a solution that would fully cover the set of requirements of our customers, specifically when analyzing the layout of ready-made artifacts.

The following partially fit our parameters:

Dockle

A well-known Open Source tool that includes some image layout checks.

This solution has a wider functionality: analysis for secrets, CIS checks, and also a Dockerfile linter. But we need checks specifically for images, and there are not enough of them there.

Chef Inspec

Chef Inspec – a framework for testing Docker images based on the tool
compliance analysis.

In theory, this solution can cover all the tasks we need,
but it is important to understand that you will also need to write your own checks for it, because
the available library is not very wide. Also, the tool may not be suitable for
use within CI/CD pipelines.

LinPEAS

LinPEAS

LinPEAS

Linux Privilege Escalation Awesome Script is a set of scripts for auditing the host operating system for the possibility of privilege escalation.

This is a powerful Linux system auditing tool that includes many checks. Unfortunately, most of them are not applicable to container environments and will be redundant for us.

Trivy

A comprehensive tool for vulnerability scanning, secrets mining, IAC manifest analysis, etc.

It has the ability to expand functionality, including by adding plugins, which are essentially separate files called via the Trivy CLI. For example, with the command trivy plugin_name --args
Here we will also need to develop the checks we need.

Why do we need our own tool?

The first goal is to cover the requirements of our customers for some images taken from OpenSource. Since our company is also a supplier and developer of information security solutions, we need to audit the images, including for composition/compliance, before transferring them to the customer.

The second goal is academic. In this article, we will demonstrate the approaches used in such tools, and highlight the risks and some ways to mitigate them.

Since we needed a tool for analyzing images specifically, we decided not to modify the available tools, but to write our own necessary set of checks and package it in the form we needed for use in CI/CD.

We put forward requirements for the scanner being developed

Checks are checks, but first we need to determine what the scanner itself will look like and what functions it should perform. For this, we have identified the following functional requirements:

  1. Fully automated scanning
    The tool should accept the path to the image as input and perform all necessary checks.

  2. Integration into CI/CD pipeline
    To do this, all scripts are packaged into an image for launching it in the pipeline, and the ability to configure launch parameters through environment variables is also added.

  3. Working with remote registry
    Using Podman, Docker or Scopeo utilities to authenticate in a private repository and upload the image to our container for scanning. Since running a container from a scanned image is not necessary for performing checks, we will not need solutions such as Dind or its analogues with container runtime socket forwarding. We settled on using Podman to upload images, as the simplest option. You can use Skopeo, but in this case you would have to write additional conditions depending on the registry type. In the case of Podman, we just need to run the command podman pull

  4. Possibility of setting Security Gate depending on the result of the check If there are any defects in the image or any errors during the scanning stage, a non-zero exit code must be returned.

  5. Possibility of aggregation of received reports
    The JSON format must be supported for reports. For example, for subsequent processing and sending to the defect and risk management system.

  6. Outputting results in human-readable form
    The output must be in a readable format for its initial analysis.

What are we going to check?

Before implementing the checks themselves, let's figure out what a container image is and what it consists of:

Manifesto

In addition to the set of files that are represented in the image, each image has a manifest and a description of its configuration. These entities are governed by the OCI specification Image Manifest Specificationwhich aims to standardize the container image format so that the same image can be run in different container environments that support the OCI standard.

Let's see what metadata the image contains alpine:latest

$ podman image inspect alpine:latest
[
    {
        "Id": "05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd",
        "Digest": "sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b",
        "RepoTags": [
            "docker.io/library/alpine:latest"
        ],
        "RepoDigests": [
            "docker.io/library/alpine@sha256:6457d53fb065d6f250e1504b9bc42d5b6c65941d57532c072d929dd0628977d0",
            "docker.io/library/alpine@sha256:c5b1261d6d3e43071626931fc004f70149baeba2c8ec672bd4f27761f8e1ad6b"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2024-01-27T00:30:48.743965523Z",
        "Config": {
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh"
            ]
        },
        "Version": "20.10.23",
        "Author": "",
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 7671366,
        "VirtualSize": 7671366,
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "UpperDir": "/home/chin/.local/share/containers/storage/overlay/d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820/diff",
                "WorkDir": "/home/chin/.local/share/containers/storage/overlay/d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820/work"
            }
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820"
            ]
        },
        "Labels": null,
        "Annotations": {},
        "ManifestType": "application/vnd.docker.distribution.manifest.v2+json",
        "User": "",
        "History": [
            {
                "created": "2024-01-27T00:30:48.624602109Z",
                "created_by": "/bin/sh -c #(nop) ADD file:37a76ec18f9887751cd8473744917d08b7431fc4085097bb6a09d81b41775473 in / "
            },
            {
                "created": "2024-01-27T00:30:48.743965523Z",
                "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
                "empty_layer": true
            }
        ],
        "NamesHistory": [
            "docker.io/library/alpine:latest"
        ]
    }
]

We are interested in the following fields:

  • Config: describes the parameters that will be applied to the container launched from this image. Contains fields such as [User, ExposedPorts, Env, Cmd, Volumes, Workdir, Entrypoint, и т.д …];

  • History: contains the build history of each layer;

  • GraphDriver: contains metadata about layers and the storage driver used.

Layers

Any image consists of one or more layers.
When a container is launched, the driver is used by default to work with its file system. OverlayFS (but not necessarily) which allows you to combine all the layers and provide them as a common structure to the container.

Working with an image when using the storage driver "OverlayFS" using Docker as an example

Working with an image when using the “OverlayFS” storage driver using Docker as an example

Image layers are available only in Read-only mode. All write operations to the container's file system during its operation are performed in a separate layer, which is tied to each container and is unique.

Let's take a look at how layers are stored in Podman:

$ podman image inspect c8d36223ec8c | jq .[].GraphDriver
{
  "Name": "overlay",
  "Data": {
    "LowerDir": "/home/chin/.local/share/containers/storage/overlay/112a4d2b01d132d767dd308bf84cb1056ac9e9c69e20363e6f4e0408f6a9093c/diff:/home/chin/.local/share/containers/storage/overlay/8db3188b70bb36654ff2bd898e80b10755c11edd07594656eaf282006ff77d7a/diff:/home/chin/.local/share/containers/storage/overlay/3d3aed3edc819c2f725ac40628fe1a2be94f857eb4761d12493d0161a18dc6da/diff:/home/chin/.local/share/containers/storage/overlay/31f3b94693eedf746f21c88fded37fb678da561c6a0a504cda548ebb7faf649b/diff:/home/chin/.local/share/containers/storage/overlay/58f1e7fffb3a66240446f0a4902ae32f157782c01a0c3a5e81323e70b9ed4d17/diff:/home/chin/.local/share/containers/storage/overlay/fdf3ed5c79ccdf9fda551031d00e9068a46a586410f21f97401c77c49d642f8d/diff",
    "UpperDir": "/home/chin/.local/share/containers/storage/overlay/5922742b05d547c802ab6eb1a4129282baf33c0e73c4ee6d59c218e0f0e91ba3/diff",
    "WorkDir": "/home/chin/.local/share/containers/storage/overlay/5922742b05d547c802ab6eb1a4129282baf33c0e73c4ee6d59c218e0f0e91ba3/work"
  }
}

We see that the OverlayFS driver is used, and also:

  • LowerDir: contains a list of directories separated by “:”, they correspond to the same RO layers that the image consists of;

  • UpperDir: writable layer used by the container for write operations during its operation;

  • MergedDir: a service directory for internal use. The result of overlaying LowerDir and UpperDir. If two directories with the same name are found in LowerDir and UpperDir, then MergedDir is used during mounting, as a result of their merger;

  • WorkDir: mandatory service directory. Used for OverlayFS operation.

We will be checking the file system contents in LowerDir, which contains the components present in the image itself.

Attack vectors under consideration

The checks we implement are primarily intended to mitigate the attack and make it more difficult for the attacker to further develop it if he has somehow already entered the container.
Let's look at what options there are in this case:

  1. Privilege Escalation
    Typically, if we consider security standards tailored for Linux environments, the development of an attack by escalating privileges will be described as implementations through service misconfigurations, manipulations with system services, running processes, etc. This is not particularly applicable to containers.
    To apply this specificity to container environments, we will need to recall the fundamental mechanisms of privilege escalation in Linux.

    Each process in Linux has its own attributes which include UID, GID, RUID, RGID, EUID, EGID.

    Let's take a brief look at each of them:

    UID and GID

    They are also user and group. These attributes are specified in the system file “/etc/passwd” for each user in the system.

    user:x:1000:1000:user,,,:/home/user:/bin/bash

    In this case, the UID and GID when initializing the “bash” shell on behalf of the user “user” will have values ​​1000:1000

    RUID and RGID

    Each child process that starts in the current shell will inherit the attributes of the user account, i.e. will have the same RUID and RGID values.

    Let's run a simple command to demonstrate this:

    $ sleep 3600 & ps aux | grep 'sleep'

    And we will check UID, GID, RUIG, RGID, EUID, EGID of this process:

    $ ps -p $PID -o pid,euid,ruid,suid,egid,rgid,sgid,cmd
        PID  EUID  RUID  SUID  EGID  RGID  SGID CMD
     229878  1000  1000  1000  1000  1000  1000 sleep 3600

    As we can see, all values ​​in this case are identical.

    EUID and EGID

    To understand the difference between EUID and EGID from RUID and RGID, let's look at a well-known utility as an example passwd.

    $ ls -l $(which passwd)
    -rwsr-xr-x 1 root root 59976 Feb  6 15:54 /usr/bin/passwd

    The owner and group for the executable file are root:root. This is because the passwd utility needs to modify the /etc/shadow file, which requires root privileges to write to.

    The question arises: “How then can we use passwd from any user to change the password if we do not have root privileges?”
    Pay attention to the letter “s” instead of “x” in the owner portion of the file's permissions. This is a special permission bit known as setuid.

    This is where the purpose of the EUID and EGID attributes comes into play. When executing passwd, the process will change its EUID from the default to the owner of the executable file. In this case, root. The operating system kernel then decides whether to allow writing to the /etc/shadow file by looking at the EUID of the process. Since the EUID now points to root, the write operation will be allowed.

    We looked at the privilege escalation mechanism in Linux associated with the SUID bit.
    Since all privileges in the container are inherited from the process with PID 1, which is not necessarily launched as root, the possibilities for escalating privileges are mainly related to the use of SUID, SGID bits on executable files.

    It turns out that it is necessary to scan the image file system for the presence of SUID, SGID bits on files inside our images.

  2. Remote access
    Various components allow you to implement scenarios for remote access in a container. These can be both client and server applications. If an attacker gets into a container that also has an ssh client, for example, he gets another opportunity to expand the attack.

  3. Compilers
    Depending on the configuration of the runtime environment, kernel, and other conditions, you can come up with various development options: from compiling utilities for pentesting directly inside the container to assembling a kernel module or eBPF program and loading it into the host OS kernel.

  4. Image misconfigurations
    Here we highlight the default container launch from root and the absence of an unprivileged user.

  5. Misconfigurations of the application itself and its specific information security risks
    This may include vulnerabilities in the code and libraries used, unsafe handling of confidential data, lack of image integrity, and much more. To cover all possible risks, other tools and approaches are used at different stages of SSDLC: SAST, DAST, IAC, SecretScan, CodeReview, SupplyChain, RuntimeSecurity, Observability, Benchmarking, etc.

Examples and analysis of received checks

Let's see what we ended up with, both in terms of the checks themselves and their implementation. I'll make it clear right away that I'm not a developer and I don't claim to have an optimal implementation in terms of code.

1. The image must not have the :latest tag

Description:
The image must be pinned to a specific version for the idempotency of our IAC manifests.

Examination:
Image Tag Analysis

2. The image should not be run with superuser rights

Description:
The image must have both a non-root user and group to be able to run processes in the container without superuser privileges.

Examination:
In the image manifest, the launch parameters must explicitly specify a user other than the superuser, and similarly for the group. Here we check the entries
to the values ​​”root” or “0”, since the user and group can be specified both by username and by UID. We also check cases when the user is not defined at all. To analyze the manifest for all the described occurrences, we get the following code block:

           if 'User' in config:
               user_group = config['User']
               if not user_group:
                   result["Severity"] = "Critical"
                   result["Pass"] = False
                   result["Description"] = f"Проверка не пройдена. Пользователь не определён."  
               user = user_group.split(":")[0]
               try:
                   group = user_group.split(":")[1]
               except IndexError as e:
                   group = None
               if user == "root" or user == "0":
                   result["Severity"] = "Critical"
                   result["Pass"] = False
                   result["Description"] = f"Проверка не пройдена. Пользователь по умолчанию {user}."  
               elif not group:
                   result["Severity"] = "Critical"
                   result["Pass"] = False
                   result["Description"] = f"Проверка не пройдена. Группа по умолчанию не определена."   
               elif group == "root" or user == "0":
                   result["Severity"] = "Critical"
                   result["Pass"] = False
                   result["Description"] = f"Проверка не пройдена. Группа по умолчанию {group}."
               else:
                   result["Severity"] = "Informational"
                   result["Pass"] = True
                   result["Description"] = f"Проверка пройдена. Пользователь по умолчанию {user}, группа по умолчанию {group}."
           else: 
               result["Severity"] = "Critical"
               result["Pass"] = False
               result["Description"] = f"Проверка не пройдена. Пользователь не определён."
           return result

In the image itself, in the file “/etc/passwd” there should actually be a user other than root, which is specified in the USER block. If it is not there, then, obviously, the container will not start:

   $ podman inspect ubuntu:test | jq .[].Config.User
   "user"
   $ podman run -ti ubuntu:test bash
   Error: unable to find user user: no matching entries in passwd file

It is important to note here that often, almost all Dockerfile linters do not take into account the group check. At the same time, if the group is not explicitly specified in the launch parameters and the root group is specified for this user in “/etc/passwd”, then the container will be launched with user:root rights, which in many ways will be almost equivalent to the superuser.

For example:

   $ podman inspect ubuntu-rootless:latest | jq .[].Config.User
   "user"
   $ podman run -ti ubuntu-rootless:latest bash
   user@814ec3598430:/$ id
   uid=1001(user) gid=0(root) groups=0(root)''

3. No files with the ability to elevate privileges

Description:
Reduced privilege escalation opportunities for a potential attacker. Some of you may notice that it is possible to limit the ability of a container to escalate/change privileges by properly configuring the “securityContext”. However, it should be taken into account that the exploitation team may not specify the necessary parameters to limit such actions at runtime itself. Therefore, additional verification at the image analysis stage is not superfluous.

Examination:
Search in the file system of the image of binary files su, sudo, executable files with SUID, SGID bits

4. Lack of files that can be accessed remotely

Description:
By analogy, we check well-known utilities that can provide remote access.

Examination:
Search in FS for images of executable files ssh, sshd, nc, netcat, socat and others…

5. No compilers in the image

Description:
The presence of certain compilers in the container allows the attacker to significantly expand the attack vector. It should be taken into account that this check rather carries an informative context, since the compiler may well be needed for the operation of the application itself, i.e. it is part of the logic laid down by the developer. Accordingly, such a check should be carried out rather as an informational one with subsequent manual analysis of the image.

Examination:
Search for executable files of known compilers in the image. Here we have collected data on the most common compilers and the names of their executable files. We have a list of ~250 compiler files that we will search for in the image.
I cannot vouch for the completeness of the list, nor for 100% certainty that all the compilers listed will definitely help an attacker develop an attack.
Below you can see part of the resulting list:

       "gccgo":	"Go compiler, based on the GCC backend",
       "gccgo-9":	"GNU Go compiler",
       "gccgo-10":	"GNU Go compiler",
       "gccgo-11":	"GNU Go compiler",
       "gccgo-12":	"GNU Go compiler",
       "gcl":	"GNU Common Lisp compiler",
       "gdc":	"D compiler (language version 2), based on the GCC backend",
       "gdc-9":	"GNU D compiler (version 2)",
       "gdc-10":	"GNU D compiler (version 2)",
       "gdc-11":	"GNU D compiler (version 2)",
       "gdc-12":	"GNU D compiler (version 2)",
       "gfortran":	"GNU Fortran 95 compiler",
       "gfortran-9":	"GNU Fortran compiler",
       "gfortran-10":	"GNU Fortran compiler",
       "gfortran-11":	"GNU Fortran compiler",
       "gfortran-12":	"GNU Fortran compiler",
       "go":	"Go programming language compilermetapackage",
       "gobjc":	"GNU Objective-C compiler",
       "gobjc++":	"GNU Objective-C++ compiler",
       "gobjc++-9":	"GNU Objective-C++ compiler",
       "gobjc++-10":	"GNU Objective-C++ compiler",
       "gobjc++-11":	"GNU Objective-C++ compiler",
       "gobjc++-12":	"GNU Objective-C++ compiler",
       "gobjc-9":	"GNU Objective-C compiler",
       "gobjc-10":	"GNU Objective-C compiler",
       "gobjc-11":	"GNU Objective-C compiler",
       "gobjc-12":	"GNU Objective-C compiler",
       "golang":	"Go programming language compilermetapackage",
       "golang-1.13":	"Go programming language compilermetapackage",
       "golang-1.13-go":	"Go programming language compiler, linker, compiled stdlib",
       "golang-1.17":	"Go programming language compilermetapackage",
       "golang-1.17-go":	"Go programming language compiler, linker, compiled stdlib",
       "golang-github-googleapis-gnostic-dev":	"compiler for OpenAPI specificationlibrary",
       "golang-github-gopherjs-gopherjs-dev":	"Go to Javascript compiler",
       "golang-github-wellington-go-libsass-dev":	"Go wrapper for libsass, the only Sass 3.5 compiler for Go",
       "golang-github-yuin-gopher-lua-dev":	"virtual machine and compiler for Lua in Go",

6. Obtaining information about the manufacturer

Description:
It is necessary to display useful information from the developer to the user. It is a good practice to indicate information about the manufacturer, version, contacts and any other important information.

Examination:
Manifest analysis and output of LABEL, MAINTAINER information to obtain data embedded by the developer/manufacturer.

7. Getting information about the executable file in the image

Description:
To display information about the executable file and its arguments to the user. Sometimes it can help to understand what software is used in the image.

Examination:
Manifest analysis and output of information from CMD, ENTRYPOINT

8. The image should have the minimum possible number of layers

Description:

It is a good practice to compress images into one resulting layer. Sometimes there are images consisting of 50 or more layers, which raises some questions about their composition and further use in the infrastructure.
For example, files in these layers can overlap each other due to the specifics of how overlay works, resulting in an image consisting of bOmore files than will actually be available in the container. This will, at a minimum, bloat the image size.

Examination:
Analyze the manifest and output information about the number of layers in the image

9. The image should consist of a minimum number of components

Description:

The simpler our images are, the fewer different binary files and packages they contain, the lower the risks associated with a possible attack and its development by an intruder.

Our image should only have those files that are actually used in the process of our application. Almost always, this means the absence of package managers, system utilities, often shell, etc. Such images will be assembled, as they say, “From scratch”. However, when analyzing, you should remember that there is a certain minimum set of components that you will almost always encounter:

  • A set of certificates for TLS operation;

  • Core libraries such as glibc or musl for languages ​​where static compilation is not possible;

  • Interpreters for some languages, such as Python, Node;

  • System files and directories required for the libraries to work correctly, such as “/etc/passwd” and “/tmp”.

A good example to get acquainted with the concept of distroless would be its implementation in images from chainguard or google.

Google and Chainguard Distroless

Google and Chainguard Distroless

Below is an example of the “/usr/bin” directory structure for one of the images positioned as an image for “python”. We took it from the well-known Docker Hub.

├── usr │ ├── bin │ │ ├── [
│   │   ├── addpart
│   │   ├── apt
│   │   ├── apt-cache
│   │   ├── apt-cdrom
│   │   ├── apt-config
│   │   ├── apt-get
│   │   ├── apt-key
│   │   ├── apt-mark
│   │   ├── arch
│   │   ├── awk -> /etc/alternatives/awk
│   │   ├── b2sum
│   │   ├── base32
│   │   ├── base64
│   │   ├── basename
│   │   ├── basenc
│   │   ├── bash
│   │   ├── bashbug
│   │   ├── captoinfo -> tic
│   │   ├── cat
│   │   ├── chage
│   │   ├── chattr
│   │   ├── chcon
│   │   ├── chfn
│   │   ├── chgrp
│   │   ├── chmod
│   │   ├── choom
│   │   ├── chown
│   │   ├── chrt
│   │   ├── chsh
│   │   ├── cksum
│   │   ├── clear
│   │   ├── clear_console
│   │   ├── cmp
│   │   ├── comm
│   │   ├── cp
...

Здесь мы видим классический набор системных утилит, скорее всего для операционной системы Ubuntu. Можно утверждать, что данные компоненты вряд ли будут нужны для работы приложений на основе Python. Вероятно, утилиты появились в образе в результате неправильного выбора базового образа разработчиком или некорректной сборки.

Проверка:
Вывод информации о базовой ОС образа, вывод дерева файловой системы образа для первичного визуального анализа.

Реализация проверок

Все проверки, написаны на Python и сводятся к следующему: определение класса Image и уже в нем набора методов, соответствующих каждой проверке.

Выглядит это примерно так:

class Image():  

    def __init__(self, name):
        self.name = name   

    def tagCheck(self):
        result = Output()   
        result["Title"] = f"Tag :latest" result["Mitigation"] = "The image must have a fixed tag to determine the version" n = self.name.split(':') tag = n[-1]
        if tag != "latest": result["Severity"] = "Informational" result["Pass"] = True result["Description"] = f"Image {self.name} has tag: {tag}" elif not tag: result["Severity"] = "Critical" result["Pass"] = False result["Description"] = f"Image {self.name} tag not defined: {tag}" else: result["Severity"] = "Critical" result["Pass"] = False result["Description"] = f"Image {self.name} has tag: {tag}" _returnStdout(result) return result def labelCheck(self, data): ...

And the scanning order itself looks like this:

def main(image): 
    # Auth in private repo
    auth_repo(auth_config)
  
    # Create/refrest report dir
    create_dir(report_dir)
  
    # Start scan
    # Add :latest if tag not specified
    if ':' not in image:
        image = f"{image}:latest"
    image_short = ('_'.join((image.split("/")[-1]).split(":")[-2::]))
    print(f"{colorCyan}Image scanning started {image}{colorDefault}")  
  
    # Loading image into podman   
    print("1. Pulling the image")  
    pull_image(image)
  
    # Getting image manifest JSON manifest
    print("2. Getting the image manifest")
    data = get_manifest(image)
    image_name = image
    image_obj = Image(image_name)
  
    # Launch checks   
    print("3. Launch of compliance checks")  
    results = []
    results.append(image_obj.tagCheck())
    results.append(image_obj.exposeCheck(data))
    results.append(image_obj.defaultUserCheck(data))
    results.append(image_obj.labelCheck(data))
    results.append(image_obj.layersCheck(data))
    ...

We also set a set of exit codes and environment variables to override them:

# EXIT CODES
INFORMATIONAL = os.environ.get("INFORMATIONAL_EXIT_CODE", 0)
CRITICAL = os.environ.get("CRITICAL_EXIT_CODE", 12)
HIGH = os.environ.get("HIGH_EXIT_CODE", 13)
MEDIUM = os.environ.get("MEDIUM_EXIT_CODE", 14)
LOW = os.environ.get("LOW_EXIT_CODE", 15)
CANT_PULL_IMAGE = 20
CANT_GET_MANIFEST = 21
NOT_DEFINED_IMAGE = 22
CANT_CREATE_REPORT_DIR = 23

Building an image and embedding it into the pipeline

Now let's pack everything into an image. We plan to use this tool within gitlab-ci, so we have no desire to get a distroless image, because gitlab requires some kind of shell inside the image as part of job initialization.
This is required to configure the script field inside the .gitlab-ci.yaml file.

Dockerfile

FROM python:alpine as python

FROM docker.io/mgoltzsche/podman as podman
RUN printf '%s\n' > /etc/containers/registries.conf \
    [registries.search] \
    registries=[\'docker.io\']
COPY --from=python / /
RUN mkdir -p /podman/module
COPY scan.py /podman
COPY module/ /podman/module

FROM scratch
COPY --from=podman / /
LABEL org.opencontainers.image.authors="mchinenov@swordfishsecurity.ru"
WORKDIR /podman
ENTRYPOINT ["python3", "scan.py"]

Example of a job based on gitlab

Compliance Image Scan:
  tags:
    - appsec-team
  image:
    name: registry.swordfishsecurity.com/internal/image_compliance_scanner:v1
  variables:  
    # Стратегия Git
    GIT_STRATEGY: none

    # Для сканирования образов из приватных реестров
    # необходимо определить переменную DOCKER_AUTH_CONFIG

    # Для запуска сканирования необходимо указать
    # путь до сканируемого образа в переменной COMPLIANCE_IMAGE_FULL_REF
  
  script: 
     - python3 /home/nonroot/scan.py

  artifacts:
    # Указать в переменной COMPLIANCE_REPORTS_DIR, директорию, куда будет сохранен отчёт в формате .json, по умолчанию ./reports
    paths:
      - reports
    expire_in: 1 day

We launch and check the work

To start scanning, you need to specify in the variable value COMPLIANCE_IMAGE_FULL_REF full path to the image being scanned.
When using a private repository, you must pass authentication data in a variable. DOCKER_AUTH_CONFIG

Output of our scanning job

Also a report in JSON format, as an artifact of the job's work

{
    "gcc:latest": [
        {
            "Title": "Tag :latest",
            "Severity": "Critical",
            "Pass": false,
            "Description": "Image gcc:latest has tag: latest",
            "Mitigation": "The image must have a fixed tag to determine the version"
        },
        {
            "Title": "Critical ports in the instructions EXPOSE",
            "Severity": "Informational",
            "Pass": true,
            "Description": "The image does not have critical ports in the EXPOSE statement",
            "Mitigation": "Make sure your containers only use protocols for remote connectivity when necessary."
        },
        {
            "Title": "Default user and group",
            "Severity": "Critical",
            "Pass": false,
            "Description": "User not defined",
            "Mitigation": "The default USER and GROUP must be explicitly defined in the USER statement and do not contain root or 0. For example, USER app:app"
        },
        {
            "Title": "Image LABEL metadata",
            "Severity": "Low",
            "Pass": false,
            "Description": "LABEL is not defined",
            "Mitigation": "The image must contain a set of labels specified by the developer in the LABEL instruction"
        },
        {
            "Title": "The number of layers in the image",
            "Severity": "Low",
            "Pass": false,
            "Description": "Number of layers in the image 7. It is necessary to minimize the number of layers if possible",
            "Mitigation": "A good practice would be to squash all layers in the resulting image into a single layer. See docker squash, multistage build"
        },
        {
            "Title": "Parameters CMD, ENTRYPOINT",
            "Severity": "Informational",
            "Pass": true,
            "Description": "The image has startup parameters set in CMD ['bash']",
            "Mitigation": "It is good practice to specify default launch parameters by developer. You must define parameters in a CMD or ENTRYPOINT statement"
        },
        {
            "Title": "Checking for a suid bit file",
            "Severity": "Critical",
            "Pass": false,
            "Description": "Found file",
            "Mitigation": "The image must not contain file(s) suid bit in the image",
            "Files": [
                {
                    "5da10afe97eba389e1dc9867f240e6e61672153700132447b4bbbc97b9b0d6ee": [
                        "/usr/lib/openssh/ssh-keysign"
                    ]
                },
                {
                    "072686bcd3db19834cd1e0b1e18acf50b7876043f9c38d5308e5e579cbefa6be": [
                        "/usr/bin/newgrp",
                        "/usr/bin/chfn",
                        "/usr/bin/passwd",
                        "/usr/bin/umount",
                        "/usr/bin/chsh",
                        "/usr/bin/gpasswd",
                        "/usr/bin/su",
                        "/usr/bin/mount"
                    ]
                }
            ]
        },
        {
            "Title": "Checking for a sgid bit file",
            "Severity": "Critical",
            "Pass": false,
            "Description": "Found file",
            "Mitigation": "The image must not contain file(s) sgid bit in the image",
            "Files": [
                {
                    "e8ef21fa16f7d8b718e156431313a530cd4aee22a23f2340a2030cd6cb5843ca": [
                        "/usr/local/share/fonts"
                    ]
                },
                {
                    "5da10afe97eba389e1dc9867f240e6e61672153700132447b4bbbc97b9b0d6ee": [
                        "/usr/bin/ssh-agent"
                    ]
                },
                {
                    "072686bcd3db19834cd1e0b1e18acf50b7876043f9c38d5308e5e579cbefa6be": [
                        "/var/mail",
                        "/var/local",
                        "/usr/sbin/unix_chkpwd",
                        "/usr/bin/chage",
                        "/usr/bin/expiry"
                    ]
                }
            ]
        },
        {
            "Title": "Checking for a sudo file",
            "Severity": "Informational",
            "Pass": true,
            "Description": "File(s) sudo not found",
            "Mitigation": "The image must not contain file(s) sudo in the image"
        },
        {
            "Title": "Checking for a su file",
            "Severity": "Critical",
            "Pass": false,
            "Description": "Found file",
            "Mitigation": "The image must not contain file(s) su in the image",
            "Files": [
                {
                    "072686bcd3db19834cd1e0b1e18acf50b7876043f9c38d5308e5e579cbefa6be": [
                        "/usr/bin/su"
                    ]
                }
            ]
        },
        {
            "Title": "Checking for a sshd file",
            "Severity": "Informational",
            "Pass": true,
            "Description": "File(s) sshd not found",
            "Mitigation": "The image must not contain file(s) sshd in the image"
        },
        {
            "Title": "Checking for a ssh client file",
            "Severity": "Critical",
            "Pass": false,
            "Description": "Found file",
            "Mitigation": "The image must not contain file(s) ssh client in the image",
            "Files": [
                {
                    "5da10afe97eba389e1dc9867f240e6e61672153700132447b4bbbc97b9b0d6ee": [
                        "/usr/bin/ssh"
                    ]
                }
            ]
        },
        {
            "Title": "Checking for a nc file",
            "Severity": "Informational",
            "Pass": true,
            "Description": "File(s) nc not found",
            "Mitigation": "The image must not contain file(s) nc in the image"
        },
        {
            "Title": "Checking for a netcat file",
            "Severity": "Informational",
            "Pass": true,
            "Description": "File(s) netcat not found",
            "Mitigation": "The image must not contain file(s) netcat in the image"
        },
        {
            "Title": "Checking for a socat file",
            "Severity": "Informational",
            "Pass": true,
            "Description": "File(s) socat not found",
            "Mitigation": "The image must not contain file(s) socat in the image"
        },
        {
            "Title": "Compilers in the image",
            "Severity": "Medium",
            "Pass": false,
            "Description": "Found file",
            "Mitigation": "The image should not contain compilers in the file system, except when they are necessary for the operation of the application. Make sure that compilers are actually needed during execution",
            "Files": [
                {
                    "baec4f93827486ff5ae81e5748f5c6d1523975ca0f0526e4a7b8f990ce70a4d4": [
                        "/usr/local/bin/g++",
                        "/usr/local/bin/gcc",
                        "/usr/local/bin/gccgo",
                        "/usr/local/bin/gfortran",
                        "/usr/local/bin/go"
                    ]
                }
            ]
        },
        {
            "Title": "Checking for OS type",
            "Severity": "Informational",
            "Pass": true,
            "Description": "The image gcc:latest is based on the OS ['Debian GNU/Linux', '12']",
            "Mitigation": "A good practice would be to use Distroless images to minimize the components in the image"
        }
    ]
}

As we can see, in the example of analyzing the image “gcc:latest” our scanner worked correctly.

The scan revealed:

  • the image has the tag “latest“;

  • the default user was not defined by the developer;

  • there are files with bits SUID, SGIDas well as binary executable files “su“, “ssh client“;

  • compiler set detected

It should be noted that the detected files in the report are divided into layers:

                {
                    "e8ef21fa16f7d8b718e156431313a530cd4aee22a23f2340a2030cd6cb5843ca": [
                        "/usr/local/share/fonts"
                    ]
                },
                {
                    "5da10afe97eba389e1dc9867f240e6e61672153700132447b4bbbc97b9b0d6ee": [
                        "/usr/bin/ssh-agent"
                    ]
                },
                {
                    "072686bcd3db19834cd1e0b1e18acf50b7876043f9c38d5308e5e579cbefa6be": [
                        "/var/mail",
                        "/var/local",
                        "/usr/sbin/unix_chkpwd",
                        "/usr/bin/chage",
                        "/usr/bin/expiry"
                    ]
		}

This value is nothing more than the hash of the layer in which the corresponding files were found.

e8ef21fa16f7d8b718e156431313a530cd4aee22a23f2340a2030cd6cb5843ca

What to do next?

Probably many of you will have the following question: “We scanned the image and see that it does not pass the checks. Is there anything we can do about it?”

Indeed, let's consider what actions we can take if the image is vital to us and we simply cannot refuse it. Also remember that we do not have the source code, including the original Dockerfile, which excludes the possibility of rebuilding. In this case, you can use the docker slim toolkit, also known as slim toolkit.

Slimtoolkit

Slimtoolkit

Slimtoolkit

According to the developers, this tool allows you to reduce the container image several times.
When it starts, it tracks the container's execution context and then builds a new image using only the files that were used during execution. To understand this concept and what it looks like in practice, let's use an nginx image as a demonstration.

It is important that the necessary parameters for proper profiling are passed to launch the container. For our test image, they were already set by the developer in “Entrypoint” and “Cmd”:

# podman inspect nginx:latest | jq '.[].Config.Entrypoint, .[].Config.Cmd'
[
  "/docker-entrypoint.sh"
]
[
  "nginx",
  "-g",
  "daemon off;"
]

We launch the build based on our nginx using SlimToolkit. The tool launches a container from the specified image, tracks the execution context and creates a new “slim” image based on it:

# slim build --target nginx:latest
cmd=build info=param.http.probe message="using default probe" 
cmd=build state=started
cmd=build info=params image-build-engine="internal" target.type="image" target.image="nginx:latest" continue.mode="probe" rt.as.user="true" keep.perms="true" tags="" 
cmd=build state=image.inspection.start
cmd=build info=image id='sha256:e784f4560448b14a66f55c26e1b4dad2c2877cc73d001b7cd0b18e24a700a070' size.bytes="187659947" size.human='188 MB' 
cmd=build info=image.stack index='0' name="nginx:latest" id='sha256:e784f4560448b14a66f55c26e1b4dad2c2877cc73d001b7cd0b18e24a700a070' 
cmd=build info=image.exposed_ports list="80/tcp"
cmd=build state=image.inspection.done
cmd=build state=container.inspection.start
cmd=build info=container status="created" name="slimk_394176_20240527152435" id='96a28be6951f926eb9465149d4184a11ab427b5bb5f7edf8dd5294a670384fd9' 
cmd=build info=container status="running" name="slimk_394176_20240527152435" id='96a28be6951f926eb9465149d4184a11ab427b5bb5f7edf8dd5294a670384fd9' 
cmd=build info=container message="obtained IP address" ip='172.17.0.2'
cmd=build info=cmd.startmonitor status="sent"
cmd=build info=event.startmonitor.done status="received" 
cmd=build info=container name="slimk_394176_20240527152435" id='96a28be6951f926eb9465149d4184a11ab427b5bb5f7edf8dd5294a670384fd9' target.port.list="32783" target.port.info='80/tcp => 0.0.0.0:32783' message="YOU CAN USE THESE PORTS TO INTERACT WITH THE CONTAINER"
cmd=build state=http.probe.starting message="WAIT FOR HTTP PROBE TO FINISH" 
cmd=build info=continue.after mode="probe" message="no input required, execution will resume when HTTP probing is completed"
cmd=build prompt="waiting for the HTTP probe to finish"
cmd=build state=http.probe.running
cmd=build info=http.probe.ports count="1" targets="32783"
cmd=build info=http.probe.commands count="1" commands="GET /"
cmd=build info=http.probe.call target="http://127.0.0.1:32783/" attempt="1" error="none" time="2024-05-27T15:24:45Z" status="200" method='GET'
cmd=build info=http.probe.summary total="1" failures="0" successful="1"
cmd=build state=http.probe.done
cmd=build info=http.probe.crawler page="0" url="http://127.0.0.1:32783/"
cmd=build info=probe.crawler.done addr="http://127.0.0.1:32783/"
cmd=build info=event message="HTTP probe is done"
cmd=build state=container.inspection.finishing
cmd=build state=container.inspection.artifact.processing
cmd=build state=container.inspection.done
cmd=build state=building message="building optimized image" engine=internal 
cmd=build state=completed
cmd=build info=results status="MINIFIED" by='14.12X' size.original="188 MB" size.optimized='13 MB'
cmd=build info=results image-build-engine="internal" image.name="nginx.slim" image.size="13 MB" image.id='sha256:7e525a9263886539a2b6c8bf883b8663885cc314c6360fae3192dbd312bfbd50' image.digest="sha256:f2dcc16aca76580c1d56c3cd4dd5328ff857a7c58bce2e5d25610823e9ffc15d" has.data="true"
cmd=build info=results artifacts.location='/tmp/slim-state/.slim-state/images/e784f4560448b14a66f55c26e1b4dad2c2877cc73d001b7cd0b18e24a700a070/artifacts'
cmd=build info=results artifacts.report="creport.json"
cmd=build info=results artifacts.dockerfile.reversed='Dockerfile.reversed'
cmd=build info=results artifacts.seccomp='nginx-seccomp.json'
cmd=build info=results artifacts.apparmor="nginx-apparmor-profile"
cmd=build state=done
cmd=build info=commands message="use the xray command to learn more about the optimize image"
cmd=build info=report file="slim.report.json"

Let's see how the size of the resulting image has changed:

nginx.slim                     latest    7e525a926388   4 minutes ago    13.3MB
nginx                          latest    e784f4560448   3 weeks ago      188MB

I would like to point out that after such a “rebuild” you should always test the resulting application. You should not rely on the tool solving all the problems described earlier.

In practice, modern applications are quite complex and their profiling does not always proceed without problems, since their logic often contains many execution options, including those related to working with the file system and system calls.

conclusions

Today we looked at one example of implementing image checks for assembly in the absence of control over the original assembly process. All the checks obtained boil down to analyzing the values ​​specified in the image manifest configuration, as well as scanning it layer by layer for the presence of certain files.

Scanning of the file system is performed by a set of rules (signatures), so it should be taken into account that this tool is more suitable for a primary check or one of the stages of a more comprehensive audit.

If an attacker knows what and how you check, it will not be difficult for him to bypass all scans, for example, by changing the names of executable files. In fact, this
The problem concerns not only the given tool, but also many other means of protection that also use the signature approach.

In conclusion, I would like to note that the security of your container environments is a complex and often non-trivial task. It should be considered first and foremost as process and not a set of any information security tools. It is necessary to build this process in such a way as to protect your application at all stages of the life cycle, from design and development to operation in industrial environments.

Additional materials

Instructions for use and source code for building the tool can be found in our github at the link SwordFish Security OCI-image-compliance-scanner

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *