How (and why) we deployed ActiveMQ Artemis in the cloud

Platform V Synapse at SberTech.

Our team is working on a product from the Platform V Synapse line – Platform V Synapse Messaging. It is a message broker based on Apache ActiveMQ Artemis. We make it a more secure and feature-rich solution by developing additional plugins, and make sure that it can be easily and quickly deployed using our automation scripts.

In recent years, the trend towards the use of cloud technologies, containerization technologies and microservice architecture has been gaining momentum, and our team decided to expand the capabilities of the product. And if initially the stands were limited only to virtual machines (VMs), then recently we began to introduce Platform V Synapse Messaging into container orchestration environments – Kubernetes (K8s / cloud).

In this article we will tell you about our path: why we chose this or that solution, what difficulties we encountered and where it led us. We believe that our experience will be useful to engineers who are working on mechanisms for migrating applications to the cloud, deploying application data and automating related processes.

Let's go!

Why ActiveMQ Artemis?

We chose Artemis as an open-source replacement for IBM MQ. Both solutions perform the function of a message broker and support a point-to-point operating model with sending and reading messages from queues.

Artemis works with Core (Artemis native), OpenWire, AMQP, MQTT, STOMP protocols. It can be used either individually or in a cluster. Other features can be found in the official documentation. Plus, our team has extensive experience in Java development, which allows us to refine the product by adding various functionality to it:

  • generation of audit events – to simplify the analysis of incidents;

  • message tracing – to trace their entire path within the cluster;

  • collection of connection and session metrics – for monitoring and administering the cluster;

  • limiting connections and the speed of sending messages – to regulate the load on the broker;

  • working with backups – to restore the cluster in case of incidents;

  • checking the DN certificate of a connected client or server – to control client access to the cluster;

  • working with a secret storage (vault) – to use an external storage of secrets (passwords and certificates);

  • encryption of messages while they are in the broker – for secure data storage;

  • client interceptors – to monitor data integrity when writing and reading messages.

Preparing to Deploy Artemis on Kubernetes

While the development team is working to improve Synapse Messaging by introducing new functionality, we, the DevOps team, are solving the problems of its installation, expanding its capabilities and increasing the ease of deployment. In order not to limit ourselves only to deployment on a VM, where everything works generally stable and simple, we looked at the options of packaging the application in a container and launching it in K8s. This would allow us to explore the potential for rapid scaling, fault tolerance, alternative configuration approaches and other features, while taking into account the likely performance drawdowns.

There were no problems with dockerizing the application, given that in the Apache ActiveMQ Artemis repository itself, the developers provide several Dockerfiles with explanations. We essentially used the same approach: we transferred the application files, declared environment variables, created the necessary directories and users, and distributed permissions. We also slightly changed the application launch script, adding a wait for the statuses of Istio and Vault sidecars – we’ll talk about them later. The script queries the Istio sidecar endpoint and waits for files with secrets in the file system that are generated by the Vault sidecar.

# Check Istio and Vault sidecars before launching Artemis

if [ "x$ISTIO_ENABLED" = "xtrue" ]; then
  echo "Checking for Istio Sidecar readiness..."
  until curl -fsI http://localhost:15020/healthz/ready; do
    echo "Waiting for Istio Sidecar, sleep for 3 seconds";
    sleep 3;
  done;
  echo "Istio Sidecar is ready."
fi

if [ "x$VAULT_ENABLED" = "xtrue" ]; then
  config_file="$APP_HOME"/etc/waitVault.txt
  if [ ! -f "$config_file" ]; then
    echo "Vault wait file $config_file not found, skipping Vault check."
  else
    echo "Checking for Vault Sidecar readiness..."
    checked_files=$(cat "$config_file")
    files_count=0
    for file in $checked_files; do
        files_count=$(( files_count + 1 ))
    done

    exists_files_count=0

    time_counter=0

    while [ $exists_files_count != $files_count ]; do
        exists_files_count=0
        for file in $checked_files; do
            if [ -f "$file" ]; then
                exists_files_count=$(( exists_files_count + 1 ))
            fi
        done

        sleep 1
        time_counter=$(( time_counter + 1 ))
        echo "Waiting Vault Sidecar $time_counter s."
    done
    echo "Vault Sidecar is ready."
  fi
fi

In the configmap generation template in Helm charts, which we will discuss below, we also added the creation of a waitVault.txt file with a list of secrets, which is used in the script:

waitVault.txt: |-
    {{- range $key, $value := .Values.annotations }}
    {{- if hasPrefix "vault.hashicorp.com/secret-volume-path" $key }}
    {{ $value }}/{{ $key | trimPrefix "vault.hashicorp.com/secret-volume-path-" }}
    {{- end }}
    {{- end }}

We check the work locally, make sure that everything starts and, most importantly, does not crash. Satisfied, we push to the repository and go drink coffee.

We didn’t stop at the usual image – we follow the recommendations and best practices in our field, so the next iteration was the development of a distroless image. Distroless is a type of image that does not contain a distribution (Alpine, Debian…), but only has everything necessary to run the application, in our case Java. This makes them lighter and less vulnerable due to the reduced attack surface.

Here the approach is also quite trivial – the necessary utilities, locales, and libraries were taken from the Debian builder image and transferred to the Distroless image with Java 11. And the resulting image was used as a base image when assembling the application image itself.

# Start from a Debian-based image to install packages
FROM debian:bullseye-slim as builder

# Install the required packages

RUN apt-get update && apt-get install -y \
    bash \
    coreutils \
    curl \
    locales \
    locales-all

# Start from the distroless java 11 image

FROM gcr.io/distroless/java:11

# Copy the required libraries

COPY --from=builder /lib/x86_64-linux-gnu/libtinfo.so.6 \
                    /lib/x86_64-linux-gnu/libselinux.so.1 \
                    /lib/x86_64-linux-gnu/libpthread.so.0 \
                    /lib/x86_64-linux-gnu/libdl.so.2 \
                    /lib/x86_64-linux-gnu/libc.so.6 \
                    /lib/x86_64-linux-gnu/libaudit.so.1 \
                    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 \
                    /lib/x86_64-linux-gnu/libcap-ng.so.0 \
                    /lib/x86_64-linux-gnu/libdl.so.2 \
                    /lib/x86_64-linux-gnu/libsepol.so.1 \
                    /lib/x86_64-linux-gnu/libbz2.so.1.0 \
                    /lib/x86_64-linux-gnu/

COPY --from=builder /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 \
                    /usr/lib/x86_64-linux-gnu/libacl.so.1 \
                    /usr/lib/x86_64-linux-gnu/libattr.so.1 \
                    /usr/lib/x86_64-linux-gnu/libsemanage.so.1 \
                    /usr/lib/x86_64-linux-gnu/

COPY --from=builder /usr/lib/locale/ /usr/lib/locale/

COPY --from=builder /usr/share/locale/ /usr/share/locale/

# Copy the shell and utilities

COPY --from=builder /bin/bash \
                    /bin/cat \
                    /bin/chown \
                    /bin/chmod \
                    /bin/mkdir \
                    /bin/sleep \
                    /bin/ln \
                    /bin/uname \
                    /bin/ls \
                    /bin/

COPY --from=builder /usr/bin/curl \
                    /usr/bin/env \
                    /usr/bin/basename \
                    /usr/bin/dirname \
                    /usr/bin/locale \
                    /usr/bin/

COPY --from=builder /usr/sbin/groupadd \
                    /usr/sbin/useradd \
                    /usr/sbin/

# Change shell to Bash
SHELL ["/bin/bash", "-c"]
# Create link sh -> bash
RUN ln -s /bin/bash /bin/sh

Let's move on to deployment. I didn't have to go far – ArtemisCloud.io provides a K8s operator for deploying an application in the cloud. The kit includes an operator, a CRD with a description of broker entities, manifests with roles, auxiliary scripts and instructions (as is often the case, they do not answer all questions).

Before installing the operator, we need to add a CRD to our namespace, create a ServiceAccount, Role, RoleBinding, ElectionRole, ElectionRoleBinding. Then you can deploy the operator itself. A set of custom resource definitions covers the main Artemis entities:

  • Broker CRD – creating and configuring broker deployment;

  • Address CRD – creating addresses and queues;

  • Scaledown CRD – creating a message migration controller when reducing the cluster size;

  • Security CRD – setting up security and authentication methods for the broker.

Solid kit! But here questions begin to arise:

  • How do we manage our plugins and integrations?

  • How to configure the cluster now? Not by changing XML files via Ansible, as you are used to? Rewrite everything in YAML under CRD?

  • How to separate access to cluster management and queue management?

  • How to add the necessary functionality without extensive experience in Go development?

  • And what will security say about this, with which the application on the VM is fully consistent, but it knows nothing about the operator?

  • and so on.

On the one hand, we have a ready-made operator that we need to study in detail, understand how it can be adjusted to our needs, and use it. On the other hand, there are our Ansible playbooks for working with VMs, which do not take so long to adapt for deployment in the cloud, and the usual XML configs.

Without thinking twice, we decided that we would not use the operator, but we would develop Helm manifests and finish our playbooks. And this is where the fun begins.

Preparing Helm charts

The architecture we aimed to achieve is as follows:

An application with multiple replicas is deployed in the Kubernetes namespace. The cluster is located behind a single service. In addition to the Artemis cluster, two gateways (Istio envoy) are also deployed in the namespace – ingress and egress, through which traffic is routed for logging. Application and gateway pods are configured to work with Vault‑agent and Istio‑proxy sidecars. Inside the Kubernetes namespace, traffic routing and mTLS are configured using DestinationRule (DR), VirtualService (VS), PeerAuthentication (PA), ServiceEntry (SE) Istio manifests. Let's start with the application itself, and then move on to the “wraparound”.

We use Helm charts to deploy and manage our applications on Kubernetes. The Helm chart consists of manifest templates and variable values ​​(values.yaml), which are substituted into the templates. Unlike individual manifests of various objects, which are deployed one ready-made file through kubectl, charts are installed by “set” or “release”. A release can be updated or rolled back, and when deleted, resources from Kubernetes are also deleted all at once.

We wrote a manifest for the application that raises a statefulset. Statefulset works for us because its pods have predictable names, remain identical across restarts, and are raised, deleted, or restarted one by one when the cluster topology changes, allowing messages to flow from broker to broker. Manifests for services are also needed – service for accessing pods, headless service for discovering pods in a broker cluster.

apiVersion: v1
kind: Service
metadata:
  name: artemis-svc
  namespace: my_namespace
spec:
  ports:
  - name: console
    port: 8161
    protocol: TCP
    targetPort: 8161
  - name: data
    port: 61616
    protocol: TCP
    targetPort: 61616
  - name: jgroups-7800
    port: 7800
    protocol: TCP
    targetPort: 7800
  - name: jgroups-7900
    port: 7900
    protocol: TCP
    targetPort: 7900
  publishNotReadyAddresses: true
  selector:
    app: artemis-app
  type: ClusterIP

We declare ports in services:

  • console — to access the UI interface;

  • data — for TCP connections to application acceptors;

  • prometheus — to collect metrics;

  • jgroups — for intercluster communication.

Since we already had Ansible roles and playbooks for deploying Artemis on a VM, most of the configuration files needed to be translated from the Jinja2 format into the Helm template, and templates for the missing files should be added. As a result, we got the following list of files with configurations, which we mount via configmap in /app/broker/etc:

etc/
|-- _address-settings.tpl
|-- _addresses.tpl
|-- _artemis_profile.tpl
|-- _audit_metamodel.tpl
|-- _audit_properties.tpl
|-- _bootstrap.tpl
|-- _broker.tpl
|-- _cert_roles.tpl
|-- _cert_users.tpl
|-- _jgroups-ping.tpl
|-- _jolokia-access.tpl
|-- _keycloak.tpl
|-- _logback.tpl
|-- _login.tpl
|-- _management.tpl
|-- _plugins_configs.tpl
|-- _resource-limit-settings.tpl
|-- _security-settings.tpl
`-- _vault.tpl

Clustering via Jgroups

An important part of the configurations is the clustering setup. On the VM, we combined the broker nodes into a cluster, declaring static connectors in the section <cluster-connections> in broker.xml:

<connectors>
      <!-- Connector used to be announced through cluster connections and notifications -->
      <connector name="artemis">tcp://10.20.30.40:61616?sslEnabled=true;enabledProtocols=TLSv1.2,TLSv1.3</connector>
      <connector name="node0">tcp://10.20.30.41:61616?sslEnabled=true;enabledProtocols=TLSv1.2,TLSv1.3</connector>
    </connectors>
<cluster-connections>
      <cluster-connection name="my-cluster">
        <reconnect-attempts>-1</reconnect-attempts>
        <connector-ref>artemis</connector-ref>
        <message-load-balancing>ON_DEMAND</message-load-balancing>
        <max-hops>1</max-hops>
        <static-connectors allow-direct-connections-only="false">
          <connector-ref>node0</connector-ref>
        </static-connectors>
      </cluster-connection>
    </cluster-connections>
    <ha-policy>
      <live-only>
        <scale-down>
          <connectors>
            <connector-ref>node0</connector-ref>
          </connectors>
        </scale-down>
      </live-only>
    </ha-policy>

In the cloud, declaring a cluster in this way would be inconvenient. So we set up a mechanism Jgroupsavailable in Artemis out of the box. Jgroups — a protocol stack that allows you to implement clustering for Java applications. The broker settings now look like this:

<connectors>
      <!-- Connector used to be announced through cluster connections and notifications -->
      <connector name="cluster">tcp://${POD_IP}:61617?sslEnabled=false;enabledProtocols=TLSv1.2,TLSv1.3</connector>
      <connector name="artemis">tcp://${POD_IP}:61616?sslEnabled=true;enabledProtocols=TLSv1.2,TLSv1.3</connector>
    </connectors>
<acceptors>
      <acceptor name="cluster">tcp://0.0.0.0:61617?protocols=CORE,AMQP,MQTT,STOMP;amqpCredits=1000;amqpDuplicateDetection=true;amqpLowCredits=300;amqpMinLargeMessageSize=102400;supportAdvisory=false;suppressInternalManagementObjects=false;tcpReceiveBufferSize=1048576;tcpSendBufferSize=1048576;useEpoll=true;sslEnabled=false</acceptor>
      <acceptor name="artemis">tcp://0.0.0.0:61616?protocols=CORE,AMQP,MQTT,STOMP;amqpCredits=1000;amqpDuplicateDetection=true;amqpLowCredits=300;amqpMinLargeMessageSize=102400;supportAdvisory=false;suppressInternalManagementObjects=false;tcpReceiveBufferSize=1048576;tcpSendBufferSize=1048576;useEpoll=true;sslEnabled=true;enabledProtocols=TLSv1.2,TLSv1.3;keyStorePath=/app/artemis/broker/vault/crt.pem;keyStoreType=PEM;trustStorePath=/app/artemis/broker/vault/ca.pem;trustStoreType=PEM;verifyHost=false;needClientAuth=true</acceptor>
    </acceptors>
<broadcast-groups>
      <broadcast-group name="my-broadcast-group">
        <jgroups-file>jgroups-ping.xml</jgroups-file>
        <jgroups-channel>activemq_broadcast_channel</jgroups-channel>
        <connector-ref>cluster</connector-ref>
      </broadcast-group>
    </broadcast-groups>
    <discovery-groups>
      <discovery-group name="my-discovery-group">
        <jgroups-file>jgroups-ping.xml</jgroups-file>
        <jgroups-channel>activemq_broadcast_channel</jgroups-channel>
        <refresh-timeout>10000</refresh-timeout>
      </discovery-group>
    </discovery-groups>
    <cluster-connections>
      <cluster-connection name="my-cluster">
        <discovery-group-ref discovery-group-name="my-discovery-group"/>
        <connector-ref>cluster</connector-ref>
        <max-hops>1</max-hops>
        <message-load-balancing>ON_DEMAND</message-load-balancing>
        <reconnect-attempts>-1</reconnect-attempts>
      </cluster-connection>
    </cluster-connections>
    <ha-policy>
      <live-only>
        <scale-down>
          <discovery-group-ref discovery-group-name="my-discovery-group"/>
        </scale-down>
      </live-only>
    </ha-policy>

Each broker now had a separate acceptor and connector designed for communication between cluster nodes. announced broadcast– And discovery-groups for work Jgroupswhich are indicated in cluster-connections And ha-policy. Himself Jgroups-stack was described in the jgroups-ping.xml file:

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="urn:org:jgroups"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"
        >
    <TCP bind_addr="127.0.0.1"
         bind_port="7800"
         external_addr="${POD_IP}"
         external_port="7800"
         port_range="0"
         thread_pool.min_threads="0"
         thread_pool.max_threads="200"
         thread_pool.keep_alive_time="30000"/>
    <dns.DNS_PING dns_query="${DNS_QUERY}"
                  dns_record_type="${DNS_RECORD_TYPE:A}" />
    <MERGE3 min_interval="10000"
            max_interval="30000"/>
    <FD_SOCK2 port_range="0" />
    <FD_ALL3 timeout="40000" interval="5000" />
    <VERIFY_SUSPECT2 timeout="1500"  />
    <pbcast.NAKACK2 use_mcast_xmit="false" />
    <pbcast.STABLE desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="2000" max_join_attempts="2" print_physical_addrs="true" print_view_details="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
</config>

We will not describe in detail each protocol with its features, this can be found in the documentation Jgroups. Let's look at the main points that are used in this project.

We are interested in the TCP block – in it we declare the addresses and ports on which it will work Jgroups. The butt itself runs on 127.0.0.1 inside the pod and standard Jgroups-port 7800. It is also necessary to specify the “external” address – the address of the pod in which our application is located, the port remains unchanged.

Previously in the service manifest we declared that for Jgroups two ports are required: 7800 and 7900, but this is not written in the configuration. The fact is that port 7900 is used for the protocol FD_SOCK2specified on the stack. The port value is obtained from bind_port + offsetand usually it's 7800 + 100.

The second block we are interested in is dns.DNS_PING. It is responsible for discovering cluster nodes. Here we indicate dns_querycoinciding with headless‑service. Besides DNS_PING There are other detection methods. For example, JDBC_PING And S3_PINGwhich allow access to an external source of discovery information, a database or bucket; or AWS_PING And AZURE_PINGwhich access the resources of the public cloud where the application is located.

You can choose a mechanism that fits the intended architecture, but for us it was enough DNS_PING. The protocols of the remaining stack do not need to be configured relative to the application, let's leave them as they are. But if desired, their parameters can be configured based on your needs.

As a result, the detection mechanism consists of the following steps:

  1. When started, the broker node contacts DNS_PING to the DNS server. Broker requests dns_queryin which the headless‑service statefulset is registered.

  2. The DNS server looks for pods that match dns_query.

  3. The DNS server returns a list of addresses to the broker node.

  4. The broker node sends invitations to join the cluster to other nodes from the received list. The cluster password is being exchanged. Here the lower protocols from the stack come into play:

    1. MERGE3 is a protocol for detecting subgroups that arise during network partitioning and reconstruction.

    2. FD_SOCK2 and FD_ALL3 – used to detect failures. FD_SOCK2 monitors the health of cluster members via TCP connections, and FD_ALL3 uses heartbeat.

    3. VERIFY_SUSPECT2 – checks and confirms the inactivity of a cluster member.

    4. pbcast.NAKACK2 – provides reliable message delivery using a negative acknowledgment (NAK) mechanism. It handles retransmissions of missing messages to ensure that all participants receive messages.

    5. pbcast.STABLE – Calculates which broadcast messages have been delivered to all cluster members and pushes STABLE events onto the stack. This allows NAKACK2 to delete messages that have been seen by all members.

    6. The broker node receives the response, and the GMS protocol (Group Membership Service) processes it. A new topology is calculated between the cluster nodes and the nodes are merged.

The UFC and MFC protocols use a credit system to control message flow and prevent congestion.

The FRAG protocol fragments messages larger than a specified size and assembles them at the receiving end.

Sidecars Vault-Agent and Istio

In our architecture, Artemis is configured to work with mTLS. In addition to using certificates to establish a secure connection, they are also used to authenticate clients. The broker supports JKS keystore/truststore and, more recently, PEM keystore/truststore.

The application requires passwords for JKS and for the cluster connection. In order not to store secrets in configuration files (even in encrypted form) and not to use Secret objects in K8s for passwords and keystore/truststore, we use Vault‑agent.

Through annotation the statefulset includes a sidecar and declares secrets that need to be taken from the storage and written to the file system. Below are examples of a request to the PKI engine to issue a PEM certificate and a request to the KV store for cluster_password‑secret (we also modified Artemis a little so that it could read cluster_password from file).

    vault.hashicorp.com/agent-init-first: 'true'
    vault.hashicorp.com/agent-set-security-context: 'true'
    vault.hashicorp.com/agent-pre-populate: 'false'
    vault.hashicorp.com/agent-inject-secret-cluster.pass: 'true'
    vault.hashicorp.com/secret-volume-path-cluster.pass: /app/artemis/broker/vault
    vault.hashicorp.com/namespace: MY_VAULT_NAMESPACE
    vault.hashicorp.com/role: MY_ROLE
    vault.hashicorp.com/agent-inject: 'true'
    vault.hashicorp.com/agent-limits-cpu: 100m
    vault.hashicorp.com/agent-requests-cpu: 100m
    vault.hashicorp.com/secret-volume-path-crt.pem: /app/artemis/broker/vault
    vault.hashicorp.com/agent-inject-secret-crt.pem: 'true'
    vault.hashicorp.com/agent-inject-template-crt.pem: |
      {%- raw %}
      {{- with secret "PKI/issue/MY_ROLE"
      "common_name=my_artemis_app.my_domain" "format=pem" "ttl=20h"
      "private_key_format=pkcs8" -}}
      {{ .Data.private_key }}
      {{ .Data.certificate }}
      {{- end }}
      {%- endraw %}
    vault.hashicorp.com/agent-inject-template-cluster.pass: |
      {%- raw %}
      {{- with secret "PATH/TO/MY/KV/cloud_artemis" -}}
      {{ index .Data "cluster_password" }}
      {{- end }}
      {%- endraw %}

When the pod starts, the container with the butt waits until the Vault‑agent sidecar creates the necessary secrets in the file system along the specified path. This way we get the necessary passwords for the application, keystore And truststore to configure TLS on the server. Vault‑agent is installed not only on the application pod, but also on edge gateways, which allows you to use Vault to obtain certificates when integrating with external systems.

Routing traffic within namespace and mTLS settings

Trying not to forget about network security, which our colleagues carefully remind us about with all kinds of standards and checks, we turn to our beloved Istio. We will not tell you how Istio works in this article; we will only go over the points relevant to our project.

Step zero is to enable Peer Authentication mode mtls: strictso that only TLS traffic flows inside the namespace.

Next, let's follow the path from “user/application”. Ingress has been deployed for the application, which users use to connect to the UI via the console port or the application to send messages via data-port:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
  name: artemis-istio-ingress
  namespace: my_namespace
spec:
  rules:
  - host: ui-artemis-istio-ingress.my_cluster
    http:
      paths:
      - backend:
          service:
            name: artemis-ingressgateway-svc
            port:
              number: 8161
        path: /
        pathType: Prefix
  - host: data-artemis-istio-ingress.my_cluster
    http:
      paths:
      - backend:
          service:
            name: artemis-ingressgateway-svc
            port:
              number: 61616
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - ui-artemis-istio-ingress.my_cluster
    - data-artemis-istio-ingress.my_cluster

Once on the Ingress‑controller, the traffic is transferred to the backend, which is the service of our elevated Ingress‑gateway. And, since user authentication is carried out in the application, we pass SSL traffic further without interrupting it.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: artemis-ingressgateway
  namespace: my_namespace
spec:
  selector:
    app: artemis-ingressgateway
    istio: artemis-ingressgateway
  servers:
  - hosts:
    - ui-artemis-istio-ingress.my_cluster
    port:
      name: tls-console
      number: 8161
      protocol: tls
    tls:
      mode: PASSTHROUGH
  - hosts:
    - data-artemis-istio-ingress.my_cluster
    port:
      name: tls-data
      number: 61616
      protocol: tls
    tls:
      mode: PASSTHROUGH

---
apiVersion: v1
kind: Service
metadata:
  name: artemis-ingressgateway-svc
  namespace: my_namespace
  ports:
  - name: tls-console
    port: 8161
    protocol: TCP
    targetPort: 8161
  - name: tls-data
    port: 61616
    protocol: TCP
    targetPort: 61616
  selector:
    app: artemis-ingressgateway
    istio: artemis-ingressgateway
  sessionAffinity: None
  type: ClusterIP

Further traffic is regulated through VirtualService and redirects from the gateway to the application service:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: artemis-ingress-vs
  namespace: my_namespace
spec:
  exportTo:
  - .  
  gateways:
  - artemis-ingressgateway
  hosts:
  - ui-artemis-istio-ingress.my_cluser
  - data-artemis-istio-ingress.my_cluster
  tls:
  - match:
    - gateways:
      - artemis-ingressgateway
      port: 8161
      sniHosts:
      - ui-artemis-istio-ingress.my_cluster
    route:
    - destination:
        host: artemis-svc
        port:
          number: 8161
  - match:
    - gateways:
      - artemis-ingressgateway
      port: 61616
      sniHosts:
      - data-artemis-istio-ingress.my_cluster
    route:
    - destination:
        host: artemis-svc
        port:
          number: 61616

Along the path “towards the application” there are no restrictions on the traffic. DestinationRulesince we indicated earlier ssl-passthrough.

How does traffic flow from the application? Let's look at the example of a call to Vault, which is located outside of our K8s. In order for the Istio service to know where to send traffic that goes outside the cluster, you need to define ServiceEntry:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: vault-8443-service-entry
spec:
  exportTo:
  - .
  hosts:
  - my.vault.host
  location: MESH_EXTERNAL
  ports:
  - name: http-vault
    number: 8443
    protocol: https
  resolution: DNS

We declare the Egress gateway and gateway service:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: scripts-egressgateway
spec:
  selector:
    app: artemis-egressgateway
    istio: artemis-egressgateway
  servers:
  - hosts:
    - my.vault.host
    port:
      name: tls-vault-9444
      number: 9444
      protocol: TLS
    tls:
      mode: PASSTHROUGH

---
apiVersion: v1
kind: Service
metadata:
  name: artemis-egressgateway-svc
spec:
  ports:
  - name: status-port
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: tls-vault-9444
    port: 9444
    protocol: TCP
    targetPort: 9444
  selector:
    app: artemis-egressgateway
    istio: artemis-egressgateway
  sessionAffinity: None
  type: ClusterIP

We direct traffic through VirtualService:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: artemis-vault-vs
spec:
  exportTo:
  - .
  gateways:
  - artemis-egressgateway
  - mesh
  hosts:
  - my.vault.host
  tcp:
  - match:
    - gateways:
      - mesh
      port: 8443
    route:
    - destination:
        host: artemis-egressgateway-svc
        port:
          number: 9444
  - match:
    - gateways:
      - artemis-egressgateway
      port: 9444
	  sniHosts:
      - my.vault.host
    route:
    - destination:
        host: my.vault.host
        port:
          number: 8443

Since traffic goes from the application to Vault using TLS configured in Vault-agentno additional DestinationRule no need to put it.

Now that we’ve sorted out the traffic that goes to and from the application, let’s move on to the application cluster itself. If you thought that after clustering through Jgroups All the most unpleasant things are behind us, we hasten to convince you: traffic inside the cluster also needs to be converted to TLS.

We had two options: configure SSL directly via Jgroups or run everything through Istio. Since everything else goes through Istio, then we decided not to split hairs, turned on the debugging mode and went to figure it out.

The first problem we encountered when opening the logs was that all node discovery requests were going to BlackHoleCluster. We asked ServiceEntry for our headless‑serviceand traffic began to reach the DNS service and return a list of cluster nodes.

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: artemis-headless
spec:
  exportTo:
  - .
  hosts:
  - artemis-hdls-svc
  location: MESH_INTERNAL
  ports:
  - name: jgroups-7800
    number: 7800
    protocol: TCP
  - name: jgroups-7900
    number: 7900
    protocol: TCP
  resolution: NONE
  workloadSelector:
    labels:
      app: artemis-app

But a similar problem appeared when nodes communicated with each other. When the broker, having received a list of hosts, began sending out invitations to join the cluster, we were again sucked into black holes. We are announcing another one ServiceEntrythis time for the cluster hosts. Since we don’t know in advance what address the pod will get when deployed or scaled, we indicate in the manifest any address (0.0.0.0/0), but with Jgroups– ports and with data‑port for running the butt and sending messages between cluster nodes.

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: artemis-cluster
spec:
  addresses:
  - 0.0.0.0/0
  exportTo:
  - .
  hosts:
  - artemis.hosts
  location: MESH_INTERNAL
  ports:
  - name: cluster
    number: 61617
    protocol: TCP
  - name: jgroups-7800
    number: 7800
    protocol: TCP
  - name: jgroups-7900
    number: 7900
    protocol: TCP
  resolution: NONE
  workloadSelector:
    labels:
      app: artemis-app

Next error − NR filter_chain_not_found. It arose due to the fact that we have peerAutherntication mtls:strictand the traffic that travels within the framework of clustering processes Jgroupsis not covered by TLS. Setting it up DestinationRule on mTLS with Istio certificates for cluster ports:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: artemis-clustering-dr
spec:
  exportTo:
  - .
  host: artemis.hosts
  trafficPolicy:
    portLevelSettings:
    - port:
        number: 61617
      tls:
        mode: ISTIO_MUTUAL
    - port:
        number: 7800
      tls:
        mode: ISTIO_MUTUAL
    - port:
        number: 7900
      tls:
        mode: ISTIO_MUTUAL
  workloadSelector:
    matchLabels:
      app: artemis-app

We open the Istio logs and see that traffic has begun to flow through the necessary ports:

info    Envoy proxy is ready
"- - -" 0 - - - "-" 192 0 7747 - "-" "-" "-" "-" "172.21.10.42:7800" outbound|7800|| artemis-hdls-svc 172.21.1.146:59028 172.21.10.42:7800 172.21.1.146:35715 - -
"- - -" 0 - - - "-" 1854 1527 40980 - "-" "-" "-" "-" "127.0.0.1:7800" inbound|7800|| 127.0.0.1:42266 172.21.1.146:7800 172.21.10.179:46534 outbound_.7800_._. artemis-hdls-svc -

In the application logs there is an entry about the formation of a bridge:

artemis-app [Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6b69761b)] INFO  org.apache.activemq.artemis.core.server  - AMQ221027: Bridge ClusterConnectionBridge@6d8264f3 [name=$.artemis.internal.sf.my-cluster.b49f0fb1-53d4-11ef-8504-568fdad344ae, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.b49f0fb1-53d4-11ef-8504-568fdad344ae, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=artemis-statefulset-0], temp=false]@13e456bc targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@6d8264f3 [name=$.artemis.internal.sf.my-cluster.b49f0fb1-53d4-11ef-8504-568fdad344ae, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.b49f0fb1-53d4-11ef-8504-568fdad344ae, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=artemis-statefulset-0], temp=false]@13e456bc targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=cluster, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?enabledProtocols=TLSv1-2,TLSv1-3&port=61617&sslEnabled=false&host=172-21-10-42&verifyHost=false], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1887326180[nodeUUID=97d07d4c-53d4-11ef-8aab-5e95e30bd562, connector=TransportConfiguration(name=cluster, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?enabledProtocols=TLSv1-2,TLSv1-3&port=61617&sslEnabled=false&host=172-21-1-146&verifyHost=false, address=, server=ActiveMQServerImpl::name=artemis-statefulset-0])) [initialConnectors=[TransportConfiguration(name=cluster, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?enabledProtocols=TLSv1-2,TLSv1-3&port=61617&sslEnabled=false&host=172-21-10-42&verifyHost=false], discoveryGroupConfiguration=null]] is connected

In UI Artemis, the cluster topology was updated, and a connection was formed between the acceptors of the two nodes.

After restarting one of the cluster pods (.42), you can again see traffic in the logs Jgroups‑ports in the process of changing the topology and communication with the new hearth (.179) by data‑acceptor port.

"- - -" 0 - - - "-" 192 0 7747 - "-" "-" "-" "-" "172.21.10.42:7800" outbound|7800|| artemis-hdls-svc 172.21.1.146:59028 172.21.10.42:7800 172.21.1.146:35715 - -
"- - -" 0 - - - "-" 1854 1527 40980 - "-" "-" "-" "-" "127.0.0.1:7800" inbound|7800|| 127.0.0.1:42266 172.21.1.146:7800 172.21.10.179:46534 outbound_.7800_._. artemis-hdls-svc -
"- - -" 0 - - - "-" 1281 1292 3226 - "-" "-" "-" "-" "127.0.0.1:7900" inbound|7900|| 127.0.0.1:46662 172.21.1.146:7900 172.21.10.179:52084 outbound_.7900_._. artemis-hdls-svc -
"- - -" 0 - - - "-" 3347 4241 38730 - "-" "-" "-" "-" "127.0.0.1:7800" inbound|7800|| 127.0.0.1:46792 172.21.1.146:7800 172.21.10.179:48670 outbound_.7800_._. artemis-hdls-svc -
"- - -" 0 - - - "-" 1213 1211 1527 - "-" "-" "-" "-" "127.0.0.1:61617" inbound|61617|| 127.0.0.1:59074 172.21.1.146:61617 172.21.10.179:40322 outbound_.61617_._.artemis.hosts -
"- - -" 0 - - - "-" 1213 1211 787 - "-" "-" "-" "-" "127.0.0.1:61617" inbound|61617|| 127.0.0.1:59088 172.21.1.146:61617 172.21.10.179:40350 outbound_.61617_._.artemis.hosts -
"- - -" 0 - - - "-" 1213 1211 1527 - "-" "-" "-" "-" "127.0.0.1:61617" inbound|61617|| 127.0.0.1:59086 172.21.1.146:61617 172.21.10.179:40336 outbound_.61617_._.artemis.hosts -

To ensure that Artemis nodes have time to exchange messages in queues when restarting, it is necessary to add an annotation for the Istio sidecar, which specifies waiting for network connections to complete before shutting down.

proxy.istio.io/config:|
  proxyMetadata:
    EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'

Results and vectors of development

We received:

  • a fully functional Artemis broker cluster on Kubernetes without using an operator;

  • invaluable experience in setting up and debugging Istio;

  • a little gray hair.

The current implementation of the cluster is limited to operating in non-persistent mode – messages are stored only in the broker's memory and are not written to disk. Therefore, our next step will be to set up a persistent cluster with storage in PersistentVolume on disk or S3. This modification will make it possible to transfer message encryption according to the “encryption at rest” model, which has already been implemented on the VM.

Another area of ​​research and improvement is performance. At the time of writing, the final results of load testing that would be relevant to the VM have not yet been received. But it is already obvious that the performance of a cluster in the cloud will be less than on a VM due to additional processes with traffic and work in containers.

Also, to increase fault tolerance, we plan to deploy a multicluster stretched between several data centers. And with the help of the same Istio we will process node failures and switch to working nodes.

We carry out all these developments within the product Platform V Synapse Messagingwhich is part of Platform V Synapse — a set of cloud products for integration and orchestration of microservices. It allows import replacement of any corporate service buses, provides real-time data processing for business decisions and integrates technologies into a single production process.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *