Setting up sending a notification to the chat after logging into the server via SSH

Introduction

Have you ever thought about being aware of every login to your servers? I was gripped by the same paranoia: what if, while I’m sleeping, a brownie comes into my server and does terrible things there??? Although logging into our servers is prohibited by password, and only I have SSH keys, in any case this raises great concerns.

In this article, we will deploy several servers in Yandex.Cloud via Terraform, and then use Ansible to configure the necessary software on each server. We will have a main server where Loki (a log aggregation system) and Grafana (a tool for data visualization) will be deployed, and Promtail (an agent for collecting and sending logs) will be installed on the servers that we want to monitor. We will figure out how to track logins to the server, and then send notifications about this to the chat in a convenient format using the above services.

In addition, you can use Grafana for more than just tracking connections to your servers. You can also deploy Node-Exporter(-s)+Prometheus for monitoring to keep track of server performance.

The article uses the “Tutorial” format, so all actions are described here, from creating servers to sending a notification. Use the table of contents if you want to skip a step you don’t need.

Table of contents

We are developing the infrastructure

Most often in infrastructure there is a herd of many servers, so let’s create several servers. Terraform will help us with this.

First, let’s create a service account in the cloud with the role admin for the entire cloud:

Photo

Next, we will create an authorized key so that Terraform can manage the infrastructure:

Photo

Click create and download the file in JSON format.

Let’s go to the terminal and configure access to the cloud in yc. First, let’s create a profile. Profiles are used to store configurations for accessing different clouds if you have several of them:

$ yc config profiles create yc-compute-logs
Profile 'yc-compute-logs' created and activated

Let’s assign service-account-key (the path to authorized-key.json that we set earlier), cloud-id and folder-id:

$ yc config set cloud-id <your_cloud_id>
$ yc config set folder-id <your_folder_id>
$ yc config set service-account-key <your_path_to_authorized_key>

Let’s make sure that you have configured access to the cloud in yc correctly. Let’s try to get a list of all service accounts:

$ yc iam service-account list
+----------------------+-----------+
|          ID          |   NAME    |
+----------------------+-----------+
| ajevk0p06h138i651oje | terraform |
+----------------------+-----------+

Before creating virtual machines, we will generate several SSH keys to access them:

ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/habr-logs/grafana
ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/habr-logs/node1
ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/habr-logs/node2
ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/habr-logs/node3

Now you need to clone the repositories. Let’s do this, go to the directory terraformrun the set_env.sh script and execute the terraform plan:

$ git clone git@github.com:AzamatKomaev/habr-terraform-ansible-logs.git
$ cd ./habr-terraform-ansible-logs/terraform
$ . ./set_env.sh
You are using yc-compute-logs!
Profile 'yc-compute-logs' activated
$ Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:
<output was hidden>
Plan: 12 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + grafana_ip_address = (known after apply)
  + node1_ip_address   = (known after apply)
  + node2_ip_address   = (known after apply)
  + node3_ip_address   = (known after apply)

Terraform will show you which resources will be created after the terraform apply command. After running this command the following resources will be created:

  • Default network, subnet in the ru-central1-a zone

  • 4 disks (10GB, SSD)

  • 4 virtual machines (5% CPU code fraction, 2 CPU cores, 2 RAM)

Let’s run terraform apply. The servers should appear in the cloud console:

Photo

The terminal should display the server IP addresses:

Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

Outputs:

grafana_ip_address = "158.160.43.15"
node1_ip_address = "51.250.71.111"
node2_ip_address = "158.160.110.84"
node3_ip_address = "158.160.100.121"

We will need them in the future.

Deploy the necessary components via Ansible

Let’s pass the baton to Ansible. With Ansible we can deploy one or more services on several servers at once. Less words, more practice: let’s start deploying Grafana and Loki!

Grafana is a data visualization tool. Using Grafana, you can view metrics from various data sources, create dashboards with information, and flexibly configure notifications via chat or email. Loki is a log aggregation system that will serve as a data source for Grafana. The general scheme is this:

The role of agents will be performed by Promtail, we will install it on servers node1, node2 and node3. Promtail on different virtual machines will monitor the contents of the specified files and then send them to a single Loki instance. We can then view these logs using Grafana. In addition, Grafana has the ability to customize alerts out of the box, so why not take advantage of such great functionality?

Ansible installation and initial configuration

Installing Ansible couldn’t be easier, it can be done in different ways, I chose pip:

$ python3 -m pip install --user ansible

Although Ansible is considered “agentless” (you don’t need to install an agent on each VM), we will need python installed on each VM.

Next, we need a repository from GitHub, if you skipped the step of raising several VMs in Yandex.Cloud, then clone this same repository and go to the ansible directory:

$ git clone git@github.com:AzamatKomaev/habr-terraform-ansible-logs.git
$ cd ./habr-terraform-ansible-logs/ansible

Here you can find many files and directories, let’s focus on inventory.ini. In this file, you must specify the IP addresses of your servers on which Ansible will execute the commands you specify. For flexibility, addresses can be grouped by name. In the file I indicated the IP addresses of my servers, replace them with yours:

inventory.ini
[grafana]
158.160.43.15    ansible_user=admin
 
[nodes]
51.250.71.111    ansible_user=admin logging_label=node1
158.160.110.84   ansible_user=admin logging_label=node2
158.160.100.121  ansible_user=admin logging_label=node3

Using ansible_user we can specify which user to run commands from

logging_level is a custom variable, we will need it later

Let’s try to make a test connection to the servers. Let’s ping:

$ ansible -m ping all -i ./inventory.ini
Output
158.160.43.15 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
51.250.71.111 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
158.160.100.121 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
158.160.110.84 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

Everything is fine! Go ahead…

There are several YAML files in the current ansible directory. Each of them contains an Ansible Playbook – a list of tasks that Ansible must perform on specific servers.

One task – one action. The name of the module is indicated for the task. A module is a small program that runs on the server. A list of all modules can be found in Ansible documentation. We will use ansible.builtin.* modules, they will be enough for our tasks. Instead of the full name, you can use short Alices for this group of modules.

Deploy Loki

Let’s start by installing Loki. For this I created a separate file loki-installation-playbook.yml with Ansible Playbook responsible for installing and running Loki. Loki and Grafana will be installed on the same server. The contents of all further configuration files and explanations for them will be hidden in the spoiler:

loki-installation-playbook.yml
- name: Install Loki
  hosts: grafana

  tasks:
  - name: Install unzip for unpacking archives 
    apt:
      name: unzip
    become: true

  - name: Install loki using wget and unzip it
    shell:
      chdir: /home/admin
      cmd: |
        wget https://github.com/grafana/loki/releases/download/v2.9.4/loki-linux-amd64.zip
        unzip "loki-linux-amd64.zip"  
        chmod a+x "loki-linux-amd64"
        sudo mv ./loki-linux-amd64 /usr/local/bin/loki

  - name: Install default loki configuration yaml
    shell:
      chdir: /home/admin
      cmd: wget https://raw.githubusercontent.com/grafana/loki/main/cmd/loki/loki-local-config.yaml

  - name: Copy loki.service (unit for systemd) to /etc/systemd/system folder
    copy:
      src: ./resources/loki.service
      dest: /etc/systemd/system
    become: true

  - name: Load loki unit and start it
    systemd:
      name: loki
      state: started
      daemon_reload: true
      enabled: true 
    become: true

At the very beginning, we indicate the name of the Playbook, as well as the name of the group of hosts from inventory.ini on which the tasks will be executed. Then comes a list of tasks.

Since some tasks require privileged access, for such tasks the become field is set to true: become: true

Install unzip for unpacking archives: Install unzip to unpack the archive with the loki binary. To do this we will use the module ansible.builtin.apt.

Install loki using wget and unzip it: Install the archive with the loki binary, unpack it and move it to one of the directories from $PATH. Here we use the module ansible.builtin.shellrunning bash commands on the remote server.

Install default loki configuration yaml: We install the configuration file to launch loki, the default value is enough for us.

Copy loki.service (unit for systemd) to /etc/systemd/system folder: Systemd will be used to run loki in the background. Therefore it is necessary to copy the local file loki.service to the directory with units. The module is used for copying ansible.builtin.copy.

Load loki unit and start it: After successfully copying the systemd unit, you need to start it. To manage systemd processes you can use the module ansible.builtin.systemd_service.

Let’s launch Playbook. To do this, use the command ansible_playbookinto which you should pass the path to inventory.ini, as well as to the playbook itself:

$ ansible-playbook -i ./inventory.ini ./loki-installation-playbook.yml
Output
PLAY [Install Loki] ***********************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [158.160.43.15]

TASK [Install unzip for unpacking archives] ***********************************************************************************************
changed: [158.160.43.15]

TASK [Install loki using wget and unzip it] ***********************************************************************************************
changed: [158.160.43.15]

TASK [Install default loki configuration yaml] ********************************************************************************************
changed: [158.160.43.15]

TASK [Copy loki.service (unit for systemd) to /etc/systemd/system folder] *****************************************************************
changed: [158.160.43.15]

TASK [Load loki unit and start it] ********************************************************************************************************
changed: [158.160.43.15]

PLAY RECAP ********************************************************************************************************************************
158.160.43.15              : ok=6    changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

If the command was executed without errors, then go to the addresshttp://<grafana_ip_address>:3100/metrics. You should see a list of all metrics:

$ curl http://158.160.43.15:3100/metrics | head -n 10               
Output
# HELP cortex_consul_request_duration_seconds Time spent on consul requests. 0
# TYPE cortex_consul_request_duration_seconds histogram
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.005"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.01"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.025"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.05"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.1"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.25"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.5"} 2436
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="1"} 2436

Loki deployed successfully!

Deploying Grafana

It’s time for Grafana. Installing its deb package is a little more difficult if you are from Russia than Loki… So I used <Uncle Major, don’t> and placed the installed .deb file in the resources directory. GitHub repositories don’t allow you to upload more than 100MB of content, so I added it to .gitignore.

Let’s look at the contents of grafana-installation-playbook.yml:

grafana-installation-playbook.yml
- name: Install Grafana
  hosts: grafana

  tasks:
  - name: Install musl
    apt:
      name: musl 
    become: true

  - name: Copy grafana deb package from local
    copy:
      src: ./resources/grafana-enterprise_10.3.3_amd64.deb
      dest: /home/admin/

  - name: Install Grafana
    apt:
      deb: /home/admin/grafana-enterprise_10.3.3_amd64.deb
    become: true


  - name: Make sure a grafana-server is running
    systemd_service:
      state: started
      name: grafana-server
    become: true

Install musl: install the necessary package for Grafana.

Copy grafana deb package from local: copy the deb package to the /home/admin directory on the remote server.

Install Grafana: install the previously copied deb package.

Make sure a grafana-server is running: After installation, launch the systemd unit.

Let’s launch Playbook!

$ ansible-playbook -i ./inventory.ini ./grafana-installation-playbook.yml
Output
PLAY [Install Grafana] ********************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [158.160.43.15]

TASK [Install musl] ***********************************************************************************************************************
changed: [158.160.43.15]

TASK [Copy grafana deb package from local] ************************************************************************************************
changed: [158.160.43.15]

TASK [Install Grafana] ********************************************************************************************************************
changed: [158.160.43.15]

TASK [Make sure a grafana-server is running] **********************************************************************************************
changed: [158.160.43.15]

PLAY RECAP ********************************************************************************************************************************
158.160.43.15              : ok=5    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

After successful completion we will go to the addresshttp://<grafana_ip_address>:3000. Enter your username and password in the field admin. Then go to the Connections tab, select the Loki data source and connect it:

Adding a Loki data source to Grafana (lots of screenshots)

As you can see in the last screenshot, the list of tags is empty. This is due to the fact that Loki does not receive any logs yet. It’s time to deploy agents on node1, node2 and node3 servers to collect and deliver logs. The role of the agent will be played by Promtail.

Setting up Firewall

Since logs from other servers will be delivered to Loki, opening the service publicly is such an idea. Let’s use ufw – a firewall on Linux. By default, all ports will be closed. Let’s open port 22 for SSH and port 3000 for access to Grafana. Let’s allow access to port 3100 (Loki) only from a private network. I created a separate Playbook for all ufw rules:

firewall-setup-playbook.yml
- name: Set up firewall for grafana server
  hosts: grafana

  tasks:
  - name: Set up firewall via ufw
    shell:
      cmd: |
        sudo ufw allow 22
        sudo ufw allow 3000
        sudo ufw allow from 10.0.0.0/8 to any port 3100
        sudo ufw --force enable

Launch the Playbook:

$ ansible-playbook -i ./inventory.ini ./firewall-setup-playbook.yml
Output
PLAY [Set up firewall for grafana server] *************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [158.160.43.15]

TASK [Set up firewall via ufw] ************************************************************************************************************
changed: [158.160.43.15]

PLAY RECAP ********************************************************************************************************************************
158.160.43.15              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Let’s take a look at the open ports. Let’s use the command nmap:

$ nmap -Pn 158.160.43.15
Starting Nmap 7.80 ( https://nmap.org ) at 2024-02-24 19:17 MSK
Nmap scan report for 158.160.43.15
Host is up (0.11s latency).
Not shown: 998 filtered ports
PORT     STATE SERVICE
22/tcp   open  ssh
3000/tcp open  ppp

As you can see in the terminal, two ports are open. What about Loki? Let’s go to the node1 server and try to make a curl request to Loki:

$ curl http://grafana:3100/metrics | head -n 3
# TYPE cortex_consul_request_duration_seconds histogram
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.005"} 7796
cortex_consul_request_duration_seconds_bucket{kv_name="ingester-ring",operation="CAS loop",status_code="200",le="0.01"} 7796

Deploying Promtail

Now we need to configure an agent that will collect and deliver logs to Loki. Promtail will play this role. It needs to be deployed on each server where we want to process logs. As with Loki, a systemd unit is used to manage Promtail. Let’s launch Playbook. Please note that it will run on all IP addresses specified in inventory.ini with the group nodes.

But first, let’s create a configuration file for Promtail. In it we must indicate where the logs should be sent (clients[0].url), as well as scrape_config, which will receive changes from the /var/log/auth.log file and send them to Loki. This file stores all user authorization information. In addition, we can add our own labels. Let’s add node_name And node_ip for each server. We will use this label so that Grafana can determine where the log came from.

promtail-config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://grafana:3100/loki/api/v1/push

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      __path__: /var/log/auth.log
      node_name: $NODE_NAME
      node_ip: $NODE_IP

Playbook Contents:

promtail-installation-playbook.yml
- name: Install Promtail
  hosts: nodes

  tasks:
  - name: Install unzip for unpacking archives 
    apt:
      name: unzip
    become: true

  - name: Install promtail using wget and unzip it
    shell:
      chdir: /home/admin
      cmd: |
        wget https://github.com/grafana/loki/releases/download/v2.8.8/promtail-linux-amd64.zip
        unzip "promtail-linux-amd64.zip"  
        chmod a+x "promtail-linux-amd64"
        sudo mv ./promtail-linux-amd64 /usr/local/bin/promtail

  - name: Copy promtail configuration
    copy:
      src: ./resources/promtail-config.yml
      dest: /home/admin/promtail-config.yml.tmp
  
  - name: Set $NODE_NAME in promtail-config.yml
    environment:
      NODE_NAME: "{{ logging_label }}"
      NODE_IP: "{{ ansible_host }}"
    shell:
      chdir: /home/admin
      cmd: envsubst < promtail-config.yml.tmp > promtail-config.yml
      
  - name: Copy promtail.service (unit for systemd) to /etc/systemd/system folder
    copy:
      src: ./resources/promtail.service
      dest: /etc/systemd/system
    become: true

  - name: Load promtail unit and start it
    systemd:
      name: promtail
      state: started
      daemon_reload: true
      enabled: true 
    become: true

Set $NODE_NAME in promtail-config.yml: substitute instead of $NODE_NAME and $NODE_IP the values ​​of the variables specified in inventory.ini

$ ansible-playbook -i ./inventory.ini ./promtail-installation-playbook.yml
Output
PLAY [Install Promtail] *******************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [51.250.71.111]
ok: [158.160.110.84]
ok: [158.160.100.121]

TASK [Install unzip for unpacking archives] ***********************************************************************************************
changed: [51.250.71.111]
changed: [158.160.100.121]
changed: [158.160.110.84]

TASK [Install promtail using wget and unzip it] *******************************************************************************************
changed: [158.160.110.84]
changed: [158.160.100.121]
changed: [51.250.71.111]

TASK [Copy promtail configuration] ********************************************************************************************************
changed: [51.250.71.111]
changed: [158.160.100.121]
changed: [158.160.110.84]

TASK [Set $NODE_NAME in promtail-config.yml] **********************************************************************************************
changed: [158.160.110.84]
changed: [51.250.71.111]
changed: [158.160.100.121]

TASK [Copy promtail.service (unit for systemd) to /etc/systemd/system folder] *************************************************************
changed: [158.160.110.84]
changed: [51.250.71.111]
changed: [158.160.100.121]

TASK [Load promtail unit and start it] ****************************************************************************************************
changed: [158.160.100.121]
changed: [158.160.110.84]
changed: [51.250.71.111]

PLAY RECAP ********************************************************************************************************************************
158.160.100.121            : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
158.160.110.84             : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
51.250.71.111              : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 

Let’s go to Grafana in the tab Explore and let’s try the following query:

{node_name="node1", filename="/var/log/auth.log"}

You should see the contents of the file in the logs. At the same time, you can specify which server you want to receive logs from and which file to look at:

Photo

At this point we say goodbye to Ansible, since all the necessary components have been successfully deployed. Let’s start setting up alerts!

Setting up notification sending

It is worth noting that on my servers, password login is disabled by default. Therefore, throughout this article we will monitor SSH logins, although no one is stopping you from setting up sending for password logins.

/var/log/auth.log

Let’s take a closer look at the contents of the /var/log/auth.log file. There is a lot of information here, but we are interested in lines like this:

2024-02-25 10:16:38.858	Feb 25 07:03:47 node1 sshd[1139]: Accepted publickey for admin from <client_ip_addr> port 34638 ssh2: RSA SHA256:<hidden_content>

This line appears after a successful login. Let’s try to log into node1 using the private key from Grafana:

$ ssh -i ~/.ssh/habr-logs/grafana admin@<node1_ip_address>
admin@51.250.71.111: Permission denied (publickey).

The following appears in the logs:

2024-02-25 10:44:32.126	Feb 25 07:44:31 node1 sshd[3054]: Connection closed by authenticating user admin <client_ip_add> port 48444 [preauth]

In both cases, we see information about the IP address and port of the client, as well as the name of the user to whom the login is performed. But the log format is slightly different, so we will have two alerts: one alert for successful connections, the second for unsuccessful ones.

Setting up Telegram

Let’s set up sending notifications in Telegram: in my opinion this is the easiest way. Let’s create a bot and add it to our group:

Lots of screenshots
Let's save the token - it will be needed when adding Contact Point

Let’s save the token – it will be needed when adding Contact Point

Make sure the bot has access to messages

Make sure the bot has access to messages

Configuring Grafana Alerting entities

I will be configuring all components through the GUI. IN repositories There is a provisioning directory, you can use it to avoid manual configuration. Use the documentation: poke

Let’s start with Contact Points. Grafana supports many Contact Points. They serve to send notifications in various ways. So Grafana supports many Contact Points for different messengers and services, we will use Telegram

For it you need to specify the BOT API Token and Chat ID. Optionally we can specify other parameters.

Let’s get the Chat ID. After you have added the bot to your group, let’s send something to the chat and execute the request:

$ curl https://api.telegram.org/bot<bot_api_token>/getUpdates | jq '.result[-1].message.chat.id'
-1002141029691

Let’s create a Contact Point in the Alerting section. First, let’s write our own notification template:

{{ if gt (len .Alerts) 0 }}
{{ range .Alerts }}
{{ if .Labels.node_name }}
Alertname: {{ .Labels.alertname }}
Status: {{ .Labels.status }}
Node name: {{ .Labels.node_name }}
From: {{ .Labels.client_ip }}:{{ .Labels.client_port }}
To: {{ .Labels.user }}@{{ .Labels.node_ip }}
{{ else }}
{{ range .Labels.SortedPairs }}
The name of the label is {{ .Name }}, and the value is {{ .Value }}
{{ end }}
{{ end }}
{{ end }}
{{ end }}

Next, we will create a Loki request that will contain the client_ip, client_port, status and user tags. The node_name and node_ip tags are specified in the Promtail configuration and are available by default. If the alert contains a tag node_name, then we will send a message containing all the metrics in a convenient format. Otherwise, we’ll just send a list of all tags.

Let’s create a Contact Point:

Lots of screenshots

Let’s send a test alert:

Photo

Next up is Notification policy. It will serve as a “bridge” between the alert and the Contact-Point. The policy specifies the tags and contact point:

Photo

Compiling a Loki query

Before setting up notifications, let’s go to Explore and experiment with queries. Let’s get a list of all logs that contain information about logging into all servers:

{filename="/var/log/auth.log"} |= `Accepted publickey`

But this way we will only see a list of logs, which will not give us anything for alerts… Let’s add a function count_over_timewhich will count the number of logs for a certain period of time:

(count_over_time({filename="/var/log/auth.log"} |= `Accepted publickey` [1s]))

Now we have a graph displaying the number of logs for a given period. You can also see what tags the logs have:

Great, we can use these labels in the notification content. But something is missing… Information about the connection! Let’s use the operation pattern:

(count_over_time({filename="/var/log/auth.log"} |= `Accepted publickey` | pattern `<_> sshd[<_>]: <status> publickey for <user> from <client_ip> port <client_port> <_>` [1s]))

Let’s look at the tags again:

Additional labels have appeared that will provide more detailed information about a successful connection.

Let’s also create a request for failed connections:

count_over_time({filename="/var/log/auth.log"} |= `Connection closed by authenticating` != `root` | pattern `<_> sshd[<_>]: Connection <status> by authenticating user <user> <client_ip> port <client_port> <_>` [1s])

We create alerts

Now let’s create two alerts with the above query. Go to Alerting -> Alert Rules and click create alert. Let’s fill in the fields as follows:

Lots of screenshots

Let’s indicate the name of the alert. Then we indicate the previously generated Loki Query. Let’s make the alert trigger if more than zero such logs arrive. You can connect to one of the servers and click on Run queriesto make sure everything is ok:

Go ahead. Let’s create a folder, as well as an Evaluation group. Let’s choose an interval of 10s. This means that Grafana will check the status of the notification every 10 seconds. If the condition for sending a notification is met, it goes into the status Pending. After the time specified in the “Pending period” the notification changes to status Firing and sent to Telegram/Email/other Contact Point.

Summary and Description are optional – we can omit them. In Labels and Notifications, it is important to specify valid labels from the previously created notification policy. Click on Preview Routing to make sure that notifications will be sent to a valid Contact Point:

This is an alert for successful entries. Let’s make a duplicate for unsuccessful connections and replace Loki Query.

We receive alerts

Let’s test!

Let’s try to log in via SSH to one of the servers. After this, the alert went into the Pending state:

We wait 10 seconds and see that the alert has switched to Alerting status:

After a few seconds, you should receive a message in the Telegram chat:

You can connect to two or three servers at once. Then the following alert will come:

Let’s try to log into the server using an invalid SSH key:

Everything is working!

Huge takeaway

Why huge? Because the article turned out to be huge! In this tutorial, we used a lot of DevOps tools: we deployed infrastructure in the cloud via Terraform, configured Grafana components via Ansible, configured Loki, Promtail and Grafana, and ultimately were able to configure sending messages to Telegram chat after logging in via SSH!

Of course, there may be easier ways to implement what is written in the title. But I relied on the number of tools that can be used for different purposes…

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *