Top 10 Linux Artifacts for Incident Investigation

Lada Antipova from the Angara SOC cyber forensics team has prepared new material about useful tools for investigating hacker attacks. Stuff with pleasure published colleagues from Positive Technologies on their resources, so we can make it available to the audience.

Despite the fact that Windows remains the most common OS, as attackers are aware of, other systems cannot be ignored, especially Linux (okay, GNU/Linux). Today, Russian companies are increasingly using Linux for reasons of import substitution, but this OS is still more common as a server solution.

Now imagine the situation: you are working on a Linux machine, and suddenly something clearly goes wrong. The load on the processor has increased sharply, calls to unknown resources have begun, or the user www-data suddenly found himself in the wheel group. What to do?

Making a list of commands

You have two options: either work with a live system, or do what is called post-analysis. Let’s start with the first one. I will provide my list of favorites and most frequently used commands. There is no universal order here: you, as an information security specialist, set priorities yourself.

So, the commands:

In addition to basic commands for viewing text files (and we all know that in Linux, essentially, everything is a file) like cat ~/.bash_historyyou can use more advanced options:

tail -n 15 /var/log/<file> By default it outputs 10 lines, but the number can be changed using the n parameter.

tail -f -s 5 /var/log/<file> used to track the appearance of new lines. This is similar to the command watch for the same magazines. Can be useful for tracking certain events in real time.

grep – without this command, the previous two may be useless. I won’t tire of repeating that grep is our best friend, and not knowing its syntax at least at a basic level is simply a crime. In information security, this is as important as knowing how to Google.

For example, user login history can be viewed in different ways. If we are dealing with the SSH protocol (one of the most popular methods for remotely managing a host), we can use:

cat /var/log/auth.log | grep "sshd". Filters authentication logs exclusively by the sshd daemon.

cat /var/log/auth.log | grep -iE "session opened for|accepted password|new session|not in sudoers". In addition to basic information about the beginning and end of a session, authentication logs may contain sudo incidents—violations of the rules specified in the /etc/sudoers file. Attackers do not always use clever methods of local privilege escalation (for example, targeted exploitation of system vulnerabilities), so they can sometimes be tracked through sudo command errors.

Basic parameters: -v (exclude parameters), -i (case insensitive), -E (regular expressions, basic Egrep), -P (also regular expressions, but Perl-compatible), -a ― binary as a file. I would also add -l, -w, -r, -o, but I don’t really want to rewrite man. Let’s leave this as DM 🙂

find will help you find the files you need not only by name or mask, extension, size and access rights, but also by timestamps. For example: find / -mtime -2 -ls displays the latest files modified in two days.

In practice, there are often situations when everything starts with a single event. Let’s assume that information security specialists recorded the execution of a suspicious file on host N on January 1 at 10 a.m. (pivot point). In this case, we can set a specific scope, including an existing timestamp, to find out what happened in the system before and after. To do this we use the command find -newerct "01 Jan 2023 09:00:00" ! -newerct "01 Jan 2023 11:00:00" -ls | sort.

I note that using find You can not only search for files, but also perform specific actions with them. So, you can collect the web pages available in a given directory and check by hashes whether there are any recent web shells among them: find . type f -exec sha256sum {} \; 2> /dev/null | grep -Ei '.php|.jsp' | sort.

ps auxwf — process tree. Everyone is used to ps auxbut notice: there is a tree here; netstat -plunto shows information about current network connections, active network processes and socket addresses. Netstat is included in the net-tools package, so you will have to install it, which is not always possible. Alternative: ss-tupln. The ss utility will also list the names of the processes with current TCP/UDP connections.

last-Faiwx displays information about the latest login sessions of system users (reads the file /var/log/wtmp);

ls -lta – a directory listing familiar to everyone. An important nuance: the t parameter, sorting by date of modification;

lsof -V used to understand which files are open and what they are doing. You can look point by point:

  • lsof -i TCP:22 — search for processes running on a specific port

  • lsof -i -u root informs which files and commands the user is using

  • lsof -p 1 – search by PID

stat ― the final command that allows you to view file timestamps (and also, for example, information about the owner). There are some minor quirks here. For example, there is no file creation mark in the Ext2/3 file system, but there is in Ext4. For this case, there is statx (stat extended): the utility uses other system calls, but introduced only starting from kernel version 4.11, which must be taken into account. Additionally, you can install istat, which works at the inode level, from The Sleuth Kit.

Looking for anomalies

So, we know the commands, but what exactly do we want to find? Remember: we are always looking for anomalies. There is no universal recipe here; each system and infrastructure has its own. For example, you know that your admins are happy people and sleep soundly at night, so if you notice a login at two in the morning, it’s worth double-checking.

Pay attention to each parameter: user data, whether it is a pseudo-terminal or not, the duration of the session, its start and end time, the source host.

Network connections (primarily established ones) are usually easier – especially if your system, in principle, should not interact with the outside world. But anomalous connections are not always about external addresses. Therefore, it is important to understand what is happening in our system, what application software is installed, how it works and what it interacts with. This will make it much easier to identify illegitimate activity.

We use artifacts

What if we know for sure that the host is compromised? What data do we need first? I will give you the same top 10 artifacts.

1. First of all, at the entrance we have basic information (for example, hostname and IP address). Most likely, we already know it ourselves or from administrators. First of all, you should make sure that this data is reliable, and also check all network interfaces (are they all up, are there public addresses?). Separately, we look at the established time zone: when analyzing an incident, this is important, since system logs are usually written in the current time zone, but with butt logs everything is more complicated.

Additionally, not at the first stage, I would take information about routes (iptables/netfilter) and the /proc/ directory, which contains the command lines of processes (/proc/[PID]/cmdline). This can be useful later: for example, to identify a file that has been deleted but its process is still active.

2. Log files. There are many magazines, but not all of them are equally useful. It’s difficult to limit yourself to a specific list and forget about the rest. However, there are a number of main logs that are worth paying attention to first. Below I will provide specific directories with files, but remember that there may be differences depending on what system you have – RedHat-like or Debian-like. For example, if we are talking about authentication logs, then for Debian it is var/log/auth.log, and for RedHat it is /var/log/secure.

  • /var/log/lastlog – records of the last login time for each user

  • /var/log/btmp – unsuccessful logons (based on this file it is convenient to track brute forces)

  • /var/log/wtmp – all user logins and logouts since this file was created

  • /var/log/cron – cron task scheduler events. What tasks were created, completed, etc. and when?

  • /var/log/apt/history.log (/var/log/dpkg) – manipulations with packages. We pay attention not only to currently installed applications, but also to those that were installed previously. This is useful when you need to check data in retrospect. For a faster search, you can use the following commands:

· cat /var/log/dpkg.log | grep installed

· cat /var/lib/dpkg/status | grep -E “Package:|Status:”

  • /var/log/audit/audit.log – good when auditd is installed. And it’s even better if it’s configured and works with the right configuration file.

3. History of entered commands /.bash_history. You need to not only be able to read logs, but also understand commands, their consequences and potential capabilities (still remember that with the less command you can not only read files, but also make changes, get a full-fledged interactive shell… Don’t remember? Then go ahead – study GTFOBins). By default, command history is logged without timestamps, which can cause a number of inconveniences in the future: not everyone collects logs from the host on the fly, much less commands entered by the user. But you can save yourself the hassle by using the following command to add timestamps to every command entered:

export HISTTIMEFORMAT="%d.%m.%Y %H:%M:%S " >> ~/.bash_profile

4. Working with file system deserves special attention. Windows (or rather, the NTFS file system) has a master file table called MFT, but Linux does not have a similar file. However, you can generate it yourself using the command below, and then make your own timeline and use it when building a super timeline:

find "${MOUNTPOINT}" -xdev -print0 | xargs -0  -c "%Y %X %Z %A %U %G %n" >> timestamps.dat

Pay attention to recently modified files, files modified during the time frame of the incident, and hidden files (beginning with a “.”) that may be either legitimate or intentionally hidden by attackers. Check where the symbolic links lead and what size the files are: a large volume may indicate that data is being uploaded or being prepared for it. This way you can discover some tools. An equally important point is permissions. First of all, the setuid and setgid permission bits, execution and write flags for individual files should be noticeable. By the way, this way you can simultaneously find files in the system with clearly excessive rights.

5. To have a more complete understanding of the system, you need to know who works with it – the time has come accounts. How many real users do you have (we take into account that when installing application software, an account of the same name or a similar name is usually created for it)? What rights do they have in the system and are they configured flexibly? Or maybe all users are in the sudo/wheel groups? If a web service runs on this server, it should not have a full-fledged command interpreter like /bin/bash or /bin/sh. Have you seen something like this? You need to understand whether this is legacy or happened quite recently. You can find similar users and double-check them like this:

find . -name passwd -exec grep -P -H sh$ {} >> ../processed/shell-users.txt \;

Additionally, I check the cryptographic strength of passwords (stored in the /etc/shadow file) using John the Ripper or hashcat. Further – depending on the result. At a minimum, you can inform the administrator or owner of the service that weak passwords are bad practice. The maximum is to find another clue in the chain of unacceptable events. And of course, don’t forget to look into your users’ home directories.

6. The most interesting part is publicly accessible applications. To begin with, it is better to clarify for yourself at least at a basic level the architecture of the service under study: what security tools are used (and whether they exist at all), what technology stack (and whether there are vulnerable components), and also study the configuration files and logs (can be configured separately in configuration file). When viewing the logs, we again analyze each parameter. For example, access.log of a web service: we build a picture by time (a significant increase in requests at a certain time – did the crawler go through or not?). Alternatively, you can extract all unique IP addresses by choosing a suitable regular expression (you don’t need to thoroughly check the correctness of each octet: you can easily discard anything that is fundamentally incorrect manually), and collect statistics:

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" file.txt | sort | uniq -с | sort

Pay special attention to the “leaders”, both at the beginning and at the end of the list. Additionally, you can enrich the data with information about the location of these addresses.

Similarly, we highlight unique URIs and the frequency of their requests, the size and time of server responses, as well as the type of User Agent client application. Some tools explicitly indicate their name in this field, and few change it. It’s difficult to come up with a universal command here, but there are a couple of life hacks. For example, to display specific columns you can use awk ‘{print $1 $3}’ filewhich will allow you to display columns 1 and 3 in a given file.

Note that by default awk uses space as a delimiter. To change the delimiter character, use the -F option. But be careful: if there is more than one character in the parameter, the entire expression will be treated as a regular expression.

Also, don’t forget to look at the special published directories (if we are dealing with a web application, are there a couple of web shells lying around there?). In addition to the access log access.log, you should not neglect the error log error.log, because errors in the application can also occur when malicious actions are committed.

7. Let’s move on to a separate branch – fastening techniques. Whatever one may say, it is better to check them separately. There really are a lot of places here, just take a look here:

Figure 1. Linux Persistence Map by Pepe Berba

Figure 1. Linux Persistence Map by Pepe Berba

Most likely, you will have to look through the entire diagram, although everything depends on the incident: you should not rush thoughtlessly from one to another. This task can be automated, but if we are talking about a basic level, we first check the scheduled cron tasks and services – these are the most common techniques.

You should always have an idea of ​​which service is responsible for what. Again, knowing what’s going on in the system is extremely important.

8. Next – configuration files stored in the /etc/ directory. Of course, you have already noticed the changes when working with the file system (step 4), but no one canceled timestomping, right? Make sure that all application software you know works as it should, especially home-written applications. By the way, do you remember that log files can be rotated? It’s useful to immediately clarify which ones and when – in the files /etc/logrotate.conf и/etc/logrotate.d/*.

9. Can’t help but mention about application data ― separate files created during the operation of the butt, most often for a specific user. They are stored in home directories and hidden from ordinary users.

Usually these are directories ~/.local/share/<app>/* и ~ /.config/<app>/* – in accordance with the xdg base directory specification. Some applications ignore this standard and create their own hidden directories directly in the root of the home directory (~/.), and temporary log files in ~/.local/cache//*. Here you can find not only the browser history and text editor settings, but also something interesting: for example, the history of commands and viewed directories through Midnight Commander. The same command history, but already executed through the previously mentioned less (GTFOBins!).

10. In a Windows environment, among popular among attack directories you can select %APPDATA%, %PROGRAMDATA% or %TEMP%. Linux also has its favorites:

  • /tmp and /var/tmp – directories of temporary files

  • /dev/shm – the so-called tmpfs (virtual memory system for implementing the IPC interprocess communication mechanism)

  • /var/run is a directory for storing system data describing the system since the last time it was booted (/var/run is a symbolic link to /run, both directories exist due to compatibility issues)

  • /var/spool is a directory of temporary data awaiting post-processing, after which it will be deleted.

If your investigation reaches a dead end, I recommend making sure that there is nothing unnecessary in these directories.

Filming a triage

We have identified the most critical areas that we will check first. It turned out not so little. What if there is more than one car? How to collect, how to analyze?

All respondents are familiar with the term “triage” – a set of significant forensic artifacts. Even though we are part forensic scientists, collecting images en masse during the active phase of a response is disastrous. In simple terms, there is a certain list of places where the same criminologist will look first, even if he has a complete image of the system. This is triage. The image for primary analysis will most likely be redundant, especially when minutes are counting.

There are a lot of ready-made tools for removing triage, but you can make a data collection script yourself. Consider the parameters that are important to you, such as cross-platform or compatibility. But don’t forget that everything was written and done for us a long time ago, so the main thing is to search, check before using and, if necessary, supplement. GitHub is full of great tools.

I would like to draw your attention to two utilities – CatScale And uac. The first, for example, is convenient because it collects almost everything that I mentioned above. In addition, it conveniently archives the collected data into separate files and then parses it separately using the included script. It comes with configs for Logstash right away, and you don’t have to do double work when parsing data. Kamil Kamaletdinov from our team wrote more about triage utilities.

uac collects everything and even more. For example, not just modified files in /etc/, but the entire directory, which can be extremely useful. The utility also works well with the already mentioned statx and generates an excellent bodyfile. Other important advantages include general support for nix-like systems (MACOS, AIX, ESXi, etc.).

If you need to collect data en masse, orchestrators like Ansible / Hashicorp Nomad / Chef / Puppet / SaltStack. It’s good if you already have some kind of centralized management tool configured, otherwise you’ll have to make small crutches.

Analyzing the system

Let’s move on to analysis. Considering that analysis of Linux systems usually comes down to a scrupulous study of logs, we start with basic commands like tail, more, less And grep. Level a little higher – add find to search and process multiple files (do you remember about the exec parameter?). We also use sort in conjunction with uniq And wc. Most often they go together: before use uniq need to sort everything through sortA wc useful for calculating statistics. I will also note sed And awk: the first one is needed to perform at least basic replacements, the second one is needed to display a specific column (at least!). We have already looked at examples earlier. And of course, grep with regular expressions! I urge you not to be afraid of them, although I know for sure that at the initial level they are scary. But they also help out a lot, so I strongly recommend accepting and loving them. And don’t forget about piping (pipe is the symbol “|”) – transferring the output stream from one command to another.

If you don’t like the command line and all its interpretations at all, a great alternative is advanced text editors like Notepad++ (which everyone seems to already know about) and EmEditor. The main advantages of the second one are the ability to open and quickly analyze large (extra-large) files, flexible filtering and search, as well as ease of working with dsv files (delimiter-separated values).

If we are dealing with a large amount of data, then sooner or later we will hit the ceiling, and it will not be possible to obtain even minimal visualization. Here you can’t do without elastic (ELK stack), splunk, graylog – choose what is closer to you. The main thing is to remember that Elastic with Kibana can work great without writing cumbersome Logstash configs due to the built-in grok patterns. Before you write something yourself, don’t forget to Google: everything was invented a long time ago for you and for you.

Another important utility is ChopChopGo. This is a Linux log viewing tool, like chainsaw and Hayabusa, which have proven to be excellent for general analysis of Windows system logs (via Sigma rules). There is a nuance: at the time of writing this material, the utility does not have the ability to download data from third-party systems. You can try an alternative – Zircolite for auditd and Sysmon logs for Linux. In addition, if you wish, you can always use the same Sigma rules and converter.

Instead of a conclusion

Whatever one may say, sooner or later you will have to script. For example, we understand that we are faced with the same task for the umpteenth time, and repeating the same actions gets boring. Therefore, we automate everything that is possible: in Bash, Python, and other high-level languages. For some, it will be convenient to do all the work via PowerShell, despite the fact that you are interacting with data from another OS. There is no difference, the main thing is to understand what to do, know your capabilities and skills, as well as the existing time frame. This is not the time to start learning Python, because everyone around you is praising it when your house is on fire behind you.

How to investigate incidents in Linux?

  1. When responding “here and now,” we determine the order of our actions depending on the incident and, using previously described or similar commands, we look for anomalies in the system. They are unique for each infrastructure: we are looking for something that is unusual for ours. Depending on the artifact under study, we pay attention to each parameter: be it user data, the duration of his session, or open connections.

  2. We use artifacts – data that will be needed first. Most important:
    basic information about the system
    log files
    history of commands entered
    file system
    user data
    applications published on the Internet
    places of fastening
    configuration files
    application data
    and the most popular directories among attackers.

  3. When conducting a remote response or in the case of post-analysis, we remove triage (key significant information) using existing tools or write our own script.

  4. We analyze the system in a similar way, first of all examining the previously identified artifacts.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *