PyTorch exposed a malicious dependency chain
PyTorch detected a malicious dependency with the same name as the library torchtriton
in the framework. This resulted in a successful compromise via the dependency confusion attack vector. Details – to the start of our course “White hacker”.
The PyTorch administration warns users who installed PyTorch-nightly during the holidays to remove the framework and its fake dependency torchtriton
.
The open source PyTorch ML framework has been widely adopted in commercial and academic fields, from computer vision to natural language processing.
Malicious torchtriton
targeted at PyTorch-nightly users
As the PyTorch team warns, users who installed PyTorch-nightly from December 25 to December 30, 2022 should ensure that their systems are not compromised.
Warning followed after addiction appeared torchtriton
over the holidays in the Python Package Index (PyPI) registry, the official repository for third-party software for Python.
“Please remove it immediately [PyTorch] and
torchtriton
use the latest nightly binaries, newer than December 30, 2022,” the PyTorch team advises.
Malicious torchtriton
on PyPI is named as the official library, published in the PyTorch-nightly repository. When extracting Python ecosystem dependencies, PyPI has a higher priority, so it downloads a malicious package rather than a regular PyTorch package.
“Because the PyPI index takes precedence, this malicious package was installed instead of the version from our official repository. This design allows anyone to register a package with the same name as an existing one in a third party index, and pip will set its default version,” the PyTorch team wrote in yesterday’s post with vulnerability disclosure.
At the time of writing [этого материала] BleepingComputer showed that over the past week [с 25 по 31 декабря] number of malicious dependency downloads torchtriton
exceeded 2300.
This type of supply chain attack is known as “dependency confusion”. Dependency confusion was first reported in 2021 by BleepingComputer when this attack vector was popularized by a white hat hacker Alex Birson.
PyTorch states that users of stable PyTorch packages are not affected by the vulnerability.
Hacker steals sensitive data while claiming proper research ethics
Malicious torchtriton
not only probes the system for basic identifying information (e.g. IP address, username, current working directory), but also steals sensitive data:
- Gets system information:
- name servers from
/etc/resolv.conf
; - hostname from
gethostname()
; - current username from
getlogin()
; - the name of the current working directory from
getcwd()
; - environment variables.
- name servers from
- Reads the following files:
- /etc/hosts;
- /etc/passwd;
- The first 1000 files in $HOME/*;
- $HOME/.gitconfig;
- $HOME/.ssh/*.
It then uploads all of this data, including the contents of the file, to the h4ck.cfd domain.
The PyTorch team explains that the malicious binary triton
in fake torchtriton
executed only when the user imports triton
to your assembly. This requires explicit code, which is not PyTorch’s default behavior.
Notification on the h4ck.cfd domain impliesthat the entire operation is ethical research, but the analysis clearly indicates otherwise.
“Hello, if you stumbled upon this in your logs, it is probably because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify vulnerable companies, the script sends me metadata about the host (such as the hostname and current working directory). After I determined who was vulnerable, and [сообщил] found, all metadata about your server will be deleted.”
Contrary to the above wording, a binary file collects not only “metadata”, but also the secret information mentioned above, including SSH keys and files gitconfig, hosts and passwdas well as the contents of 1000 files from the directory HOME
.
BleepingComputer received a copy of the malware, which, according to VirusTotal, at the time of writing, shows a clean reputation. But don’t be fooled.
We noticed that, unlike several research packages and PoC exploits, which are conspicuous due to their goals and behavior, torchtriton
uses well-known anti-VM technologies to avoid detection. More importantly, the payloads of the malware are disguised and contained exclusively in binary form, such as in Linux ELF files. This makes the library stand out from white hat dependency confusion exploits. of the past delivered in plain text.
We noticed that the instance reads .bash_history (a list of commands and user input in the terminal, and this is another sign malware.
This isn’t the first time a hacker has claimed to be white hat hacking once he’s caught exfiltrating secrets.
In mid-2022, the highly popular Python and PHP libraries are ctx
and PHPass
respectively, were compromised and modified to steal AWS keys. Later, the researcher behind the attack declaredthat the attack was ethical research.
For the avoidance of doubt, we have contacted the owner of h4ck.cfd for clarification. Public records show the domain was registered through Namecheap on December 21, just days before the incident.
Below is the full text of the statement. We received it from the owner of the domain, who, apparently, is also associated with the wheezy.io domain.
Please note that the mention of “Facebook” (this organization is recognized as extremist and banned in Russia) below is appropriate given the concept of PyTorch in Meta AI (Meta is also recognized as extremist and banned in Russia).
“Hi, I am the person who claimed the package
torchtriton
on Pip. Please note that it was not intended to be malicious!I understand that I could do better and not send all user data. The reason I posted more metadata is that in the past, when investigating dependency confusion issues, in many cases it was not possible to identify victims by hostname, username, and CWD. For this reason, this time I decided to send more data, but looking back, I see that this was the wrong decision and I should have been more careful.
I admit my fault and offer my apologies. At the same time, I want to assure you that it was not my intention to steal anyone’s secrets. I already reported this vulnerability on Facebook on December 29 (almost three days before the announcement), after I was convinced that the vulnerability really existed. I have also made numerous reports to other companies affected by this vulnerability through my HackerOne programs. If I had malicious intent, I would not fill out bug bounty reports, but simply sell the data to the highest bidder.
Once again, I apologize for any destabilization and assure you that all data I received has been deleted.
By the way, in my Facebook bug report, I already offered to give them a PyPi package, but so far I have not received any response from them.
Response
The PyTorch Team renamed addiction torchtriton
to ‘pytorch-triton’ and reserved on PyPI stub package to avoid such attacks. The Group seeks to claim ownership of the existing torchtriton
on PyPI to defuse the current attack.
PyTorch renames dependency to avoid further attacks
To remove a chain of malicious dependencies, users should run the following commands:
$ pip3 uninstall -y torch torchvision torchaudio torchtriton
$ pip3 cache purge
Running the following command will search for the malicious binary and show if you are under attack:
python3 -c "import pathlib;import importlib.util;s=importlib.util.find_spec('triton');
affected=any(x.name == 'triton' for x in (pathlib.Path(s.submodule_search_locations[0]
if s is not None else '/' ) / 'runtime').glob('*'));
print('You are {}affected'.format('' if affected else 'not '))"
The SHA256 hash of the “triton” ELF binary is 2385b29489cd9e35f92c072780f903ae2e517ed422eae67246ae50a5cc738a0e.
And we will teach you how to work with data carefully so that you upgrade your career and become a sought-after IT specialist. Up to 50% discount with promo code HABR – New Year’s promotion.
Data Science and Machine Learning
Python, web development
Mobile development
Java and C#
From basics to depth
As well as