“Where is the worst place to leave your secrets?” – what happens to AWS credentials that are not in good condition?
I posted canary tokens on various public online services, logging all access attempts, and discovered intriguing patterns in cybercriminals' attack methods and how they comb the network for other people's credentials.
Canary tokens: educational program
Canary token is a digital trap that signals unauthorized access or activity in the system. They work by injecting seemingly valuable but false information (account credentials, API keys, or other sensitive data) into various segments of the network, codebase, or application. When an attacker activates these tokens, an alert is triggered notifying the token owner of a potential breach.
Research method
In this study, I used AWS API credentials as canary tokens, placing them in public places on the Internet. To generate tokens I used the service Canarytokens from Thinkst. This is a good free service that provides different types of tokens: AWS credentials, DNS tokens, executable tokens and much more. When the token is triggered, data about the incident is sent to you by email.
I decided to place canary tokens in the following places:
Public code/image repositories: GitHub, GitLab, BitBucket, DockerHub.
Public services under my control: FTP server, web server, blog.
SaaS services: Pastebin, JSFiddle.
Package managers: NPMJS, PyPI.
Cloud storage buckets: AWS S3, GCP Google Cloud Storage.
I generated one AWS token for each of these resources. Every time someone used the token, the service sent me an email with information about an attempt to log into my account.
I tracked everyone who used my canary tokens for unauthorized access to AWS, and collected valuable data: IP addresses, user agents, timestamps, and methods used by the attackers.
I decided to use AWS credentials as bait to collect real data on activities that could definitely be classified as illegal. Other types of tokens, such as DNS tokens, executable tokens, or graphics tokens, may be triggered if someone simply touches them out of sheer curiosity, for example, which is not necessarily a deliberate violation of the law. But trying to gain access to someone else's AWS account using credentials that you found on the Internet is clearly an illegal act.
Motivation for research
My motivation for conducting this research stems from a combination of personal curiosity and a desire to remind cybersecurity professionals about a rarely used tool: canary tokens. I find their concept intriguing and cool. The idea of using randomly “left behind” data as digital traps to detect unauthorized access interested me, and I wanted to find out how they would perform in real conditions. I'm also intrigued by how quickly and how often attackers scan public resources and compromise targets using unsecured credentials.
Moreover, I believe that canary tokens are an undervalued resource among information security professionals. Despite their simplicity and effectiveness, these tokens have not been as widely adopted as other tools. By showcasing the ideas and insights gained from my research, I hope to raise awareness of the usefulness of canary tokens and encourage security practitioners to use them in their security strategies.
Research results
Code/image repositories
Code repositories are the most common place where people leave their credentials, and GitHub is obviously the most popular service in this category. For three code repositories (GitHub, GitLab and Bitbucket) I cloned the project Prowler (an open source cloud auditing tool), added a config file containing a canary token, and pushed the modified code into new repositories, setting them up as public.
For DockerHub, I created a public Docker image with a NodeJS web application that had credentials hardcoded into the source code. Anyone pulling the image can easily see the tokens in the source code. I gave the image a spicy name to make it more attractive to attackers.
In the graph below you can see the number of access attempts to GitHub and DockerHub per hour from the moment the canary tokens were published. BitBucket and GitLab are not shown here: to my surprise, no one showed interest in the tokens published on these platforms.
With GitHub, the situation was exactly the opposite: the first access attempt using canary tokens was made within a few seconds after the project was published. On DockerHub, the first attempt was recorded after 170 hours (~7 days), after which attempts were repeated every few days.
The following chart shows the distribution of IP addresses that tried to use canary tokens on GitHub during the first 500 hours.
Public services under my control
For this category, I ran EC2 on AWS, installed a few services, and exposed it to the internet:
Anonymous FTP Server – I installed an open source FTP server, configured it to allow anonymous access, and placed a canary token file in it.
Web Server – I configured the web server on port 80, added a robots.txt file and placed the token along the path
/aws.config.
The robots.txt file was supposed to direct scraper bots to/aws.config.
Unfortunately, no one was interested in the bait, and I did not record a single attempt to access either FTP or the web server. Then I decided to make the task easier for the attackers and moved the token to the root directory of the web application, and then, a day later, I started getting interesting results.
I also created a fake blog post on my website, posing as a guide to connecting to AWS using the CLI. IN post Examples of connecting to AWS were given, which actually represented a canary token.
One way or another, the first successful results came only after the token was in /aws.config
has been moved to the web server's root folder. It took almost 50 hours for scrapers to visit my website and start using the token.
The chart below shows the number of access attempts in the first 600 hours after the token was issued. I compare the canary token in the root directory with the token on Pastebin (more on that in the next section), because they had a comparable number of attempts.
SaaS services
1) Pastebin is an online service that allows users to store and exchange text documents (code fragments, configuration files, logs). Users can create a “paste” by submitting text, which is then stored on the Pastebin server and given a unique URL so it can be shared with other users.
For Pastebin, I tested 2 tokens: one token was on a password-protected paste with an easy-to-crack password 123456. The second token was without a password.
2) JSFiddle is an online tool and collaborative web development framework that allows users to write, test, and share code snippets in HTML, CSS, and JavaScript. I created a new piece of code with a hard-coded canary token that simulated a service listing S3 buckets.
The results show that Pastebin is really not a good place to put anything sensitive, at least without password protection first: the token was immediately discovered and used. But password-protected pastes, apparently, cannot be hacked – I received 0 hits for this token.
JSFiddle doesn't seem to be too bad – I also got 0 hits on it. I assume that since it is used for client code, developers are unlikely to often leave anything secret and potentially valuable in the code, so hackers do not monitor it.
Then I wondered what would happen if I published my fiddle link on Pastebin without a password, but even then there were no takers.
Package Managers
Package Managers are tools that automate the installation, updating, configuration and management of software packages. These packages may contain libraries, frameworks, applications, and most importantly, many packages are publicly available.
Finding secrets in such packages is a very realistic scenario: sometimes developers accidentally publish packages with passwords or keys in them, or mistakenly publish a package as public instead of private. As you can see from the data below, if you make a mistake like this, chances are your secrets will fall into the wrong hands in a matter of seconds.
For this section of the study, I chose two popular package managers: Pypi and NPMJS. I created applications with hard-coded canary tokens and placed packages in these repositories.
The chart below shows the number of attempts to access NPMJS and Pypi in the first hours after they were published publicly.
I believe there are quite a few legitimate services on the Internet that regularly download and run any newly published package. Therefore, some of the results in this section may not actually indicate an illegal access attempt, but rather some kind of service that is automatically executing code.
I probably should have foreseen this scenario and taken into account that canary tokens will be triggered not only when the package is launched by an attacker, but also as a result of the operation of similar services. On the other hand, I have data showing that the same IP address attempted to make AWS API calls multiple times using tokens left in packages on Pypi and NPMJS. This behavior is not typical for robots that automatically download public packages: this was clearly an attacker who was trying to compile a list of repositories and secrets.
Buckets
Buckets in AWS (Amazon Web Services) and GCP (Google Cloud Platform) are containers used to store and manage data objects such as files, images, and backups. Leaky buckets sometimes expose credentials: Some people use buckets to store backups and configuration files without realizing that the bucket is configured as public.
In this part of the study, I placed one canary token in public buckets on AWS S3 and GCP Google Cloud Storage.
The reason for including these buckets in the study was my somewhat conspiracy theory that there may be villains in the world who have a method of identifying all public buckets. I hoped that I would be able to find these reptilians and expose their lizard-like habits!
Alas, none of the buckets generated any hits when I made them public. It wasn't until I posted the bucket address on Pastebin, GitHub, and my website that I received a few hits that looked like someone in the United States was trying to use a bunch of API functions on AWS (I'm guessing it was just a regular person, but not reptilian).
Access speed
As I mentioned, I was interested in learning how quickly hacker bots intercept and gain access to stolen credentials. It turned out that in the case of some services they do it damn fast.
Attack patterns
Once you receive a notification that a canary token has been triggered, you can see what action the attacker took that led to its activation. The tokens I left behind were AWS API credentials so I could track what AWS API event the attackers were trying to trigger. Unfortunately, this is where the canarytoken.org service is a little disappointing – it does not store detailed information about past events (it is not sent by email). In total, I managed to catch about 70% of the events – the rest of the information was lost. Here's a graph broken down by event type:
The InvokeModel event in AWS refers to the action of invoking a machine learning model deployed in AWS services, such as AWS SageMaker. When this event is fired, the specified model processes the input data and returns prediction results. I have a few ideas why this has proven to be a popular choice among attackers, but I'll leave it to you to figure it out for yourself.
Distribution of access attempts across services
This chart shows the total number of access attempts across tokens hosted by different services (including services that have had no access).
IP address analysis
A total of 45 unique IP addresses were found in the canary token logs. Here are detailed information about them:
The breakdown of attacker IP addresses by country differs little from what we typically see in other similar studies of cyber activity. Most of the IP addresses are from the United States; several Asian countries were also noted. The only surprise for me here is the lack of Chinese IP addresses. However, I would not attach much importance to the source of the IP addresses, since many attackers probably use some foreign automated cloud service for reasons of caution. AWS Internal and SNS are included in the chart because CanaryTokens sometimes lists them as the source of IP addresses.
Analysis of malicious IP addresses
I was also wondering if the IP addresses would be flagged as malicious by some IP testing service. VirusTotal provides a free IP scan that checks IP address classification using 92 different engines. When checking, the engine classifies the IP into one of four categories: “Clean”, “Unrated”, “Malicious”, “Suspicious”. I ran all 45 unique IP addresses against each of the 92 engines (4140 results in total) and got the following results:
Net: 1283 (31.0%);
Unrated: 2848 (68.8%);
Suspicious: 2 (0.04%);
Malicious: 7 (0.02%).
The results of this analysis show that the method of classifying malicious IP addresses to detect this type of attack is fundamentally useless due to the high rate of false negatives.
User agent analysis
User agent data provides insight into how bots accessed AWS. This data can be easily falsified, but can be used to identify attackers by tracking the version number of the software used to access AWS.
The following diagram shows the number of access attempts for each user agent.
Most requests were made using some version of botocore3 (the core library used in the official AWS SDK). There are also a large number of well-known HTTP libraries such as python-requests, axios and AIOHttp. This suggests that access attempts were performed automatically using specially designed tools (rather than manually using the AWS CLI).
Final Thoughts
Things that surprised me
Analysis of the collected data revealed two unexpected discoveries:
No attempts to access BitBucket and GitLab. Going into the experiment, I was confident that tokens on these services would be found quite quickly – perhaps not as quickly as on GitHub, but still I did not expect that there would be 0 hits. I'm still not sure how to explain this: perhaps these services are less popular, or they are more difficult to crawl by scraper bots.
I find it quite surprising that some tokens were snatched up and used literally seconds after they were posted. The NPMJS token was captured in less than a minute (including additional time for logging into the account and detecting the access attempt). The situation is the same with GitHub and Pypi – tokens on these platforms were activated a couple of minutes after I posted them.
Opinion about canary tokens
As I mentioned at the beginning, canary tokens are an underrated tool that allows you to cheaply and quickly create an additional layer of security for an application or IT product. Although simple, they can play a critical role in detecting unauthorized access and potential threats.
By strategically placing canary tokens in your systems, you can proactively identify malicious activity, respond to it faster, and mitigate the consequences. Deploying canary tokens requires minimal effort and cost, but can significantly improve your security.
You can read about how to learn to work with this tool yourself in my other post.
Conclusions from the experiment
The main conclusion: there are groups of attackers who very effectively use web scrapers to “vacuum clean” the network for bad secrets. Most likely, hackers will intercept your token or password within minutes or hours depending on which service you left it on
If you or your company mistakenly left account keys on a public service, update them immediately and investigate the possibility of malicious use of compromised credentials.