Deepvoice = Deep Trouble. A new attack scheme using voice generation from friends and colleagues

At the beginning of 2024, a new scheme of attacks on people appeared in Russia – in it, attackers extort money with the votes of relatives and friends, and in corporate frauds – with the voices of executives.

Voice generation has already been seen in the Fake Boss scheme, a scheme with a fake photo of a bank card, and a scheme for hijacking an account on social networks. Most cases occur in Telegram – scammers hack an account, generate the voice of the account owner and send a short voice message to all chats asking them to send money.

We have analyzed the new scheme step by step – using examples we talk about the attack on people, and at the end we give advice on how to protect accounts and resist the new scheme.

How scammers gain trust through the voices of friends, relatives and colleagues

Deep voice phishing is an attack on people in which the victim is extorted money or confidential information using a fake voice.

Since the beginning of the year, deep voice phishing has become actively used in the Fake Boss scheme, which we have already discussed wrote on the blog. If at the end of 2023 regular messages were received from “managers,” now voice messages are added to them with a note like “the call is not getting through, now I’ll explain by voice.” Identity and voice theft were reported in February Mayor of Ryazan, Mayor of Vladivostok And Mayor of Krasnodar.

With the deep voice scheme, in the context of ordinary people, both become victims – the one who receives voice messages and the one on whose behalf they come. Most often, such attacks are carried out on WhatsApp, VK and Telegram, the latter being the most popular platform. Over the past month, the Solar external digital threat monitoring center revealed more than 500 new domain names similar to Telegram. Let's look at the diagram step by step using this messenger as an example:

How people are attacked using voice messages on Telegram

1. Account hacking

Using social engineering, attackers obtain user account data through fake voting, a wave of which was observed at the end of 2023, and a fake Telegram Premium subscription.

2. Collection of material

After logging into your account, attackers download voice messages. If there are not enough of them, hackers collect samples from social networks.

In January 2024, an employee from Hong Kong translated scammers $25 million after a video call with deepfakes – in total he made 15 transactions. First, the attackers sent the person a phishing email with a request to discuss confidential transfers and attached a link to a call.
On the call, digital copies of senior management used familiar voices to convince the victim to transfer money to several accounts in Hong Kong. The authenticity of the incident became clear only a week later, when the company discovered the losses and reported to the police. To recreate the meeting participants, the scammers used publicly available video materials.

3. Voice generation

Previously, voice messages were cut from a limited number of recordings that scammers had. Now AI tools only need a few samples to create believable speech in a stream.

Fraudsters use voice in two ways:

Make an appointment in advanceto turn it on during a call and hang up before the person starts asking questions. On Telegram, scammers create new messages asking to borrow money and several voice messages to work out objections with the message “It’s me, do you want me to answer a question?”

They create a model that processes voice in real time. Neural networks study speech patterns, intonation, speed of pronunciation and other characteristics to create a reliable voice image.

When the user speaks into the microphone, the algorithm analyzes the voice → converts it into text → translates the text into a spectrogram → generates a new spectrogram with the given voice → converts it into sound. Depending on the scammers’ access to other AI tools, video copies of people are also added to the voice fakes.

4. Generating a photo of a bank card

Using OSINT—open source intelligence—fraudsters search for the victim’s passport details and generate a photo of a card from a well-known bank with the sender’s name and surname. The photo is created by a special Telegram bot – such a scheme, but without voice messages, has become popular in the summer of 2023. The number indicated on the card is linked to the fraudster’s account in another bank, usually a little-known one.

5. Attack

Fraudsters use voice messages in different ways: some send it at the beginning of the conversation to immediately provoke a reaction, others send it during the conversation, when the recipient most doubts the veracity of what is happening.

In the second case, scammers first gain confidence and start a dialogue with text messages – they can continue the dialogue they started or bring up an old topic that they discover in correspondence. When interest is piqued, a fake voice message is sent in personal correspondence and in all chats where the account owner is a member.

The victim becomes aware of the fraud after he has transferred money and contacted the account owner

Here are some examples of successful attacks via voice messages since the beginning of the year:

Who	How much money did I give to scammers?	Whose voice reacted to	What form did the attack take?
11 year old girl	275 thousand rubles	Grandmother	Phone call
young girl	100 thousand rubles	Sister's husband	Voice message in Telegram
young girl	40 thousand rubles	Girlfriend	Voice message in Telegram
Pensioner	100 thousand rubles	Neighbor	Voice message (messenger not named)

The scheme is dangerous because scammers use several factors to identify the victim – account, voice and bank card. This makes the sender more trustworthy.

Attacks on employees combine several communication channels: email, text message in instant messenger, and audio message. The scammer, disguised as a top manager or partner, sends the victim an email about concluding an important contract that must be paid by the end of the day. After this, the attacker generates the phrase “We need money now, don’t delay the deal!” — by doing this he puts pressure on urgency and emphasizes authority.

A group of 17 criminals in the UAE in 2020 stole thus at the bank $35 million. A random employee believed the director’s fake voice on a call, allegedly the company was preparing for a major deal – money had to be quickly sent to the company’s “new” accounts. To make it more convincing, the criminals sent emails from the company and even copied “lawyer.”

Being a successful scammer just got easier

Fraudsters are not becoming smarter or more virtuoso, but technology is. Voice neural networks are still poorly distinguished between neutral and illegal requests in terms of phishing, which makes them vulnerable to requests from attackers. The AI cannot generate an error when asked to generate the phrase “Mom, put some money on this card,” because the phrase itself is harmless and is not subject to restrictions.

The situation with text neural networks is a little better. If you immediately ask the AI to write a phishing email, it will not do it.

ChatGPT refuses to write a phishing email

But it’s worth looking for weaknesses in the restrictions, removing key phrases – and voila – ChatGPT is ready to help scammers:

Early 2024 OpenAI and Microsoft turned off accounts associated with the hacking groups Charcoal Typhoon, Salmon Typhoon, Crimson Sandstorm, Emerald Sleet and Forest Blizzard, according to reports from both companies.

Hackers used OpenAI language models to search for cybersecurity tools, intelligence, and create content for phishing attacks. Both companies said they want to improve their methods of combating hacking groups. To do this, they will invest in monitoring technologies, collaborate with other AI companies, and increase transparency regarding possible security issues related to AI.

How to protect yourself from fake voice attacks

Here's what it's usually recommended to do to avoid becoming a victim of scammers:

delete voice messages immediately after listening;
set automatic deletion of all correspondence;
do not post videos with voice online;
listen to noise in the recording and unnatural intonations;
come up with a secret password for such cases with your loved ones.

The problem with such advice is that it is difficult to follow. People value private messages and don't always want to delete conversations. Only those who don’t post anything will be able to control their every step on social networks; in other cases, you might miss something. Well, unnatural intonations and noises in the recording may not be noticed if the person is on the street or in another noisy place at that moment.

In the examples that victims shared in the media, very short voice recordings are visible – up to 10 seconds. In short recordings there are much fewer errors in timbre, intonation and speech features, while in long recordings there is a greater chance of not being included in the original voice. It seems that the logical thing in this case would be to ask to send a long voice message, but under the influence of the situation the victim will not remember such an option, and also will not want to seem tactless when the person asks for help.

Here's what we offer:

contact the sender of the message via an alternative communication channel. If a person cannot receive an audio/video call, write in another messenger, send an SMS or contact through mutual friends or colleagues.
don't be afraid to miss time. It doesn’t matter whether you received a voice message asking you to send money at work or from loved ones, an extra 10–20 minutes for verification will not lead to disaster. It's better to take a break and play it safe.
make your social networks private. Yes, some of your data is already online, but you can limit access to your voice samples and your list of friends through your privacy settings on social networks.

You can train employees to identify deepfakes in a corporate environment using interactive courses—we developed one of these materials at Start AWR. In the course “Deepfake – an ideal tool for scammers,” employees will learn: