Geo-chat, harmful bots and steganography: enriching knowledge about Telegram

What do you know about Telegram chat rooms? And can you distinguish steganography in VideoNote (popularly – round-robin)? We analyze the same task NeoQUEST-2020, which caused the most questions and exclamations to our support! Spoiler: yes, and there will also be a bit of crypto here 🙂

In the legend NeoQUEST-2020 detect the link on the profile of a traveling robot on Instagram. Nothing out of the ordinary, right? So we decided so too, but we still need to solve the task, so we carefully consider all the pictures in the profile and look for at least some clues. A little meditation on a beautiful picture of Lake Baikal, and we come to the realization that the clue is in the last post:

Thanks to the picture, we understand that we need to somehow connect Lake Baikal (Shaman Rock) and Telegram (“U can join my …” – doesn’t resemble anything?). At first, we decided not to give the participants a direct hint of geo-chat (which is exactly it!), And many of them successfully coped with the task using an emulator or a mobile device with the ability to change the geo-location. Shamanim We set the coordinates (53.20074, 107.349426) (you can by eye) in the area of ​​the Shamanka rock and prepare for the most difficult thing – the wait. Telegram strangely works with geo-positioning and pulls up relevant contacts and chats for an hour. For our diligence and patience, we are paid in full – the desired chat appears in the Contacts section -> Find people nearby -> Groups nearby.

Voila, we’re in business!

The bot meets us with a task in the form of a file some.bytes with unidentified contents, in which we can read the lines “Decrypt me” and “Apocalypse Spares Nobody”.

We understand the first line without any problems, but what does the second mean? .. Here, the participants divided into two camps: some wrote to us by mail, because they got into a dead end, while others carefully looked at the phrase “Apocalypse Spares Nobody” and discerned that ? Right! Good old format ASN.1 (here we already wrote about how to parse it).

Let’s get it right. Inside are 2 structures. In one, we find a set of bytes marked “Decrypt me”, from which we assume that this is ciphertext. In the second structure, we see two numbers. It is unlikely that this is a key generously presented by the participant along with ciphertext, which means most likely. We are dealing with a public key. All the information gathered leads us to the obvious conclusion – why not try RSA?

So, we have a module and an open indicator, which, by the way, is quite large. After RSA Convulsive Examination After some thought, we conclude that the closed indicator is small, and this means what? Bingo! We can definitely play bad guys and apply Wiener attack.
We thought it through even for those who do not like cryptography – you could use a ready-made version of the attack, for example, by this.

Next, we get the value of the closed indicator d = 40553818206320299896275948250950248823966726834704013657854904761789429403771 and decrypt the ciphertext: key = nq2020faAeFeGUCBjYf7UDrH9FapFCdFPa4up3 pass5HDP pass4HD = Pass.

We get the key “nq2020faAeFeGUCBjYf7UDrH9FapFCdFPa4u” to the first part of the job and the password “passCxws3jzYhp0HD5Fy84”, which you need to feed the bot representative. It can be found among the chat participants under the name @neoquestbot.

Being on the wave of positive from receiving the first key, we do not immediately realize that the bot is picky in communication and all the time says that it does not see the interlocutor:

But the bot happily receives VideoNote round messages and even answers them … in the same round form:

It seems that both the video and the sound are the same, but this is only at first glance. What if our bot gives us some secret signs? To find out, we’ll save and compare the original video with the bot’s response. For this and for the next steps, the package is great for us Ffmpeg. So, let’s see what is there:

Format aac -> flac, frequency 44100 Hz -> 98000 Hz. They found out, we continue to work further with audio.
With a deft movement of the hands we pull it out of the video:

The same can be done with our original message, so that we can compare them later. For clarity, open both tracks in Audacity.

The jump in amplitude in the bot’s audio response immediately catches your eye (it’s especially strange if we were silent at all). On closer inspection, we note the clear boundaries of the intervals during the alternation of “wave-silence”:

We suggest putting aside all matters and counting a little. Analyze by fragments:
0 – 0.005 – silence
0.005 – 0.01 – wave
0, 01 – 0.0225 – silence
0,025 – 0,04 – wave
0.04 – 0.045 – silence
The smallest interval is 0.005, while all other intervals are multiples of 0.005.
We take the presence of a wave of 0.005 for 1, and silence for 0. We get nothing more than a binary code!
We recall that the frequency has changed, and try to look at the spectrum graph (Analysis -> Spectrum graph):

We see that the most powerful signal is at a frequency of ~ 44100 Hz, which is ultrasound.
So, then you should work only with high frequencies.

In fact, the bot superimposes its signal on the original audio in the audible spectrum. And those participants who had sound in the original video noticed this in Audacity.
We cut off the high frequencies with a high-pass filter either in Audacity, or in the same ffmpeg:

So, we have a 16-bit mono wav file. It consists of a header, uncompressed audio stream, and metadata. The audio stream itself is divided into frames (and frames can store several samples in themselves, but this is a completely different story), in our case 16 bits each (this is indicated by the letters pcm_s16 in the screenshots). Frames are sequences of bits that describe the amplitude of a wave at a time for one or more channels (in our case, for one). The sampling frequency of the audio stream is 98000 frames (that is, 98000 frames per second), and 490 frames per 0.005 second interval. Therefore, further we work according to a simple algorithm: we read 490 frames, determine whether it is a wave or silence, and, depending on this, set the bit to 0 or 1.

We will use python and package wave for parsing wav files.
If the error “wave.Error: unknown format: 65534” occurs when opening the file, then replace “wFormatTag” in the header from ‘FE FF’ to ’01 00 ‘:

fh = open(input_file, "r+b")

So, open the file, process 490 frames and calculate the average value:

file =,"r")
    for i in range (1, int(file.getnframes()/490)+1):
        frames = file.readframes(490)
        bit = 0
        sum = 0
        for k in range(0, 246):
            frame_bytes = frames[k*2:k*2+2]
            sum += int.from_bytes(frame_bytes, "big")
        if sum/490 > 16000:
            bit = 1

It is possible that where there should be silence (compare with the picture in Audacity), noise may remain. Therefore, we set the threshold (let it be 16000), above which we consider the signal to be 1.
Then group the bits into bytes:

bytes = []    
for i in range (1, int(len(bits)/8)+1):
        b1 = bits[i*8-8]
        b2 = bits[i*8-7]
        b3 = bits[i*8-6]
        b4 = bits[i*8-5]
        b5 = bits[i*8-4]
        b6 = bits[i*8-3]
        b7 = bits[i*8-2]
        b8 = bits[i*8-1]
        byte = (b1 << 7) | (b2 << 6) | (b3 << 5) | (b4 << 4) | (b5 << 3) | (b6 << 2) | (b7 << 1) | b8
        bytes.append(byte.to_bytes(1, byteorder='big')) 

If everything is done correctly, the result is the string “Givemethepassword”. Since the bot communicates in circles using steganography, it will be logical to slip a password for it (and we received it together with the key as a result of decryption) in the same format.
To get started, compose an audio track with a password. To do this, we use the data obtained during the analysis of the message from the bot: sampling frequency 98000 Hz; the duration of the signal describing each bit is 5 ms; the signal frequency corresponding to the bit value “1” - as we saw from the graphs, 44100 Hz.
Now we need to "generate" silence. We do this by neutralizing:

sample_rate = 98000.0
def generate_silence(duration_milliseconds=5):
    fragment = []
    num_samples = duration_milliseconds * (sample_rate / 1000.0)
    for x in range(int(num_samples)): 
    return fragment

We will use a sine wave to generate sound (information can be read here):

def generate_sinewave(
    fragment = []
    amplitude = volume * 32767.0
    num_samples = duration_milliseconds * (sample_rate / 1000.0)
    for x in range(int(num_samples)):
        fragment.append(amplitude * math.sin(2 * math.pi * freq * ( x / sample_rate )))
    return fragment

Now the thing is small: it remains to convert the password into bits, and then into sound.

Note: The bot uses the original input video track to overlay its message, as mentioned earlier. Therefore, you need to add a few zero bytes after the password in order to shake out the whole key from the bot, and not just its beginning (the key length was 36 bytes).

Sound generation

    audio = []
    f = open(input_file, 'rb')
    for character in
        a = character
        b8 = a & 0b00000001 
        b7 = (a & 0b00000010) >> 1 
        b6 = (a & 0b00000100) >> 2
        b5 = (a & 0b00001000) >> 3
        b4 = (a & 0b00010000) >> 4
        b3 = (a & 0b00100000) >> 5
        b2 = (a & 0b01000000) >> 6
        b1 = (a & 0b10000000) >> 7
        if b1 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b2 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b3 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b4 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b5 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b6 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b7 == 1:
            audio += generate_sinewave()
            audio += generate_silence()
        if b8 == 1:
            audio += generate_sinewave()
            audio += generate_silence()

Now we will create a ready-made wave file:,"w")
    nchannels = 1
    sampwidth = 2
    nframes = len(audio)
    comptype = "NONE"
    compname = "not compressed"
    wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
    for sample in audio:
        wav_file.writeframes(struct.pack('h', int(sample)))

We save our track, for example, in pass.wav. Along the way, we check with our stego decoder whether the password is recognized. If all is well, then we get a new video with a password from the original video my_video.mp4, replacing the audio track:

src = "»/>
Now we need to make VideoNote out of this. You can try to find working ones (some of the participants, for example, found @TelescopyBot), or you can write your bot using TelegramAPI.

Anyway, forward to our bot:

We get a new round and congratulations (we would have done such a job!), Decoded the audio according to the already worked out scenario, and got the key: “nq2020SyOMK7SnnJP1sNlvbTs8zt35vUrrsD”

No wonder steganography is considered one of the most difficult areas of cybersecurity - try to guess here about all these nuances! But the NeoQUEST participants demonstrated excellent dexterity and a sense of humor during this assignment, so we are addressing our sincere (from the bot congratulations) sincere admiration!

Similar Posts

Leave a Reply