Audio-graphic encryption or how to hide the sound in a picture

To my great surprise, I did not find anything (Either nobody did it, or they did it, but didn’t share it with the world, or I was looking badly. Friday evening, you know). In any case, the flywheel of my determination has already begun to unwind. So let’s go in order. (complete project code here ). Examples for seed)):

And here is a ten second recording of my cold voice

Encryption method

First of all, I decided to think over the encryption scheme itself. How convenient, economical and painless can you put sound into an image? I will not talk about all the ideas that visited the palaces of my mind, I will tell you about the method that I subsequently chose. Let’s present a diagram for your convenience, and then there will be block-by-block explanations.

The first step is to get the samplerate (sampling rate) and data (peak amplitude values of the audio track) using scipy.io.wavfile. In Python code, it looks like this:

srate , data = wavfile.read(file)

Now the second step. Loop through the data list and create a hex-code from each value. For example, from the value 544 we get # 544aaf etc. The literal values are salt and are randomly generated. Here, by the way, the first problem is revealed. Amplitude values can take negative values, while hex-code does not favor minuses. The solution is simple, we replace ‘-‘ with ‘f’ and we will pay attention to this flag when decrypting. The final implementation looks like this:

for elem in s_arr:   
        gate = np.random.choice([False, True])
        app = None
        salt_pos="".join([np.random.choice(buffer_symbols) for _ in range(6-len(str(elem)))])
        salt_neg = ''.join([np.random.choice(buffer_symbols) for _ in range(6-len(str(elem)))])
        
        if elem >= 0:
            
            if gate:
                app = f'#{elem}' + salt_pos
                new_arr.append(app)
            else:
                app = f'#{salt_pos}{elem}'
                new_arr.append(app)  
        else:
            
            if gate:
            
                app = f'#f{elem*-1}' + salt_neg
                new_arr.append(app)
                
            else:
                app = f'#f{salt_neg}{elem*-1}'
                new_arr.append(app)

In the same step, we translate the hex-code values from new_arr to RGB format. These are the colors of the pixels of the future image. Don’t forget to encrypt the samplerate value into the last pixel. Next, we translate our np.array array into square dimensions and create images from it using the Image.fromarray of the PIL library (Pillow):

p_arr = np.array([list(hex2rgb(x)) for x in new_arr] + [[0,0,0] for x in range(delta_res - 1)] +[[srate_rgb, srate_rgb, srate_rgb]])
p_arr = p_arr.reshape(resolution, resolution, 3)
p_arr = p_arr.astype(np.uint8)
    
img = Image.fromarray(p_arr)
img.save(f'{file[:-4]}_encoded.png')

Wrapping all of the above into a function, we get:

def encode(file: str) -> PIL.PngImagePlugin.PngImageFile:
    
    """ Audio track encoding function.
        It takes a .wav file as input and encodes first in HEX and then in RGB.
        The output is an image with audio encoded in it
    """
    srate , s_arr = wavfile.read(file)
    # Разрешение это квадратный корень из длины списка значений амплитуд(s_arr, data)
    # Не спрашивайте почему, мне это показалось очень удобным
    resolution = math.ceil(np.sqrt(len(s_arr)))
    delta_res = resolution**2 - len(s_arr)
    
    new_arr = []

    buffer_symbols = ['a', 'b', 'c', 'd', 'e']
    
    # Так как частота дискретизации число большое, берем из неё кубический корень
    srate_rgb = int(srate ** (1/3))
    
    # А вот и цикл преобразования значений амплитуд в hex-code
    for elem in s_arr:   
        gate = np.random.choice([False, True])
        app = None
        
        # Значения солей случайны и разные для отрицательных и положительных значений
        salt_pos="".join([np.random.choice(buffer_symbols) for _ in range(6-len(str(elem)))])
        salt_neg = ''.join([np.random.choice(buffer_symbols) for _ in range(6-len(str(elem)))])
        
        if elem >= 0:
            
            if gate:
                app = f'#{elem}' + salt_pos
                new_arr.append(app)
            else:
                app = f'#{salt_pos}{elem}'
                new_arr.append(app)  
        else:
            
            if gate:
            
                app = f'#f{elem*-1}' + salt_neg
                new_arr.append(app)
                
            else:
                app = f'#f{salt_neg}{elem*-1}'
                new_arr.append(app)
            																																													# Зашиваем частоту дискретизации																																								# 
    p_arr = np.array([list(hex2rgb(x)) for x in new_arr] + [[0,0,0] for x in range(delta_res - 1)] + [[srate_rgb, srate_rgb, srate_rgb]])
    # Меняем размерность
    p_arr = p_arr.reshape(resolution, resolution, 3)
    p_arr = p_arr.astype(np.uint8)
    
    # Создаём изображение
    img = Image.fromarray(p_arr)
    img.save(f'{file[:-4]}_encoded.png')

Something like that. At the output, we get an image, on the one hand is a jumble of pixels, on the other, that it reminds me of something (Spectrogram? Perhaps). In any case, without the decryption function, this is just a beautiful, but useless pngeshka. Speaking of decryption. The plan is the same, the diagram, and then the explanations with the code.

Receiving an image with encrypted audio as an input, the decode () function first of all receives an np.array of RGB values of this image and pulls it into a one-dimensional array. It then wraps around the exact opposite of the encode () function:

def decode(path: str):
    
    """Audio decoding function from image. Uses the inverse algorithm of the encode () function
    """
    
    # Получаем и вытягиваем в одномерку наше изображение
    img = np.array(Image.open(path))
    img = img.reshape(img.shape[0]**2, 3)
    
    f_arr = []
    end_arr = []
    
    # Обратный цикл( В народе дешифровка )
    for elem in img:
        f_arr.append(rgb2hex(*elem))
        
    for h in f_arr:
        
        res = None
        
        # А вот и наш флаг f , если он есть подставляем к конечному значению минус
        if h[1].lower() == 'f':
          
          	# Единтсвенными числами в получившемся hex-code будут нужные нам значения амплитуд
            # Используем модель re для вытягивания циферок
            res = re.findall('d+', h)[0]
            res = -int(res)
            end_arr.append(res)
            
        else:
            
            res = re.findall('d+', h)[0]
            end_arr.append(int(res))
            
    
    
    end_arr = np.array(end_arr).astype(np.int16)
    
    # Помните последний пиксель с зашитой в него частотой дискретизации? 
    # Восстанавливаем эту мадам в звании
    samplerate = img[-1][-1] ** 3
    samplerate -= samplerate % -100
    
    # Создаём .wav файл из полученных np.array со значениями амлитуд и частоты диск-ии
    wavfile.write(f'{path[:-4]}_decoded.wav', samplerate, end_arr)

Afterword, results, disclaimers and limitations

Hmm, of course it turned out great. Time was well spent. But. Let’s talk about the cons:

First. I have not yet solved the format problem and therefore the code only works with .wav files.

Second. The encryption / decryption algorithm is not the fastest, an audio length of three minutes is encoded for 15 minutes, moreover, the resolution of the output image also directly depends on the duration of the audio file. The same three-minute fragment in encrypted form has a resolution of 3058×3058 pixels. Not weak. But the losses are minimal! when comparing, it turned out that the original and decrypted files are 99.3% identical !! It would be a hundred, but madam the sampling rate does not want to be decrypted without loss.