Libraries for implementing data compression algorithms in Rust

Suitable for applications where it is necessary to maintain the original data quality. They are used in:

  • Text data and code;

  • Archivingswhen you need to compress many files without losing information;

  • Some image formatssuch as PNG, where preserving detail and transparency is important.

Good examples of such algorithms are Huffman algorithm And Lempel-Ziv-Welch algorithm.

Lossy compression algorithms

But these algorithms are used where a small loss of quality is not particularly important, but what is important is to significantly reduce the amount of data:

  • Video and audio processing, where file reduction may involve simplifying details that are less noticeable to the human eye or ear;

  • Image compression for the web, where page loading speed is a priority over a superb picture.

One example of such algorithms is DCT.

So, we talked briefly about the basics, and now let's move on to libraries. And the first library – zstd.

zstd

zstd Supports both synchronous and asynchronous compression and decompression, allowing you to choose the appropriate method depending on your application requirements.

Main functions:

encoder And decoder classes allow you to compress and decompress data respectively. encoder accepts data to be compressed and outputs the result, whereas decoder reads compressed data and restores it.

stream: module for working with data streams, which includes functions copy_encode And copy_decode for working with standard Rust I/O streams.

dict: a module for working with compression dictionaries, which compresses by pre-training on typical data that is supposed to be compressed.

encode_all And decode_all: Functions to compress and decompress full data in memory.

compression_level_range: A function that returns a valid range of compression levels.

Library async-compression integrates zstd with Rust asynchronous streams, which allows you to use data compression and decompression in asynchronous applications.

DEFAULT_COMPRESSION_LEVEL: A constant that specifies the default compression level.

Examples of using

Synchronized compression and decompression

use std::io;
use zstd::stream::{read::Decoder, write::Encoder};

// сжатие данных
fn compress(data: &[u8], level: i32) -> io::Result<Vec<u8>> {
    let mut encoder = Encoder::new(Vec::new(), level)?;
    io::copy(&mut &data[..], &mut encoder)?;
    let compressed = encoder.finish()?;
    Ok(compressed)
}

// декомпрессия данных
fn decompress(data: &[u8]) -> io::Result<Vec<u8>> {
    let mut decoder = Decoder::new(&data[..])?;
    let mut decompressed = Vec::new();
    io::copy(&mut decoder, &mut decompressed)?;
    Ok(decompressed)
}

Asynchronous compression and decompression

use async_compression::stream::{ZstdDecoder, ZstdEncoder};
use tokio::io::{self, AsyncReadExt, AsyncWriteExt};

async fn compress_async(input: &[u8], level: i32) -> io::Result<Vec<u8>> {
    let reader = io::Cursor::new(input);
    let stream = ZstdEncoder::with_quality(reader, level);
    let mut writer = Vec::new();
    let mut stream = io::BufReader::new(stream);

    while let Some(bytes) = stream.fill_buf().await? {
        writer.write_all(bytes)?;
        stream.consume(bytes.len());
    }
    Ok(writer)
}

async fn decompress_async(input: &[u8]) -> io::Result<Vec<u8>> {
    let reader = io::Cursor::new(input);
    let stream = ZstdDecoder::new(reader);
    let mut writer = Vec::new();
    let mut stream = io::BufReader::new(stream);

    while let Some(bytes) = stream.fill_buf().await? {
        writer.write_all(bytes)?;
        stream.consume(bytes.len());
    }
    Ok(writer)
}

You can customize the compression level by passing the compression level parameter to the constructor Encoder. Levels can range from 1 (fastest compression, least compression) to 22 (highest compression, slowest compression).

flate2

flate2 can be configured to work with various backends, including miniz_oxide And zlib-ng.

Main functions:

  1. Modules for working with data streams:

    • read: module for working with data streams that can be read.

    • write: Designed for recording and compressing data. Includes classes that wrap standard output types and allow data to be compressed when written.

    • bufread: contains functions for working with buffered reads

  2. Types for compression and decompression:

    • GzEncoder And GzDecoder: classes for working with the gzip format. GzEncoder used for data compression, GzDecoder – for their decompression.

    • DeflateEncoder And DeflateDecoder: designed to work with the DEFLATE format.

    • ZlibEncoder And ZlibDecoder: Used for compression and decompression in the zlib format.

  3. Features and utilities:

    • compress And decompress: Functions for compressing and decompressing data blocks. You can compress or decompress data very quickly without the need to manage threads.

    • Compression: A framework that provides varying levels of compression, from fast and less efficient to slow but more efficient.

Examples of using

Compressing Data Using DEFLATE

use flate2::write::DeflateEncoder;
use flate2::Compression;
use std::io::prelude::*;

let data = b"Пример данных для сжатия";
let mut encoder = DeflateEncoder::new(Vec::new(), Compression::default());
encoder.write_all(data).unwrap();
let compressed_data = encoder.finish().unwrap();

Data decompression DEFLATE

use flate2::read::DeflateDecoder;
use std::io::prelude::*;

let mut decoder = DeflateDecoder::new(&compressed_data[..]);
let mut decompressed_data = Vec::new();
decoder.read_to_end(&mut decompressed_data).unwrap();

Use gzip for compression

use flate2::write::GzEncoder;
use flate2::Compression;
use std::io::prelude::*;

let mut encoder = GzEncoder::new(Vec::new(), Compression::fast());
encoder.write_all(b"Пример данных для сжатия с использованием gzip").unwrap();
let gzip_compressed_data = encoder.finish().unwrap();

gzip data decompression

use flate2::read::GzDecoder;
use std::io::{self, Read};

let mut decoder = GzDecoder::new(&gzip_compressed_data[..]);
let mut decompressed_data = String::new();
decoder.read_to_string(&mut decompressed_data).unwrap();

You can choose a compression backend depending on your needs:

  • miniz_oxide: Completely in Rust, well suited for scenarios that require compatibility and memory safety.

  • zlib-ng: A high-performance lib that can be used for super performance:

flate2 = { version = "1.0.17", features = ["zlib-ng"] }

brotli

brotli was developed by Dropbox itself. It uses predefined dictionaries that can significantly improve the compression ratio, especially for commonly used texts or data with repeating patterns.

Main functions of the library:

CompressorReader And CompressorWriter: wrappers for streaming data compression. CompressorReader processes data for reading, and CompressorWriter – for recording.

Decompressor: For data decompression, supports both streaming read and stream write.

encode And decode: Functions for compressing and decompressing blocks of data in memory.

BrotliEncoderParams: A structure for setting compression parameters, including compression level and window size.

There are also custom memory allocators for managing resources in embedded systems or resource-constrained applications.

BrotliCompress And BrotliDecompress: functions for data compression and decompression using streams io::Read And io::Write.

Examples of using

Data compression

use brotli::CompressorWriter;
use std::io::Write;

let mut output = Vec::new();
{
    let mut writer = CompressorWriter::new(&mut output, 4096, 11, 22);
    writer.write_all(b"Пример данных для сжатия").unwrap();
}

Data Decompression

use brotli::Decompressor;
use std::io::Read;

let mut decompressor = Decompressor::new(&output[..]);
let mut result = String::new();
decompressor.read_to_string(&mut result).unwrap();

snap

Snap does not strive for maximum compression, but is more focused on speed.

Main functions:

FrameEncoder And FrameDecoder – streaming compression and decompression, respectively.

compress And decompress for data compression and decompression, respectively.

compress_into And decompress_into allow you to compress and decompress data directly into a user buffer.

compress_raw And decompress_raw are designed to work with data without using the “framed

compress_raw_into And decompress_raw_into also allow you to work directly with buffers, minimizing some costs

Examples of using

Data compression

extern crate snap;

use snap::{write::FrameEncoder, read::FrameDecoder};
use std::io::{Cursor, Read, Write};

let data = b"Example data that needs to be compressed";
let mut encoder = FrameEncoder::new(Vec::new());
encoder.write_all(data).unwrap();
let compressed = encoder.into_inner().unwrap();

Data Decompression

let compressed_data = compressed;
let mut decoder = FrameDecoder::new(Cursor::new(compressed_data));
let mut decompressed = Vec::new();
decoder.read_to_end(&mut decompressed).unwrap();

snap now demonstrates excellent performance, compressing data at speeds of up to 250 MB/s or more, and decompressing at speeds up to 500 MB/s on new processors.

lz4-compression

lz4-compression uses the LZ4 algorithm. The algorithm is distinguished by the fact that it offers very high compression speeds, reaching more than 500 MB/s on a single core and decompression up to several GB/s

Examples of using

Data compression

use lz4::EncoderBuilder;
use std::io::Write;

let mut encoder = EncoderBuilder::new().level(4).build(Vec::new()).unwrap();
encoder.write_all(b"Пример данных для сжатия").unwrap();
let (compressed, result) = encoder.finish();
result.unwrap();

Data Decompression

use lz4::Decoder;
use std::io::Read;

let mut decoder = Decoder::new(&compressed[..]).unwrap();
let mut decompressed = Vec::new();
decoder.read_to_end(&mut decompressed).unwrap();

Streaming compression and decompression

use lz4::EncoderBuilder;
use lz4::Decoder;
use std::io::{self, Read, Write, BufReader, BufWriter};

// подготовка стримингового сжатия
let input_stream = io::Cursor::new(b"Data stream that needs to be compressed in real-time");
let output_stream = Vec::new();
let mut encoder = EncoderBuilder::new().level(4).build(output_stream).unwrap();
let mut buffered_encoder = BufWriter::new(encoder);

// сжатие данных в стриме
io::copy(&mut BufReader::new(input_stream), &mut buffered_encoder).unwrap();
let (compressed_stream, result) = buffered_encoder.into_inner().unwrap().finish();
result.unwrap();

// подготовка стриминговой декомпрессии
let input_compressed_stream = io::Cursor::new(compressed_stream);
let mut decoder = Decoder::new(input_compressed_stream).unwrap();
let mut decompressed_stream = Vec::new();

// декомпрессия данных из стрима
decoder.read_to_end(&mut decompressed_stream).unwrap();

There are different levels of compression, from basic fast compression to high compression, which requires more CPU time but provides better compression. For example, the standard LZ4 compression level can achieve compression speeds of up to 780 MB/s and decompression speeds of up to 4970 MB/s, while LZ4_HC at the maximum level increases the compression quality, reducing the compression speed to 41 MB/s, while decompression speed remains the same comparable.

Finally, let's try to implement compression in pure RUST

Let's implement a simple Run-Length Encoding compression algorithm.

RLE is a basic data compression algorithm that compresses sequences of repeated characters into character-quantity pairs.

use std::char;
use std::fmt::Write;

fn rle_encode(input: &str) -> String {
    let mut encoded = String::new();
    let mut chars = input.chars();
    let mut current_char = chars.next();

    while let Some(c) = current_char {
        let mut count = 1;

        while let Some(next) = chars.next() {
            if next == c {
                count += 1;
            } else {
                break;
            }
        }

        // записываем символ и его количество в результат
        write!(&mut encoded, "{}{}", c, count).unwrap();
        current_char = chars.next();
    }

    encoded
}

/// юзаем
fn main() {
    let input_string = "aaaabbbccddddde";
    let compressed = rle_encode(&input_string);
    println!("Original: {}", input_string);
    println!("Compressed: {}", compressed);
}

rle_encode: Takes a string as input and returns a compressed version of that string. We use iterators to traverse characters in a string and count sequences.

write!: Macro is used to write a character and its quantity into a string encoded.

This assumes that all characters in the input string can be compressed, and does not take into account cases where compression might increase the data size.


As we have seen, even for such a simple algorithm as RLE, a fairly large number of lines were produced. Therefore, libraries greatly facilitate the process.

What libraries for data compression in Rust do you know? Share in the comments!

Colleagues from OTUS talk more about working with data algorithms in practical online courses. Link You can register for a free lesson course “Algorithms for developers”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *