how hard links and XOR will save your gigabytes

Let's delete bigfile.txt. But hardlink.txt still works (with shortcuts and symbolic links this will not work), this is possible because the hard link points to the data directly, and not to a specific file name. As long as there is at least one hard link to the data, the file continues to take up disk space. If you remove all hard links, space is freed up.

XOR and backup

XOR is a logical exclusive-or operation useful for creating backup copies of data. XOR allows you to compare data bit by bit, which makes it possible to detect changes. For example, you can completely backup 2 disks, having only one additional one. Let's imagine that we have data on two disks, say “0100” and “1111”. The XOR result of this data will be written to the third disk.
Here's an example of how it works:

Disk 1: 0100 Disk 2: 1111 XOR result: 1011 Now you can restore any of the three disks using the other two. Simply by performing the XOR operation on the remaining two disks again. This allows you to create cost-effective and reliable backups. RAID 5 arrays work on this principle. If we used AND (logical and) also known as RAID 1 (mirror), then to reserve 2 disks we would need 2 other disks, saving money.

Hard links and XOR for backup

It was all a saying. Now this is why this article was written. The combination of XOR and Hard link gives fault-tolerant mechanics for backup and working with snapshots. Imagine we have two files. Let these be “File A” (1 GB) and “File B” (2 GB). Let’s say on November 1st, changes were made only to “File A”, while “File B” remained the same. With a traditional November 2nd backup, both files would have to be copied entirely, which would take up an additional 3 GB. However, by comparing the hash sum of the files, we determine which file was changed and record only it, in our case “File A” (1 GB), and for “File B” we create a hard link to its previous state. This allows you to save the state of the file system completely, with detail for each day, wasting space only on changed data. This is similar to the shadow copy system in Windows.
But data is vulnerable to loss due to drive failure. Therefore, we use software, hardware, or our own RAID written in Python to create redundancy and improve security:

def xor_files(file_a, file_b, file_xor):
    with open(file_a, 'rb') as infile, open(file_b, 'rb') as maskfile, open(file_xor, 'wb') as outfile:
        while True:
            byte = infile.read(1)
            mask_byte = maskfile.read(1)
            if not byte or not mask_byte:
                break
            # XOR каждого байта файла с байтом маски
            outfile.write(bytes([byte[0] ^ mask_byte[0]]))

# Пример использования
input_file="original_file.txt"    # Исходный файл
mask_file="mask_file.txt"         # Маска-файл
output_file="xor_output.txt"      # XOR-файл

xor_files(file_a, file_b, file_xor)

Thus, the combination of hard links and XOR opens up opportunities for creating reliable backups and saving disk space. This method allows you not only to maintain access to the system state for every day, but also to save space, because only what has actually changed is stored.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *