Genome and fractal archiving

Genome – contains all the necessary information for the development, functioning and inheritance of an organism. The genome is at the center of all biological problems, all human properties and abilities, all of its diversity. Chromosomes are the structure on which the genome is organized, and they contain the DNA on which genes are located.

G

G

As Kozma Prutkov said: “Many people are like sausages: what they are stuffed with is what they carry around with them.” So, we are “stuffed” with DNA, we carry it inside us, and it mainly determines a lot about us.

Let us dwell on the issue of the volume of stored data. To estimate the amount of information in a person's chromosomes, one can use the approximate number of base pairs in the genome and multiply by the number of bits of information contained in each pair. For humans, this is approximately 3 billion base pairs, each of which is encoded by two bits (bases A, T, G or C) [1]. So, the total amount of information in human chromosomes can be estimated as:

3 billion pairs×2 bits/pair=6 billion bits. 3 billion pairs×2 bits/pair=6 billion bits

This rough estimate of the actual amount of information may vary depending on the specific characteristics of the genome. The Wiki provides similar information: “The nitrogenous bases in DNA (adenine, thymine, guanine, cytosine) correspond to 4 different logical states, which is equivalent to 2 bits of information. Thus, the human genome contains more than 6 gigabits of information in each strand, which is equivalent to 800 megabytes and comparable to the amount of information on a CD.”

Just gigabytes, while, for example, the approximate number of synapses in the human brain is about 10141014 (that is, about 100 trillion). Clearly, one could argue that the construction of such a complex network involves cyclic algorithms with short code that unfold and generate this complex network. However, both setting up such a network and the algorithms for its functioning require much larger volumes than gigabytes. For comparison, GPT-3 (artificial intelligence neural network) has about 175 billion parameters. The amount of memory required to store a GPT-3 model is on the order of several hundred gigabytes, including model weights and other artifacts necessary for its operation. And if for a neural network it is still possible to introduce explanations related to learning and modeling based on the primary primitives, then for the growth of all cells in the body with their location and functionality, for the deployment of immune and other systems in the body, learning is impossible within the framework of the work of an already living organism, since deviation from proper functioning and development leads to death. And it is quite obvious that orders of magnitude more data is required. What to do?

We can imagine the genome (more precisely, unfolded DNA) as points of contents of a book describing the growth and functioning of a living organism, each point of which leads to the desired page with an algorithm for an action plan.

Link to the desired one "page"

Link to the desired “page”

An algorithm can contain a huge number of actions that describe the required process (for example, for building a specific cell in a specific place in the body) or a set of instructions necessary to carry out reactions by a particular system to manifestations of influences from the outside world.

So, DNA contains a table of contents or links to the pages of a book. And where is the content of the pages themselves with a huge amount of information. Where and on what? The answer is in fractal sets.

Understanding a fractal set and even reproducing it using mathematical operations is not at all difficult; only a few formulas are used. But the internal content and connections within complex fractals are unimaginable and practically endless, especially if they are not similar fractals themselves. You can construct such a fractal yourself using a computer and admire its complexity, for example, the Mandelbrot set or the fractal in the figure below [2][3].

Before returning to the genome, let's consider the fractal archiving mechanism associated with fractals. In the theory of data compression, this direction has been successfully developing for a long time; there are entire institutes engaged in promising developments in the study of algorithms and mathematical methods for fractal archiving of images and video sequences. Fundamentally, the essence of these methods lies in finding the corresponding fractals in a compact, sometimes even analytical formula, for compression and subsequent recovery of a large amount of information. And it doesn’t matter that the data is not just images. We can always turn data into multidimensional pictures and apply archiving to them, just the same and vice versa. There are algorithms for archiving (compressing) large information arrays, information warehouses and data warehouses using fractals. They are based on Banach's contraction theorem (also known as the Collage “Theorem”).

“Potentially the most useful type of fractals are those based on the Iterated Function System (IFS). The IFS method as applied to the construction of fractal images, invented by their great expert Michael Barnsley and his colleagues from the State Institute of Technology. Georgia (Georgia Institute of Technology), is based on the self-similarity of image elements and consists in modeling the picture with several smaller fragments of itself. Special equations allow you to move, rotate and change the scale of image areas; thus, these areas serve as building blocks for the rest of the painting. One of the most striking (and famous) IFS images is of the black fern, in which each leaf is actually a miniature version of the fern itself (see pic). Despite the fact that the picture was created by a computer using the affine transformation method, the fern looks exactly like a real one. It has been suggested that nature, when encoding the genetic structure of plants and trees, uses something close to the IFS fractal method.

IFS fractals have one very real and useful application: they can be used to compress large raster images to a fraction of their normal size. This statement follows from Banach's contraction theorem (also known as the Collage Theorem) and is the result of work by a researcher at the US Institute of Technology. Michael Barnsley's Georgia in the IFS field. He told the world about his achievement in Byte magazine in January 1988. However, there was no information about solving the inverse problem: how to find affine transformations from a given image. By that time, this problem did not even have a hint of a solution.

Ideally, I would like to be able to find for any image an affine transformation system (IFSM) that reproduces the image with a given accuracy. However, the solution was a little out of the question. It was Barnsley student Arnaud Jacquin who found it first. The proposed method is called the Partitioned Iterated Function System (PIFS). According to this scheme, individual parts of an image are not similar to the entire image, but only to parts of it.” [4]

Neuroscientists have discovered and established that inside cells there are special molecular automata capable of very complex actions that can open a DNA molecule, read its fragment, edit it, remove something from it, and correct errors. In other words, cells have a system that processes information in the same way as a computer.

Now we can conditionally reconstruct the development of the genome in the process of evolution. Let's start from the moment when organic formations with copying properties appeared, that is, the “protocell” has algorithms for its own copying. These algorithms can be quite large, so it is natural to assume that they should be stored in a compressed and even encoded form. Note that this fact echoes the properties of the Information described in earlier posts [5][6]. The most effective way to implement these mechanisms is through the above-mentioned fractal archiving, when both algorithms and accompanying data are recorded in fractals. Large sizes of algorithms and data cease to play a significant role, since the mechanisms themselves for reproducing any part of a fractal are very compact. DNA stores only an indication of the fractal and the area of ​​the fractal where the necessary data is contained in encoded form. In another way, the entire volume of data about the construction of a person and the functioning of all his systems is stored in encrypted form in various areas of fractals, which DNA and RNA can read and, guided by the deciphered instructions, build living organisms step by step and cell by cell. Most likely, fractals were originally stored in specialized molecules, as in storages [7], but then cells learned to use new mechanisms to build fractals and read data from them on the fly. Note that due to emerging “inaccuracies” during fractal decoding and copying of the genome, mutations may occur during the reproduction of structures, which is the reason for the evolution of both the genome and living organisms. So, in the process of evolution, cells, as a result of successful mutations, selected new functionality for the evolutionary development of the cells themselves and their ensembles in the form of living organisms, while cells (organisms) with unsuccessful mutations simply died. The time required for such evolutionary random selection is also understandable – billions and billions of years.

Literature:

1. E. McConkie, “The Human Genome,” Technosphere, 2020.

2. https://www.youtube.com/watch?v=7HSmu5l6vAg

3. https://waksoft.susu.ru/2019/04/27/mnozhestvo-mandelbrota-na-python

4. https://studfile.net/preview/5150865/page:24/

5. https://habr.com/ru/articles/776080/

6. https://habr.com/ru/articles/774930/

7. https://dzen.ru/a/Zh6fhjZT7FHAOebt

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *