Data Within Us: What Do Bioinformatics Do?

Let’s talk about the people of the future who decipher the organic big date. Over the past two decades, the amount of biological data that can be analyzed has increased many times due to the fact that the human genome was decoded. Before that, we could not even imagine that based on the information stored literally in our blood, it would be possible to determine our origin, check how the body would react to certain drugs, and even change its biological heredity.

This and other articles are the first to appear in blog on our website. Enjoy reading.


The attributes of the average bioinformatist are the same as those of a programmer – red eyes, stooped posture and coffee cup marks on the desktop. However, at this table, work is underway not on abstract algorithms and commands, but on the code of nature itself, which can tell us a lot about us and the world around us.

Specialists in this field deal with huge amounts of data (for example, the results of sequencing the genome of one person take about 100 gigabytes). Therefore, processing such an array of information requires Data Science approaches and tools. It is logical that a successful bioinformatist should understand not only biology and chemistry, but also methods of data analysis, statistics and mathematics – this makes his profession quite rare and in demand. Such specialists are especially needed in the fields of innovative medicine and drug development. Tech giants like IBM and Intel open their programsdedicated to the study of bioinformatics.

What does it take to become a bioinformatist?

  • Biology and Chemistry (university level);
  • Matstat, linear algebra, probability theory;
  • Programming languages ​​(Python and R, often also use C ++);
  • For structural bioinformatics: understanding mathematical analysis and the theory of differential equations.

It is possible to enter the field of bioinformatics both with a biological base, and with knowledge of programming and mathematics. For the former, work with ready-made bioinformatics programs is suitable, for the latter, a more algorithmic profile of the specialty.

image

What do bioinformatics do?

Modern bioinformatics is divided into two main branches – structural bioinformatics and sequence bioinformatics. In the first case, we see a person sitting in front of a computer and launching programs that help to study biological objects (for example, DNA or proteins) in 3D visualizations. They build computer models that make it possible to predict how a drug molecule will interact with a protein, what the spatial structure of a protein looks like in a cell, what properties of a molecule explain its interactions with cellular structures, etc.

Structural bioinformatics methods are actively used both in academic science and in industry: it is difficult to imagine a pharmaceutical company that can do without such specialists. In recent years, computational methods have greatly simplified the process of finding potential drugs, which made pharmaceutical development a much faster and cheaper process.

image
RNA-dependent RNA polymerase SARS-CoV-2 (left), as well as its relationship with the RNA duplex. A source.

What is a genome?

The genome is all information about the structure of an organism’s heredity. Almost all living beings have DNA as a carrier of the genome, but there are organisms that transmit their hereditary information in the form of RNA. The genome is passed from parents to children, and during this process of transmission, errors can occur – mutations.

image
Interaction of the drug remdesivir with the RNA-dependent RNA polymerase of the SARS-CoV-2 virus. A source.

Sequence bioinformatics works with a higher level of organization of living matter – from individual nucleotides, DNA and genes, to whole genomes and their comparisons with each other.

Imagine a person who sees in front of him a set of letters of the alphabet (but not simple, but genetic or amino acid) and looks for patterns in them, explaining and confirming them statistically, using computer methods. Sequence bioinformatics explains which mutation a particular disease is associated with or why harmful substances accumulate in the patient’s blood. In addition to medical data, sequence bioinformatics study the patterns of distribution of organisms across the earth, population differences between groups of animals, the roles and functions of specific genes. Thanks to this science, it is possible to test the effectiveness of drugs and study the biological mechanisms that explain their action.

For example, thanks to bioinformatics analysis, mutations were found and described that lead to the development of cystic fibrosis, a monogenic disease caused by a breakdown of a gene of one of the chlorine channels. And now we know much better who is the closest biological relative of man and how our ancestors settled around the planet. Moreover, each person, having read his genome, can find out where his family originates from and what ethnic group he belongs to. Many foreign (23andme, MyHeritage) and Russian (Genotek, Atlas) services allow you to get this service for a relatively low price (about 20 thousand rubles).

image
DNA Test Results for Origin and Population from MyHeritage.

image
Results of the analysis of a DNA test for population belonging from 23andMe.

How is the genome read?

Today genome sequencing is a routine procedure that will cost anyone about 150 thousand rubles (including in Russia). To read your genome, you just need to donate blood from a vein in a special laboratory: in two weeks you will receive a finished result with a detailed description of your genetic characteristics. In addition to your genome, you can analyze the genomes of your gut microbiota: you will learn the characteristics of the bacteria that inhabit your digestive system, and you will also receive advice from a professional nutritionist.

The genome can be read by different methods, one of the main ones now is the so-called “new generation sequencing”. To carry out this procedure, you must first obtain biological samples. In every cell of the body, the genome is the same, so most often blood is taken to read the genome (this is the easiest way). The cells then break down and separate the DNA from everything else. Then, the obtained DNA is crushed into many small pieces and “sewn” to each of them special adapters – artificially synthesized known nucleotide sequences. Then the DNA strands are separated, and the single-stranded strands are attached using adapters to a special plate on which sequencing is performed. During sequencing, complementary fluorescently labeled nucleotides are attached to the DNA sequence. Each labeled nucleotide, when attached, emits a beam of light of a specific wavelength, which is recorded on a computer. So the computer reads short sequences of the original DNA, which are then assembled into the original genome using special algorithms.

image
An example of data that sequence bioinformatics work with is amino acid sequence alignment.

Where do bioinformatics workers work and how much do they get?

The bioinformatics path is traditionally divided into two main areas – industry and science. The career of a bioinformatics scientist usually begins with a postgraduate position at one of the major institutions. Initially, bioinformatics receive a base rate, depending on their institution, the number of grants they participate in, and their number of affiliations – the places in which they are officially employed. Over time, the number of grants and affiliations grows, and after a couple of years of work in the academic environment, a bioinformatist without any problems receives an average salary (70-80 thousand rubles), but much depends on diligence and hard work. The most experienced bioinformatics end up running their own laboratories in their areas of specialization.

image

Where do you study for bioinformatics?

  • MSU – Faculty of Bioengineering and Bioinformatics
  • HSE – Data Analysis in Biology and Medicine (Master’s Program)
  • MIPT – Department of Bioinformatics
  • Institute of Bioinformatics (NPO)

Unlike an academy, no one in the industry will spend their time teaching an employee the necessary skills, so getting there is usually more difficult. Bioinformatics career paths in the industry vary greatly depending on their specialization and place of work. On average, wages in this area fluctuate from 70 thousand to 150 thousand rubles, depending on experience and specialization.

Famous bioinformatics

The history of bioinformatics should be traced back to Frederick Sanger, an English scientist who won the 1980 Nobel Prize in Chemistry for discovering a way to read DNA sequences. Since then, every year the methods of reading sequences have improved, however, the method of “Sanger sequencing” served as the basis for all further research in this area.

image

By the way, many programs created by Russian scientists are widely used all over the world today – for example, the genomic collector SPAdes, – St. Petersburg genome assembler, created at the St. Petersburg Institute, helps scientists from all over the world assemble short DNA sequences into large sequences to restore the original genomes of organisms.

Bioinformatics discoveries and achievements

Bioinformatics are making many useful discoveries these days. It would be impossible to imagine the development of drugs for coronavirus without decoding its genome and complex bioinformatic analysis of the processes occurring during the course of the disease. International Group scientists using the methods of comparative genomics and machine learning was able to understand what coronaviruses have in common with other pathogens.

It turned out that one of these features is the amplification of nuclear localization signals (NLS) of pathogenic viruses occurring in the course of evolution. This study could help investigate strains of viruses that could potentially be harmful to humans in the future, and possibly initiate preventive drug development.

In addition, bioinformatics have played a key role in the development of new methods of genome editing, in particular, the CRISPR / Cas9 system (a technology based on the immune system bacteria). Thanks to the bioinformatic analysis of the structure of these proteins and their evolutionary development, the accuracy and efficiency of this system has increased significantly in recent years, which made it possible to purposefully edit the genomes of many organisms (including humans).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *