Freeing our data from corporate slavery. Personal vault concept

Mathematica author Stephen Wolfram has been digital logging for about 40 years many aspects of professional and personal life

Now almost everyone understands the essence of some Internet corporations that seek to get as much personal data from people as possible – and make money on it. They offer free hosting, free messengers, free mail – so long as people give their files, photos, letters, private messages. Our data is making a lot of money, and people have become a product. Therefore, tech giants Google and Facebook are the largest corporations in human history. This is not surprising, because they have billions of units of free “raw materials” at their disposal, that is, “users” (by the way, with this word users refer to people in only two areas: the drug industry and the software industry).

Now is the time to end this. And get the data back under your control. This is the essence of the concept personal data storage (personal data services or personal data store, PDS).

We need convenient programs, services, databases and secure storage for photos, personal finances, social graph, data on personal productivity, food consumption, the history of all online and offline chats, a personal diary, medical data (heart rate, blood pressure, mood, etc. .), read literature and publicistic articles, viewed web pages, films and videos, listened to music, and so on.

Of course, this data should be stored for the entire life of a person – in an absolutely reliable storage, to which corporations and unauthorized persons have no access. We need convenient tools for analysis and statistics. We need personal neural networks to process data and predict personal decisions (for example, to recommend music groups, kitchen dishes, or people for communication).

Unfortunately, there is still no unified generally accepted and convenient approach to creating such solutions. But work is going in the right direction.

Infrastructure for storing personal data

Some researchers are thinking about a conceptual solution to the problem, that is, about what the entire infrastructure for personal data should be.

For example, developer @karlicoss described the concept of such an infrastructure

Basic principles:

  • Simplicity for peopleto make the data easy to view and read.
  • Simplicity for machine analysis, for data manipulation and interaction.

If you think about it, the second principle is more important. Because if we create a machine-readable infrastructure, programmers can process data and design human-friendly interfaces.

What else to envisage in the PDS concept? Must be API to receive any data from the personal archive.

It is logical that the easiest way to work with data is when it directly resides in your file system. In reality, personal data is scattered across dozens of different services and programs, which makes it very difficult to work with them. First, it is advisable to extract them from there and save them locally. Yes, in theory this is not necessary, because advanced PDS can support work with different data sources in different formats. For example, data can be stored in different cloud storages, retrieved through third-party APIs from other services and programs. True, you need to understand that these are unreliable storages.

For example, Twitter through its API gives 3200 latest tweets, Chrome stores history 90 daysand Firefox removes it based on clever algorithm… Your account in the cloud service can be closed at any time, and all data can be deleted. That is, third-party services do not imply long-term data storage.

Calculated sheet Babylonian worker, dated 3000 BC e. An example of long-term storage of personal information

Exporting data to personal storage

The concept is proposed as an intermediate solution data mirrors (data mirror).

This is a special application that runs continuously on the client side in the background – and constantly synchronizes the local archive with all external services. The application seems to “suck” your data from various programs and web services, saving it in an open machine-readable format like JSON / SQLite. In fact, it builds on disk this most personal storage, which in the future should contain all kinds of personal information.

In fact, no such universal application has yet been created that would automatically suck information of all formats and types from the whole variety of existing third-party applications and services – and save it locally.

This work has to be done in a semi-manual mode.

It’s about doing export information from all services and programs that allow it. Export in the largest possible universal format – and store this data in the archive. In the future, it will be possible to index and conveniently work with this data, but now our main task is to save it so that it does not disappear forever.

People understand how important it is to keep forever and ever Personal Photos. But few people realize the same for chat history in all messengers, but this is a truly priceless chronicle of human life. This information is erased from human memory over the years.

For example, ICQ chats were stored in plain text, so there was little effort needed to save them. So, if you now read your chats from the 90s, then you will rediscover a whole layer of personal history that you have long forgotten. Perhaps this is a very important part of the personal archive.

Equally important are medical data on health, heart rate, blood pressure, sleep time and other characteristics, which are now measured over the course of a lifetime by fitness trackers.

Visualization over a million emailsthat Stephen Wolfram has posted since 1989 shows sleep disturbances during years of strenuous work

To make it easier for yourself to regularly export / scrap personal data from different programs @karlicoss wrote a number of scripts for Reddit, Messenger / Facebook, Spotify, Instapaper, Pinboard, Github, and other services he uses.

Ideally, these programs allow you to find any message or note, that is, almost any of your think from the past, wherever it was recorded – in the Telegram or Vkontakte chat, comments on Habré, a book you read or in the code that you wrote. All information is stored in a single database with full-text search.


Instead of cloud-based corporate services, you need to switch to locally-oriented software (local-first software). It is so called in contrast to cloud applications.

Locally oriented software works much faster, with less latency than cloud applications, because here, at the press of a single button, packages do not travel around the globe, but all data is stored locally.

Synchronization of local data between all devices is provided, full control of a person over his data, work offline first (Offline First movement), painless resolution of conflicts in joint work, maximum security of information, long-term preservation of data for our descendants, like the payroll of the Babylonian worker above (by the way, in 2016, the decryption of the text revealed that the labor of the Babylonian worker was paid for with alcohol, and specifically beer).

Thus, locally-oriented software meets all seven of these principles. According to experts, data structures like CRDT (conflict-free replicated data type). These data structures can be replicated among many computers on the network, with the replicas being updated independently and concurrently without coordination between them, but there is always the mathematical ability to eliminate inconsistency. This is the Strong Eventual Consistency model.

This consistency model makes CRDT data structures similar to version control systems like Git. For a better introduction to CRDT, you can read article by Alexey Babulevich

Git scraping

The idea of ​​freeing personal data from “corporate slavery” with long-term local storage has recently gained particular popularity. Life has shown that nothing good can be expected from commercial web services. Therefore, individual developers are trying to create examples of personal information stores.

For example, FOSS developer and consultant Simon Willison is working on two tools Datasette and Dogsheepwhich are quite useful for personal vaults.

Datasette – a web application for processing data and publishing it in a readable format, as an interactive website (demo). It’s just one big item Datasette ecosystem – open source tools for collecting, analyzing and publishing interesting data. The ecosystem is divided into two parts: tools for building SQLite databases (for use with Datasette) and plugins that extend the functionality of Datasette.

Different plugins allow you to combine data with each other. For example, overlay the coordinates of objects from one database on a geographic map.

Willison is experimenting with regularly scraping different sites to publish data to a GitHub repository. It turns out a slice of data on the change of a certain object in time. He calls this technique git scraping… In the future, the collected data can be converted and Datasette.

See examples git scraping on Github. This is one of the key techniques for populating personal data storage in a standard open format for long-term storage.

There is still a long way to go to free up your data and build the infrastructure to keep your personal information safe and secure. In the future, it can be imagined that this information will also include memories and emotions that are removed from a neuro-computer interface such as Neuralink, so that in the aggregate, the storage will almost completely reflect the owner’s personality, presenting a kind of “digital life cast” or human avatar.

Very inspiring individual examples of heroic efforts to digitize their lives, like Stephen Wolfram… The photo on the left shows a home RIAD array with its storage of information for 40 years.

Stephen Wolfram tries to log all events in his work. The main thing is to preserve them. And you can save them only under your control, on your own server. A person must have complete control over the hardware, the software, and the data that he owns.


Order and work immediately! Creature VDS any configuration within a minute, including servers for storing large amounts of data up to 4000 GB, CEPH storage based on fast NVMe disks from Intel. Epic 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *