The problem for historians is not only damaged texts, but also their origin – time and place. For history, the place of writing texts is important, whether it be a lengthy document or something like an accounting report. But it is far from always possible to find out, simply because such documents often move hundreds and even thousands of kilometers from the place of their creation. Well, the third important factor for historians is the time when the text was written. Thanks to radiocarbon and other types of analysis, the age of the document can be determined quite accurately. But here’s the problem – for any kind of analysis, a sample of the medium on which the text is applied is required. And in the case of ancient materials, a small impact is enough, and a priceless artifact can be severely damaged or completely destroyed.
The technology discussed in the material is capable of solving almost all of these problems. Of course, the tool is not perfect, but it is capable of many things that even a very highly qualified historian cannot do.
Damaged texts and Pythia
Often the documents that have come down to our time are incomplete. It is impossible or very difficult to restore the meaning of the lost area in a normal situation. In most cases, lost sections, or rather, their meaning, are restored using the preserved text, as well as possible clues that may be in other sources, historical context, etc.
Several years ago, a group of scientists and developers created a system that can significantly speed up this process. For example, Yannis Assael of DeepMind, Thea Sommerschild, and Jonathan Prag developed Pythia with researchers from the University of Oxford. This is a technology for restoring ancient texts, which is named after the priestess-soothsayer at the temple of Apollo in Ancient Greece.
First, the scientists began working with the database of the Packard Institute for the Humanities. It is the largest digital collection of ancient Greek inscriptions. They decided to convert it into machine-readable text, the base, which was called PHI-ML. The database, by the way, contained about 35,000 inscriptions and about 3 million words – from the 7th century BC to the 5th century AD. Once all this was converted into text that the AI system could understand, Pythia was taught to predict the missing letters in intentionally incomplete or damaged words. At the heart of all this was a complex system of neural networks.
When faced with a problematic word or sentence, Pythia suggested up to 20 different letter and word choices that might have been in the original text. In addition, the system displayed the level of “credibility” for all the proposed options. After a series of tests, the developers of the system tested it in action on real texts with the already known decoding. Both post-graduate students in epigraphy worked on these texts at the same time. The team tested the system by comparing Pythia’s results on parsing 2,949 inscriptions. The Pythia output had 30.1% errors compared to 57.3% for graduate students. Pythia was also able to complete the task much faster: it took only a few seconds to decipher 50 inscriptions, compared to two hours for novice scientists.
If you are reading our blog, you may be interested in these texts:
A new stage of work – Ithaca enters the game
As mentioned at the beginning, the final system, which is working on deciphering the texts, received
. She is engaged not only in the restoration of damaged areas, but also helps to find out where and when the restored texts were created. The group of authors of the project posted the results of the work on their blog, where, among other things, they showed on an interactive map the possible places where ancient texts were created. Dating – from 800 BC. before 800 AD
As it turned out, the accuracy of the new algorithm is about 62%. Whereas the average estimate of the accuracy of restoring texts with an estimate of the date and place of their origin by scientists is only 25%. But there is an interesting nuance. If the system works together with a person who is an epigrapher, then the accuracy is even higher – it rises to 72%. The spread of dates, that is, the time of origin of the texts, is plus or minus 30 years. Not so much, given the time interval with which the work is being carried out – over 1500 years.
After the correctness of the work was confirmed by scientists, the possibilities of Ithaca were decided to be used to solve the problem with the dating of a certain pool of Athenian texts. Several experts on ancient Greece have disagreed about their dating. For example, some scholars believed that the texts were written no later than 446 BC, while others argued that the texts were written earlier, around 420 AD. The algorithm, having analyzed the disputed pool of texts, gave a date of 421 AD.
The difference does not seem to be that big, but for specialists in ancient Greece it is huge, since it is important for determining the course of political history in the ancient state.
As far as experts can judge, the algorithm works correctly, and, as indicated above, the result of Ithaca’s work is higher than the result of the work of scientists. Now the system is planned to be adapted to work with other texts in other languages, including Akkadian, Hebrew and Maya.
Several scientists, after analyzing the results of the algorithm created by DeepMind, said they are looking forward to applying the possibilities of technology in other directions of history. So, in museums there are a lot of texts, about the origin of which practically nothing is known – simply because they became victims of “hunters of antiquities”. Those. people who bought and sold texts obtained through unknown channels for the sake of enrichment.
As a result, neither the exact dates of the creation of such documents, nor the place of origin are known to scientists. Needless to say, we too are looking forward to the results of Ithaca’s work in the near future.