The problem of multilingualism Question Answering over Knowledge Graphs and the new QALD-9-Plus dataset that can solve it

TLDR

For heterogeneous groups of users (approx. language, age), the ability to equally effectively interact with web applications is one of the most important factors in such a concept as “accessibility” (eng. Accessibility). This also applies to automated knowledge graph question answering (KGQA) systems that provide access to Semantic Web data. The Semantic Web) through a natural language interface. In the course of working on such a topic as the multilingual availability of KGQA systems, my colleagues and I identified several of the most pressing problems. One of which is the lack of multilingual benchmarks for KGQA.

In this article, we are improving one of the most popular benchmarks for KGQA — QALD-9, by creating reference translations of questions from the original dataset into 8 different languages ​​(German, French, Russian, Ukrainian, Belarusian, Armenian, Bashkir, Lithuanian). One of the most important aspects is that the translations have been provided and validated by native speakers of the respective language. Five of these languages ​​- Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian – to the best of our knowledge, have never been considered within the KGQA systems before. And two languages ​​(Bashkir and Belarusian) are considered UNESCO as “endangered”. We named the new extended dataset “QALD-9-plus”. The dataset is available online: https://github.com/Perevalov/qald_9_plus

Question-answer systems based on knowledge graphs (KGQA)

KGQA systems convert a natural language question into a Knowledge Graph query, thereby allowing their user to access “knowledge” without having to learn a query language (e.g. SPARQL). This is the main difference between KGQA systems and Text-based QA systems (also called in the literature as MRC, ODQA, IR-based) based on unstructured data.

Question-answer systems based on knowledge graphs are an example of a question and a query.  On the right are the most famous knowledge graphs
Question-answer systems based on knowledge graphs are an example of a question and a query. On the right are the most famous knowledge graphs

Knowledge Graphs are often created based on the Resource Description Framework. Resource Description Framework (RDF)), which is a “data model for metadata”. Data in RDF is represented as subject-predicate-object triplets, for example, Иван-Является_Другом-Мария, which is why it is convenient to visualize them as a graph. Known schema.org is an “add-on” to RDF and is used by many websites to mark up their content (de facto, to improve search results). This structuring of the World Wide Web is the foundation of the previously mentioned Semantic Web, where all resources are structured and linked to each other. Thus, KGQA systems are our guides to the world of structured information throughout the World Wide Web.

An example of a knowledge graph.  Original: https://www.w3.org/TR/rdf11-primer/
An example of a knowledge graph. Original: https://www.w3.org/TR/rdf11-primer/

The problem of multilingual question-answer systems

The seemingly natural availability of information through Google is not at all the same for speakers of those languages ​​that are spoken not by hundreds of millions of people (for example, Russian), but by a few million (for example, Belarusian) or even less (for example, Bashkir). Naturally, most people who speak Belarusian or Bashkir also speak Russian, which opens up opportunities for access to the second largest segment of the Web. But this does not work for all languages ​​and, as usual, everything is relative. People who speak Russian have access to understand only 6.9% of the content of the World Wide Web, while English-speaking users have access to 63.6% of the content (https://w3techs.com/technologies/overview/content_language). In this regard, the term “Digital Language Barrier” was introduced. digital language divide), the essence of which lies in the fact that the languages ​​that the user of the Web speaks directly affect his user experience.

We conducted a small experiment on how Google works with “big” and “small” languages ​​using the example of English, German, Belarusian and Bashkir languages. One simple question was asked, “How old is Donald Trump?” in each language, respectively. The answer, as they say, killed! The illustration below shows how Google successfully answered a question asked in English and German, and how it failed in Belarusian and Bashkir – isn’t this an indication of a problem? It is worth noting that with a successful answer, Google presents the answer in a structured form, this is the very Google Knowledge Graphwhich is also aided by schema.org markup.

Illustration of Google's work with English, German, Belarusian and Bashkir languages
Illustration of Google’s work with English, German, Belarusian and Bashkir languages

How do others deal with this problem?

There is a misconception that with the advent of unsupervised, weakly-supervised, and semi-supervised methods (eg word2vec or BERT), the problem of multilingualism has been solved (since there is no need for a large amount of tagged data). However, it is not. If the language model can be evaluated without using labeled data, then more complex systems will no longer work (eg KGQA). Therefore, the problem of having structured reference data (benchmarks) in several languages ​​is still urgent.

Question Answering over Knowledge Graphs is still a rather specific area of ​​applied science, so there are not so many works published on this topic. At the time of writing, there are only 3 multilingual benchmarks for KGQA. This QALD, RuBQand CWQ (see illustration below).

Existing multilingual KGQA benchmarks
Existing multilingual KGQA benchmarks

All of the above datasets are not ideal. For example, QALD-9, although it has 10 languages, however, the quality of translation, to put it mildly, leaves much to be desired. RuBQ 2.0 and CWQ used automatic (machine) translation to receive translations, which is a limitation.

What have we done? Dataset QALD-9-Plus

In order to improve the situation with multilingualism, we decided to completely update the QALD-9 dataset, leaving only questions in English in it and involving crowdsourcing platforms (Amazon Mechanical Turk, Yandex Toloka) in this work. Also, volunteers from the Open Data Science community were involved in the translation process.

The task consisted of 2 stages: (1) a native speaker translates from English into their native language, and (2) another native speaker checks the translation options. Both stages were carried out independently of each other.

An example of a translation and validation process.  Each question has been translated at least 2 times.
An example of a translation and validation process. Each question has been translated at least 2 times.

As a result of the work, we received translations into 8 different languages: Russian, Ukrainian, Lithuanian, Belarusian, Bashkir, Armenian, German and French. 5 of these languages ​​have never been represented in the KGQA area until now (Ukrainian, Lithuanian, Belarusian, Bashkir, Armenian), and 2 languages ​​(Belarusian, Bashkir) have not been represented at all considered by UNESCO as “endangered”.

In addition to translations, we have also improved the applicability of our benchmark. The original QALD-9 made it possible to evaluate systems operating only on the basis of a knowledge graph Dbpedia. In our work on QALD-9-Plus, we decided to move the benchmark to another knowledge graph — Wikidata. This turned out to be quite a difficult task, since automatic SPARQL query converters between different knowledge graphs do not yet exist, so I had to do the task manually. It’s amazing how different queries sometimes get on DBpedia and Wikidata due to different data models (see code below).

# Вопрос -- Когда Финляндия присоединилась к ЕС

# DBpedia
SELECT DISTINCT ?date 
WHERE { 
  dbr:Finland dbp:accessioneudate ?date .
}

# Wikidata
SELECT DISTINCT ?date 
WHERE { 
  wd:Q33 p:P463 ?membership . # member of 
  ?membership pq:P580 ?date . # start time
  ?membership ps:P463 wd:Q458 . # EU
}

The final characteristics of the QALD-9-Plus benchmark, as well as an example of its structure, are presented in the form of a table and a code fragment below.

Benchmark QALD-9-Plus and its characteristics
Benchmark QALD-9-Plus and its characteristics
{
  "id": "86",
  "question": [
    {
      "language": "en",
      "string": "What is the highest mountain in Germany?"
    },
    {
      "language": "de",
      "string": "Wie heißt der höchste Berg in Deutschland?"
    },
    {
      "language": "ru",
      "string": "Какая гора является самой высокой в Германии?"
    },
    {
      "language": "uk",
      "string": "Яка найвища гора у Німеччині?"
    },
    {
      "language": "be",
      "string": "Самая высокая гара ў Германіі?"
    },
    {
      "language": "lt",
      "string": "Pats Aukščiausias kalnas Vokietijoje?"
    },
    {
      "language": "ba",
      "string": "Германияла иң бейек тау?"
    }
  ],
  "query": {
    "sparql": "PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wd: <http://www.wikidata.org/entity/> SELECT ?uri WHERE { ?uri wdt:P31 wd:Q8502 ; wdt:P2044 ?elevation ; wdt:P17 wd:Q183 . } ORDER BY DESC(?elevation) LIMIT 1"
  },
  "answers": [
    {
      "head": {
        "vars": [
          "uri"
        ]
      },
      "results": {
        "bindings": [
          {
            "uri": {
              "type": "uri",
              "value": "http://www.wikidata.org/entity/Q3375"
            }
          }
        ]
      }
    }
  ]
}

Instead of a conclusion

I will be very happy if you read this far! It is at the end of this article that I would like to share useful links related to this work, namely:

Bibtex citation

@misc{perevalov2022qald9plus,
      title={QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers}, 
      author={Aleksandr Perevalov and Dennis Diefenbach and Ricardo Usbeck and Andreas Both},
      year={2022},
      eprint={2202.00120},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *