Finding synonyms for terms using Wikidata (python)

There are many services for the selection of synonyms, but they rarely cope with terms that contain more than one word. For the selection of synonyms for more complex expressions, Wikidata can help. Few people know that in addition to the standard Wikipedia, there is an additional database called Wikidata, which is a knowledge graph of the Wikimedia Foundation. Now it is integrated into Wikipedia itself, so for many articles in the left menu you can find the Wikidata Element item. Wikidata is presented in the rdf model, that is, the information is in the form of triplets that characterize the entity. The triplet looks like a statement subject – predicate – object. An example, for the entity England one of such information triplets is presented: England – has its capital – London.
One of the predicates (link types) is altLabel, which means alternative names, which will help us in the search for synonyms.
It should immediately be borne in mind that Wikidata is a very extensive knowledge base, but, nevertheless, it is not perfect. Therefore, for terms that are not presented there, or are presented, but there are no alternative names entered for their entities, no synonyms will be found.
Finding an item in the knowledge base
The first step is to find the Wikidata entity that represents the given term. To do this, you need to find its unique identifier (Q_id). This can be done by sending a request through the Wikidata API.
Full API documentation can be found at https://www.mediawiki.org/wiki/API:Main_page
import requests
session = requests.Session()
URL = 'https://www.wikidata.org/w/api.php'
def wbgetentities(name):
res = session.post(URL, data={
'action': 'wbsearchentities',
'search': name,
'language':'ru',
'format': 'json',
})
try:
res_json = res.json()['search'][0]['id']
except:
res_json = None
return res_json
Q_id = wbgetentities(term)
Search for synonyms
To search for a synonym, use SPARQL. SPARQL is an RDF data query language that allows you to quickly search for data on a query. It will allow us to search for alternative names for our entity using the altLabel predicate.
The sparql-client library was used to send sparql queries.
import sparql
def create_query(first_id):
q = ('''
PREFIX entity: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?syno
WHERE {
?O ?P ?id .
OPTIONAL{?id skos:altLabel ?syno
filter (lang(?syno) = 'ru')}
VALUES ?id {entity:'''+ first_id +'''}
SERVICE wikibase:label {bd:serviceParam wikibase:language "ru" .}}''')
return q
synonyms = []
query = create_query(Q_id)
result = sparql.query('https://query.wikidata.org/sparql', query)
for r in result:
values = sparql.unpack_row(r)
if values[0] not in synonyms:
synonyms.append(values[0])
print(synonyms)
Thus, the code will return a list of synonyms for the term, if there are any in the Wikidata system. You can also find synonyms in another language if you change the code ‘ru’ in the request to the code of another language presented in the list https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all