zur Artikelübersicht

Do you speak data? Wikidata as the Open Internet’s universal language

Languages are powerful instruments to create community, share knowledge and preserve heritage. Also in technology, language is important to store and share information, be it for education, on social platforms, or with voice assistants. Yet, some smaller languages are increasingly under threat of disappearing. That’s why Wikidata strives to promote linguistic diversity and multilingualism in the Wikimedia projects and beyond.
Photo: https://pixabay.com/vectors/hello-languages-word-cloud-foreign-3791381/

WMDE allgemein

20. February 2020

You can read this blog also in German.

As of today, Wikipedia exists in 309 different languages. In addition, there are several applications and functionalities in Wikidata that aim to strengthen smaller languages, such as the Romance language Occitan and its relative Catalan. Wikidata is a freely editable knowledge base, which supports Wikipedia but also provides data as a common source for myriad other platforms and tools. The mission of Wikidata is “Giving more people more access to more knowledge”. (So, on International Mother Language Day, we want to highlight what Wikidata’s contributions to help languages prosper inside Wikipedia and in the digital world.) 

Wikidata’s developers and community therefore focus on two things regarding language: They are helping to provide content in various languages, as well as allowing for interaction in the different languages. Lydia Pintscher, Product Manager of Wikidata explains why it is critical to support a diverse range of languages on Wikidata: “Language is an important part of creating an inclusive and diverse community and technology. This is especially important today, since more and more of our lives depend on technology and its interrelation with language. At Wikidata, we don’t want to leave anyone behind because they speak a different language.” 

Data inside the Wikipedia language projects

One way of how the data is used is on Wikipedia itself. Wikidata’s data can be reused in infoboxes of Wikipedia entries. This makes editing and updating of pages easier, since the editors don’t have to update the articles manually. This functionality is helpful for all Wikipedias. In smaller communities like the Catalan or Basque Wikipedia, it means that more pages can be updated since the work can be done by fewer editors. The data can also be used directly in the article through templates, as in this Basque example and article placeholders that are for example used in the Welsh Wikipedia.

Lexemes: a new form of data and language relation

Multilingualism is at the core of Wikidata. From the outset, any element relating to an object of knowledge and any property can have a name in one of the supported languages. Since 2018, Wikidata also stores a new type of data: words, described in many languages. This information is lexicographical data. Lexemes are the concrete data points in this lexicographical data. With all the language combinations that exist in Wikimedia projects, completely new possibilities open up: Translations from one language to another becomes possible, even though a printed dictionary for these languages does not exist. It can be generated with structured data about languages.You can learn more about the data model on the documentation page and read more about lexicographical data in this blog post.  

Language applications that use Wikidata

Wikidata also helps underrepresented languages to produce the tools and technologies they need. Focusing on marginalized languages and groups is often not considered financially advantageous by larger companies. However, since Wikidata is open and entirely free, smaller communities can build their own tools and integrate this data into existing applications. 

Lingua Libre is a library of audio records that everyone can complete. By recording words, proverbs, or sentences. The website uses Wikibase and data from Wikidata too and so far over 100,000 audio files have been recorded in 46 languages thanks to 128 active speakers. 

Egunean Behin (Once a Day) is a smartphone app in Basque, providing quizzes and trivia questions. The app reuses data from Wikidata, Wikipedia and Wikimedia Commons. The app is helping to highlight regional and language knowledge and is used by one out of ten Basque speakers. 

Access through data

Wikidata has the clear goal of giving more people more access to more knowledge. Language is a crucial factor to encourage understanding and bring everyone onboard of a truly Open Internet. Its efforts to promote diversity of languages serves to encourage linguistic diversity and multicultural exchange on the Wikimedia projects and beyond. Therefore, Wikidata invites you to use its data for your purposes and develop the tools you and your community needs. 

If you want to know more about Wikidata and languages and meet like-minded people who work on sharing information in minority languages, have a look at the page of The Celtic Knot Wikimedia Language Conference in Limerick, from 9th to 10th of July 2020. 


  1. Elisabeth Giesemann
    3. March 2020 at 10:06

    Hi Quentin,

    sure, that would be great! Please send us the link as well!

    Best regards,

  2. Quentin
    29. February 2020 at 20:01

    Well, I would like to ask if I can make a translation of this article for an Occitan online newspaper?
    With the credits and links of course.
    Best regards

Leave a Reply

Your email address will not be published. Required fields are marked *