zurück

Wikidata exceeds 100 million entries

The free knowledge database Wikidata, which serves among other things as a source for the more than 300 language versions of Wikipedia, has reached the mark of 100 million items (read: data objects). In addition, the 10th birthday of the knowledge database is coming up this week. This prompted us to talk to Lydia Pintscher, Wikidata product manager at Wikimedia Deutschland.

Patrick Wildermann (free editor)

25. October 2022

Is the record number of 100 million uploaded items a reason to celebrate?

LYDIA PINTSCHER: Yes and no. Of course I’m pleased, but on the other hand I don’t want to attach too much importance to such “higher, faster, further” milestones, if only because the significance of this figure is limited. You can’t compare the data with Wikipedia articles, in which volunteers have really invested a lot of time, effort and research. Wikidata works differently, an item is generally relatively fast to create, partly automated. Let’s take the Wikidata project “Sum of all paintings” as an example, which focuses on paintings and other art – for this, museums, among others, provide databases that are usually already well prepared. This also means work, but on a different level than in Wikipedia. The real work then lies in expanding the item. Overall, the growth of our community, or the increasingly diverse uses of our data are more important to me.

Was there or is there an objective as to what scope Wikidata wants to achieve?

No, not quantified in figures. But we have given ourselves a strategic direction. My wish is for Wikidata to operate as a central node in an open data network. This node should provide a basic stock of data about everything imaginable – and anyone who wants to know more will get pointers to where the knowledge can be deepened. This is the same with Wikipedia articles, which usually do not tell everything about a topic, but offer links for further and closer study.

Lydia Pintscher has been working at Wikimedia Deutschland for more than ten years. She describes herself as an enthusiast for free software and free culture. In 2012, Lydia was part of the team that developed Wikidata and since then she has been working continuously in a dedicated team and together with the community to further develop the free knowledge database.

A concrete example?

In Wikidata, for example, there is a number of data on musicians and also songs, but not necessarily the complete discography of an artist. We have an entry on Elvis, but not on every song he ever recorded. What we do offer is a link to the Elivs entry in MusicBrainz – an open database that provides a lot of information from the world of music, including every single Elvis song. Thereby we enable a next step in this open data network. In general, we link in Wikidata to entries in more than ten thousand other databases, websites, catalogs or archives.

How big is the Wikidata community at the moment – and how could it grow even further?

It currently includes around 12,000 active editors, people who have made at least five edits in the past 30 days. Our goal is to make many more people aware of the benefits of contributing to Wikidata – for example, by raising awareness of the everyday technologies that use our data and how these technologies can be improved as Wikidata gets better. Our data is used in quite a few websites, apps, and services, but the people who connect with it and gain knowledge from it don’t usually notice. After all, they don’t go to Wikidata.org, but get the data delivered, for example, by the personal digital assistant on their smartphone when they ask a question.

How exactly can more awareness of the value of Wikidata be created?

When a cool game comes out that uses our data, or a new useful app, we try to point it out with regular showcases. The understanding that Wikidata is not just a dry knowledge base where only some data is maintained is fundamental for the motivation to contribute and to enable or improve more such applications. We need to push that further. I’m also a big fan of Depth of Wikipedia – which is a collection of social media accounts by Annie Rauwerda, who regularly posts curiosities and notable things from Wikipedia. In this way, she conveys how much wit and heart and soul is also put into this project and that Wikipedia is not just a boring encyclopedia. I would like to see something like that for Wikidata, too.

In 2014, Wikidata received the Open Data Innovation Award. Alongside Lydia Pintscher in the picture (from left to right): Sir Nigel Shadbolt, Executive Chair of the Open Data Institute. Magnus Manske, who wrote the first version of the MediaWiki software that powers Wikipedia in 2002. And Sir Tim Berners-Lee, the inventor of the World Wide Web.

What innovative projects have been created based on Wikidata?

In the past ten years, a lot! One example is the “Open Library” website, which allows users to borrow books. It uses data from Wikidata about authors to show users their reading behavior – for example, that a reader primarily reads books by male authors from northern Europe. Depending on this, she or he can then expand their own focus and make their reading behavior more diverse. Another example is a machine-learning program designed to recognize what is in historical theater photographs. The developer used Wikidata to improve the program. For example, if it thought it recognized a laptop in a 1905 picture, she could use information from Wikidata to teach the program that laptops didn’t even exist back then…

Another researcher has looked at how the political elite in different countries are networked and the interconnections that result: the prime minister of country X hires as justice minister his sister, the shah of country Y puts the largest state-owned company in the hands of his brother – such cases. With the data from Wikidata, these connections can be shown. The existing family connections, what companies these people own, what offices they hold. No other database provides such a combination of data from industry, law and government, worldwide at that.

Do your goals include making the Wikidata community more diverse?

Absolutely. If we want to collect and maintain data and make it usable for technology that people come into contact with every day, we also need as diverse a group as possible to contribute to Wikidata. That’s what we’re working on. One plan is to not focus so much on having everyone go to Wikidata to edit. Instead, we are looking very closely at the decision-making processes on Wikidata. Which ones are particularly important because they decide what our data looks like and who can use it and how? From this, we want to learn where we need to start in order to make more voices heard that have not been heard so far.

What decision-making processes are involved?

One example is the modeling of names. At first glance, this seems to be a simple matter, but in fact names are one of the most complex fields of all – among other things, because the way they are dealt with differs greatly from one cultural group to the next. The question of transliteration, for example, is linked to this: is the name Muhammad written in Cyrillic the same as the name in Arabic script? Are Muhamed and Mohammed the same? In such decision-making processes we need to bring in as many different perspectives as possible, precisely because there are so many different ideas of what a name is. And because it depends on modeling whether certain people are in doubt about being reflected in our data.

Does Wikidata deliver on the promise that everyone can contribute – or do hurdles exist, for example of a technical nature?

Theoretically, anyone can contribute, just like on Wikipedia. In fact – and this is also comparable with Wikipedia – things are a bit more complicated. Building a knowledge graph, modeling and maintaining data is not everyone’s cup of tea, just as researching or editing Wikipedia articles is not everyone’s cup of tea. It takes a certain mindset, a certain affinity to want to describe the world. In any case, we have a very open-minded community, one that remains open to things it doesn’t yet know. That’s important if you want to capture the world in data. And we welcome every new contributor who helps make Wikidata better, improving and enriching Wikipedia and far more.

More things to know about Wikidata

On October 29, Wikidata celebrates its 10th birthday! To mark the occasion, we’ve published a series of blog articles with lots of interesting facts about the history of the world’s largest free knowledge database and its unique community.

10 years ago, the foundations for Wikidata were laid. Lydia Pintscher on the beginnings.

Part 1 about the people who made Wikidata the collaborative project it is today.

Part 2 about the people who made Wikidata the collaborative project it is today.

Part 1 on the impact of Wikidata in fostering the Wikimedia mission.

Part 2 on the impact of Wikidata in fostering the Wikimedia mission.

Leave a Reply

Your email address will not be published. Required fields are marked *