Scaling Wikidata: success means making the pie bigger

German summary: Wikidata wird größer und erfolgreicher. Im nächsten Jahr müssen wir Strategien und Werkzeuge entwickeln um Wikidata zu skalieren. …

  • Lydia Pintscher
  • 3. January 2015

German summary: Wikidata wird größer und erfolgreicher. Im nächsten Jahr müssen wir Strategien und Werkzeuge entwickeln um Wikidata zu skalieren. In diesem Beitrag lege ich meine Überlegungen dazu dar.


Wikidata is becoming more successful every single day. Every single day we cover more topics and have more data about them. Every single day new people join our community. Every single day we provide more people with more access to more knowledge. This is amazing. But with any growth comes growing pains. We need to start thinking about them and build strategies for dealing with them.

Wikidata needs to scale in two ways: socially and technically. I will not go into the details of technical scaling here but instead focus on the social scaling. With social scaling I mean enabling all of us to deal with more attention, data and people around Wikidata. There are several key things that need to be in place to make this happen:

We have all of these in place but all of them need more work from all of us to really prepare us for what is ahead over the next months and years.

One of the biggest pressures Wikidata is facing now is organisations wanting to push large amounts of data into Wikidata. This is great if it is done correctly and if it is data we truly care about. There are key criteria I think we should consider when accepting large data donations:

So once we have this data how can we make sure it stays in good shape? Because one of the crucial points for scaling Wikidata is quality of and trust in the data on Wikidata. How can we ensure high quality of the data in Wikidata even on a large scale? The key pieces necessary to achieve this:

Many eyes on the data. What does it mean? The idea is simple. The more people see and use the data the more people will be able to find mistakes and correct them. The more data from Wikidata is used the more people will get in contact with it and help keep it in good shape. More usage of Wikidata data in large Wikipedias is an obvious goal there. More and more infoboxes need to be migrated over the next year to make use of Wikidata. The development team will concentrate on making sure this is possible by removing big remaining blockers like support for quantities with units, access to data from arbitrary items as well as good examples and documentation. At the same time we need to work on improving the visibility of changes on Wikidata in the Wikipedia’s watchlists and recent changes. Just as important for getting more eyes on our data are 3rd-party users outside Wikimedia. Wikidata data is starting to be used all over the internet. It is being exposed to people even in unexpected places. What is of utmost importance in both cases is that it is easy for people to make and feed back changes to Wikidata. This will only work with well working feedback loops. We need to encourage 3rd-party users to be good players in our ecosystem and make this happen – also for their own benefit.

Tools that help maintenance. As we scale Wikidata we also need to provide more and better tools to find issues in the data and fix them. Making sure that the data is consistent with itself is the first step. A team of students is working with the development team now on improving the system for that. This will make it easy to spot people who’s date of birth is after their date of death and so on. The next step is checking against other databases and reporting mismatches. That is the other part of the student project. When you look at an item you should immediately see statements that are flagged as potentially problematic and review them. In addition more and more visualizations are being built that make it easy to spot outliers. One recent example is the Tree of Life.

An understanding that we don’t have to have it all. We should not aim to be the one and only place for structured open data on the web. We should strive to be a hub that covers important ground but also gives users the ability to find other more specialized sources. Our mission is to provide free access to knowledge for everyone. But we can do this just as well when we have pointers to other places where people can get this information. This is especially the case for niche topics and highly detailed data. We are a part of an ecosystem and we should help expand the pie for everyone by being a hub that points to all kinds of specialized databases. Why is this so important? We are part of a larger ecosystem. Success means making the pie bigger – not getting the whole pie for ourselves. We can’t do it all on our own.

If we keep all this in mind and preserve our welcoming culture we can continue to build something truly amazing and provide more people with more access to more knowledge every single day.

Improving the data quality and trust in the data we have will be a major development focus of the first months of 2015.

  1. I think “Many eyes on the data” misses the point. Wikidata needs to be *used*. Eyes will follow.

    Comment by Nemo on 3. January 2015 at 19:59

  2. Yes I see this as more or less equivalent.

    Comment by Lydia Pintscher on 3. January 2015 at 21:17

  3. Ganz ehrlich – wenn das jetzt alles ist, was noch als deutscher Text kommt, fühle ich mich nur noch veralbert. Solange das von WMDE in Deutschland gemacht wird, erwarte ich neben dem englischen auch einen deutschen Text.

    Comment by Marcus Cyron on 4. January 2015 at 04:21

  4. I think we have not to focus to find new contributors before we have all the tools defined at the beginning of WD (mainly quantity with units datatype and access to several items from one WP article). I saw too many contributors at the beginning of WD coming and trying to do something but due too many limitations they disappeared. People will come once we will be able to build infoboxes in WP because they will see the connection WD-WP.

    Then I think we have to avoid to be tangled up in details: I see very specific properties which I doubt will be used in a large extend or outside of one or two wikipedias. Here the only way to avoid the importation of large amount of data which won’t be used is to propose specialized Wikidatas which should be maintenaid by local team or “fan teams”. To import data we should perhaps require the use of these data in at least two or three WPs.

    Then to ensure data care, importation of data should be supported by wikidata projects. I think we have to put a little more constraints in bot activities concerning data importations.

    WD was like the Far west, a big empty space, but if we plan to integrate more people we will need more policies and rules because people will have less and less space as new contributors will arrive. The old time is over, a new one is coming.

    Comment by Snipre on 4. January 2015 at 18:43

  5. Snipre: The thing is I am pretty sure we don’t have a choice but concentrate on this now. It’s coming. We can ignore it and bear the consequences or address it and deal with it properly :)

    Comment by Lydia Pintscher on 5. January 2015 at 09:40

  6. das finde ich nicht so tool

    Comment by kuhlertüb on 13. January 2015 at 12:23

  7. […] however, the data incorporated into wikidata is open for anyone to use. In fact, wikidata is begging to be used and citizen scientists and citizen data scientists are welcome to use it. An international group of […]

    Pingback by Wikidata can change the way citizen scientists contribute | Gee-aI-eN-Gee on 15. January 2015 at 19:08

The comments are closed.