German summary: Wikidata wird größer und erfolgreicher. Im nächsten Jahr müssen wir Strategien und Werkzeuge entwickeln um Wikidata zu skalieren. In diesem Beitrag lege ich meine Überlegungen dazu dar.


 

Wikidata is becoming more successful every single day. Every single day we cover more topics and have more data about them. Every single day new people join our community. Every single day we provide more people with more access to more knowledge. This is amazing. But with any growth comes growing pains. We need to start thinking about them and build strategies for dealing with them.

Wikidata needs to scale in two ways: socially and technically. I will not go into the details of technical scaling here but instead focus on the social scaling. With social scaling I mean enabling all of us to deal with more attention, data and people around Wikidata. There are several key things that need to be in place to make this happen:

  • A welcome wagon and good documentation for newcomers to help them become part of the community and understand our shared norms, values, policies and traditions.
  • Good tools to help us maintain our data and find issues quickly and deal with them swiftly.
  • A shared understanding that providing high-quality data and knowledge is important.
  • Communication tools like the weekly summary and Project chat that help us keep everyone on the same page.
  • Structures that scale with enough people with advanced rights to not overwhelm and burn out any one of them.

We have all of these in place but all of them need more work from all of us to really prepare us for what is ahead over the next months and years.

One of the biggest pressures Wikidata is facing now is organisations wanting to push large amounts of data into Wikidata. This is great if it is done correctly and if it is data we truly care about. There are key criteria I think we should consider when accepting large data donations:

  • Is the data reliable, trustworthy, current and published somewhere referencable? We are a secondary database, meaning we state what other sources say.
  • Is the data going to be used? Data that is not used is exponentially harder to maintain because less people see it.
  • Is the organization providing the data going to help keep it in good shape? Or are other people willing to do it? Data donations need champions feeling responsible for making them a success in the long run.
  • Is it helping us fix an important gap or counter a bias we have in our knowledge base?
  • Is it improving existing topics more than adding new ones? We need to improve the depth of our data before we continue to expand its breadth.

So once we have this data how can we make sure it stays in good shape? Because one of the crucial points for scaling Wikidata is quality of and trust in the data on Wikidata. How can we ensure high quality of the data in Wikidata even on a large scale? The key pieces necessary to achieve this:

  • A community that cares about making sure the data we provide is correct, complete and up-to-date
  • Many eyes on the data
  • Tools that help maintenance
  • An understanding that we don’t have to have it all

Many eyes on the data. What does it mean? The idea is simple. The more people see and use the data the more people will be able to find mistakes and correct them. The more data from Wikidata is used the more people will get in contact with it and help keep it in good shape. More usage of Wikidata data in large Wikipedias is an obvious goal there. More and more infoboxes need to be migrated over the next year to make use of Wikidata. The development team will concentrate on making sure this is possible by removing big remaining blockers like support for quantities with units, access to data from arbitrary items as well as good examples and documentation. At the same time we need to work on improving the visibility of changes on Wikidata in the Wikipedia’s watchlists and recent changes. Just as important for getting more eyes on our data are 3rd-party users outside Wikimedia. Wikidata data is starting to be used all over the internet. It is being exposed to people even in unexpected places. What is of utmost importance in both cases is that it is easy for people to make and feed back changes to Wikidata. This will only work with well working feedback loops. We need to encourage 3rd-party users to be good players in our ecosystem and make this happen – also for their own benefit.

Tools that help maintenance. As we scale Wikidata we also need to provide more and better tools to find issues in the data and fix them. Making sure that the data is consistent with itself is the first step. A team of students is working with the development team now on improving the system for that. This will make it easy to spot people who’s date of birth is after their date of death and so on. The next step is checking against other databases and reporting mismatches. That is the other part of the student project. When you look at an item you should immediately see statements that are flagged as potentially problematic and review them. In addition more and more visualizations are being built that make it easy to spot outliers. One recent example is the Tree of Life.

An understanding that we don’t have to have it all. We should not aim to be the one and only place for structured open data on the web. We should strive to be a hub that covers important ground but also gives users the ability to find other more specialized sources. Our mission is to provide free access to knowledge for everyone. But we can do this just as well when we have pointers to other places where people can get this information. This is especially the case for niche topics and highly detailed data. We are a part of an ecosystem and we should help expand the pie for everyone by being a hub that points to all kinds of specialized databases. Why is this so important? We are part of a larger ecosystem. Success means making the pie bigger – not getting the whole pie for ourselves. We can’t do it all on our own.

If we keep all this in mind and preserve our welcoming culture we can continue to build something truly amazing and provide more people with more access to more knowledge every single day.

Improving the data quality and trust in the data we have will be a major development focus of the first months of 2015.