(Die deutsche Version dieses Artikels ist hier.)

This is the first in a short series of blog entries in which I explain some of the design decisions for Wikidata. They are my personal opinion, but they have a strong impact on some features or non-features of Wikidata. This is to explain them.

By Tomascastelazo (Own work)
via Wikimedia Commons

One of the features – others call it a bug – of Wikidata is that you can choose any item as the value for a property. Many of them do not make sense: so, if you have the article on Paris, saying that its country is goat cheese does not really make sense. Wouldn’t it be great if Wikidata knew which values for a country would make sense, and only allow you to choose those, instead of allowing any possible value here? Wouldn’t it be great if the community decided that a property like the widely used P107 could actually be restricted to the six possible values they decided on?

I strongly disagree.

Another feature – others call it a bug – of Wikidata is that you can use any property on any item. If you want to add the capital city of Julius Caesar, you’re welcome to do so. Wouldn’t it be great if Wikidata knew which properties make sense for a given item, and would not only restrict you to use those but even list the ones that still have missing values? Wouldn’t it be great if the community could create templates of properties that should all be filled out for a person, or for a city, or a country – and not allowing anything else?

I strongly disagree.

I completely agree that smarter suggestions would be great. Some of these could be pretty trivial to implement: count the frequency for the values of a property and make a suggestion based on that. What about suggesting properties? There’s lots of research going on in that area, basically something like “items with these properties also have these properties” – you might have seen that on certain shopping sites.

I am all for better suggestions. What I am strongly disagreeing with are strong restrictions. It provides far too much space for drama and edit-warring. Does every country have a capital? What is a country anyway? What should the possible values for the property „gender“ be? What are the right properties for presidents?

Anything that the system uses for building its user interface and core functionality – labels and descriptions, for example, or the links to Wikipedia pages – can not have references. This is something the system simply “believes.” On the other hand, if you add a statement saying that Kosovo is a country, you can add a reference to it. Others might say that Kosovo is a part of Serbia. You can add a reference for that too. But if you want to make the user interface use this kind of information – for example when a property is restricted to countries – the system needs to make a call whether Kosovo is an independent country or not. There is no room for the kind of knowledge diversity that Wikidata is build for.

I perceive the danger that some parts of Wikidata might get stuck in an ontology engineering exercise. I think these exercises can be fundamentally unresolvable, and thus that Wikidata’s mandate should not be to solve them. Wikidata should, in my opinion, work on a less abstract level: Let us enter the authors of Aerosmith’s “I Don’t Want to Miss a Thing”, and not discuss whether authorship can apply to a song or not. Let us trace the genealogy of the British monarch, and not whether officials can only be persons. Are you sure that no donkey has ever become a Roman senator? Can you tell whether drinks should have inventors?

Wikidata allows for a unique collaborative space for humans and bots. Much more than Wikipedia, which already sports a pretty amazing example of such an environment. In Wikipedia, we have bots checking for outdated references to websites, for correct usage of punctuation, etc. In Wikidata we can create bots that check whether a teacher has indeed lived before the death of its student. Whether all Roman senators have lived before the 6th century. Whether the population of the cities of a country add up to be less than the population of the country as a whole. And the bots doing these checks will need to find a way to report their results to humans, who can then check whether the bots discovered genuine inconsistencies – either in the real world or in Wikidata – or not.

The world is complex. Wikidata aims to collect structured knowledge about this complex world. The root of Wikidata, as the name hints, are wikis – and wikis mean freedom. Based on this legacy, Wikidata as a software does not aim to implement restricted types for properties, nor restricting sets of properties for types of item anytime soon.

(I skipped the boring technical details about why it would be hard to implement and what kind of problems could arise from implementations of the suggested features. There are some serious problems with that, but I wanted to stick with the conceptual reasons.)