(Die deutsche Version dieses Beitrags ist hier.)
In the first session of the first Wikimania, I presented the idea of enriching Wikipedia with structured data. Asked how long it would take to implement this, I answered: “Two weeks, if you know the MediaWiki software well.”
That was in 2005. It turned out, I would be slightly off.
Now, in 2013, we finally started using structured data from Wikidata in the Wikipedias. The project is still in its infancy, but I am already extremely proud of the Wikidata team and what they have achieved. I am very thankful to the many, many people that helped us get to where we are today (I started listing them explicitly, but this post became too long). There are still many things that need to be done, but the rough sketch of what Wikidata is and is not has been drawn, and I think we have created a very interesting new project. I am confident enough about Wikidata and its future, or else I would not be leaving.
Many people are projecting formidable expectations onto Wikidata. There will be plenty of disappointments there: Wikidata is not a panacea to all problems, and it is nothing magical. But I also notice more and more people understanding and appreciating the limits of the system, and working within these limits and achieving things that are close to magical. Wikidata is nowhere near natural language in expressivity, and won’t be for a long time. But with support for more datatypes, for queries, Wikimedia Commons and Wiktionary on the roadmap, Wikidata could possibly become the world’s most important repository of free knowledge, and maybe even a foundation for an artificial intelligence in which everyone can share in. We can leave the development of an artificial intelligence either to companies — or we can try to have this happen in the open with everyone able to join. And in my opinion, the only place that I currently see that is possibly up to this task, is the Wikimedia movement.
This week, Wikidata crossed the milestone of 20 Million statements. We have more than 750,000 coordinates and 250,000 points in time. We only started exploring the possibilities of visualizing this data or integrating it with external datasets for an even richer experience. The deeply-ingrained support for many languages provides us with novel datasets that can be used in many different circumstances. I expect references for statements to become a major input for NLP training algorithms. Wikidata has been devised to become an enabler and to lead to novel algorithms and applications, and we only scratched the surface here.
A visualization of all Wikidata data, with the present in the center and the location defining the angle of an item. The color is given by the type of the item. Time and location propagates to connected items that lack this information.
What is the biggest risk for Wikidata in my opinion?
Not to be used.
To be used means that the data becomes visible. Errors and omissions become glaring, and beg to be corrected. To be used ensures the quality of the data. If an app displays data to a thousand users, maybe fifty will notice an error, and maybe one of them will actually go ahead and fix it. And that is enough — this is the beauty of a central knowledge repository.
To be used also will remove the risk of overengineering the ontology. A focus on getting the ontology “right” could use up a lot of precious contributor time and energy with unclear benefits. Usage of the data will drive completeness and schema creation and discussions in a much clearer way. Requirements that come from Wikipedia’s usage of the data and, secondary, from external applications will yield a much higher benefit. Another advantage that will play out increasingly is the fact that Wikidata is much more amenable to bot-driven “data refactoring” than Wikipedia ever could be.
To be used will also lead to visibility, and thus to more people using it, thus to more people wanting to join Wikidata. Be it by using Qids, Wikidata-Identifiers, to link to a central, Web-based knowledge base, be it by using Wikidata to make a small application or Website more intelligent, be it by using Wikidata to create completely novel applications: the wider Wikidata is used, the more eyeballs are on it, directly or indirectly, the more energy will be spent in improving and maintaining Wikidata.
I am extremely happy with the growth of the Wikidata-Community. I am not seeing yet the wide adoption in data usage. It will be crucial to increase this in the future — and I have no doubt that the plans for Wikidata are set up well for that.
What are the biggest strengths of Wikidata?
The community. ‘nuff said. They are simply amazing. Unbelievably amazing. My deepest gratitude goes to them.
The tight integration with Wikipedia and the other Wikimedia projects. This gave us a huge head start over comparable projects, and the positive effects of that cannot be overestimated. Without the immediate connection to Wikipedia, be it the language links that were used to build an entity base, be it the infoboxes that are used to guide the creation of the knowledge, a lot of energy would have dissipated. And the group of people investing time and energy would have been much smaller.
The extreme flexibility of the knowledge model, and it the sound grounding of its semantics in standards like OWL. Keeping Wikidata tied to wiki-principles and building on the MediaWiki software gave us a huge array of tested, community oriented tools and principles. To let as many restrictions as possible be enforced by the community and not by the software allowed for a growth well beyond the tight limitations of the developers’ imagination.
Publishing this post is my last action as director of the Wikidata project. It was a great time, and I am looking forward to see how Wikidata will develop. From now on, I am a member of the community, and I am excited to see it grow.