(Die deutsche Version dieses Artikels ist hier.)
The following is a guest post by Magnus Manske, active tool developer around Wikidata and author of the software that later evolved into MediaWiki.
Wikidata is the youngest child of the Wikimedia family. Its main purpose is to serve as a “Commons for factoids”, a central repository for key data about the topics on, and links between, the hundreds of language editions of Wikipedia. At time of writing, Wikidata already contains about 10 million items, more than any edition of Wikipedia (English Wikipedia currently has 4.2 million entries). But while, as with Commons, its central purpose is to serve Wikipedia and its sister projects, Wikidata has significant value beyond that; namely, it offers machine-readable, interlinked data about millions of topics in many languages via a standardized interface (API).
Such a structured data repository has long been a “holy grail” in computer science, since the humble beginnings of research into artificial intelligence, to current applications like Google’s Knowledge Graph and Wolfram Alpha, and towards future systems like “intelligent” user agents or (who knows?) the Singularity.
The scale of any such data collection is a daunting one, and while some companies can afford to pour money into it, other groups, such as DBpedia, have tried to harvest the free-form data stored in Wikipedia. However, Wikidata’s mixture of human and bot editing, the knowledge of Wikipedia as a resource, and evolving features such as multiple property types, source annotation, and qualifiers add a new quality to the web of knowledge, and several tools have already sprung up to take advantage of these, and to demonstrate its potential. A fairly complete list is available.
Views on Wikidata
For a straight-forward example of such a tool, have a look at Mozart. This tool does not merely pull and display data about an item; it “understands” that this item is a person, and queries additional, person-specific items, such as relatives. It also shows person-specific information that does not refer to other items, such as Authority Control data. Mozart’s compositions are listed, and can be played right on the page, if a file exists on Commons. To a degree, it can also use the language information in Wikidata, so you can request the same page in German (mostly).
Instead of looking only for direct relatives, a tool can also follow a “chain” of certain properties between items, and retrieve an “item cluster”, such as a genealogical tree (pretty and heavy-duty tree for Mozart). The Wikidata family tree around John F. Kennedy contains over 10.000 people at time of writing. In similar fashion, a tool can follow taxonomic connections between species up to their taxonomic roots, and generate an entire tree of life (warning: huge page!).
These tools demonstrate that even in its early stages, Wikidata allows to generate complex results with a fairly moderate amount of programming involved. For a more futuristic demo, talk to Wiri (Google Chrome recommended).
Edit this item
Unsurprisingly to anyone who has volunteered on Wikimedia projects before, tools to help with editing are also emerging. Some have the dual function of interrogating Wikidata and displaying results, while at the same time informing about “things to do”. If you look at the genre of television series on Wikidata, you will notice that over half of them have no genre assigned. (Hint: Click on the “piece of pie” in the pie chart to see the items. Can you assign a genre to Lost?).
When editing Wikidata, one usually links to an item by looking for its name. Bad luck if you look for “John Taylor”, for there are currently 52 items with that name but no discerning description. If you want to find all items that use the same term, try the Terminator; it also has (daily updated) lists with items that have the same title but no description.
Similarly, you can look for items by Wikipedia category. If you want some more complex filter, or want to write your own tool and look for something to ease your workload, there is a tool that can find, say, Operas without a librettist (you will need to edit the URL to change the query, though).
This is only the beginning
While most of these tools are little more than demos, or primarily serve Wikidata and its editors, they nicely showcase the potential of the project. There might not be much you can learn about Archduke Ernest of Austria from Wikidata, but it is more than you would get on English Wikipedia (no article). It might be enough information to write a stub article. And with more statements being added, more property types (dates, locations) emerging, and more powerful ways to query Wikidata, I am certain we will see many, and even more amazing tools being written in the near future. Unless the Singularity writes them for us.Technisches. Sie können diesen Beitrag mit RSS 2.0 Feed abonieren. Sowohl Kommentare als auch Pings sind derzeit geschlossen.