Posts Tagged ‘english’



The Wikidata tool ecosystem

(Die deutsche Version dieses Artikels ist hier.)

The following is a guest post by Magnus Manske, active tool developer around Wikidata and author of the software that later evolved into MediaWiki.

Wikidata is the youngest child of the Wikimedia family. Its main purpose is to serve as a „Commons for factoids“, a central repository for key data about the topics on, and links between, the hundreds of language editions of Wikipedia. At time of writing, Wikidata already contains about 10 million items, more than any edition of Wikipedia (English Wikipedia currently has 4.2 million entries). But while, as with Commons, its central purpose is to serve Wikipedia and its sister projects, Wikidata has significant value beyond that; namely, it offers machine-readable, interlinked data about millions of topics in many languages via a standardized interface (API).

Such a structured data repository has long been a „holy grail“ in computer science, since the humble beginnings of research into artificial intelligence, to current applications like Google’s Knowledge Graph and Wolfram Alpha, and towards future systems like „intelligent“ user agents or (who knows?) the Singularity.

The scale of any such data collection is a daunting one, and while some companies can afford to pour money into it, other groups, such as DBpedia, have tried to harvest the free-form data stored in Wikipedia. However, Wikidata’s mixture of human and bot editing, the knowledge of Wikipedia as a resource, and evolving features such as multiple property types, source annotation, and qualifiers add a new quality to the web of knowledge, and several tools have already sprung up to take advantage of these, and to demonstrate its potential. A fairly complete list is available.

Views on Wikidata


Family tree of Johann Seabastian Bach

For a straight-forward example of such a tool, have a look at Mozart. This tool does not merely pull and display data about an item; it „understands“ that this item is a person, and queries additional, person-specific items, such as relatives. It also shows person-specific information that does not refer to other items, such as Authority Control data. Mozart’s compositions are listed, and can be played right on the page, if a file exists on Commons. To a degree, it can also use the language information in Wikidata, so you can request the same page in German (mostly).

Instead of looking only for direct relatives, a tool can also follow a „chain“ of certain properties between items, and retrieve an „item cluster“, such as a genealogical tree (pretty and heavy-duty tree for Mozart). The Wikidata family tree around John F. Kennedy contains over 10.000 people at time of writing. In similar fashion, a tool can follow taxonomic connections between species up to their taxonomic roots, and generate an entire tree of life (warning: huge page!).

These tools demonstrate that even in its early stages, Wikidata allows to generate complex results with a fairly moderate amount of programming involved. For a more futuristic demo, talk to Wiri (Google Chrome recommended).

Edit this item

Unsurprisingly to anyone who has volunteered on Wikimedia projects before, tools to help with editing are also emerging. Some have the dual function of interrogating Wikidata and displaying results, while at the same time informing about „things to do“. If you look at the genre of television series on Wikidata, you will notice that over half of them have no genre assigned. (Hint: Click on the „piece of pie“ in the pie chart to see the items. Can you assign a genre to Lost?).

When editing Wikidata, one usually links to an item by looking for its name. Bad luck if you look for „John Taylor“, for there are currently 52 items with that name but no discerning description. If you want to find all items that use the same term, try the Terminator; it also has (daily updated) lists with items that have the same title but no description.

Similarly, you can look for items by Wikipedia category. If you want some more complex filter, or want to write your own tool and look for something to ease your workload, there is a tool that can find, say, Operas without a librettist (you will need to edit the URL to change the query, though).

There are also many JavaScript-based tools that work directly on Wikidata. A single click to import all language links or species taxonomy from Wikipedia, find authority control data, declare the current item to be a female football player from Bosnia, or apply the properties of the current item to all items in the same Wikipedia category — tools for all of these exist.

This is only the beginning

While most of these tools are little more than demos, or primarily serve Wikidata and its editors, they nicely showcase the potential of the project. There might not be much you can learn about Archduke Ernest of Austria from Wikidata, but it is more than you would get on English Wikipedia (no article). It might be enough information to write a stub article. And with more statements being added, more property types (dates, locations) emerging, and more powerful ways to query Wikidata, I am certain we will see many, and even more amazing tools being written in the near future. Unless the Singularity writes them for us.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (3 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...

Wikidata all around the world

(Die deutsche Version dieses Artikels ist hier)

Since one month 11 Wikipedias have the ability to include data from Wikidata in their articles. Two days ago English Wikipedia was added to that group. Today the remaining 274 are joining. Usage examples are in the last blog entry. There is also an FAQ for this deployment.

This is a huge step for Wikidata and at the same time also another beginning. It’s a huge step because from now on all Wikipedias are able to collect, curate and use data together. For example every Wikipedia can query the ID of a movie on the Internet Movie Database and use it in their article as soon as someone added it to Wikidata. At the same time it is a beginning because there is still a lot to do. Accessing the data has to be made easier. More data has to be added to Wikidata (and translated where necessary). More sources have to be added to existing claims. More data types need to be made available – for example geocoordinates and time. Your help and your Feedback is very welcome and important there.

We’re looking forward to the next steps!

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (1 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...

And that makes 12

(Die deutsche Version dieses Beitrags gibt es hier.)

Today the English Wikipedia got the ability to include data from Wikidata. Four weeks ago the first 11 Wikipedias started testing this second phase of the project. This means by now 12 Wikipedias can make use of the shared data in their infoboxes for example. The available data includes things like conservation status for a species, ISBN for a book or the top level domain of a country.

A Request for Comments about how to use data from Wikidata is currently ongoing. Until the Request for Comments is closed you can continue to try it out on test2.wikipedia.org.

There are two ways to access the data:

  • Use a parser function like {{#property:p159}} in the wiki text of the article on the Wikimedia Foundation. This will return “San Francisco” as that is the current headquarters location of the non-profit.
  • For more complicated things you can use Lua. The documentation for this is here.

We are working on expanding the parser function so you can for example use {{#property:headquarters location}} instead of {{#property:p159}}. The complete plan for this is here.

The next step is the deployment on all remaining 274 Wikipedias. If there are no issues they will follow on Wednesday.

There is an FAQ for this deployment. Please help us with testing and feedback. The best place to leave feedback is this discussion page.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (3 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...

You can have all the data!

(Die deutsche Version dieses Beitrags gibt es hier.)

Today the first 11 Wikipedias got the ability to include data from Wikidata in their articles. These are the Italian, Hebrew, Hungarian, Russian, Turkish, Ukrainian, Uzbek, Croatian, Bosnian, Serbian and Serbo-Croatian Wikipedias. If you are curious you can also try it out on test2.wikipedia.org. This means the editors on these Wikipedias are now able to make use of the growing amount of structured data that is available in Wikidata as a common dataset. It includes things like conservation status for a species, ISBN for a book or the top level domain of a country.

There are two ways to access the data:

  • Use a parser function like {{#property:p169}} in the wiki text of the article on Yahoo!. This will return “Marissa Mayer” as she is the chief executive officer of the company.
  • For more complicated things you can use Lua. The documentation for this is here.

We are working on expanding the parser function so you can for example use {{#property:chief executive officer}} instead of {{#property:p169}}. The complete plan for this is here.

The next step is the deployment on the other Wikipedias. We will carefully monitor performance and if there are no issues they will follow within a week or two.

We have prepared an FAQ for this deployment and are looking forward to your testing and feedback. The best place to leave feedback is this discussion page.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (4 Bewertungen, Durchschnitt: 4,50 von 5)
Loading...

Some data on Wikidata

(Die deutsche Version dieses Artikels ist hier.)

This weekend we saw the creation of the 7th Million item on Wikidata. We had already collected some data based on the Wikidata database dumps, but now we extended the scripts so that they can provide us with daily updates. We want to use this chance to publish a few statistics.

Within the last month – since statements got enabled – more than 660,000 of these items also got statements about them, resulting in more than 1.4 Million statements. The item with the most statements is the United Nations (Q1065), listing all member states. The growth of the number of statements is amazing and well beyond what we expected.

So far we have more than 22 Million links to Wikipedia articles. There are about 24-25 Million Wikipedia articles, which means that we have more than 90% of all links already in Wikidata. Assuming the bots continue working as efficiently as they did so far, all links could be transferred in about a month or two, and then the rapid growth in the number of items is expected to slow down considerably.

At the same time, the edits on Wikidata are increasing a lot. With more than 12.5 Million edits as of now, Wikidata is one of the most dynamic Wikimedia projects. One might say that this is all due to bot activity — but that would be very wrong. About 2 Million edits have been done by human editors, and actually the percentage of edits performed by human editors is increasing. More than 4,500 human editors have been active on Wikidata in the last thirty days.

Regarding labels and descriptions, Wikidata has collected more than 23 Million labels and more than 5 Million descriptions so far, in 333 languages. We see a great opportunity for external tools and websites to help us with collecting labels and descriptions, as they basically provide the translation of the content of Wikidata and make its content available in many languages simultaneously.

It is still too early to really understand what these numbers mean, but we can clearly state that the activity of the community exceeds the hopes of the development team. Although many features are still missing, the warm embrace of the Wikidata project in its current state by the Wikimedia communities is simply amazing, and I can only say „thank you“ to those thousands and thousands of editors for their contribution to free knowledge.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (1 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...

Wikidata now live on all Wikipedias

(Die deutsche Version dieses Eintrags ist hier.)

Today we have enabled Wikidata’s language link support on all remaining Wikipedias. (So all 282 remaining ones except Hungarian, Hebrew, Italian and English.) This means the links in the sidebar that link to articles on the same topic in other languages are now coming from Wikidata.

What does this mean exactly?

  • Language links in the sidebar are automatically coming from Wikidata, once the article is linked on Wikidata. No special syntax is needed for that.
  • Existing language links in the wikitext will continue to work and overwrite links from Wikidata.
  • For individual articles, language links from Wikidata can be suppressed completely with the noexternallanglinks magic word.
  • Changes on Wikidata that relate to articles on this Wikipedia show up in Recent Changes and Watchlist, if the option is enabled by the user. (There are still some issues with this when you have enhanced recent changes enabled.)
  • At the bottom of the language links list you will see a link to edit the language links that leads you to the linked page on Wikidata.
  • You can see an example of how it looks in the article about the long-eared hedgehog.
  • The second phase (which is about statements/infoboxes) is in use on Wikidata, but can’t yet be used on any Wikipedia. It is scheduled to be enabled on the first few Wikipedias at the end of the month. The rest will follow soon after that.

An FAQ for editors is here and documentation exists here.

Staying up-to-date and contributing

There are several ways to stay up-to-date on everything happening around Wikidata. The weekly status updates are the most important ones. You can add yourself here to have them delivered to your talk page on-wiki. There is also Twitter, identi.ca, Facebook and Google+.
If you’d like to contribute to Wikidata this page is a good start.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (7 Bewertungen, Durchschnitt: 4,29 von 5)
Loading...

Restricting the World

(Die deutsche Version dieses Artikels ist hier.)

This is the first in a short series of blog entries in which I explain some of the design decisions for Wikidata. They are my personal opinion, but they have a strong impact on some features or non-features of Wikidata. This is to explain them.

By Tomascastelazo (Own work)
CC-BY-SA-3.0,
via Wikimedia Commons

One of the features – others call it a bug – of Wikidata is that you can choose any item as the value for a property. Many of them do not make sense: so, if you have the article on Paris, saying that its country is goat cheese does not really make sense. Wouldn’t it be great if Wikidata knew which values for a country would make sense, and only allow you to choose those, instead of allowing any possible value here? Wouldn’t it be great if the community decided that a property like the widely used P107 could actually be restricted to the six possible values they decided on?

I strongly disagree.

Another feature – others call it a bug – of Wikidata is that you can use any property on any item. If you want to add the capital city of Julius Caesar, you’re welcome to do so. Wouldn’t it be great if Wikidata knew which properties make sense for a given item, and would not only restrict you to use those but even list the ones that still have missing values? Wouldn’t it be great if the community could create templates of properties that should all be filled out for a person, or for a city, or a country – and not allowing anything else?

I strongly disagree.

I completely agree that smarter suggestions would be great. Some of these could be pretty trivial to implement: count the frequency for the values of a property and make a suggestion based on that. What about suggesting properties? There’s lots of research going on in that area, basically something like “items with these properties also have these properties” – you might have seen that on certain shopping sites.

I am all for better suggestions. What I am strongly disagreeing with are strong restrictions. It provides far too much space for drama and edit-warring. Does every country have a capital? What is a country anyway? What should the possible values for the property „gender“ be? What are the right properties for presidents?

Anything that the system uses for building its user interface and core functionality – labels and descriptions, for example, or the links to Wikipedia pages – can not have references. This is something the system simply “believes.” On the other hand, if you add a statement saying that Kosovo is a country, you can add a reference to it. Others might say that Kosovo is a part of Serbia. You can add a reference for that too. But if you want to make the user interface use this kind of information – for example when a property is restricted to countries – the system needs to make a call whether Kosovo is an independent country or not. There is no room for the kind of knowledge diversity that Wikidata is build for.

I perceive the danger that some parts of Wikidata might get stuck in an ontology engineering exercise. I think these exercises can be fundamentally unresolvable, and thus that Wikidata’s mandate should not be to solve them. Wikidata should, in my opinion, work on a less abstract level: Let us enter the authors of Aerosmith’s “I Don’t Want to Miss a Thing”, and not discuss whether authorship can apply to a song or not. Let us trace the genealogy of the British monarch, and not whether officials can only be persons. Are you sure that no donkey has ever become a Roman senator? Can you tell whether drinks should have inventors?

Wikidata allows for a unique collaborative space for humans and bots. Much more than Wikipedia, which already sports a pretty amazing example of such an environment. In Wikipedia, we have bots checking for outdated references to websites, for correct usage of punctuation, etc. In Wikidata we can create bots that check whether a teacher has indeed lived before the death of its student. Whether all Roman senators have lived before the 6th century. Whether the population of the cities of a country add up to be less than the population of the country as a whole. And the bots doing these checks will need to find a way to report their results to humans, who can then check whether the bots discovered genuine inconsistencies – either in the real world or in Wikidata – or not.

The world is complex. Wikidata aims to collect structured knowledge about this complex world. The root of Wikidata, as the name hints, are wikis – and wikis mean freedom. Based on this legacy, Wikidata as a software does not aim to implement restricted types for properties, nor restricting sets of properties for types of item anytime soon.

(I skipped the boring technical details about why it would be hard to implement and what kind of problems could arise from implementations of the suggested features. There are some serious problems with that, but I wanted to stick with the conceptual reasons.)

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (7 Bewertungen, Durchschnitt: 3,86 von 5)
Loading...

The Future of Wikidata

(The German version of this article is here.)

Almost precisely one year ago, in March 2012, Wikimedia Deutschland started a completely new Wikimedia project – Wikidata. The goal of Wikidata is to create an open knowledge base about the world that can be read and edited by everyone. Wikidata is the biggest technical project that a chapter of the Wikimedia movement has ever undertaken.

The initial development of the Wikidata project is almost completed. Much has been achieved: the language links from Wikidata are already in use in four Wikipedia language versions (Hungarian, Hebrew, Italian, and English) and the other language versions will follow in the next days. The current state of Wikidata is nicely illustrated by the example of the page about Russia. With the help of staff, volunteers, and generous donations by [ai]², the Gordon and Betty Moore Foundation and Google, the foundation for the first new Wikimedia project since 2006 has been laid: a scalable infrastructure that allows for the central management of data in a wiki in order to make them available on Wikipedia and beyond, for example on blogs or websites.

The board of Wikimedia Deutschland has decided to continue the development of Wikidata with a team of eight in 2013. Wikimedia Deutschland will fund this development by means of donations.

In the coming year, the team will be working on the further development and maintenance of Wikidata. This includes, among other things:

  • the implementation of the third phase of Wikidata: the automatic creation and updating of lists and visualizations of the data in Wikidata
  • extending Wikidata with other data types, e.g. geodata
  • supporting the community in the growth and expansion of Wikidata, also when it is used outside of the different Wikipedia language versions
  • the possibility of deploying Wikidata in further Wikimedia projects, e.g. Wikimedia Commons or Wikivoyage

We expect that Wikidata will become an integral part of the Wikimedia movement. The excellent cooperation with the Wikimedia Foundation was an essential factor for this development: the Wikimedia Foundation not only operates Wikidata but also the many tools that have supported us during its development. We show our trust in the project and its goals by continuing to support Wikidata. In addition, we ensure the further development and maintenance.

Wikidata has the potential for more „great leaps“: Without the generous donations that funded the first year Wikidata would not have been possible. For the further expansion we hope to find additional partners who support us in reaching our goal of making the sum of all human knowledge accessible for every single person.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (5 Bewertungen, Durchschnitt: 4,40 von 5)
Loading...

Wikidata live on the English Wikipedia

(Die deutsche Version dieses Artikels ist hier.)

After the deployment of the first phase of Wikidata on the Hungarian, Hebrew and Italian Wikipedias we have added English to the list today.
This means language links on that Wikipedia are now also coming from Wikidata. This is yet another step on the way to moving towards a system where language links are stored in one central place (as opposed to the wikitext of each article on each Wikipedia).

What is going to happen exactly?

  • Language links in the sidebar will automatically come from Wikidata, once the article is linked on Wikidata. No special syntax is needed for that.
  • Existing language links in the wikitext will continue to work and overwrite links from Wikidata.
  • For individual articles, language links from Wikidata can be suppressed completely with the noexternallanglinks magic word.
  • Changes on Wikidata that relate to articles on this Wikipedia show up in Recent Changes and Watchlist, if the option is enabled by the user. (There are still some issues with this when you have enhanced recent changes enabled.)
  • At the bottom of the language links list you will see a link to edit the language links that leads you to the linked page on Wikidata.
  • You can see an example of how it looks in the article about Maria Goeppert-Mayer.
  • The second phase of Wikidata (which is about claims/infoboxes) was started on Wikidata, but can’t yet be used on any Wikipedia. This will follow later.

An FAQ is here.

What’s next?

The first parts of phase 2 have been deployed on wikidata.org. We are working on the missing parts of phase 2 now. This includes for example the ability to enter dates and geocoordinates.

At the same time we’re preparing the deployment on all remaining Wikipedias. This was planned for February 27, but due to a number of meetings at the Wikimedia Foundation offices this might be moved a few days back or forth. We will keep you updated.

UPDATE: The new date is March 6.

Office hour

If you have questions you’re for example welcome to come to one of the next office hours on IRC.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (3 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...

First parts of phase 2 of Wikidata going live

(The German version of this article is here.)

A few days after the roll-out of the first phase of Wikidata on the first three Wikipedias (Hungarian, Hebrew and Italian) the next step for Wikidata happened today. The first parts of phase 2 (infoboxes) are now in use on wikidata.org.

What does this mean?

an example for phase 2 of Wikidata

With this deployment you will be able to create statements. To do this you connect items with properties to other items or to content on Wikimedia Commons. This sounds more complicated than it really is. Here is an example: You will be able to create a property “child”. Then you can add a statement to the item for Marie Curie using this property to say that she is the mother of Irène Joliot-Curie and Ève Curie. You can then create another property “portrait”. Using this you can add another statement to the item for Marie Curie linking to a portrait of her on Wikimedia Commons. You can support all of these statements by adding references to them. You can see the result of this in the screenshot.

Just like we did with the language links, this is currently only on wikidata.org and not yet used on any Wikipedia. Additionally it is limited to two data types (items and images on Wikimedia commons). More data types will follow later. These include for example coordinates and dates. Also, the references are currently overly simple and allow only a single property and value. In the future, this will be a list of such properties and values, so that you can make more structured references. Bear with us as we roll out iteratively.

What’s next?

The next step is deployment of the language link part to the English language Wikipedia. This is currently planned for February 11. After that all other Wikipedias will follow. This is currently planned for February 27.

If you want to stay up-to-date on everything related to Wikidata subscribe to the weekly newsletter.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (2 Bewertungen, Durchschnitt: 5,00 von 5)
Loading...