Archiv für die ‘Technisches’ Kategorie



Scaling Wikidata: success means making the pie bigger

German summary: Wikidata wird größer und erfolgreicher. Im nächsten Jahr müssen wir Strategien und Werkzeuge entwickeln um Wikidata zu skalieren. In diesem Beitrag lege ich meine Überlegungen dazu dar.


 

Wikidata is becoming more successful every single day. Every single day we cover more topics and have more data about them. Every single day new people join our community. Every single day we provide more people with more access to more knowledge. This is amazing. But with any growth comes growing pains. We need to start thinking about them and build strategies for dealing with them.

Wikidata needs to scale in two ways: socially and technically. I will not go into the details of technical scaling here but instead focus on the social scaling. With social scaling I mean enabling all of us to deal with more attention, data and people around Wikidata. There are several key things that need to be in place to make this happen:

  • A welcome wagon and good documentation for newcomers to help them become part of the community and understand our shared norms, values, policies and traditions.
  • Good tools to help us maintain our data and find issues quickly and deal with them swiftly.
  • A shared understanding that providing high-quality data and knowledge is important.
  • Communication tools like the weekly summary and Project chat that help us keep everyone on the same page.
  • Structures that scale with enough people with advanced rights to not overwhelm and burn out any one of them.

We have all of these in place but all of them need more work from all of us to really prepare us for what is ahead over the next months and years.

One of the biggest pressures Wikidata is facing now is organisations wanting to push large amounts of data into Wikidata. This is great if it is done correctly and if it is data we truly care about. There are key criteria I think we should consider when accepting large data donations:

  • Is the data reliable, trustworthy, current and published somewhere referencable? We are a secondary database, meaning we state what other sources say.
  • Is the data going to be used? Data that is not used is exponentially harder to maintain because less people see it.
  • Is the organization providing the data going to help keep it in good shape? Or are other people willing to do it? Data donations need champions feeling responsible for making them a success in the long run.
  • Is it helping us fix an important gap or counter a bias we have in our knowledge base?
  • Is it improving existing topics more than adding new ones? We need to improve the depth of our data before we continue to expand its breadth.

So once we have this data how can we make sure it stays in good shape? Because one of the crucial points for scaling Wikidata is quality of and trust in the data on Wikidata. How can we ensure high quality of the data in Wikidata even on a large scale? The key pieces necessary to achieve this:

  • A community that cares about making sure the data we provide is correct, complete and up-to-date
  • Many eyes on the data
  • Tools that help maintenance
  • An understanding that we don’t have to have it all

Many eyes on the data. What does it mean? The idea is simple. The more people see and use the data the more people will be able to find mistakes and correct them. The more data from Wikidata is used the more people will get in contact with it and help keep it in good shape. More usage of Wikidata data in large Wikipedias is an obvious goal there. More and more infoboxes need to be migrated over the next year to make use of Wikidata. The development team will concentrate on making sure this is possible by removing big remaining blockers like support for quantities with units, access to data from arbitrary items as well as good examples and documentation. At the same time we need to work on improving the visibility of changes on Wikidata in the Wikipedia’s watchlists and recent changes. Just as important for getting more eyes on our data are 3rd-party users outside Wikimedia. Wikidata data is starting to be used all over the internet. It is being exposed to people even in unexpected places. What is of utmost importance in both cases is that it is easy for people to make and feed back changes to Wikidata. This will only work with well working feedback loops. We need to encourage 3rd-party users to be good players in our ecosystem and make this happen – also for their own benefit.

Tools that help maintenance. As we scale Wikidata we also need to provide more and better tools to find issues in the data and fix them. Making sure that the data is consistent with itself is the first step. A team of students is working with the development team now on improving the system for that. This will make it easy to spot people who’s date of birth is after their date of death and so on. The next step is checking against other databases and reporting mismatches. That is the other part of the student project. When you look at an item you should immediately see statements that are flagged as potentially problematic and review them. In addition more and more visualizations are being built that make it easy to spot outliers. One recent example is the Tree of Life.

An understanding that we don’t have to have it all. We should not aim to be the one and only place for structured open data on the web. We should strive to be a hub that covers important ground but also gives users the ability to find other more specialized sources. Our mission is to provide free access to knowledge for everyone. But we can do this just as well when we have pointers to other places where people can get this information. This is especially the case for niche topics and highly detailed data. We are a part of an ecosystem and we should help expand the pie for everyone by being a hub that points to all kinds of specialized databases. Why is this so important? We are part of a larger ecosystem. Success means making the pie bigger – not getting the whole pie for ourselves. We can’t do it all on our own.

If we keep all this in mind and preserve our welcoming culture we can continue to build something truly amazing and provide more people with more access to more knowledge every single day.

Improving the data quality and trust in the data we have will be a major development focus of the first months of 2015.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (11 Bewertungen, Durchschnitt: 4.73 von 5)
Loading...Loading...

Zwei Jahre Wikidata: Eine Feier mit Geschenken und einem Preis

“Wikidata team and painting” – work of a member of the Wikidata team as part of his employment. Licensed under CC BY-SA 4.0 via Wikimedia Commons

Letzte Woche feierte Wikidata seinen zweiten Geburtstag. Mit Wikidata sammeln Menschen Daten über die Welt (z. B. Einwohnerzahlen oder Geburtsdaten) in strukturierter Form und in mehreren hundert Sprachen. Diese Daten werden genutzt, um Wikipedia und deren Schwesterprojekte zu verbessern. Sie stehen aber darüber hinaus Allen zur freien Nachnutzung zur Verfügung. Mehr als 16.000 Nutzer der Wikidata-Community haben seit dem Start über 12,8 Millionen Einträge angelegt und mit Daten gefüllt – ehrenamtlich und kollaborativ wie im Schwesterprojekt Wikipedia. Die Arbeiten für die Software hinter Wikidata wurden von Wikimedia Deutschland begonnen und als offene Software kontinuierlich weiterentwickelt. Wikidata hat sich in den letzten zwei Jahren zu einem der erfolgreichsten Wikimedia-Projekte entwickelt und liegt bei der Anzahl der aktiven Benutzerinnen und Benutzer vor vielen Sprachversionen der Wikipedia.

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (8 Bewertungen, Durchschnitt: 5.00 von 5)
Loading...Loading...

Establishing Wikidata as the central hub for linked open life science data

German summary: Der wunderbaren Wikidata-Community ist es zu verdanken, dass jedes menschliche Gen (laut dem United States National Center for Biotechnology Information) jetzt durch einen Eintrag auf Wikidata repräsentiert wird. Benjamin Good, Andrew Su und Andra Waagmeester haben uns dankenswerterweise einen kurzen Bericht über ihre Arbeit mit Wikidata zur Verfügung gestellt.


Thanks to the amazing work of the Wikidata community, every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on Wikidata. We hope that these are the seeds for some amazing applications in biology and medicine. Here is a report from Benjamin Good, Andrew Su, and Andra Waagmeester on their work with Wikidata. Their work was supported by the National Institutes of Health under grant GM089820.

Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. By Courtesy: National Human Genome Research Institute [Public domain], via Wikimedia Commons

The life sciences are awash in data.  There are countless databases that track information about human genes, mutations, drugs, diseases, etc.  This data needs to be integrated if it is to be used to produce new knowledge and thereby improve the human condition.  For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web.  Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement.  With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.

In parallel to the appearance of Big Data in biology (and elsewhere), Wikipedia has arisen as one of the most important sources of all information on the Web.  Within the context of Wikipedia, members of our research team have helped to foster the growth of a large collection of articles that describe the function and importance of human genes. Wikipedia and the subset of it that focuses on human genes (which we call the Gene Wiki), have flourished due to their centrality, the presence of the edit button, and the desire of the larger community to share knowledge openly.

Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration.  Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community?  The steps we are planning to take to test this idea within the context of the life sciences, are:

  1. Establishing bots that populate Wikidata with entities representative of three key classes: genes, diseases, and drugs.
  2. Expanding the scope of these bots to include the addition of statements that link these entities together into a valuable network of knowledge.
  3. Developing applications that display this information to the public that both encourage and enable them to contribute their knowledge back to Wikidata.  The first implementation will be to use the Wikidata information to enhance the articles in Wikipedia.

We are excited to announce that the first step on this path has been completed!

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (6 Bewertungen, Durchschnitt: 5.00 von 5)
Loading...Loading...

Transatlantische Arbeit an strukturierten Daten in Berlin

The English version of this post can be found here.

Letzte Woche hatte Wikimedia Deutschland Besuch zu einer ganz besonderen technischen Gesprächsrunde in der Berliner Geschäftsstelle. Mitglieder des Multimedia-Teams der Wikimedia Foundation in San Francisco, Entwicklerinnen und Entwickler für Wikidata bei Wikimedia Deutschland und Mitglieder der Freiwilligen-Community kamen dort zusammen, um Wikimedia Commons und strukturierte Daten zu besprechen.

Strukturierte Daten war in vielen technischen Gesprächen auf der diesjährigen Wikimania in London ein wichtiges Thema. Es handelt sich um das Prinzip hinter Wikidata — einer freien Wissensdatenbank, in der Daten gefiltert, sortiert und abgefragt werden können. Auch mit der Möglichkeit zur Bearbeitung durch Menschen und Maschinen geht es über die Speicherung von Wikitext in einer spezifischen menschlichen Sprache hinaus. Die Technik im Maschinenraum von Wikidata ist ein Projekt namens Wikibase, mit dem Daten strukturiert gespeichert werden können. Ideen, dass Wikimedia Commons, der freie Fundus an Mediendateien, von strukturierten Daten und dem Einsatz von Wikibase profitieren könnten, gab es schon seit geraumer Zeit, ebenso Überlegungen dazu, Commons einfacher in der Benutzung zu machen und die lizenzkonforme Nachnutzung von Bildern zu vereinfachen. Das einwöchige Meeting in Berlin brachte Wikimedianer von beiden Seiten des großen Teichs zusammen und markierte einen Startpunkt für den Planungs- und Diskussionsprozess.

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (7 Bewertungen, Durchschnitt: 4.57 von 5)
Loading...Loading...

Transatlantic work on structured data in Berlin

Die deutsche Version dieses Beitrags findet sich hier.

Last week Wikimedia Deutschland was happy to welcome guests for a special technical discussion that spawned an entire week at the headquarters in Berlin. Members from the multimedia team of the Wikimedia Foundation in San Francisco, members from the team developing software for Wikidata at Wikimedia Deutschland and technical experts and developers from the volunteer community came together to discuss Wikimedia Commons and structured data.

Structured data was an important topic in many talks on technology at this year’s Wikimania in London. It is the principle behind Wikidata — a free knowledge base with data that can be filtered, sorted, queried, and of course edited by machines and human beings alike, all in a way that goes beyond storing wikitext in a specific human language. The technology in the engine room of Wikidata is a software project called Wikibase which stores data in a structured way. Ideas that Wikimedia Commons, the free repository of media files, could benefit from structured data and Wikibase have been floating around for a long time, as have thoughts about making Commons more user-friendly and make license-conforming re-use of pictures easier. The weeklong meeting in Berlin marked the starting point of a planning and discussion process that brought together Wikimedians from both sides of the pond.

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (2 Bewertungen, Durchschnitt: 4.00 von 5)
Loading...Loading...

Podcast „Source Code Berlin“: Hacks and the City zum Mitnehmen

 

Der Podcast für Coder, die sich dafür interessieren, was an Open Source Code in Berlin passiert. Grafik von Sven Sedivy (CC-BY-SA 4.0).

Die Szene um Open Source und die Menschen, die Code schreiben, ist innovativ und mobil. Es gibt gerade in Berlin sehr viele Projekte und Möglichkeiten, zusammen zu arbeiten. So viele, dass es schwer ist, einen Überblick zu bekommen. Der Podcaster Mark Fonseca Rendeiro, vielen bekannt auch als @bicyclemark, führt Interviews zum Thema und stellt einen Audio-Podcast zusammen.

Die Webseite des Projekts mit den Episoden zum Download und zum Abonnieren findet sich unter sourcecode.berlin. Der Audio-Inhalt der ersten Folge ist auch auf Wikimedia Commons zu finden.

Zweiwöchentlich sollen neue Episoden veröffentlicht werden, die sich mit Themen rund um Source Code und Open Source in Berlin beschäftigen. Das heutige Berlin ist ein Magnet für interessante Ideen aus aller Welt geworden. Die erste Episode will einleitend einen breiteren Blick einnehmen und schauen, ob es dafür auch schon historische Bezüge gibt, die Berlin schon früher zu einem Platz der Offenheit und Zusammenarbeit gemacht haben. Aber auch ein Interview mit der Projektmanagerin von Wikidata, Lydia Pintscher, erzählt über die Begeisterung und Vielschichtigkeit von Open Source.

Die kommenden Episoden werden wir Hackerspaces und andere Orte besuchen, an denen Co-Working schon alltäglich ist. Es wird auch gezeigt, wie JavaScript das Internet beeinflusst – weit entfernt von ein paar animierten Schneeflocken. Aber auch andere, teilweise vielleicht etwas versteckte Orte, sollen vorgestellt werden und Codern ganz praktische Tipps geben, was Berlin zu bieten hat. Sei es eine gute Currywurst oder welche Schritte bei einem Umzug in die “Silicon Allee” zu beachten sind.

Da die gebräuchliche Sprache unter Programmiererinnen und Programmierern englisch ist und Berlin immer mehr als internationale Stadt lebt, wird auch der Podcast Source Code Berlin auf englisch veröffentlicht. Wikimedia Deutschland produziert diesen Podcast, um Codern zu helfen, den Source Code von Berlin besser kennenzulernen und Open Content zu entwickeln.

 

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (1 Bewertungen, Durchschnitt: 5.00 von 5)
Loading...Loading...

Why Wikidata is so important to Histropedia

The following is a guest post we received from our friends at the Histropedia project. We met at Wikimania 2014 in London and they told us how Wikidata is useful for them. Here is their write-up.

For those who don’t yet know; Histropedia is a project using Wikipedia and Wikidata to create the world’s first timeline of everything in history.
Earlier this year I wrote on the Histropedia blog about how important Wikidata is for our project. At the time we had just switched from trying to get dates from Wikipedia articles (from the infoboxes) to using Wikidata items. We had a reasonable amount of success with the infoboxes, but encountered some major limitations. Firstly we were only able to get dates precise to a year, and in some cases we were unable to recognise the date format used to even get the year. And of course there were the articles with no infobox.
By switching to Wikidata as the primary source for dates we immediately added over 700,000 date properties to our events, often to a much better precision than just years. This was incredibly important to the project as it not only greatly improved the accuracy of our timelines, but also allowed us to increase the available zoom levels. So now thanks to Wikidata we can zoom right in to see a day by day view of History. Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (5 Bewertungen, Durchschnitt: 4.20 von 5)
Loading...Loading...

Wikidata at Wikimania 2014 in London

Die deutsche Version dieses Blogposts kann hier gelesen werden.

Wikidata was one of the dominating themes at Wikimania 2014. Many talks mentioned it in passing, even those that didn’t focus on technical topics. Structured data with Wikibase were a topic that was often talked about, be it in discussions on the future of Wikimedia Commons or in projects that do something with GLAM.

When it comes to Wikidata, more and more people are beginning to see the light, so to say. It was fitting that Lydia Pintscher’s talk on Wikidata used this metaphor for the projects: creating more dots of light on the map of free knowledge.

Another excellent talk on Wikidata was dedicated to the research around it. Markus Krötzsch took us on a journey through the data behind the free knowledge base that anyone can edit.

Of course, there were meetups by the Wikidata community and hacks were developed during the hackathon. One enthusiastically celebrated project came from the Russian Wikipedia. Russian Wikipedia had infoboxes that come from Wikidata for quite some while now. What they added at the hackathon was the ability to edit data in the columns of these infoboxes in place — and change it on Wikidata at the same time, pretty much like a visual editor for Wikidata. Read about their hack on Wikidata, or have a look at the source code (which is still a long way from being easy to adopt to other Wikipedias, but it’s a start).

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (4 Bewertungen, Durchschnitt: 5.00 von 5)
Loading...Loading...

Wikidata auf der Wikimania in London 2014

The English version of the blog post can be found here.

Wikidata war auf der Wikimania 2014 eines der vorherrschenden Themen. In vielen Vorträgen wurde es erwähnt, auch wenn es gar nicht um Technik ging. Strukturierte Daten auf der Basis von Wikibase waren etwa bei Debatten um die Zukunft von Wikimedia Commons ein Thema oder auch bei Aktivitäten im Bereich GLAM.

Es wird Licht im Zusammenhang mit Wikidata. Passenderweise wählte Lydia Pintscher auch diese Metapher für ihre Keynote zu Wikidata, wo es um Lichtpunkte auf der Weltkarte des Freien Wissens ging.

Ein weiterer hervorragender Beitrag kam von Markus Krötzsch und behandelte die Forschung zu Wikidata. Er nahm uns mit auf eine Reise zu den Zahlen und Daten hinter dem freien Wissensfundus.

Es gab selbstverständlich Treffen der Wikidata-Community und auf dem Hackathon wurde gehackt.Besonders begeistert gefeiert wurde ein Projekt der russischen Wikipedia. Schon seit einiger Zeit hat die russische Wikipedia Infoboxen, deren Inhalt aus Wikidata befüllt wird. Auf dem Hackathon kam dann ein Gadget hinzu, mit dem die Werte der Infoboxen direkt im Text bearbeitet und auch gleich auf Wikidata angepasst werden können — ein Visual Editor für Wikidata sozusagen. Auf Wikidata haben sie mehr zu ihrem Hack geschreiben, und natürlich ist auch der Quellcode verfügbar. Bis der Code auch einfach auf anderen Wikipedien eingesetzt werden kann, ist es noch ein langer Weg, aber ein Anfang ist gemacht.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (4 Bewertungen, Durchschnitt: 3.25 von 5)
Loading...Loading...

Guided tours and Wikidata: How to explain a complex project and encourage new editors

The following is a contribution by Bene*, admin and bureaucrat on Wikidata and author of the guided tours on Wikidata. He explains the motovation behind guided tours and how they can attract new editors to the Wikidata community:

Wikidata is no longer a brand new project but still a lot of people do not really know what it actually does. This makes it hard for new editors to get involved with the project and become active contributors. We realized that something had to change; that we had to make things easier to understand and take our newbies by the hand.

Wikidata guided tour intro
Wikidata guided tour intro

Wikidata guided tour labels
Wikidata guided tour labels

 

When it comes to planning how to help new editors, a first approach is typically to create help pages for individual topics. However, these pages are often very long and do not do a good job of explaining concepts beyond their theoretical context. Another way to explain things is to create illustrative presentations including slideshows. Unfortunately, the users still only get the theory and have to make the leap from reading to actually editing on their own. Keeping all this in mind, we decided that we needed a format that is integrated with the editing interface of Wikidata and gives users the opportunity to edit content through a series of practical exercises.
In fact, this is exactly what the GuidedTour extension does. It provides a way to create presentations, or rather interactive tutorials, in which the user can actually complete a set of actions. One great use case of Guided Tours is the Wikipedia Adventure. However, for Wikidata we needed something different because the item editing interface shares very little in common with a standard wiki page. The pages contain more buttons and small text fields because an item does not simply consist of text but stores structured data instead. Therefore, we adjusted the guided tours to our needs and added an overlay feature to highlight single design elements. We also made the tours translatable as Wikidata is a multilingual project. If you are interested in the result just try it out for yourself: there are currently two Wikidata tours available—one on items, and one on statements.

Wikidata items tour stats
Wikidata items tour stats

Wikidata statements tour stats
Wikidata statements tour stats

As you can see from the usage statistics, the work was well worth the effort. Since the release on 11th July more than 150 users have taken the first tour and more than 100 went on to complete the second one. This shows the impact our tours have had and the great need for them. It was lots of fun to create and implement the interactive tutorials but there is still a lot of work to do. New tours are being worked on and the existing ones are also in need of translations. If you have any ideas for new tours or improvements to the existing ones, just add your comments to the coordination page. You might also want to help translate the released tours (which is just like translating any wiki page). You can translate the existing tutorials about items and about statements.

A note from Lydia (Wikidata’s product manager): Thank you so much to Bene* (Wikidata community developer) and Helen (Free Software Outreach Program for Women intern with Wikimedia) who have worked together over the past weeks to make these first guided tours a reality. It’s great to see us making progress towards making Wikidata easier to use every single day.

 

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (Noch keine Bewertungen)
Loading...Loading...