Posts Tagged ‘english’

Teaching machines to make your life easier – quality work on Wikidata

German summary: ORES is eine künstliche Intelligenz, die Vorschläge zur Bekämpfung von Vandalismus machen kann. Nachdem sie auf einigen Wikis bereits erfolgreich eingesetzt wurde, hilft sie jetzt auch bei der Qualitätsverbesserung bei Wikidata.  Amir Sarabadani und Aaron Halfaker beschreiben die Entwicklung und den Einsatz von ORES ein einem Gastbeitrag auf englisch.


A post by Amir Sarabadani and Aaron Halfaker

Today we want to talk about a new web service for supporting quality control work in Wikidata. The Objective Revision Evaluation Service (ORES) is an artificial intelligence web service that will help Wikidata editors perform basic quality control work more efficiently. ORES predicts which edits will need to be reverted. This service is used on other wikis to support quality control work. Now, Wikidata editors will get to reap the benefits as well. Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (5 Bewertungen, Durchschnitt: 4,20 von 5)

Q167545: Wikidata celebrated its third birthday

 Wikidata celebrated its third birthday on October 29th. The project went online in 2012 and a lot has happened ever since.

Coincidentally, the birthday also happened along with the project being awarded a prize from Land der Ideen, so so a proper party for volunteers and everyone involved with the project was in order.

There was cake and silly birthday hats, but above all this was an occassion to look at the past, present, and future.

Denny Vrandečić and Eric Möller used a video message to talk about the genesis and development of Wikidata.

Community members Magnus Manske and Marteen Dammers talked about their work for Wikidata in GLAM and science. And Lydia Pintscher not only looked backed to a successful year behind us, but also gave us a peek into the future that lies ahead for the project.

In order to experience Wikidata there was a little exhibition of projects that use it: From Histropedia which visualizes timelines to Ask Platypus, a project that parses questions about the knowledge of the world according to Wikidata using natural language.

No birthday would be complete without presents. Especially the software developers had worked hard to improve parts of Wikidata for this special date. To give you just two examples:

  • shows nearby items in Wikidata and invites you to improve structured data knowledge in your neighborhood
  • A machine learning model called  ORES helps to identify vandalism with artificial intelligence and can be used as a tool for administrators

These are only two new features released for the birthday party. There is much, much more to come for the Wikidata project next year and we’ll talk about it in length in another post.

Wikidata has data in its name. However — this was more than obvious at the birthday party — it’s about more than just cold numbers. As in all collaborative projects, people are at the core of it all. Those behind or around Wikidata have love in their hearts for something that may at first sound as abstract as „structured data for Wikimedia projects and beyond“.

Upon exiting the party, guests could add themselves on a board and leave a tiny love letter to Wikidata . „I love Wikidata because… with machine-readable data, machines can do the heavy lifting for me“ one guest wrote. The last three years were all about building a foundation for machine-readable data. Let the heavy lifting begin in all the years to come. Q167545!

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (1 Bewertungen, Durchschnitt: 5,00 von 5)

Visualizing history with automated event maps

German summary: Fred Johansen hat eine Webseite erstellt mit der sich, basierend auf Daten in Wikidata, einfach historische Ereignisse zeitlich und räumlich einordnen lassen. Hier erzählt er über die Seite und seine Arbeit daran.

The following post is a guest blog by Fred Johansen about EventZoom.

Just as today’s online maps are being continually updated, historical maps can be automatically generated and updated to reflect our ever-evolving knowledge about the past. As an example, please allow me to tell you about a project that I’m working on. Recently I implemented an event visualization site which accepts geolocation data combined with info about time spans of events, and renders the input as points on a map zoomable in time and space. Each such point is an object with a title, description, latitude / longitude and a time, as well as a reference back to its source. But what source should be used to fill this framework with data? Even though this is a tool born outside of the Wikimedia world, so far the best content I’ve found for it is Wikidata – more specifically, the Wikidata API. By importing data about events that are part of larger events all defined in Wikidata, with the restriction that they contain a start or end date as well as a location, that’s all the data that’s needed for representation in this kind of dynamic historical map.

Extracting data from the Wikidata API works like a charm. Sometimes, of course, some data might be missing from Wikidata. For example, an event may contain an end date, but no start date. So, what’s fantastic about Wikidata is that it’s easy to simply extend its data by adding the missing fact. In addition to helping in increasing the data of Wikidata, this also improves the overall possibilities for visualization.

This very activity serves as a positive feedback loop: The visualization on a map of, for example, the events of a war makes errors or omissions quite obvious, and serves as an incentive to update Wikidata, and finally to trigger the re-generation of the map.

The site I’m referring to here is – currently in Beta and so far containing 82 major event maps and growing. You can extend it yourself by triggering the visualization of new maps: When you do a search for an event, for example a war, and the Search page reports it as missing, you can add it directly. All you need is its Q-ID from Wikidata. Paste this ID into the given input field, and the event will be automatically imported from the Wikidata API, and a map automatically generated – with the restriction that there must exist some ‘smaller’ events that contain time & location data and are part (P361) of the major event. Those smaller events become the points on our map, with automatic links back to their sources. As for the import itself, for the time being, it also depends on, but I expect that will change in the future.

Although you can always click Import to get the latest info from Wikidata, an automatic update is also in the pipeline, to trigger a re-import whenever the event or any of its constituent parts have changed in Wikidata. As for other plans, at the very least our scope should encompass all the major events of history. Here, wars represent a practical starting point, in so far as they consist of events that are mostly bounded by very definite time spans and locations, and so can be defined by those characteristics. The next step would be to extend the map visualization to other kinds of events – as for Wikidata, it could be interesting to visualize all kinds of items that can be presented with a combination of geolocations and temporal data, and that can be grouped together in meaningful ways.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (3 Bewertungen, Durchschnitt: 3,33 von 5)

Using Wikidata to Improve the Medical Content on Wikipedia

German summary: Vor einigen Tagen wurde eine wissenschaftliche Veröffentlichung publiziert die sich damit beschäftigt wie Wikipediaartikel zu medizinischen Themen durch Wikidata verbessert werden können. Hier stellen sie die Veröffentlichung und ihre Ergebnisse vor.


This is a guest post by Alexander Pfundner, Tobias Schönberg, John Horn, Richard D. Boyce and Matthias Samwald. They have published a paper about how medical articles on Wikipedia can be improved using Wikidata.

An example of an infobox that shows drug-drug-interactions from Wikidata. Including this information could be of significant benefit to patients around the world.

The week before last a study was published in the Journal of Medical Internet Research that investigates how Wikidata can help to improve medical information on Wikipedia. The researchers from the Medical University of Vienna, the University of Washington and the University of Pittsburgh that carried out the study are active members of the Wikidata community.

The study focuses on how potential drug-drug interactions are represented on Wikipedia entries for pharmaceutical drugs. Exposure to these potential interactions can severely diminish the safety and effectiveness of therapies. Given the fact that many patients and professionals often rely on Wikipedia to read up on a medical subject, the quality, completeness and relevance of these interactions can significantly improve the situation of patients around the world.

In the course of the study, a set of high-priority potential drug-drug-interactions were added to Wikidata items of common pharmaceutical drugs (e.g. Ramelteon). The data was then compared to the existing information on the English Wikipedia, revealing that many critical interactions were not explicitly mentioned. It can be expected that the situation is probably worse for many other languages. Wikidata could play a major role in alleviating this situation: Not only does a single edit benefit all 288 languages of Wikipedia, but the tools for adding and checking data are much easier to handle. In addition, adding qualifiers (property-value pairs that further describe the statement, e.g. the severity of the interaction) and sources to each statement puts the data in context and makes cross-checking easier . In the study Wikidata was found to be capable to act as a repository for this data.

The next part of the study investigated how potential drug-drug interaction information in Wikipedia could be automatically written and maintained (i.e. in the form of infoboxes or within a paragraph). Working with the current API and modules, investigators found that the interface between Wikidata and Wikipedia is already quite capable, but that large datasets still require better mechanisms to intelligently filter and format the data. If the data is displayed in an infobox, further constraints come from the different conventions on how much information can be displayed in an infobox, and whether large datasets can be in tabs or collapsible cells.

Overall the study comes to the conclusion that, the current technical limitations aside, Wikidata is capable to improve the reliability and quality of medical information on all languages of Wikipedia.

The authors of the study would like to thank the Wikidata and Wikipedia community for all their help. And additionally the Austrian Science Fund and the United States National Library of Medicine for funding the study.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (4 Bewertungen, Durchschnitt: 5,00 von 5)

Improving data quality on Wikidata – checking what we have

German summary: Ein Team von Studenten des Hasso Plattner Instituts in Potsdam arbeitet aktuell mit Wikimedia Deutschland an Werkzeugen um die Datenqualität auf Wikidata zu verbessern und zu sichern. In diesem Beitrag stellen sie ihre beiden Projekte vor: die Prüfung von Wikidatas Daten auf Konsistenz mit sich selbst sowie die Prüfung von Wikidatas Daten gegen andere Datenbanken.


 Hello, we are the Wikidata Quality Team. We are a team of students from Hasso Plattner Institute in Potsdam, Germany. For our bachelor project we are working together with the Wikidata development team to ensure high quality of the data on Wikidata.

Wikidata provides a lot of structured data open to everyone. Quite a lot. Actually, they are providing an enormous amount of data approaching the mark of 13.5 million items, each of which has numerous statements. The data got into the system by diligent people and by bots, and neither people nor bots are known for infallibility. Errors are made and somehow we have to find and correct them. Besides erroneous data, incomplete data is another problem. Imagine you are a resident of Berlin and want to improve the Wikidata item about the city. You go ahead and add its highest point (Müggelberge), its sister cities (Los Angeles, Madrid, Istanbul, Warsaw and 21 others) and its new head of government (Michael Müller). As you do it the correct way, you are using qualifiers and references. Good job, but did you think of adding Berlin as the sister city of 25 cities? Although the data you entered is correct, it is incomplete and you have—both unwilling and unknowingly—introduced an inconsistency. And that’s only, assuming you used the correct items and properties and did not make a typo while entering a statement. And thirdly, things change. Population numbers vary, organizations are dissolved and artists release new albums. Wikidata has the huge advantage that this change only has to be made in one place, but still: Someone has to do it and even more importantly, someone has to become aware of it.

Facing the problems mentioned above, two projects have emerged. People using Wikidata are adding identifiers of external databases like GND, MusicBrainz and many more. So why not make use of them? We are developing a tool that scans an item for those identifiers and then searches in the linked databases for data against which it compares the items statements. This does not only help us verify Wikidata’s content and find mismatches that could indicate errors, but also makes us aware of changes. MusicBrainz is a specialist for artists and composers, GND for data related to people, and these specialists‘ data is likely to be up to date. Using their databases to cross-check, we hope to be able to have the latest data of all fields represented in Wikidata.

The second projects focuses on using constraints on properties. Here are some examples to illustrate what this means:

  • Items that have the property “date of death” should also have “date of birth“, and their respective values should not be more than 150 years apart
  • Properties like “sister city“ are symmetric, so items referenced by this statement should also have a statement “sister city“ linking back to the original item
  • Analogously, properties like “has part” and “part of” are inverse and should be used on both items in a lot of cases
  • Identifiers for IMDb, ISBN, GND, MusicBrainz etc. always follow a specific pattern that we can verify
  • And so on…

Checking these constraints and indicating issues when someone visits an items page, helps identify which statements should be treated with caution and encourages editors to fix errors. We are also planning to provide ways to fix issues (semi-)automatically (e.g. by adding the missing sister city when he is sure, that the city really has this sister city). We also want to check these constraints when someone wants to save a new entry. This hopefully prevents errors from getting into the system in the first place.

That’s about it – to keep up with the news visit our project page. We hope you are fond of our project and we appreciate your feedback! Contact information can also be found on the project page.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (11 Bewertungen, Durchschnitt: 5,00 von 5)

Platypus, a speaking interface for Wikidata

PPP (Projet Pensées Profondes)  is a student project aiming to build an open question answering platform. Its demo, Platypus ( is massively based on Wikidata content. 

At the École normale supérieure de Lyon we have to do a programming project during the first part of your master degree curriculum. Some of us were very interested in working on natural language processing and others on knowledge bases. So, we tried to find a project that could allow us to work on both sides and, quickly, the idea of an open source question answering tool came up.
This tool has to answer to a lot of different questions so one of the requirements of this project was to use a huge generalist knowledge base in order to have a usable tool quickly. As one of us was already a Wikidata contributor and inspired by the example of the very nice but ephemeral Wiri tool of Magnus Manske, we quickly chose to use Wikidata as our primary data source.

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (3 Bewertungen, Durchschnitt: 5,00 von 5)

Asking Ever Bigger Questions with Wikidata

German summary: Maximilian Klein benutzt Wikidata als als Datenfundus für statistische Auswertungen über das Wissen der Welt. In seinem Artikel beschreibt er, wie er in Wikidata nach Antworten auf die großen Fragen sucht.

Asking Ever Bigger Questions with Wikidata

Guest post by Maximilian Klein

A New Era

Simultaneous discovery can sometimes be considered an indication for a paradigm shift in knowledge, and last month Magnus Manske and I seemed to have both had a very similar idea at the same time. Our ideas were to look at gender statistics in Wikidata and to slice them up by date of birth, citizenship, and langauge. (Magnus‘ blog post, and my own.) At first it seems like quite elementary and naïve analysis, especially 14 years into Wikipedia, but only within the last year has this type of research become feasible. Like a baby taking its first steps, Wikidata and its tools ecosystem are maturing. That challenges us to creatively use the data in front of us.

Describing 5 stages of Wikidata, Markus Krötsch foresaw this analyis in his presentation at Wikimania 2014. The stages which range fromKnow to Understand are: Read, Browse, Query, Display, and Analyse (see image). Most likey you may have read Wikidata, and perhaps even have browsed with Reasonator, queried with autolist, or displayed with histropedia. I care to focus on analyse – the most understand-y of the stages. In fact the example given for analyse was my first exploration of gender and language, where I analysed the ratio of female biographies by Wikipedia Language: English and German are around 15% and Japanese, Chinese and Korean are each closer to 25%.

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (5 Bewertungen, Durchschnitt: 4,80 von 5)

Scaling Wikidata: success means making the pie bigger

German summary: Wikidata wird größer und erfolgreicher. Im nächsten Jahr müssen wir Strategien und Werkzeuge entwickeln um Wikidata zu skalieren. In diesem Beitrag lege ich meine Überlegungen dazu dar.


Wikidata is becoming more successful every single day. Every single day we cover more topics and have more data about them. Every single day new people join our community. Every single day we provide more people with more access to more knowledge. This is amazing. But with any growth comes growing pains. We need to start thinking about them and build strategies for dealing with them.

Wikidata needs to scale in two ways: socially and technically. I will not go into the details of technical scaling here but instead focus on the social scaling. With social scaling I mean enabling all of us to deal with more attention, data and people around Wikidata. There are several key things that need to be in place to make this happen:

  • A welcome wagon and good documentation for newcomers to help them become part of the community and understand our shared norms, values, policies and traditions.
  • Good tools to help us maintain our data and find issues quickly and deal with them swiftly.
  • A shared understanding that providing high-quality data and knowledge is important.
  • Communication tools like the weekly summary and Project chat that help us keep everyone on the same page.
  • Structures that scale with enough people with advanced rights to not overwhelm and burn out any one of them.

We have all of these in place but all of them need more work from all of us to really prepare us for what is ahead over the next months and years.

One of the biggest pressures Wikidata is facing now is organisations wanting to push large amounts of data into Wikidata. This is great if it is done correctly and if it is data we truly care about. There are key criteria I think we should consider when accepting large data donations:

  • Is the data reliable, trustworthy, current and published somewhere referencable? We are a secondary database, meaning we state what other sources say.
  • Is the data going to be used? Data that is not used is exponentially harder to maintain because less people see it.
  • Is the organization providing the data going to help keep it in good shape? Or are other people willing to do it? Data donations need champions feeling responsible for making them a success in the long run.
  • Is it helping us fix an important gap or counter a bias we have in our knowledge base?
  • Is it improving existing topics more than adding new ones? We need to improve the depth of our data before we continue to expand its breadth.

So once we have this data how can we make sure it stays in good shape? Because one of the crucial points for scaling Wikidata is quality of and trust in the data on Wikidata. How can we ensure high quality of the data in Wikidata even on a large scale? The key pieces necessary to achieve this:

  • A community that cares about making sure the data we provide is correct, complete and up-to-date
  • Many eyes on the data
  • Tools that help maintenance
  • An understanding that we don’t have to have it all

Many eyes on the data. What does it mean? The idea is simple. The more people see and use the data the more people will be able to find mistakes and correct them. The more data from Wikidata is used the more people will get in contact with it and help keep it in good shape. More usage of Wikidata data in large Wikipedias is an obvious goal there. More and more infoboxes need to be migrated over the next year to make use of Wikidata. The development team will concentrate on making sure this is possible by removing big remaining blockers like support for quantities with units, access to data from arbitrary items as well as good examples and documentation. At the same time we need to work on improving the visibility of changes on Wikidata in the Wikipedia’s watchlists and recent changes. Just as important for getting more eyes on our data are 3rd-party users outside Wikimedia. Wikidata data is starting to be used all over the internet. It is being exposed to people even in unexpected places. What is of utmost importance in both cases is that it is easy for people to make and feed back changes to Wikidata. This will only work with well working feedback loops. We need to encourage 3rd-party users to be good players in our ecosystem and make this happen – also for their own benefit.

Tools that help maintenance. As we scale Wikidata we also need to provide more and better tools to find issues in the data and fix them. Making sure that the data is consistent with itself is the first step. A team of students is working with the development team now on improving the system for that. This will make it easy to spot people who’s date of birth is after their date of death and so on. The next step is checking against other databases and reporting mismatches. That is the other part of the student project. When you look at an item you should immediately see statements that are flagged as potentially problematic and review them. In addition more and more visualizations are being built that make it easy to spot outliers. One recent example is the Tree of Life.

An understanding that we don’t have to have it all. We should not aim to be the one and only place for structured open data on the web. We should strive to be a hub that covers important ground but also gives users the ability to find other more specialized sources. Our mission is to provide free access to knowledge for everyone. But we can do this just as well when we have pointers to other places where people can get this information. This is especially the case for niche topics and highly detailed data. We are a part of an ecosystem and we should help expand the pie for everyone by being a hub that points to all kinds of specialized databases. Why is this so important? We are part of a larger ecosystem. Success means making the pie bigger – not getting the whole pie for ourselves. We can’t do it all on our own.

If we keep all this in mind and preserve our welcoming culture we can continue to build something truly amazing and provide more people with more access to more knowledge every single day.

Improving the data quality and trust in the data we have will be a major development focus of the first months of 2015.

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (11 Bewertungen, Durchschnitt: 4,73 von 5)

Wikidata for Research – a grant proposal that anyone can edit

German summary: Vor einigen Wochen wurde an dieser Stelle von einer Initiative berichtet, im Rahmen derer Wikidata-Einträge für alle knapp 40.000 menschlichen Gene angelegt wurden. Hier nun baut Daniel Mietchen – Wissenschaftler am Museum für Naturkunde Berlin und aktiver Wikimedianer – auf dieser Idee auf und stellt einen europäischen Forschungsantrag zur Integration von Wikidata mit wissenschaftlichen Datenbanken vor, den jede und jeder via Wikidata editieren kann, ehe er in knapp sechs Wochen eingereicht wird.

A few weeks ago, this blog was enriched with a post entitled “Establishing Wikidata as the central hub for linked open life science data”. It introduced the Gene Wiki – a wiki-based collection of information related to human genes – and reported upon the creation of Wikidata items for all human genes, along with their annotation with statements imported from a number of scientific databases. The blog post mentioned plans to extend the approach to diseases and drugs, and a few weeks later (in the meantime, Wikidata had won an Open Data award), the underlying proposal for the grant that funds these activities was made public, followed by another proposal that involves Wikidata as a hub for metadata about audiovisual materials on scientific topics.

Now it’s time to take this one step further: we plan to draft a proposal that aims at establishing Wikidata as a central hub for linked open research data more generally, so that it can facilitate fruitful interactions at scale between professional research institutions and citizen science and knowledge initiatives. We plan to draft this proposal in public – you can join us and help develop it via a dedicated page on Wikidata.

The proposal – provisionally titled “Wikidata for research” – will be coordinated by the Museum für Naturkunde Berlin (for which I work), in close collaboration with Wikimedia Germany (which oversees development of Wikidata). A group of ca. 3-4 further partners are invited to join in, and you can help determine who these may be. Maastricht University has already signaled interest in covering data related to small molecules, and we are open to suggestions from any discipline, as long as there are relevant databases suitable for integration with Wikidata.

Two aspects – technical interoperability and community engagement – are the focus points of the proposal. In terms of the former, we are interested in external scientific databases providing information to Wikidata with an intention that both parties will be able to profit from this. Information may have the form of new items, new properties, or added statements to existing ones. One focus here would be on mapping identifiers that different databases use to describe related concepts, and on aligning controlled vocabularies built around that.

In terms of community engagement, the focus would be on the curation of Wikidata-based information, on syncing of curation with other databases (a prototype for that is in the making) and especially on the reuse of Wikidata-based information – ideally in ways not yet possible –  be it in the context of Wikimedia projects or research, or elsewhere.

Besides the Gene Wiki project, a number of other initiatives have been active at the interface between the Wikimedia and scholarly communities. Several of these have focused on curating scholarly databases, e.g. Rfam/Pfam and WikiPathways, which would thus seem like good candidates for extending the Gene Wiki’s Wikidata activities to other areas. There are also a wide range of Wikiprojects on scientific topics (including within the humanities), both on Wikidata and beyond. Some of them team up with scholarly societies (e.g. Biophysical Society or International Society for Computational Biology), journals (e.g. PLOS Computational Biology) or other organizations (e.g. CrossRef). In addition to all that, research about wikis is regularly monitored in the Research Newsletter.

The work on Wikidata – including contributions by the Gene Wiki project – is being performed by volunteers (directly or through semi-automatic tools), and the underlying software is open by default. Complementing such curation work, the Wikidata Toolkit has been developed as a framework to facilitate analysis of the data contained in Wikidata. The funding proposal for that is public too and was indeed written in the open. Outside Wikidata, the proposal for Wikimedia Commons as a central hub of multimedia from open-access sources is public, as is a similar one to establish Wikisource as a central hub for open-access literature (both of these received support from Wikimedia Germany).

While such openness is custom within the Wikimedia community – it contrasts sharply with current practice within the research community. As first calls for more transparency in research funding are emerging, the integration of Wikidata with research workflows seems like a good context to explore the potential of drafting a research proposal in public.

Like several other Wikimedia chapters, Wikimedia Germany has experience with participation in research projects (e.g. RENDER) but it is not in a position to lead such endeavours. The interactions with the research community have intensified over the last few years, e.g. through GLAM-Wiki activities, participation in the Leibniz research network Science 2.0, in a traveling science exhibition, or in events around open science. In parallel, the interest on the part of research institutions to engage with Wikimedia projects has grown, especially so for Wikidata.

One of these institutions is the Museum für Naturkunde Berlin, which has introduced Wikidata-related ideas into a number of research proposals already (no link here – all non-public). One of the largest research museums worldwide, it curates 30 million specimens and is active in digitization, database management, development of persistent identifiers, open-access publishing, semantic integration and public engagement with science. It is involved in a number of activities aimed at bringing biodiversity-related information together from separate sources and making them available in a way compatible with research workflows.

Increasingly, this includes efforts towards more openness. For instance, it participated in the Open Up! project that fed media on natural history into Europeana, in the Europeana Creative project that explores reuse scenarios of Europeana materials, and it leads the EU BON project focused at sharing biodiversity data. Within the framework of the pro-iBiosphere project, it was also one of the major drivers behind the launch of Bouchout Declaration for Open Biodiversity Knowledge Management, which brings the biodiversity research community together around principles of sharing and openness. Last but not least, the museum participated in the Coding da Vinci hackathon that brought together developers with data from heritage institutions.

As a target for submission of the proposal, we have chosen a call for the development of “e-infrastructures for virtual research environments”, issued by the European Commission. According to the call, “[t]hese virtual research environments (VRE) should integrate resources across all layers of the e-infrastructure (networking, computing, data, software, user interfaces), should foster cross-disciplinary data interoperability and should provide functions allowing data citation and promoting data sharing and trust.”

It is not hard to see how Wikidata could fit in there, nor that this still requires work. Considering that Wikidata is a global platform and that initial funding came mainly from the United States, it would be nice to see Europe taking its turn now. The modalities of this kind of EU funding are such that funds can only be provided to certain kinds of legal entities based in Europe, but we appreciate input from anywhere as to how the project should be shaped.

In order to ensure compatibility with both Wikidata and academic customs, all materials produced for this proposal shall be dual-licensed under CC BY-SA 3.0 and CC BY 4.0.

The submission deadline is very soon – on January 14, 2015, 17:00 Brussels time. Let’s find out what we can come up with by then – see you over there!


Written by Daniel Mietchen

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (5 Bewertungen, Durchschnitt: 4,20 von 5)

Establishing Wikidata as the central hub for linked open life science data

German summary: Der wunderbaren Wikidata-Community ist es zu verdanken, dass jedes menschliche Gen (laut dem United States National Center for Biotechnology Information) jetzt durch einen Eintrag auf Wikidata repräsentiert wird. Benjamin Good, Andrew Su und Andra Waagmeester haben uns dankenswerterweise einen kurzen Bericht über ihre Arbeit mit Wikidata zur Verfügung gestellt.

Thanks to the amazing work of the Wikidata community, every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on Wikidata. We hope that these are the seeds for some amazing applications in biology and medicine. Here is a report from Benjamin Good, Andrew Su, and Andra Waagmeester on their work with Wikidata. Their work was supported by the National Institutes of Health under grant GM089820.

Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. By Courtesy: National Human Genome Research Institute [Public domain], via Wikimedia Commons

The life sciences are awash in data.  There are countless databases that track information about human genes, mutations, drugs, diseases, etc.  This data needs to be integrated if it is to be used to produce new knowledge and thereby improve the human condition.  For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web.  Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement.  With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.

In parallel to the appearance of Big Data in biology (and elsewhere), Wikipedia has arisen as one of the most important sources of all information on the Web.  Within the context of Wikipedia, members of our research team have helped to foster the growth of a large collection of articles that describe the function and importance of human genes. Wikipedia and the subset of it that focuses on human genes (which we call the Gene Wiki), have flourished due to their centrality, the presence of the edit button, and the desire of the larger community to share knowledge openly.

Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration.  Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community?  The steps we are planning to take to test this idea within the context of the life sciences, are:

  1. Establishing bots that populate Wikidata with entities representative of three key classes: genes, diseases, and drugs.
  2. Expanding the scope of these bots to include the addition of statements that link these entities together into a valuable network of knowledge.
  3. Developing applications that display this information to the public that both encourage and enable them to contribute their knowledge back to Wikidata.  The first implementation will be to use the Wikidata information to enhance the articles in Wikipedia.

We are excited to announce that the first step on this path has been completed!

Weiterlesen »

1 Stern2 Sterne3 Sterne4 Sterne5 Sterne (6 Bewertungen, Durchschnitt: 5,00 von 5)