Veröffentlicht von Jens Ohlig
am Mittwoch, 22. Oktober 2014, 15:40 Uhr
German summary: Der wunderbaren Wikidata-Community ist es zu verdanken, dass jedes menschliche Gen (laut dem United States National Center for Biotechnology Information) jetzt durch einen Eintrag auf Wikidata repräsentiert wird. Benjamin Good, Andrew Su und Andra Waagmeester haben uns dankenswerterweise einen kurzen Bericht über ihre Arbeit mit Wikidata zur Verfügung gestellt.
Thanks to the amazing work of the Wikidata community, every human gene (according to the United States National Center for Biotechnology Information) now has a representative entity on Wikidata. We hope that these are the seeds for some amazing applications in biology and medicine. Here is a report from Benjamin Good, Andrew Su, and Andra Waagmeester on their work with Wikidata. Their work was supported by the National Institutes of Health under grant GM089820.
Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. By Courtesy: National Human Genome Research Institute [Public domain], via Wikimedia Commons
The life sciences are awash in data. There are countless databases that track information about human genes, mutations, drugs, diseases, etc. This data needs to be integrated if it is to be used to produce new knowledge and thereby improve the human condition. For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web
. Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene
that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement. With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.
In parallel to the appearance of Big Data in biology (and elsewhere), Wikipedia has arisen as one of the most important sources of all information on the Web. Within the context of Wikipedia, members of our research team have helped to foster the growth of a large collection of articles that describe the function and importance of human genes. Wikipedia and the subset of it that focuses on human genes (which we call the Gene Wiki), have flourished due to their centrality, the presence of the edit button, and the desire of the larger community to share knowledge openly.
Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration. Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community? The steps we are planning to take to test this idea within the context of the life sciences, are:
- Establishing bots that populate Wikidata with entities representative of three key classes: genes, diseases, and drugs.
- Expanding the scope of these bots to include the addition of statements that link these entities together into a valuable network of knowledge.
- Developing applications that display this information to the public that both encourage and enable them to contribute their knowledge back to Wikidata. The first implementation will be to use the Wikidata information to enhance the articles in Wikipedia.
We are excited to announce that the first step on this path has been completed!