A few weeks ago, this blog was enriched with a post entitled “Establishing Wikidata as the central hub for linked open life science data”. It introduced the Gene Wiki – a wiki-based collection of information related to human genes – and reported upon the creation of Wikidata items for all human genes, along with their annotation with statements imported from a number of scientific databases. The blog post mentioned plans to extend the approach to diseases and drugs, and a few weeks later (in the meantime, Wikidata had won an Open Data award), the underlying proposal for the grant that funds these activities was made public, followed by another proposal that involves Wikidata as a hub for metadata about audiovisual materials on scientific topics.
Now it’s time to take this one step further: we plan to draft a proposal that aims at establishing Wikidata as a central hub for linked open research data more generally, so that it can facilitate fruitful interactions at scale between professional research institutions and citizen science and knowledge initiatives. We plan to draft this proposal in public – you can join us and help develop it via a dedicated page on Wikidata.
The proposal – provisionally titled “Wikidata for research” – will be coordinated by the Museum für Naturkunde Berlin (for which I work), in close collaboration with Wikimedia Germany (which oversees development of Wikidata). A group of ca. 3-4 further partners are invited to join in, and you can help determine who these may be. Maastricht University has already signaled interest in covering data related to small molecules, and we are open to suggestions from any discipline, as long as there are relevant databases suitable for integration with Wikidata.
Two aspects – technical interoperability and community engagement – are the focus points of the proposal. In terms of the former, we are interested in external scientific databases providing information to Wikidata with an intention that both parties will be able to profit from this. Information may have the form of new items, new properties, or added statements to existing ones. One focus here would be on mapping identifiers that different databases use to describe related concepts, and on aligning controlled vocabularies built around that.
In terms of community engagement, the focus would be on the curation of Wikidata-based information, on syncing of curation with other databases (a prototype for that is in the making) and especially on the reuse of Wikidata-based information – ideally in ways not yet possible – be it in the context of Wikimedia projects or research, or elsewhere.
Besides the Gene Wiki project, a number of other initiatives have been active at the interface between the Wikimedia and scholarly communities. Several of these have focused on curating scholarly databases, e.g. Rfam/Pfam and WikiPathways, which would thus seem like good candidates for extending the Gene Wiki’s Wikidata activities to other areas. There are also a wide range of Wikiprojects on scientific topics (including within the humanities), both on Wikidata and beyond. Some of them team up with scholarly societies (e.g. Biophysical Society or International Society for Computational Biology), journals (e.g. PLOS Computational Biology) or other organizations (e.g. CrossRef). In addition to all that, research about wikis is regularly monitored in the Research Newsletter.
The work on Wikidata – including contributions by the Gene Wiki project – is being performed by volunteers (directly or through semi-automatic tools), and the underlying software is open by default. Complementing such curation work, the Wikidata Toolkit has been developed as a framework to facilitate analysis of the data contained in Wikidata. The funding proposal for that is public too and was indeed written in the open. Outside Wikidata, the proposal for Wikimedia Commons as a central hub of multimedia from open-access sources is public, as is a similar one to establish Wikisource as a central hub for open-access literature (both of these received support from Wikimedia Germany).
While such openness is custom within the Wikimedia community – it contrasts sharply with current practice within the research community. As first calls for more transparency in research funding are emerging, the integration of Wikidata with research workflows seems like a good context to explore the potential of drafting a research proposal in public.
Like several other Wikimedia chapters, Wikimedia Germany has experience with participation in research projects (e.g. RENDER) but it is not in a position to lead such endeavours. The interactions with the research community have intensified over the last few years, e.g. through GLAM-Wiki activities, participation in the Leibniz research network Science 2.0, in a traveling science exhibition, or in events around open science. In parallel, the interest on the part of research institutions to engage with Wikimedia projects has grown, especially so for Wikidata.
One of these institutions is the Museum für Naturkunde Berlin, which has introduced Wikidata-related ideas into a number of research proposals already (no link here – all non-public). One of the largest research museums worldwide, it curates 30 million specimens and is active in digitization, database management, development of persistent identifiers, open-access publishing, semantic integration and public engagement with science. It is involved in a number of activities aimed at bringing biodiversity-related information together from separate sources and making them available in a way compatible with research workflows.
Increasingly, this includes efforts towards more openness. For instance, it participated in the Open Up! project that fed media on natural history into Europeana, in the Europeana Creative project that explores reuse scenarios of Europeana materials, and it leads the EU BON project focused at sharing biodiversity data. Within the framework of the pro-iBiosphere project, it was also one of the major drivers behind the launch of Bouchout Declaration for Open Biodiversity Knowledge Management, which brings the biodiversity research community together around principles of sharing and openness. Last but not least, the museum participated in the Coding da Vinci hackathon that brought together developers with data from heritage institutions.
As a target for submission of the proposal, we have chosen a call for the development of “e-infrastructures for virtual research environments”, issued by the European Commission. According to the call, “[t]hese virtual research environments (VRE) should integrate resources across all layers of the e-infrastructure (networking, computing, data, software, user interfaces), should foster cross-disciplinary data interoperability and should provide functions allowing data citation and promoting data sharing and trust.”
It is not hard to see how Wikidata could fit in there, nor that this still requires work. Considering that Wikidata is a global platform and that initial funding came mainly from the United States, it would be nice to see Europe taking its turn now. The modalities of this kind of EU funding are such that funds can only be provided to certain kinds of legal entities based in Europe, but we appreciate input from anywhere as to how the project should be shaped.
The submission deadline is very soon – on January 14, 2015, 17:00 Brussels time. Let’s find out what we can come up with by then – see you over there!
Written by Daniel Mietchen