Could you wikify an authority file? Wikibase has been evaluated for the Integrated Authority File (GND)
4. March 2020
Jens Ohlig: With the digital transformation, there is a growing need in scientific research institutions, museums and archives for improved retrieval of digital cultural data on the Internet. For a long time, only libraries used the Integrated Authority File, or German “Gemeinsame Normdatei” (GND). In library catalogues, authority control data are primarily used to unambiguously identify items such as authors, publication locations or subject areas. This facilitates the entry of new media, but also helps to link different media with each other via standardized data as common nodes.
Barbara Fischer: In the course of digital change, the demand for reliable and interoperable standardized data is increasing. The GND, originally designed for libraries, sparks the interest of other institutions that preserve cultural assets. They want to be able to use not only the GND identifiers. But they also want to create and maintain new GND data sets, if necessary, which refer to museum objects, archival documents and research data. The DFG-funded project GND for Cultural Data (GND4C) illustrates this need. Accessible software and a technically convincing platform help to define rules and exceptions for modelling and to reflect, discuss and document relevant properties and relations.
In the context of Linked Open Data, data reliability is increasingly connected to the transparency and traceability of its provenance. And this is supposed to be done for each individual statement. Moreover, the Internet does not stop at language borders. Therefore, the multilingualism of the terms is desirable. Wikidata meets all these criteria and yet is not a authority file. However, Wikidata is based on the software Wikibase developed by Wikimedia Germany. In a joint cooperation project with Wikimedia Deutschland, the German National Library has therefore investigated the performance of Wikibase for the GND and tested it according to the following criteria:
- What could a modular “GND 2.0″ in Wikibase look like that meets the requirements of the different sectors?
- How can the rules for the modelled properties of the entity types be mapped effectively and clearly?
- How can a stable synchronization between a GND Wikibase instance and the CBS-based master instance be implemented?
The results are encouraging. Wikibase is a bridge to the world of open, cooperative and interdisciplinary authority control data. The proof of concept shows that this bridge is viable.
What convinced us?
Jens Ohlig: Wikimedia and the German National Library are two institutions that make human knowledge available in different ways. Libraries are not only strong allies in terms of Free Knowledge, but in a way they are the older, institutional elective relatives of Wikimedia. The GND-IDs are quite comparable to the Q-numbers of Wikidata. A lot of persuasion was not necessary for the project. In addition, the German National Library and the GND as a beacon shines virtually across the entire field of library metadata – therefore the evaluation also aroused international interest in Wikibase. At a meeting of national libraries in Stockholm in the summer of 2019, the interest in Wikibase was clearly visible: because without a strategy for Linked Open Data and Wikibase, it is no longer possible today.
Barbara Fischer: From the perspective of the German National Library, the ease with which we could create and model instances was convincing. In our mixed team of software developers* and librarians* we didn’t need any lengthy training to be able to work with the software. Once set up, the instances were stable and easy to clone. For us, as operators of authority file, it is important to ensure the reliability of the data, among other things, through the transparency of its origin. That’s why we like the fact that it can be traced at any time and for any statement about an entity, who changed something and what was changed. Just like you know it from Wikipedia.
What surprised us?
Jens Ohlig: Wikibase was created as software for Wikidata to organize and link data stocks. The data models that can be mapped with it are not explicitly geared towards libraries and their special needs: Wikibase is not library software in the first place. Nevertheless, the flexible data model fits very well with GND, all relationships and properties can be mapped and expressed here. Both the accuracy of the evaluation – for example, the time taken to import data was measured exactly – and the result that Wikibase meets the requirements were a positive surprise for Wikimedia. It was surprising that Wikibase, as a software package for Linked Open Data, is so nicely suited “out of the box” for an institution like the German National Library and the GND project.
Barbara Fischer: We at the German National Library had imagined many things to be easier. Precisely because Wikidata itself is a very powerful database, we would have thought that importing large amounts of data was already part of the standard implementation. Perhaps it was our perspective as a cultural institution and our lack of experience with an open application such as Wikibase with its close integration with the Mediawiki software that initially seemed unusual. Such a close connection, also inscribed in the code, is normal for Wikipedia and many other wikis, but it is unexpected when one reckons with database software. The development is directly driven by a voluntary community, which also explains why many of the attractive additions are still Wikidata-affine and would first have to be adapted for generic re-use by all Wikibase users. But on the whole, Wikibase offers a useful working environment for us.
We were really surprised positively by the potential of an open GND, a GND 2.0. with regard to the possibility of being able to map the entire network of properties and relationships, as well as how these are defined. The sets of rules that are necessary to create truly binding and reliable authority control data can be transferred to a Wikibase instance. Based on our tests so far, we believe that we could use the rule sets modeled in Wikibase to control intelligent and dynamically adapting input masks. This promises to greatly simplify collaboration with non-librarians. The future will show whether this promise can be kept.
And what will the future bring?
Barbara Fischer: Since the GND is integrated into a whole system of services for the national library sector, we will continue the authority file in the current system. But with a full GND Wikibase instance, we want to give the GND a second home and set up outposts for extensions that are difficult to implement in the current environment. With Wikibase we want to create an extended access to the GND for interest groups for whom the librarian editorial interfaces are not suitable.
The next steps are therefore the complete import of the approximately eight million GND entities, the modeling of the properties according to the GND data model, the documentation of the rules according to which the entities are captured and finally a resilient synchronization of the Wikibase instance with the “master” in the traditional CBS system.
Jens Ohlig: There is still a lot of work ahead of us. Wikibase is a storage for structured data. Much of what librarians need in their daily work is still missing: from the conversion of standard formats like the bibliographic data formats MARC21 or PICA+ to specialized interfaces for data entry in intelligent forms. All this must now be developed and adapted to the software.
In the course of the year 2020, concrete plans for the use of Wikibase at GND are now before us. The partnership between Wikimedia and the German National Library has proven successful. Together we want to further develop a Wikibase-based GND and put it into use. With the GND based on Wikibase, a strong foundation is growing in the ecosystem of free linked data, which is attracting the interest of many other institutions: for example, the French national library Bibliothèque Nationale de France (BNF) and the university library association ABES in France are already working on their own prototype with Wikibase. Reason enough for the Deutsche Nationalbibliothek and Wikimedia with the BNF and ABES to seek an exchange at a joint meeting in Paris in the coming weeks. We are confident that Wikibase will not only become a home for authority files, but that it will evolve over time into a linked open data ecosystem.