The significance of the evaluation project for the world of open data, the GLAM area and libraries in particular is described here.
According to statistics, people in Germany spend 196 minutes a day on the Internet. And rising. Most of us search the internet; we are looking for consumer offers, media, information or simply data. More and more data is on the net. In everyday life, we rely on the algorithms of search engines. But for more in-depth research in science or for those more independent of the customer wishes of search engine operators, alternative offers are needed. This is where codified data comes into focus. The need for the precise identification of objects, persons, events and locations online increases with the number of search queries and the amount of data. Wikidata is an answer to this need. Another answer lies in the interest in the standards of the German authority file, the GND, which goes beyond the subject boundaries of German-language libraries. Authority Control, as practiced by Wikidata and GND, support the process of linked data and the construction of a semantic web that links items according to their meaning.
What is Authority Control and why do you need it?
Authority files dedicate a name or label for an object to an identifying code number and offer a reliable node in the network by means of a permanent Internet link. As a rule, at least one property of the object is linked in order to achieve disambiguation. In the case of a person, biographic dates are usually added to their name. For locations, the geo coordinates. In the case of events, their duration. Whether other properties and relationships of these to other standard data sets are recorded depends ultimately on the underlying data model. In the case of Wikidata, the volunteer project for the structured and machine-readable description of the world, the list of properties and relations of a data object is basically infinite. The community of GND contributors maintains a much tighter set of rules and understanding of authority control.
Like all artefacts created over a longer period of time by a group of people, standard data systems are subject to the laws of social systems. In order to ensure quality, to simplify decision-making and work processes and to set them apart from other systems, they often develop increasingly complex conventions and rules over time. This happens in Wikidata as well as in the GND despite the fundamentally different framework conditions. In addition to the rules, there are conditions that are set by the respective technical prerequisites. Databases are defined in their functionality by the underlying software. Communication between two different databases, which also use different software, is difficult. Data exchange, whether small or large, cannot take place automatically. The linking and federation of data, the joint maintenance of data across system boundaries is not entirely impossible, but it is cumbersome.
The German National Library and Wikimedia Deutschland e. V. each want to promote the Semantic Web by means of Linked Data projects. Therefore, we have been trying for some time to change our respective frameworks in such a way that the opening of the respective system becomes more attractive for further contributors. And independently of each other, we have experienced that it is not enough to open a door. Now we are building a bridge to achieve our goals together.
What we want and how we will proceed
Together we are building a testing ground of several standards databases as Wikibase instances. This means that the team of the German National Library creates structures in the Wikibase software in order to be able to map the GND as fully as possible from different perspectives. The first database represents the GND as used and processed by libraries today. A second database models the GND extended by the additional needs of cultural institutions such as museums and archives. And finally, the third database, Factgrid, creates a research database for historical persons and corporations on the basis of GND data records, which will no longer be an actual authority file. They all use the Wikibase software and the GND entities. In this testing landscape we will test and evaluate the performance and usability of Wikibase. We want to know how easy it is to synchronize data between instances. Which possibilities of the federation from Wikidata can be transferred to the GND Wikibase instances? Does Wikibase technically simplify cooperation across institutional boundaries? Can Wikibase guarantee the reliability of library authority control via the role models of Wikidata through complex rights handling? What about the post-use potential of many practical user tools developed for Wikidata for other Wikibase instances? Wikidata is used, for example, to visualize timelines with Histropedia – from ABBA albums to Zulu kings. Or Wikidata provides the technology behind Crotos, a visualization and discovery tool for artworks. If Wikidata offers these possibilities for building user tools because it collects the encyclopedic data on human knowledge, what exciting applications are conceivable based on specialized data from own Wikibase instances?
In the first step, we have now created three Wikibase instances, i.e. databases, and model the properties and relationships that are to apply in the respective instances. We just had a joint project meeting in Frankfurt, where the colleagues from Wikimedia Deutschland e. V. and the German National Library jointly coordinated work processes for importing data from the existing GND into the new structures. In the coming weeks, we will successively import entity types selected from the existing GND, such as all data on persons and geography, into the new instances. This should have reached a level by summer that will allow us to begin further usability and synchronization testing.
Since the GND as an authority control must be absolutely reliable, it must be ensured that the users or editors assume permanent responsibility for the correctness of the data they enter. Although it is much easier in Wikibase than in the rigid software structure of the existing system, in which the current GND is maintained, to assign a document to each statement, and the provenance of the data can also be more easily documented in Wikibase, it must nevertheless be ensured in the GND that the maintenance, updating, legal security of the data and any necessary corrections are carried out quickly. For this reason, GND will continue to be open to participation only by a closed group of users under contract, even if this group is expanded by new groups of participants. This is one of the main differences to Wikidata, where everyone is invited to participate and where on the one hand contributors must take on responsibility for the data entered and on the other hand the community of authors takes on overarching control over the quality of the data.
Wikidata also knows different roles within the community. Each of these roles has different rights and responsibilities. From the users who enter data, to the admins with extended permissions, to special permission groups such as those who can create new properties, this often corresponds to the specializations and hierarchies in libraries – and yet in some places it is quite different and more linked to the wiki principle than to library science. One of our upcoming tasks will be to check whether the existing possibilities for role definition in Wikibase are sufficient to meet the requirements of GND’s rights management, or whether and if so, which adaptations of the software would be necessary.
Until WikidataCon at the end of October, we plan to present a first evaluation report, which will show the opportunities, challenges and first ideas for solutions in the use of Wikibase for the GND. After all, we all want to make GND more accessible to a larger group of users and make the subsequent use of authority control for Wikidata and the integration of GND into new applications as attractive as possible. We are curious to see to what extent Wikibase will meet these expectations. Follow our updates if you are curious about the results.
Please comment on this article and do not hesitate to share your questions and suggestions with us.