The Impact of Wikidata and its Community – Part 1

With Wikidata’s 10 year anniversary on the horizon, we take the opportunity to look at what made Wikidata the collaborative project it is today.

Luca Martinelli [Sannita]

18. October 2022

After the blog post about community-created tools, we wanted to explore the impact of Wikidata and its contributors in fostering the Wikimedia mission in various areas of the world.

Once again, these are just some examples of initiatives or stories about the impact our project has had. Examples that prove, nonetheless, how Wikidata is not just a powerful tool to collect data to be reused on the Wikimedia projects, but also a hub that can intercept and host people and initiatives that can be beneficial to the Wikimedia mission – and vice versa. We hope these examples will resonate with you and maybe inspire you for the next ten years.

Go to:

Mapping West Bengal, From Villages to Cultural Heritage

Of course, one common way to generate impact through Wikidata is to use it to highlight territory and its landmarks. For example, the West Bengal Wikimedians user group works on topics related to their Indian State across the Wikimedia projects, with most of their activities focusing on their cultural heritage.

Bodhisattwa Mandal, known on the projects as Bodhisattwa, is one of its main contributors: “our user group has a very loose structure, there is no paid staff and no hierarchy, just volunteers who help each other and work together.” Since 2017, WBWUG has a program focused on photo-documenting cultural heritage in their home state: “sometimes they are in very remote places, and sometimes they are in very bad conditions. Sometimes our documentation is the first or even the only free documentation of such heritage.“

As it happens with any project of this kind, having a list of places is fundamental to the effort. Bodhisattwa started his own Wikidata-powered database in 2018 when the group already had a considerable amount of photos. It was then that he found out that even very basic data about West Bengal, such as how many villages there were, was still missing. So, he decided to fill that gap. “It took me three months to import all of the 40,000 villages of West Bengal on Wikidata,” he says, adding that he had to limit himself to his own State (“If I wanted to do it for all the over a million villages that we have in India, it would have taken me 3-4 years”).

But he didn’t just stop at villages, since after that first import he started to ingest data about all the features that a city or a village has, such as more than 100,000 schools of all levels that are in West Bengal, thousands of hospitals and other medical structures, thousands of post offices, and so on. “It took me another 6-7 months to ingest all of this data. Some of them were already on Wikidata – he continues – but they were incomplete, so I had to manually check every Item to avoid duplicates.“ The import was also complicated by the fact that government data was mostly in English while most of the existing Wikidata Items were in Bengali.

Bodhisattwa, by the way, was not alone in this huge import project: “I had some help from Mahir256 for the import of railway stations, since he was interested in that topic. There was another volunteer who worked on the administrative subdivisions’ geo-shapes, and then there were some non-Wikimedia people who made my life so much easier by cleaning up the government databases, since some datasets did not match with each other.” Work is still far from being completed, however: “We still have work to do on the villages’ geo-shapes, because uploading 40,000 of them on Commons and then linking them back from Wikidata is a lot of work to do.”

All of these imports became, in the end, to be extremely useful for the user group’s activities: “when we started organizing Wiki Loves Monuments in India, we used Wikipedia for our first lists, but now everything comes from Wikidata. We no longer use any manual list. All of our activities are interconnected to each other, and all the activities are interconnected to Wikidata as a necessity.“

Crowdsourcing Wiki Loves Monuments in Italy

Similar stories can be found elsewhere in the world: Wikidata has proven to be such a powerful ally (as well as Magnus Manske’s Listeria) in creating the lists of monuments eligible for the yearly contest, that basically all chapters and user groups say they rely on them. And there’s no reason not to believe them; it’s just so much easier to do it this way.

Specifically in Italy, however, relying on Wikidata was a true game changer for a number of reasons. I (as in the author of this blog post series) have been directly involved in the organization of Wiki Loves Monuments Italy since 2017, mostly working on the creation of the lists of authorized monuments – where “authorized” is the key word to stress here.

In fact, Italy has no provision regarding freedom of panorama, but has in turn a very strict legislation about cultural heritage which requires people to pay a fee to the relevant State administration in order to publish a photo of any monument “for commercial purposes.” Which administration has to be paid, though, is not at all clear, since management of cultural heritage in Italy is a “dispersed” matter along all levels of administration. Moreover, you might be required to pay this fee even if these “commercial purposes” are only theoretical, such as the ones that arise with publishing a photo on Wikimedia Commons.

So each year, Wikimedia Italy is forced to spend hundreds of hours in convincing and advising countless local administrators, as well as public and private owners and institutions, on how they can authorize people to take photos of their relevant monuments for Wiki Loves Monuments (WLM) and, and get the allowance to publish the photos under a CC BY-SA license. But that’s just half of the work: the other half is actually identifying the monuments that we are asking for. And this is where Wikidata came majorly into play.

This will probably come as a surprise to many, but Italy has no officially sanctioned national list of cultural heritage sites. There are, of course, many datasets from many agencies of the national and regional governments that try to map them, but there is no “one list to rule them all”. This is why, throughout the years, Wikimedia Italy acted in two ways: one was asking all local administrations to contribute by adding their own missing monuments in their authorisations so that we can, in turn, add them on Wikidata. The other way was curating several mass uploads of data from open national and regional cultural heritage databases into Wikidata.

This resulted in a staggering list of 133,184 potential monuments… of which, to date, only 25,378 actually take part in the contest (around 19% of the total number). “Better than nothing at all,” one might say. To better foster awareness about this topic, Wikimedia Italy developed a visual tool for consulting how many monuments are mapped, authorized, or photographed for the first time, down to the single municipality and with breakdowns per year.

In other words, each year’s Wiki Loves Monuments edition in Italy is a multi-pronged initiative, that starts every Spring and lasts well after the end of the contest, to crowdsource a Wikidata-powered list of all the relevant cultural heritage in Italy – so that people might freely take a picture of it and finally illustrate the relevant articles. This effort was somewhat recognised by the State institutions in Italy. But, the dialogue is still complicated by many factors – most of all, the possibility to reuse a picture for commercial purposes guaranteed by the CC BY-SA license.

How Wikidatans Covered the COVID-19 Pandemic

Italy was also infamously the first country to experience a nation-wide lockdown due to the global pandemic of COVID-19. As the English Wikipedia article about COVID-19 lockdowns reports, by April 2020 “about half of the world’s population was under some form of lockdown, with more than 3.9 billion people in more than 90 countries or territories having been asked or ordered to stay at home by their governments.”

This unprecedented situation that affected virtually everyone’s life gave, however, the time and opportunity to many Wikimedians to cover – arguably for the first time in history – a pandemic in an open-source format. Wikidata, of course, was part of this concerted effort, through its Wikiproject COVID-19.

Brazilian biomedical scientist and current PhD student Tiago Lubiana, who goes by the same name on our projects, was among its founders. For years Tiago wondered “how we could take all that information that is locked away in scientific papers and articles and put it into an interoperable format, in a way that we can share it with each other.” His first contact with the Wikimedia projects was with a Brazilian Wikipedian who was mostly translating scientific articles on Portuguese Wikipedia. And it was this editor who, in early 2019, introduced Tiago to the Wikidata Labs, an initiative of Wiki Movimento Brasil which aims to get people to know more about Wikidata.

He immediately started to see the potential of the project in relation to his dream of making scientific data interoperable – so much that Wikidata became, in a sense, part of his PhD grant: “I read this publication, ‘Wikidata as a FAIR knowledge graph for the life sciences,’ which basically became the base of my PhD grant request, that was subsequently granted.” And then, right before his master’s dissertation, the pandemic hit: “it was a bit of a mess, because I had the grant approved but I couldn’t start my PhD right away, so I had this ‘extra time’ of my own, and that’s when I truly dedicated myself to the COVID-19 project.”

The first step, of course, was creating the community that would take care of the project: “I started checking all the Items and pages related to the coronavirus, and sending messages to their editors. I don’t know how many people I contacted at first, but some of them replied “hey, that’s a great idea, let’s do it.” And then Jodi.a.schneider properly created the project, and it started just like that.”

The project was extremely busy in its first three months of life. Each participant “adopted” a certain aspect of the pandemic, from the countermeasures and social effects of the pandemic, to the modeling of proteins, genes and COVID-19 variants, to the count of cases, deaths and hospitalizations, as well as the scientific literature that was released. All different aspects of the same global phenomenon and all things that required lots of data modeling and technical work.

The core tenet of the project was, however, to be a common place where people could work together on these separate aspects of the same topic. The group even managed to have several video meetings during the pandemic – and as Tiago confirmed, “it wasn’t that easy to accommodate the different time zones and countries we were from!“ The project went on, following the trends of the pandemic, slowing a bit when lockdowns started to be lifted, and then experiencing new life when contagions started going up again.

But its biggest result was already achieved by then: “one thing that Daniel Mietchen kept saying – Tiago remembers – was that the project was important also in terms of creating the meta-infrastructure on how do we deal with a pandemic, and how to represent knowledge about a pandemic.” According to him, there is still a lot of untapped potential for the project, since we can say that “it was the first attempt, so probably not the best, and unfortunately it will not be the last,” but overall the result was definitely a success.

More things to know about Wikidata

On October 29, Wikidata celebrates its 10th birthday! To mark the occasion, we’ve published a series of blog articles with lots of interesting facts about the history of the world’s largest free knowledge database and its unique community.