zurück

Where did that paragraph go? This software change helps volunteers hold up Wikipedia’s high quality

High quality and reliability of content are at the heart of Wikipedia, but this requires a massive collaborative effort. Recent technical improvements have made the process easier.
Charles J Sharp (https://commons.wikimedia.org/wiki/File:Meerkat_(Suricata_suricatta)_Tswalu.jpg), https://creativecommons.org/licenses/by-sa/4.0/legalcode

Johanna Strodt

30. August 2018

Comparing page versions

One key to Wikipedia’s high quality is a system of mutual checks, based on the fact that every version of every page is stored and accessible. Thousands of community members review the latest edits of others in order to find errors or inconsistencies, comparing each new version of a page to older page versions and checking if the new content complies with guidelines for citation, style, orthography and more. Edits are also scrutinized for subjectivity, copyright violations or vandalism. If a problem is found, it usually gets corrected within minutes.

Wikipedia editors know which articles may need monitoring through a variety of ways. Many logged-in contributors save the pages they’re interested in to their personal watchlists and routinely check them for quick overviews of changes made on those pages. Furthermore, each language version of Wikipedia has a recent changes page, a ticker that shows the latest edits to all of its pages. Users who closely monitor this page can use filters to view the types of edits they’re interested in, such as changes by unregistered users. There are also ways to check changes by looking at all edits a particular user has made, or by directly examining the version history of a specific Wikipedia page.

Where did that paragraph go? The new improvement adds clarity. Charles J Sharp, Meerkat (Suricata suricatta) Tswalu, CC BY-SA 4.0

The problem with moved paragraphs

A widely used tool for comparing versions of a Wikipedia page is the wikitext diff, a two-column view that shows an older version of a page on one side and the newer version on the other side. The tool displays the two versions in the wikitextmarkup and highlights differences between them with a color code.

A simple wikitext diff: In the newer version, some text was removed (highlighted in yellow) and some other text added (highlighted in blue).
“What a wikitext diff looks like” by Johanna Strodt (WMDE), under CC-BY-SA-3.0

However, in the past, it was often hard and time-consuming to compare page versions. Due to a technical limitation, whenever a part of a text was simply moved to another position on the page, it was displayed as if it had been removed and some other text had been added. Even worse, there was no easy way to see if someone had changed the text that had been moved. In consequence, Wikipedia editors had to spend time checking whether a text had been moved or removed and then more time identifying changes between the different versions:

A chunk of text was moved. Can you spot the changes inside?
“Wikidiff2 – moved paragraph goats – before” by Johanna Strodt (WMDE), under CC-BY-SA-3.0

Detecting moved paragraphs made simpler

We wanted to create a wikitext diff view that would show both moved text chunks and the changes inside them. But what might sound like a simple change was actually a very delicate task for two reasons: first, changes to the diff code can affect the speed of MediaWiki software, and second, detecting moved pieces of text isn’t trivial: How much can a paragraph be changed to still qualify as the same, moved piece of text?The Technical Wishes team from Wikimedia Deutschland, the German Wikimedia chapter in Berlin, took on this task, supported by software teams from the Wikimedia Foundation. Our project aims to improve the software behind Wikipedia, so our developers dove deep into the wikidiff code, and put a lot of effort into improving, fine tuning and testing it.[1]

After lots of programming, testing and even more testing, the wikitext diff now clearly indicates moved text chunks with the help of little arrows, and highlights changes that were made within them:

Now it’s clearly indicated that two paragraphs were moved and which text was changed within them.
“Wikidiff2 – moved paragraph goats – after” by Johanna Strodt (WMDE), under CC-BY-SA-3.0

This change has been active for the desktop view on German Wikipedia for a few months now, and on most other wikis for a few weeks.

The most recent news from the world of diffs is on your phone: As of this week, moved text chunks are shown correctly on mobile devices as well. In order for this to happen, the Wikimedia Foundation’s Reading Web team took our recent changes in the diff code and developed styles for it in the mobile view:

This is what the diff view now looks like on mobile devices. In this example, the paragraph “The smallest dog […]” was moved down on the page, and the word “merely” was replaced by “only”.
“Wikidiff2 – moved paragraph mobile” by Johanna Strodt (WMDE), under CC-BY-SA-3.0

 

By the way, a similar technical improvement was released in early 2018 by the Wikimedia Foundation: The Visual Diff, a tool for users who prefer a visual view over wikitext, also shows changes in moved text chunks. The code behind it, however, is completely independent from the code of the wikitext diff.

We’re hoping that all these improvements are making the life of many contributors easier and will support them in the vital work they do in quality assurance.

Footnotes

1: If you’re interested in our challenges and learnings, this post is for you.

Leave a Reply

Your email address will not be published. Required fields are marked *