zurück

Platypus, a speaking interface for Wikidata

Jens Ohlig

23. February 2015

PPP (Projet Pensées Profondes)  is a student project aiming to build an open question answering platform. Its demo, Platypus (http://askplatyp.us) is massively based on Wikidata content. 

At the École normale supérieure de Lyon we have to do a programming project during the first part of your master degree curriculum. Some of us were very interested in working on natural language processing and others on knowledge bases. So, we tried to find a project that could allow us to work on both sides and, quickly, the idea of an open source question answering tool came up.

This tool has to answer to a lot of different questions so one of the requirements of this project was to use a huge generalist knowledge base in order to have a usable tool quickly. As one of us was already a Wikidata contributor and inspired by the example of the very nice but ephemeral Wiri tool of Magnus Manske, we quickly chose to use Wikidata as our primary data source.

This is why, after four months of hard work from seven people, we are happy to introduce Platypus, the new English speaking interface for Wikidata.

It is available as a simple web application. A getting started manual can be found here: https://projetpp.github.io/demo.html

Platypus, the true Jimbo Alpha?

Platypus, using advanced natural language processing techniques and Wikidata, is able to answer a lot of questions, from the simple ones like “What is the birth date of Douglas Adams?” to the strange “What are the daughters of the wife of the husband of the wife of the president of the United States?” Currently most questions that may be answered using a single statement from Wikidata are supported. Platypus is also able to do simple spell checking in order to be able to answer to questions like “What is the cappyttal of Franse?”.
As computer scientists, we love mathematics, so the Platypus is also able to simplify a lot of mathematical formulas written in a natural-like syntax like “sqrt(180)”, in Mathematica like “Sum[1/n^42, {n,1,Infinity}]” or even in LaTeX like “\sum_{i=0}^n i^2”.

Why Wikidata is amazing

Wikidata was a very good choice because with its strong database of labels and aliases it has allowed us to easily find the Wikidata entities matching a given term using the search suggestion API of Wikidata. So it is very easy to map terms of natural languages to Wikidata identifiers and then use the statements in order to answer to most of simple questions like “When is X born?”.
Platypus is also an amazing excuse to improve Wikidata: questions for which Platypus does not give the answer are often an occasion to add relevant data to Wikidata, and different formulations of questions are sometimes the occasion to add aliases to properties in order to improve their discoverability. It also made us discover vandalism on various Wikidata items. As example, the result of the query “Barack Obama” was broken a day because of a change of the English label of its item on Wikidata. After the revert of the vandalism and a cache purge Wikidata was clean again and this question worked.
We are also looking forward to improvements to Wikidata like the addition of support for quantities with units in order to increase the number of answerable questions.

Conclusion

The student project is finished since a few weeks but the open source project continues. We are currently working to add the support of other languages like French, improving the global performances and investigating in order to add context to question to be able to answer to things like “What is his birthdate?” after “Who is the president of the United States?” or “Where is the closest Wikimedia user group?”. People are welcome to help us on these points, or more globally to improve Platypus.

Hinterlasse ein Kommentar

Your email address will not be published. Required fields are marked *