Volume 6 - 2014

Abstract

Wiktionary is a rich source of linguistic knowledge and an example of a successful application of the crowdsourcing model. Knowledge in Wiktionary is only weakly structured, so in order to enable the use of that knowledge, it is necessary to represent it in a structured form which can be automatically searched and processed. Semantic web structures are especially suitable for this task because of the developed standards for interlinking different semantic web knowledge bases. Basic Wiktionary extraction has already been done as a part of DBpedia project. We present the extraction of detailed grammatical data which is obtained by merging unstructured content contained within different pages of the MediaWiki XML dump file. As an example, we'll process French verb conjugations, which is currently one of the few such examples of sufficient complexity found on Wiktionary. The main problem we will solve is analyzing and parsing a subset of the MediaWiki template system and its control structures. Based on that, we will generate RDF triples which will completely cover all domain data that is currently included in Wiktionary.

Keywords: Crowdsourcing, semantički web, Wiktionary

Published on website: 4.2.2014

Attached files: ekstrakcija-gramatickih-podataka-na-primeru-wiktionary-projekta.pdf

FaLang translation system by Faboba

GRAMMATICAL DATA EXTRACTION FROM WIKTIONARY

Volumes

Volume 1 - 2009

Volume 2 - 2010

Volume 3 - 2011

Volume 4 - 2012

Volume 5 - 2013

Volume 6 - 2014

Volume 7 - 2015

Volume 8 - 2016

Volume 9 - 2017

Volume 10 - 2018

Volume 11 - 2019

Volume 12 - 2020

Volume 13 - 2021

Volume 14 - 2022

Volume 15 - 2023

Volume 16 - 2024

Volume 17 - 2025

Authors

Search