User:Chiarcos/sandbox

In natural language processing, linguistics and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.

Linguistic Linked Open Data

Linguistic Linked Open Data is a movement about publishing data for linguistics and natural language processing using the following principles:

Data should be openly license using licenses such as the Creative Commons licenses.
The elements in a dataset should be uniquely identified by means of a URI.
The URI should resolve, so users can access more information using web browsers.
Resolving an LLOD resource should return results using web standards such as RDF.
Links to other resources should be included to help users discover new resources and provide semantics.

The primary benefits of LLOD have been identified as:

Representation: Linked graphs are a more flexible representation format for linguistic data
Interoperability: Common RDF models can easily be integrated
Federation: Data from multiple sources can trivially be combined
Ecosystem: Tools for RDF and linked data are widely available under open source licenses
Expressivity: Existing vocabularies help express linguistic resources.
Semantics: Common links express what you mean.
Dynamicity: Web data can be continuously improved.

Uses of LLOD

Linguistic Linked Open Data was applied to address a number of scientific research problems:

In all areas of empirical linguistics, computational philology and natural language processing, linguistic annotation and linguistic markup represent central elements of analysis, but progress in this field is being hampered by interoperability challenges, most notably differences in vocabularies and annotation schemes used for different resources and tools. Using Linked Data to connect language resources and ontologies/terminology repositories facilitates re-using shared vocabularies and interpreting them against a common basis.
In corpus linguistics and computational philology, overlapping markup represents a notorious problem to conventional XML formats. Hence, graph-based data models have been suggested since the late 1990s. These are traditionally represented by means of multiple, interlinked XML files, which are poorly supported by off-the-shelf technology. Modeling such complex annotations as Linked Data represents a semantically equivalent formalism, but eliminates the need for special-purpose technology, and, instead, relies on the existing RDF ecosystem.

Selected LLOD resources

lemon, a community-maintained vocabulary for lexical resources
OLiA, terminology repository for linguistic annotations and grammatical metadata
lexinfo, vocabulary for dictionary metadata
UBY lemonUBY, multilingual lexical resource for German and English
Cross-Lingual Linguistic Data (CLLD), a collection of datasets from linguistic typology
Glottolog/LangDoc, linguistic bibliography and fine-grained language identifiers

LLOD cloud development and community activities

The LLOD cloud diagram is maintained by the Linguistics Working Group (OWLG) of the Open Knowledge Foundation (since 2014 Open Knowledge), an open and interdisciplinary of experts in language resources.

The OWLG organizes community events and coordinates LLOD developments and facilitates interdisciplinary communication between and among LLOD contributors and users.

Several W3C Business and Community Groups focus on specialized aspects of LLOD:

The Ontology-Lexica Community Group develops and maintains specifications for machine-readable dictionaries in the LLOD cloud
The Best Practices for Multilingual Linked Open Data Community Group gathers information on best practises for producing multilingual linked open data.
The Linked Data for Language Technology Community Group assembles user cases and requirements for language technology applications that use Linked Data.

LLOD development is driven forward by and documented in series of international workshops, datathons, and associated publications. Among others, these include

Linked Data in Linguistics (LDL), annual scientific workshop, since 2012
Multilingual Linked Open Data for Enterprises (MLODE), bi-annual community meeting, since 2012
Summer Datathon on Linguistic Linked Open Data (SD-LLOD), bi-annual datathon, since 2015

Uses and development of LLOD have been subject to several research projects, including

LOD2 (11 EU countries + Korea, 2010-2014)
MONNET (5 EU countries, 2010-2013)
LIDER (5 EU countries, 2013-2015)
QTLeap (6 EU countries, 2013-2016)