Mesa Redonda
Linguistically Annotated Computational Multilingual Resources for NLP and Language Typology

Debatedor: Leonel de Alencar (UFC)

 

Daniel Zeman (Charles University)

Corpus-based language comparison: From morphology to dependencies and beyond

In my talk, I will focus on multilingual dependency treebanks, i.e., corpora annotated with part-of-speech categories, morphological features and syntactic dependency relations. Of particular interest will be the Universal Dependencies project, currently featuring data from 148 languages, annotated in a common annotation scheme. This collection was originally created as a resource to support tools for computational processing of natural languages; however, it has proven useful also for linguistics and other areas of digital humanities. I will suggest that the UD framework helps field workers to document endangered languages and dialects; and I will show how the treebanks can provide basis for language comparison and typology.


Magda Sevcikova (Charles University)

What can we learn about morphology from available multilingual resources?

In recent decades, considerable effort has been devoted to the creation of linguistic data resources, including those incorporating information about morphology. In my talk, I will focus on morphological resources covering multiple languages. I will present different types of multilingual resources that contain morphological annotation, ranging from lexical and typological databases to text corpora. The examples I will provide document how the resources differ, overlap and complement each other. The question I would like to raise for discussion is whether and how merging features from different morphological resources and/or integrating morphological information with linguistic information of other types could advance language comparison based on multilingual resources.