Centrum für Informations- und Sprachverarbeitung

Breadcrumb Navigation


Paraphrase and Coreference in Monolingual and Bilingual Parallel Corpora

This project is a cooperation between the Center for Information and Language Processing (Centrum für Informations- und Sprachverarbeitung - CIS) at the Ludwig Maximilian University (LMU) Munich (Dr. Desislava Zhekova) and the Department of Slavic Linguistics at LMU (Prof. Ulrich Schweier).

One of the main aims of this project is to enhance the state-of-the-art in both paraphrase and coreference resolution as well as to deepen the knowledge on their mutual dependence. We select a new type of corpora, constructed from aligned multiple monolingual translations, that will make this investigation possible. Our assumption is that this data will provide a reliable parallel corpus with rich variations of paraphrased coreferential pairs (both entity and event), thus, an investigation of the connection between paraphrase and coreference in this data will be approached.
Another objective of this work is to make use of this novel data in order to create a dataset automatically annotated for both coreference and paraphrase for an underresourced language (Russian), for which such annotations are not yet freely available, and for German.


Desislava Zhekova
Ulrich Schweier
Robert Zangenfeind
Maximilian Hadersbeck
Alena Mikhaylova
Tetiana Nikolaienko


Digital Humanities Publications