ERC Starting Grant: Domain Adaptation for Statistical Machine Translation (DASMT)

Rapid translation between European languages is a cornerstone of good governance in the EU, and of great academic and commercial interest. Statistical approaches to machine translation constitute the state-of-the-art. The basic knowledge source is a parallel corpus, texts and their translations. For domains where large parallel corpora are available, such as the proceedings of the European Parliament, a high level of translation quality is reached. However, in countless other domains where large parallel corpora are not available, such as medical literature or legal decisions, translation quality is unacceptably poor.

Domain adaptation as a problem of statistical machine translation (SMT) is a relatively new research area, and there are no standard solutions. The literature contains inconsistent results and heuristics are widely used. We will solve the problem of domain adaptation for SMT on a larger scale than has been previously attempted, and base our results on standardized corpora and open source translation systems.

Our work will lead to a break-through in translation quality for the vast number of domains with less parallel text available, and have a direct impact on SMEs providing translation services. The academic impact of our work will be large because solutions to the challenge of domain adaptation apply to all natural language processing systems and in numerous other areas of artificial intelligence research based on machine learning approaches.

Funded by the European Research Council


Principal Investigator

Prof. Dr. Alexander Fraser

Present Senior Staff

Dr. Jindrich Libovicky

Dr. Viktor Hangya

Dr. Dario Stojanovski

Dr. Marion Di Marco (Weller)

Past Senior Staff

Dr. Matthias Huck

Dr. Fabienne Braune

Dr. Aleš Tamchyna

Dr. Tsuyoshi Okita



This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 640550).