WPCom 1: Lexicon, Syntax, Semantics: WSD and MT

Summary

Word Sense Disambiguation (WSD) and Machine Translation (MT) are two key problems of natural language processing where the role of the lexicon is critical. While there are many different inventories of word senses for a particular language, it is clear that a minimal set of word senses can be defined by looking at translations into other languages (which are not synonyms).

Content:

The seminar will begin with the basics of Statistical Machine Translation and Word Sense Disambiguation, and then look at attempts to use approaches taken from the WSD literature in MT.

Goals:

The goal of the seminar is to understand the basics of MT, WSD and in particular the important role of the lexicon in both of these problems.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich


Schedule


Room C003, Tuesdays, 16:00 to 18:00 (c.t.)


Date Topic Reading (DO BEFORE THE MEETING!) Slides
October 13th Organizational Meeting, Personal Information, Orientation Test
October 20th Introduction to Statistical Machine Translation ppt pdf
October 27th Bitext alignment (extracting lexical knowledge from parallel corpora) ppt pdf
November 3rd Many-to-many alignments (also, Referat!) ppt pdf
November 10th Phrase-based model; Log-linear model and Minimum Error Rate Training (two slide sets) ppt pdf

ppt pdf

November 17th Decoding (Guest Lecture from Ales Tamchyna)     pdf
November 24th Advanced Word Alignment, Morphology, Syntax ppt pdf
December 1st Introduction to Word Sense Disambiguation Start reading Navigli (see below) ppt pdf
December 8th Introduction to Linear Models Navigli, Sections 1 and 2 pptx pdf
December 15th 2 Referat presentations (see below)
December 22nd *Kalahari* computer lab (confirmed, this is near the new lecture halls). Referat, followed by a computer lab. Navigli, Sections 3 and 5 tar.gz (See the included file Slides.pdf. Note also that the label used in this classification problem is 0 and 1 (meaning false and true), but wapiti does multiclass classification, so you can use any string as a label)



Referatsthemen (name: topic)


Date Topic Materials Hausarbeit Received
December 15th Neuburg: Literature Supervised WSD yes
December 15th Krammer: Literature Dictionary-based WSD yes
December 22nd Andreyeva: Literature Unsupervised WSD yes
January 12th Höps: Project 6 Moses EN-DE yes
January 12th Siilivask: Project 2 Cross-lingual substitution yes
January 19th Handelshauser: Project 1 Supervised WSD yes
January 19th Moiseeva: Project 4 Wikification yes
January 26th Ling: Project 7 Google Translate German Compounds yes
January 26th Conforti: Project WSD for Venetian yes


Literature:

Philipp Koehn's book Statistical Machine Translation

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)

Roberto Navigli's tutorial on WSD (here is a local copy)