Using the Middle High German RNNTagger in WebLicht

Part-of-Speech tagging and lemmatization of Middle High German texts using WebLicht On the result page you will find

Sentence Boundaries

The detection of sentence boundaries does not work very well for Middle High German texts because they often lack unambiguous sentence-final punctuation.


The Lemmatizer has been trained on the Middle High German Reference corpus ReM and follows the same conventions: The basis for the lemmatization is Early Middle High German. Contrary to the well-known Lexer list, the lemmas do not show final-obstruent devoicing (Auslautverhärtung) and degemination (Geminatenkürzung). The "Umlaut"-e is represented as "è".

Please send questions, comments, suggestions and bug reports to Helmut Schmid at