Why don't people use character-level machine translation?

Jindřich Libovický, Helmut Schmid, Alexander Fraser 

Findings of ACL 2022

We present a literature and empirical survey that critically assesses
the state of the art in character-level modeling for machine
translation (MT). Despite evidence in the literature that
character-level systems are comparable with subword systems, they are
virtually never used in competitive setups in WMT competitions. We
empirically show that even with recent modeling innovations in
character-level natural language processing, character-level MT
systems still struggle to match their subword-based counterparts.
Character-level MT systems show neither better domain robustness, nor
better morphological generalization, despite being often so motivated.
However, we are able to show robustness towards source side noise and
that translation quality does not degrade with increasing beam size at
decoding time.