Alexander Fraser
fraser@cis.uni-muenchen.de

AER was defined in Och and Ney 2003. I showed that the simple AER
formula is incorrect (not derived from F-Measure correctly) in my 2007
journal paper on word alignment quality. I also showed that the AER
score of an alignment doesn't predict downstream SMT performance very
well.

This simple example shows how AER and F-Measure do not
track. F-Measure is worse for the second alignment, which is
correct. But the AER is the same for the second alignment, this cannot
be correct.

In my work on discriminative alignment I have observed that, in
practice, systems that optimize for AER tend to predict very few word
alignment links, and get unreasonably good AER scores by doing this
(e.g., like 5% on the Aachen French/English alignment). But the
alignments produced are very poor for SMT, and probably most other
purposes.

Sure: A-A B-B Y-Y Z-Z
Possible: A-A B-B Y-Y Z-Z Q-Q R-R

Alignment 1
A-A B-B D-D E-E   P=0.50 R=0.50 F=0.50    AER = 1 - ((2+2)/8) = 0.50

Alignment 2
A-A Q-Q R-R D-D   P=0.75 R=0.25 F<0.50    AER = 1 - ((3+1)/(4+4)) = 0.50