ALF - What do Resemblance and Normalised Score mean in the XML output?

Question
What do Resemblance and Normalised Score mean in the XML output?

Answer
Live Forms XML output contains additional information pertaining to the "confidence factor" the character recognition system has in its choice of word or character interpretation. The recognition engine comes up with two types of score, a resemblance score and a normalised score:

Resemblance Score
This score measures the engine's objective opinion as to how closely the candidates suggested by the recogniser match the actual ink provided.
It is calculated on a scale of 0 -> 1, with 1 being the highest level of confidence.

Example:

Ink sample
Recogniser's candidate
Resemblance score
[a circle]
0(zero)
0.99
[zigzag]
m
0.22

The first sample achieves an excellent resemblance score, as it bears an obvious resemblance to "0" and the engine can confidently say that the ink represents "0". The second sample has a very poor score as the engine has chosen "m" but it cannot say in all honesty that the sample truly resembles the letter "m".

Normalised recognition score
This score is in fact an indicator of the possible confusion that may exist between the different candidates in the top list (the list of possible words). It is based on all the elements that the engine has had at its disposal to carry out recognition. We could say that it answers the question, "Considering the other candidates in the list, how confident are you in the selected candidate being the best match for the ink given?". It is influenced by the linguistic knowledge you give to the recogniser, which means that you can still have a normalised recognition score even if the resemblance score is close to zero.

It is calculated on a scale of 0 -> 1, with 1 being the highest level of confidence.
Note: the sum of the normalised scores for a given segment does not have to add up to 1.

Example:

Ink sample Normalised recognition score

Resemblance score

[a circle] 0(zero) = 0.52 0.99 (for "0", zero)
  O=0.47  
[zigzag] m = 0.9 0.22 (for "m")
  µ = 0.08  

Let's look at the circle first... It achieved an excellent resemblance score for "zero", meaning that the engine considers that the ink really looks like a "0". But the normalised scores are lower.

Why?
The second candidate is "O" (letter o) which is very similar to "0" (zero); the engine cannot be completely confident of its choice because these two characters are so similar, confusion is quite likely.

Now let's look at the zigzag. It achieved a mediocre resemblance score because, objectively, it cannot be said that it immediately resembles the letter "m". Yet, it has a good normalised recognition score.
The engine must make a list of possible top candidates. It chose "m" as the first candidate because the scribble is drawn more like an "m" than any other character. It then chose "µ" which seemed to be one of the few other characters that could potentially be written that way. The symbol "µ" seems an even more unlikely match for the scribble than "m".

This example shows the importance of understanding the basic mechanism of scoring and what the two scores can mean. A "good" score is only good within a specific context.

Was this article useful? Thanks for the feedback There was a problem submitting your feedback. Please try again later.