Recently while trying to complete my goal of reading the entire Interwebs, I stumbled upon something I thought the rest of you might be intrigued by: The Levenshtein Distance.
According to Wikipedia, it is “a metric for measuring the amount of difference between two sequences. The Levenshtien distance between two strings is given by the minimum number of operations needed to transform one string into the other.”
An operation, in this case, is an insertion, deletion, or substitution of a single character.
It’s named after Vladimir Iosifovich Levenshtein, the Russian scientist who developed it in 1965.
Jeff, why should I care?
The reason you should be interested in this is because this is how spell-checkers generally recommend words for you. Have you ever wondered how they figure out what you meant?
So how’s it work?
Let’s look at example to see how this measurement works. Let’s figure out the distance from the word “developer” to the word “development.”
1) developer –> developen (substituted the ‘r’ for the ‘n’)
2) developen –> developmen (inserted the ‘m’)
3) developmen –> development (inserted the ‘t’)
So, the Levenshtein distance for those two words is: 3
Are there any other special rules?
Why yes, there are. Here’s a couple of things to keep in mind about this measurement:
1) It can’t be less than the difference between the length of the two strings.
2) It can’t be greater than the longer of the two strings.
3) It can only be zero if the strings are identical.
So what does this look like in code?
I’ve put together a quick little C# console application to demonstrate this logic. You can download it here. Just change out the two strings I used above for anything.
So there you go!
Now you have a simple way to find words that are close matches to a user-entered string. Just zip through your term dictionary, picking out the words with the smallest distances. You’ll probably find their match. Certainly there are many applications for this other than spell checking. Think about recommending customer names as a user tries to search for one. Or account numbers.
Let me know how YOU would use this!
Leave a Reply