Fuzzy Logic:
- Coding without glasses.
- A halfway house for Boolean values that won’t go all the way.
- The essay answer to a true-or-false question.
Boolean values that won’t go all the way
We’re impressed when computers, defined by switches that can either be on or off, do something that seems like they come up with the “maybe” answer.
Well, the maybe is left to interpretation or rest on someone implying that it means something other than Yes or No. And when computers are involved, how much do we value or trust their maybe?
Fuzzy logic is just a fancy name for a return value that isn’t a boolean true or false. When dinosaurs roamed the earth, false was represented with a zero (0) and true was represented as a one (1). By convention, a fuzzy value is considered to be a decimal value between zero (0) and one (1).
If fuzzy logic presents its answers as either zero or one or something in-between, what does 0.93 mean? What does 0.37 mean?
Love/Hate Relationship
There’s the real question. Real developers love this question and its answer. Managers hate this question and its answer – and for exactly the same reason: it depends on the domain.
Managers especially hate this part: it largely involves the data, too. Poor data mean erratic fuzzy results. Precise data mean reliable fuzzy results. Maybe I’ll create a post on reliability one day. Today, my head already hurts.
But Fuzzy Logic is what happens when Booleans aren’t the right answer.
Why Fuzz?
One of the more useful computer tools judges whether a word or a string is like another word or string. A very simple, but popular algorithm for comparing words is Soundex, invented in the early 1900s and patented in 1919. (Use your research staff — Google and Wikipedia — to learn more.)
The idea of true-or-false comes easy to computers. This is because they are simple On/Off switches to which we’ve ascribed super-human traits. But most of our world consists of probabilities, likelihoods, nearness of value, or patterns that we want a computer to recognize.
Soundex is good for phonetics; American phonetics. This obviously has limits in today’s marketplace. It’s also limited to phonics, which won’t help much when trying to compare addresses or phone numbers, or the almighty social security number — or the verboten credit card number.
Soundex is, therefore, tolerable in Fuzzy 101, but it won’t get a good grade in most sophisticated problems today. Customers from different countries need flexible ways to compare input values.
So what does it mean to have a high input value?
A data element with a high input value means a fuzzy result that may be imbued with a reliable importance. The year of a birthdate may be considered more reliable than the day. Folks tend to assure that data entry gets their year right and pays less attention to the day (or even the month).
In the following example, I’ll use demographic data: names and addresses. In a name, the first name has a lower data value than the last name (in the US; switch that in China). That’s because many, myself included, don’t use our first name, or use a nickname instead of a first name. Few, if any, mistreat their family names in the same, cavalier manner. A first name has a lower data value than a last name. Folks tend to make sure the data entry person gets the last name right, anyway.
Usage
Different groups of numbers and letters must be compared differently. A name is not an address is not a phone number is not a date. Each should have different algorithms for comparison. These differences include the following:
- Phonetic substitution
- Use of diagraphs
- Use of special characters
- Vowels
- Keyboard proximity
“CRUNCH“ | C | R | N | C | H |
“CRASH“ | C | R | S | H | |
Outcome | Match | Match | NoMatch | NoMatch | Match – 1 |
Valuation | 1 | 1 | 0 | 0 | .5 |
Weighting | 60 | 10 | 10 | 10 | 10 |
Totals | 60 | 10 | 0 | 0 | 5 |
From a possible value of 1, the weighted calculation for this possible match is 0.75. Is this a match? I doubt it from the result, but in some cases, with some data, it might be worth a human look.
Diagraphs
Diagraphs, or blends, produce one phonetic sound from a blend of two letters. It’s important to consider the use of substitute symbols when comparing words with diagraphs.
Rejecting vowel diagraphs because most fuzzy matching eliminates vowels (except at the beginning of a word or name), let’s look at some consonant diagraphs.
Diagraph | Replacement | Sample Comparisons |
CH | ♠ | ♠ :: C = 0.8 ; ♠ :: S = 0.7 |
SH | ♖ | ♖ :: S = 0.8 ; ♖ :: ♠ = 0.6 |
TH | ⛉ | ⛉ :: T = 0.9 |
These examples are incomplete, but give a good feel for symbolic substitution so that a diagraph can be used in a letter-by-letter comparison of two strings.
Word-by-Word, too
The same things can be done by splitting two strings into their individual words and comparing them by proximity (just like we did to the characters above). This can increase our likelihood of catching transposed word placement.
Weighting
When making near-hit comparisons, the nearly-right answer must be weighted: given a factor by which the comparison can be calculated. There are multiple ways to weight a comparsion:
- Phonetic sounds of letters
- Proximity on the keyboard
- Probability of existence
- Partial accuracy (like a month in a date when the day and year are ideal matches)
- Common substitution use
- Prominence in data entry
10-Key
Take the 10-key layout for example:
The 1 and the 2 are close, but so are the 1 and the 4. In a like manner, the five is close to everything except the 0. the 6 is close to the 3, the 5, and the 9. In an algorithm weighting, while not a match, a string that compares a 4 and a 7 might be considered a fuzzy match with a value less than 1 (a match) but not 0 (no match).
Keyboard
In a similar example, an “m” and an “n” might also be considered a fuzzy match because they are very close to each other on a QWERTY keyboard. (They also have similar visual properties that could be overlooked by a trivial visual comparison.)
Many similar key-proximity associations can be made. These range far beyond the simple Soundex substitutions based on sound.