Fuzzy Logic – void static abstract

Fuzzy Logic:

Coding without glasses.
A halfway house for Boolean values that won’t go all the way.
The essay answer to a true-or-false question.

Boolean values that won’t go all the way

We’re impressed when computers, defined by switches that can either be on or off, do something that seems like they come up with the “maybe” answer.

Well, the maybe is left to interpretation or rest on someone implying that it means something other than Yes or No. And when computers are involved, how much do we value or trust their maybe?

Fuzzy logic is just a fancy name for a return value that isn’t a boolean true or false. When dinosaurs roamed the earth, false was represented with a zero (0) and true was represented as a one (1). By convention, a fuzzy value is considered to be a decimal value between zero (0) and one (1).

If fuzzy logic presents its answers as either zero or one or something in-between, what does 0.93 mean? What does 0.37 mean?

Love/Hate Relationship

There’s the real question. Real developers love this question and its answer. Managers hate this question and its answer – and for exactly the same reason: it depends on the domain.

Don’t wait for the translation. Answer the question!

Managers especially hate this part: it largely involves the data, too. Poor data mean erratic fuzzy results. Precise data mean reliable fuzzy results. Maybe I’ll create a post on reliability one day. Today, my head already hurts.

But Fuzzy Logic is what happens when Booleans aren’t the right answer.

Why Fuzz?

One of the more useful computer tools judges whether a word or a string is like another word or string. A very simple, but popular algorithm for comparing words is Soundex, invented in the early 1900s and patented in 1919. (Use your research staff — Google and Wikipedia — to learn more.)

The idea of true-or-false comes easy to computers. This is because they are simple On/Off switches to which we’ve ascribed super-human traits. But most of our world consists of probabilities, likelihoods, nearness of value, or patterns that we want a computer to recognize.

Soundex is good for phonetics; American phonetics. This obviously has limits in today’s marketplace. It’s also limited to phonics, which won’t help much when trying to compare addresses or phone numbers, or the almighty social security number — or the verboten credit card number.

Soundex is, therefore, tolerable in Fuzzy 101, but it won’t get a good grade in most sophisticated problems today. Customers from different countries need flexible ways to compare input values.

So what does it mean to have a high input value?

A data element with a high input value means a fuzzy result that may be imbued with a reliable importance. The year of a birthdate may be considered more reliable than the day. Folks tend to assure that data entry gets their year right and pays less attention to the day (or even the month).

In the following example, I’ll use demographic data: names and addresses. In a name, the first name has a lower data value than the last name (in the US; switch that in China). That’s because many, myself included, don’t use our first name, or use a nickname instead of a first name. Few, if any, mistreat their family names in the same, cavalier manner. A first name has a lower data value than a last name. Folks tend to make sure the data entry person gets the last name right, anyway.

Usage

Different groups of numbers and letters must be compared differently. A name is not an address is not a phone number is not a date. Each should have different algorithms for comparison. These differences include the following:

Phonetic substitution
Use of diagraphs
Use of special characters
Vowels
Keyboard proximity

“CRUNCH“	C	R	N	C	H
“CRASH“	C	R	S	H
Outcome	Match	Match	NoMatch	NoMatch	Match – 1
Valuation	1	1	0	0	.5
Weighting	60	10	10	10	10
Totals	60	10	0	0	5

I worked it out so the totals would amount to “100” if it were an ideal match. This is not necessary in the real world where everything is releative.

From a possible value of 1, the weighted calculation for this possible match is 0.75. Is this a match? I doubt it from the result, but in some cases, with some data, it might be worth a human look.

Diagraphs

Diagraphs, or blends, produce one phonetic sound from a blend of two letters. It’s important to consider the use of substitute symbols when comparing words with diagraphs.

Rejecting vowel diagraphs because most fuzzy matching eliminates vowels (except at the beginning of a word or name), let’s look at some consonant diagraphs.

Diagraph	Replacement	Sample Comparisons
CH	♠	♠ :: C = 0.8 ; ♠ :: S = 0.7
SH	♖	♖ :: S = 0.8 ; ♖ :: ♠ = 0.6
TH	⛉	⛉ :: T = 0.9

These examples are incomplete, but give a good feel for symbolic substitution so that a diagraph can be used in a letter-by-letter comparison of two strings.

Word-by-Word, too

The same things can be done by splitting two strings into their individual words and comparing them by proximity (just like we did to the characters above). This can increase our likelihood of catching transposed word placement.

Weighting

When making near-hit comparisons, the nearly-right answer must be weighted: given a factor by which the comparison can be calculated. There are multiple ways to weight a comparsion:

Phonetic sounds of letters
Proximity on the keyboard
Probability of existence
Partial accuracy (like a month in a date when the day and year are ideal matches)
Common substitution use
Prominence in data entry

10-Key

Take the 10-key layout for example:

The 1 and the 2 are close, but so are the 1 and the 4. In a like manner, the five is close to everything except the 0. the 6 is close to the 3, the 5, and the 9. In an algorithm weighting, while not a match, a string that compares a 4 and a 7 might be considered a fuzzy match with a value less than 1 (a match) but not 0 (no match).

Keyboard

In a similar example, an “m” and an “n” might also be considered a fuzzy match because they are very close to each other on a QWERTY keyboard. (They also have similar visual properties that could be overlooked by a trivial visual comparison.)

Many similar key-proximity associations can be made. These range far beyond the simple Soundex substitutions based on sound.