** **

**Chapter 4. Icon
Generation**

Icons and the NeoCore XML Database

Transforms, Remainders, and Modulo 2 Division

A Software-Based Transform Generator

The NeoCore Multiple-Table Implementation

Material in this chapter is taken from the following materials:

a. US Patent 4,527,239, July 2, 1985, “Digital data storage methods and apparatus”

b. US Patent 5,942,002, August 24, 1999, “Method and apparatus for generating a transform”

c. US Patent 6,324,636, November 27, 2001, “Memory management system and method”

d. US Patent Application 20020038413, March 28, 2002, “Memory management system and method”

e. US Patent Application 20020049922, April 25, 2002, “Search engine system and method”

f. US Patent Application 20020053002, May 2, 2002, “System for associative processing”

g. US Patent Application 20020069232, June 6, 2002, “Method and system for generating a transform”

h. Brandin, Christopher, NeoCore Technical Paper, “Optimized Coding Methods for Icon Generation and Manipulation in DPP,” Release 1.0, February 7, 2001

i. Direen, Harry, Ph.D., and Phillips, Keith, Ph.D., NeoCore Technical Paper, “Finite Fields and Properties of the NeoCore Icon Generator, Associative Processing Unit, and Associative Memory Controller Used in Digital Pattern Processing,” Release 1.0, September 18, 2000.

j. Williams, Ross, “A Paper on CRCs,” at http://www.repairfaq.org/filipg/LINK/F_crc_v3.html.

An icon is any object which is used to represent or symbolize another object. So in religious art an “icon” is a painting which represents a holy person. The little miniature pictures on the Windows desktop are called icons; they represent the applications that are called by clicking on them.

In the NeoCore lexicon, an “icon” (also called a “transform,” “hash,” or ”polynomial code”) is a binary string which is used to represent another string. The original string, perhaps a word or phrase to be searched for in the database, is first “transformed” into a icon of, say[1], 64 bits. There are advantages to be gained by using the icon instead of the original string: (1) the icon is of a fixed length, (2) it is usually shorter than the original string, and (3) it can be mathematically manipulated. We will see in future chapters how NeoCore makes extensive use of these features of icons in its indexing and searching methodologies.

If we use 64-bit binary strings for our icons, we have 1.84
times 10^{19 }different icons available. For each input string, the
icon generator will compute the specific icon which will be used to represent
that string. We have to be careful designing the icon generator. At the extreme
of foolishness, we might design an icon generator which, for all input strings,
always generates the same icon. It takes a fair amount of sophisticated
mathematics to create an icon generator which provides a maximum of useful
features. Here are some features we would want to design into an icon generator
for a native XML database:

a.
Absolutely essential is an icon generator which spreads the
icons evenly across all the 18(10)^{18} that are available. That is, we
don’t want the binary equivalent of 35 million to be the output icon 17 times
more frequently than, say, the binary equivalent of 14 quadrillion. The odds
that any particular icon will be generated need to be identical to the odds for
every other icon.

b. We don’t want similar input strings to generate similar output strings. That is, if “Jean,” “Jane,” “John,” “Jon,” “Jeanne,” “June,” and “Joan” are inputs to the icon generator, we want the output icons to have no relation to each other. If sequential arguments are input, the resulting icons should spread all over the available values.

c. The icon generator must be fast: some tasks could require the generation (or computation) of hundreds of icons. Virtually every task the NeoCore database does will require the generation of at least one icon.

d. The icon generator must handle input strings of variable length. (Note that when we allow input strings longer than 64 bits, we necessarily have some risk that two input strings will generate identical icons. Such “collisions” must be handled efficiently by the processes that rely on the icons. Traditional hash-based search algorithms, for example, get so bogged down by collision resolution that they are awkward at best for even medium-sized databases.)

e.
If the input strings are 64 bits or shorter, every possible
input string should produce a different icon. If we input all possible 18(10)^{18}
64-bit strings into the icon generator, we want to get out all possible 18(10)^{18}
64-bit strings—except the sequence of output icons will be different from the
sequence of input strings.

f.
If we want to use sub-strings of the 64 bits (for example, if
we want to use the first 20 bits for one purpose and the remaining 44 bits for
another purpose), the features of items a – e, above, should hold for the
sub-strings also, *mutatis mutandis*.

g. The creation of icons should be based on a mathematical system such that we can mathematically manipulate the icons. For example, if we know the icon generated by “Phone_book>entry>name>first>” and if we know the icon generated by “John,” then we should be able to mathematically compute—without generating a new icon from scratch—the icon for “Phone_book>entry>name>first>John”.

It has become common in the software industry to use polynomial division mod 2 to generate transforms. The input string is used to create the dividend, and a carefully selected number is used as the divisor. The quotient is irrelevant to the transform generator; instead, it returns the remainder as the transform for the given input string.[2]

Before we explain the specific workings of the NeoCore icon generator, we need to have a basic understanding of polynomial division mod 2.[3]

1) A binary number, perhaps 10010, is interpreted as a polynomial by using the individual bits as coefficients:

1x^{4} + 0x^{3} + 0x^{2}
+1x^{1} + 0x^{0}.

For another example, 01000001 would be interpreted as:

0x^{7} + 1x^{6} + 0x^{5}
+ 0x^{4} + 0x^{3} + 0x^{2} + 0x^{1} + 1x^{0}.

And 1011 would be interpreted as:

1x^{3} + 0x^{2} + 1x^{1}
+ 1x^{0}.

2) In adding two polynomials—just as we learned in Junior High School—we simply add the coefficients of those terms which have the same order: 10010 plus 01000001 is done as follows:

1x^{4}
+ 0x^{3} + 0x^{2} + 1x^{1} + 0x^{0}

+ __0x ^{7} +
1x^{6} + 0x^{5} + 0x^{4} + 0x^{3} + 0x^{2}
+ 0x^{1} + 1x^{0}__

0x^{7}
+ 1x^{6} + 0x^{5} + 1x^{4} + 0x^{3} + 0x^{2}
+ 1x^{1} + 1x^{0}

3) In doing the addition in 2), we do not do carries; we use modulo 2 arithmetic instead, where 0+0=0, 0+1=1, 1+0=1, and 1+1=0. Thus 1011 plus 01000001 is done as follows:

1x^{3} + 0x^{2} + 1x^{1}
+ 1x^{0}

+
__0x ^{7} + 1x^{6} + 0x^{5} + 0x^{4} + 0x^{3}
+ 0x^{2} + 0x^{1} + 1x^{0}__

0x^{7}
+ 1x^{6} + 0x^{5} + 0x^{4} + 0x^{3} + 0x^{2}
+ 1x^{1} + 0x^{0}

4) Similarly, in doing subtraction, we use modulo 2 arithmetic, where 0-0=0, 0-1=1, 1-0=1, and 1-1=0. But this means the subtraction function is absolutely identical to the addition function. If you need to subtract one number from another in polynomial mod 2 arithmetic, simply add them. And, in fact, the addition and the subtraction functions are identical to the XOR function; to add or subtract in polynomial mod 2 arithmetic, simply use the XOR function, which is available in the common programming languages.

5) For
the sake of physical brevity, in this paper from this point on, we will drop the explicit presentation of the
polynomials, writing “1011” instead of “1x^{3} + 0x^{2} + 1x^{1}
+ 1x^{0” }and “10010”^{ }instead of ”1x^{4} + 0x^{3}
+ 0x^{2} + 1x^{1} + 0x^{0}”. Thus we will write
our polynomial mod 2 arithmetic as follows:

1100 1100 1010 1010 1000 1000

__ +1010 -1010 +1100 -1100 +1001 -1001__

0110 0110 0110 0110 0001 0001

Just remember that even though these look like binary
numbers and even though we might actually program these operations using a
simple XOR, in principle they are polynomials. If this were not so, there would
be no theoretical justification for some of the mathematical manipulations we
will do later. There would be no proofs that NeoCore icons satisfy criteria a–g
above. Further without the theory, NeoCore
probably would not have discovered many of the efficiencies that are built into
the system.

6) Since addition and subtraction yield the same results in polynomial mod 2 arithmetic, our normal concepts of ordering by and large must be abandoned. That is, since 1000 minus 1001 equals 0001 and since 1000 plus 1001 also equals 0001, we cannot say that 1001 is larger than, or smaller than, 1000. If any two binary numbers are the same length and both begin with a “1” then neither is to be considered larger than the other in doing the polynomial mod 2 arithmetic. If, however, we drop all leading zeroes and the two binary numbers are then of different length, then the longer of the two numbers is considered larger than the shorter. This is relevant in doing mod 2 division:

1 __ 110__

2
101)11100

3
__101__

4
100

5
__101__

6
010

7
__000__

8
10

Notice that at
lines 4-6 of this division example, 100 divided by 101 equals 1 with a
remainder of 1. In “normal” arithmetic the answer would be 0 with a remainder
of 100, because in “normal” arithmetic 101 is larger than 100. But because in
our “transform” arithmetic 101 is not larger than 100, we get the answer
indicated.

With that introduction to polynomial mod 2 arithmetic, we
have laid the foundation for generating transforms. Suppose you want to create
a transform for a particular binary string; we will call the binary string the **input**
to the transform generator. Within the transform generator we are going to do a
polynomial mod 2 division. The divisor for the polynomial mod 2 division is a
carefully selected binary number that is fixed for that particular transform
generator. If you want the generated transform to have ** n** bits,
the length of the divisor will be

If, in a transform generator, the single binary digit “1” were to be used as the divisor, the polynomial mod 2 division remainder would always be zero; therefore, the transform for every input string would be zero. Such a transform generator would be useless. The point is that not just any number can be used as the divisor in the transform generator. Some divisors work better than others in generating transforms that meet an application’s needs—in our case features “a” through “g” above. In brief, if you want a 64-bit transform, you need to use a 65-bit divisor which starts with a binary “1” and ends with a binary “1”, which contains several additional “1” values, and which is a prime number. The math needed to figure out suitable divisors—and prove they satisfy stated criteria—is based on finite fields and polynomial math. While we do not need to examine such theoretical foundations of the NeoCore icon generator, we do want to understand how the icon generator works in practice, since NeoCore’s search techniques depend on the use of, and mathematical manipulation of, icons.

Let’s manually do some simplified examples. To keep the arithmetic manageable, we will generate only 16-bit transforms using 1A533 (hex) as the divisor. In the first example, we will have the entire input be the letter “A” (ASCII 01000001). To form the divisor, we augment the input by appending sixteen zeroes.

__ 1000001__

11010010100110011)010000010000000000000000

__11010010100110011__......

10100001001100110.....

__11010010100110011__.....

11100111010101000....

__11010010100110011__....

01101011100110110...

__00000000000000000__...

11010111001101100..

__11010010100110011__..

00001011010111110.

__00000000000000000__.

00010110101111100

__00000000000000000__

0010110101111100

The remainder is 0010110101111100; that is the transform of “A”, using this particular generator.

Now let’s do the same thing, using “B” as the input. ASCII is 01000010

__ ____ ____ 1001111__

11010010100110011) 010000100000000000000000

__11010010100110011__......

10101101001100110.....

__11010010100110011__.....

11111111010101010....

__11010010100110011__....

01011011100110010...

__00000000000000000__...

10110111001100100..

__11010010100110011__..

11001011010101110.

__11010010100110011__.

00110011100111010

__00000000000000000__

0110011100111010

The remainder is
0110011100111010; that is the transform of “B”, using this generator with a
divisor of 11010010100110011 and appending 16 zeroes to form the dividend.

One of the
advantages of the mod 2 arithmetic is that it can be done relatively quickly in
hardware, using linear feedback shift registers and XOR gates. For example,
standard Ethernet cards include the hardware to implement the CRC‑32
checksum functionality.

Because everything
is done in registers and XOR gates, each individual action takes place very
quickly in a hardware-implemented transform generator. However, because the
processing is done one bit at a time, the standard hardware-implemented
algorithm is not the fastest possible. (On a standard Ethernet card, the time
to compute the CRC code is irrelevant on the receive side: ordinary
transmission delays mean the bits are arriving at a slower pace than the
checksum is being computed.)

The other obvious
problem with hardware-implemented transform generators is that we normally
don’t have such specialized hardware available on our general-purpose
computers. So, except when we are using specialized cards, we are normally
going to implement the transform generator in software.

*Conceptually*, it seems simple to develop a
software-based transform generator to do the mod 2 division directly. It would
work like this, for our 16-bit transform using “11010010100110011” as the
divisor:

a.
Create a null
16-bit binary “remainder.”

b.
Copy the
leftmost bit of the augmented input string into the rightmost bit of the
remainder, shifting the previous bits in the remainder left. One bit is dropped
off the left of the remainder.

c.
If the bit
that dropped off the left of the remainder was a “1” do the following:

remainder
:= {remainder XOR 1010010100110011}

d.
If more bits
remain in the augmented input string, loop to b.

*Practically*, this approach is not as efficient as we
would like. For one thing, the loop must be executed once for every bit in the
augmented input stream. Further, each loop is slower in the software
implementation than in the hardware implementation. Instead, the standard approach
is to read a *byte* at a time from the input. The mod 2 arithmetic is
precalculated and held in a 256-element table. There are 256 elements because
that is the number of values that can be represented by an 8-bit byte. In such
an approach the input is not augmented by the *n* bits; the extra zeroes
or ones are precalculated into the 256-element table. One algorithm would work
like this:

1) Create a null *n*-byte transform, T.

2) Create an 8-bit pointer, P, into the
pre-calculated 256-element Table.

3) Read the next byte in the input stream. Call
it “i” for “input.”

4) Read the leftmost (most significant) byte in
T. Call it “T_{L}”.

5) P := (( i XOR T_{L} ))

6) Read the *n*-byte value, V, contained
at Table[P].

7) Shift T left one byte, losing what had been
T_{L} and filling the rightmost byte with zeroes.

8) T := ( T XOR V).

9) If there are bytes remaining in the input,
then loop back to 3).

Using C, steps 3) –
8) are typically combined into one (complex) command. Nonetheless, while there
is a single C command, the actions indicated in 3) through 8) must all be done.
The efficiency gained is because we are processing eight bits at a time using a
table.

Let’s illustrate
how this is done, using 16-bit transforms to keep our computations relatively
simple. We will read 4 bits at a time, rather than the typical 8-bit byte; this
will permit us to have a simplified 16-element table (See Appendix A, Table A-1) rather
than the standard 256-element table. The table is created with 11010010100110011
as the divisor and with the input augmented by 16 zeroes; other variations
would require their own tables.

Now let’s calculate
the transform for “A” (01000001), using the table.

1.
Initialize the
transform, T, to 0000000000000000 and the Table Pointer, P, to 0000.

2.
Process the
first 4 bits:

a.
Read the four
left bits, T_{L}, of the transform T: 0000.

b.
Read i, the
next four bits in the input: 0100.

c.
P := T_{L}
XOR i = 0000 XOR 0100 = 0100.

d.
Read V, the
value at Table[P], namely 0111101110011001.

e.
Shift the
transform register 4 places to the left, losing the four leftmost bits and
inserting four zeroes into the four rightmost bits. (Because the transform
register contained 16 zeroes, there is no change as a result of this step, for
this particular iteration.)

f.
XOR the
contents of the transform register, T, and V, the value read from the table.
Assign this new value to T

T := 0000000000000000 XOR
0111101110011001

T := 0111101110011001

3.
Process the
second four bits:

a.
Read the four
left bits, T_{L,} in the transform register: 0111.

b.
Read i, the
next 4 bits in the input: 0001.

c.
P := T_{L}
XOR i = 0111 XOR 0001 = 0110.

d.
Read V, the
value at Table[P], namely 1001010011001100.

e.
Shift the
transform register 4 places to the left, losing the four leftmost bits and
inserting four zeroes into the four rightmost bits: 1011100110010000

f.
XOR the
contents of the transform register, T, and V, the value read from the table.
Assign this new value to T

T := 1011100110010000 XOR 1001010011001100

T := 0010110101011100

4.
The input is
now empty; exit.

This example seems
a bit contrived because we are reading only four bits at a time and we have
only a 16-bit transform and we had only an eight-bit input. In a more typical
transform generator, we would read a byte at a time and be generating a 32-bit
transform. (As we will see shortly, NeoCore reads four bytes at a time and
generates a 64-bit transform.)

If we process one
byte instead of one bit at a time, we reduce the number of loops we have to
execute by a factor of 8.

We could speed up
the process of generating an icon if instead of reading one byte at a time, we could
read two bytes, or three, or more. The problem is that if we used the identical
technique except to read two bytes instead of one, it would take some 65
thousand table entries instead of 256. And this then would take a lot of
resources, especially since we want to keep the tables in RAM for fast access.
The large numbers that would be needed if we were to read four bytes at a time
and generate 64-bit icons would make such an approach extremely awkward with
current technology. You would need 256 gigabytes of space for such a table.

NeoCore gets around
this problem by using multiple tables. For example, if we were to use two
256-entry tables, and XOR their outputs, we would have the effect of a single
65,536-entry table (256^{2} = 65,536). And if we used four 256-entry
tables, we would have the equivalent of a single table with 4,294,967,296
entries. Four 256-entry tables means we have provided 1024 entry points; by
XORing the four lookups, the 1024 entry points have the effect of 4,294,967,296
entry points.

Suppose we were
processing four bytes at a time. Further suppose the next four input bytes are
CCDDEEFF (hex). We would look up CC in a “triple-shifted” table, DD in a
“double-shifted” table, EE in a “shifted” table, and FF in a standard transform
look-up table.

In computing the
table values, we would calculate the triple-shifted table’s 256 values based on
the transforms for 00000000 (hex), 01000000, 02000000, 03000000, ... ,
FC000000, FD000000, FE000000, FF000000. The double-shifted table would have its
256 values based on transforms for 000000, 010000, 020000, 030000, ... ,
FD0000, FE0000, FF0000. The shifted table would have its 256 values based on
transforms for 0000, 0100, 0200, ... , FD00, FE00, FF00. Finally, the standard
transform look-up table has its 256 values based on transforms for 00, 01, 02,
... , FD, FE, FF.

Let’s illustrate
the use of multiple tables by using our own 16-bit transforms, reading four
4-bit units at a time, and accessing four 16-element tables. The look-up tables
are found in Appendix A. For this example, we will be using tables A‑1 through A‑4. Our input string will
be “ABCD”, i.e., 01000001010000100100001101000100.

1) Initialize T with 16 zeroes; initialize P_{1}
through P_{4} for tables one through four.

2) Process the first four four-bit units:

a) Read the leftmost four four-bit units in T:
0000000000000000.

b) Read the next four four-bit units from the
input: 0100000101000010.

c)
XOR the
results of a) and b): 0100000101000010.

d) From the value obtained at step c) read the
appropriate values from the four tables:

i.
The rightmost
4-bit unit is assigned to P_{1}; we read Table A–1 at 0010:
1110111101010101.

ii.
The
second-right 4-bit unit is assigned to P_{2}; we read Table A–2 at 0100:
1000100001101111.

iii.
The
third-right 4-bit unit is assigned to P_{3}; we read Table A–3 at 0001:
1100111011101001.

iv.
The
fourth-right 4-bit unit is assigned to P_{4}; we read Table A–4 at 0100:
0010110111011111.

e) XOR the four values from step d):

i.
1110111101010101

ii.
1000100001101111

iii.
1100111011101001

iv.
__0010110111011111__

1000010000001100

f)
Shift the bits
in T, four 4-bit units (i.e., 16 bits total) to the left, losing the leftmost
16 bits and filling the rightmost 16 bits with zeroes: 0000000000000000. (Note:
since we are using 16-bit transforms and reading 16 bits at a time, we could
skip steps f) and g) and merely assign the result of e) to the current value of
the transform; however, conceptually, we need to include steps f) and g), and
we certainly have to include them if our transform is larger than the number of
bits we are reading at a time.)

g) XOR the values from e) and f): 1000010000001100; assign this value to T.

3) Process the next four four-bit units

a) Read the leftmost four four-bit units in T:
1000010000001100.

b) Read the next four four-bit units in from
the input: 0100001101000100.

c)
XOR the
results of a) and b): 1100011101001000.

d) From the value obtained at step c) read the
appropriate values from the four tables:

i.
The rightmost
4-bit unit is assigned to P_{1}; we read Table A–1 at 1000:
1111011100110010.

ii.
The
second-right 4-bit unit is assigned to P_{2}; we read Table A–2 at 0100:
1000100001101111.

iii.
The third-right
4-bit unit is assigned to P_{3}; we read Table A–3 at 0111:
1000011111001010.

iv.
The
fourth-right 4-bit unit is assigned to P_{4}; we read Table A–4 at 1100:
0111011001100001.

e) XOR the four values from step d):

i.
1111011100110010

ii.
1000100001101111

iii.
1000011111001010

iv.
__0111011001100001__

1000111011110110

f)
Shift the bits
in T, four 4-bit units (i.e., 16 bits total) to the left, losing the leftmost
16 bits and filling the rightmost 16 bits with zeroes.

g) XOR the values from e) and f):
1000111011110110; assign this value to T. (It is the transform of “ABCD”).

4) Observe there is no more input and exit.

While our example
reads only four four-bit units at a time, the NeoCore icon generator would
typically read four bytes at a time. Reading four bytes at a time is not four
times faster than reading one byte at a time because (1) there is some
additional processing to do the extra XOR operation, because (2) the last set
of bytes would contain the full four bytes only 25 percent of the time, and
because (3) some of the specific steps (like reading a table) have to be done
four times if we are processing four bytes. But we have made significant
gains—if only in loop control—processing 32 bits at a time instead of one bit
at a time (as we did when we were near the beginning of the chapter). Note that
NeoCore has a patent application covering the technique of using multiple
tables to generate a transform.

Sometimes we can
avoid much of the work of generating a transform by doing some mathematical
calculations on existing transforms.

One formula which is useful is:

R_{AB} = R(x^{z}R_{A})
Å R_{B}

where
“R_{A}”, “R_{B}”, and “R_{AB}” refer to the remainders
(i.e., the transforms) of strings “A”, “B”, and “AB” respectively; where “Å”
indicates a mod 2 addition (i.e., an XOR); where “Z” refers to the number of
bits in string “B”; where “x^{z}R_{A}” refers to a string
consisting of “R_{A}” followed by “Z” zeroes; and where “R(x^{z}R_{A})”
is the transform of x^{z}R_{A}.

Let’s illustrate the use of this formula,
using our 16-bit transforms, our same divisor (11010010100110011), and __Table A-1__. Suppose we have a
quite lengthy string A and we have already computed the transform of A: assume
it is 0111101110011001. Now suppose for some reason we need to append “0001” to
the end of string A and compute the new transform. The brute force inclination
is to generate from scratch the transform of “A0001”. Since “A” is assumed to
be a lengthy string, we have a significant task in front of us.

Instead of generating the transform from
scratch, we can do the following:

1)
Compute the
transform for 0001. From __Table A-1__, we know it is
1010010100110011. This is the value of “R_{B}” in the formula
above.

2)
Compute the
value of “R(x^{z}R_{A})”.

a.
We are assuming that R_{A} is 0111101110011001.

b.
Using the
techniques explained in previous sections of this chapter, we consider
0111101110011001 to be the current value of the transform, T. We want to read x^{z}
(i.e., “0000”) as the next value in the input stream. Take the leftmost 4 bits
of T, 0111, and read the corresponding value in Table
A-1: 0011000111111111.
Shift T left 4 bits: 1011100110010000. XOR those previous two values, getting
1000100001101111.

3)
XOR the values
from 1) and 2): 0010110101011100. This is the transform for string AB.

Notice in 1) through 3), we never did
consider the content of string A. In fact we don’t know what it is—only that
its transform is 0111101110011001.

As another example of icon algebra, suppose
we know the transform for string AB, the transform for string B, and the length
of string B; further suppose we want to get the transform for string A. By
using the formula above, we can subtract R_{B} from R_{AB}
getting R(x^{z}R_{A}). If the length of B is, for example, four
bits, then R(x^{z}R_{A}) represents the transform of A0000,
i.e., the transform of the string obtained when four zeroes are appended to
string A. Now if we had some way of calculating the transform of A from the
transform of A0000, we could solve our problem without calculating the
transform of A from scratch—in fact without even looking at the content of
string A. Fortunately, this can be done fairly easily; let’s perform such a
calculation using our simplified 16-bit transforms. We will again use Table A-1. This time we will also
use Table A-5, a “Reverse
Look-Up Table.”

Suppose
we have a transform for string AB: 0010110101011100.
Suppose we also have a transform for string B: 1010010100110011. Further suppose
that string B is 4 bits in length. We want to compute the transform for string
A. Here’s how we do it:

1)
Subtract the
transform for string B from the transform for string AB (this is identical to
an XOR): 1000100001101111. This is the transform for the string A0000 (A with
four appended zeroes).

2)
From step 1),
read the rightmost four bits from the transform for string A0000: 1111.

3) Using the 1111 from step 2) as our index, access Table A-5, the Reverse Transform Look-Up Table; the value we read is 0111. Note that we will use this value, 0111, twice; once at step 4) and once at step 6).

4) Using the 0111 value from step 3), access Table A-1, Transform Look-up table. Get the value, 0011000111111111.

5) XOR the transform for string AB (from step 1) ) and the value from Table A-1 (from step 4) ): 1011100110010000. Assign this value to T.

6) From step 3), take the value 0111, move it into the leftmost 4 bits of T, shifting the previous 16 bits of T four bits to the right, losing the rightmost four bits: 0111101110011001. The is the transform of A.

7)
Note that if
string B is longer than four bits, we will have to execute steps 2) through 6)
repeatedly, four bits on each loop.

The NeoCore icon generator of course will
work with a full byte at a time instead of our four bits; it will have
256-entry tables instead of our 16, and it will generate 64-bit icons instead
of our 16-bit transforms. We should expect that implementation details will
also vary. Further, we should expect that the Icon Generator has been designed
to use additional techniques of icon algebra besides the ones we just
explained. But the basic principles of icon generation explained in this
chapter have been implemented in the NeoCore Icon Generator: the Icon Generator
uses multiple tables to generate icons quickly; furthermore, it provides a
great deal of flexibility in doing icon algebra to calculate new icons from
previously existing icons.

In future chapters, we will see how the
NeoCore native XML database uses the icons in its indexing and searching.

[1] We will assume that NeoCore icons are 64 bits in length. (To keep the math simple, we will often provide specific illustrations using our own 16-bit icons.) But in principle, they can be any length desired. I suspect they will be longer when NeoCore begins supporting huge databases.

[2] A frequent
use of this technique is to generate *cyclic redundancy code,* commonly
used as a checksum in determining whether a file has been corrupted, e.g.,
because of transmission errors, viruses, etc. One common version (CRC-32),
generates a 32-bit transform. When used as a checksum, the probability of
failing to detect a corrupted file is (½)^{32}.

[3] We are going
to scratch the surface, ignoring virtually all the theoretical math. For a more
complete treatment, see Rudolf Lidl and Harald Niederreiter, *Introduction
to Finite Fields and Their Applications* (Cambridge University Press,
1994).

[4] There are several variations for preconditioning the dividend. Common CRC algorithms append zeroes, and that’s the way I present it in this section. NeoCore uses a different variation not disclosed in the patent materials, but functionally the two ways would be the same.