<< Chapter_3_Flattening

Index of HTML Docs

Chapter_5_Confirmers >>

 

 

 

Chapter 4.  Icon Generation

 

Icons and the NeoCore XML Database

Transforms, Remainders, and Modulo 2 Division

Implementing in Hardware

A Software-Based Transform Generator

The NeoCore Multiple-Table Implementation

Icon Algebra

 

 

Material in this chapter is taken from the following materials:

a.      US Patent 4,527,239, July 2, 1985, “Digital data storage methods and apparatus”

b.      US Patent 5,942,002, August 24, 1999, “Method and apparatus for generating a transform”

c.      US Patent 6,324,636, November 27, 2001, “Memory management system and method”

d.      US Patent Application 20020038413, March 28, 2002, “Memory management system and method”

e.      US Patent Application 20020049922, April 25, 2002, “Search engine system and method”

f.        US Patent Application 20020053002, May 2, 2002, “System for associative processing”

g.      US Patent Application 20020069232, June 6, 2002, “Method and system for generating a transform”

h.      Brandin, Christopher, NeoCore Technical Paper, “Optimized Coding Methods for Icon Generation and Manipulation in DPP,” Release 1.0, February 7, 2001

i.        Direen, Harry, Ph.D., and Phillips, Keith, Ph.D., NeoCore Technical Paper, “Finite Fields and Properties of the NeoCore Icon Generator, Associative Processing Unit, and Associative Memory Controller Used in Digital Pattern Processing,” Release 1.0, September 18, 2000.

j.         Williams, Ross, “A Paper on CRCs,” at http://www.repairfaq.org/filipg/LINK/F_crc_v3.html.

 

 

Icons and the NeoCore XML Database

 

An icon is any object which is used to represent or symbolize another object. So in religious art an “icon” is a painting which represents a holy person. The little miniature pictures on the Windows desktop are called icons; they represent the applications that are called by clicking on them.

 

In the NeoCore lexicon, an “icon” (also called a “transform,” “hash,” or ”polynomial code”) is a binary string which is used to represent another string. The original string, perhaps a word or phrase to be searched for in the database, is first “transformed” into a icon of, say[1], 64 bits. There are advantages to be gained by using the icon instead of the original string: (1) the icon is of a fixed length, (2) it is usually shorter than the original string, and (3) it can be mathematically manipulated. We will see in future chapters how NeoCore makes extensive use of these features of icons in its indexing and searching methodologies.

 

If we use 64-bit binary strings for our icons, we have 1.84 times 1019 different icons available. For each input string, the icon generator will compute the specific icon which will be used to represent that string. We have to be careful designing the icon generator. At the extreme of foolishness, we might design an icon generator which, for all input strings, always generates the same icon. It takes a fair amount of sophisticated mathematics to create an icon generator which provides a maximum of useful features. Here are some features we would want to design into an icon generator for a native XML database:

 

a.      Absolutely essential is an icon generator which spreads the icons evenly across all the 18(10)18 that are available. That is, we don’t want the binary equivalent of 35 million to be the output icon 17 times more frequently than, say, the binary equivalent of 14 quadrillion. The odds that any particular icon will be generated need to be identical to the odds for every other icon.

b.      We don’t want similar input strings to generate similar output strings. That is, if “Jean,” “Jane,” “John,” “Jon,” “Jeanne,” “June,” and “Joan” are inputs to the icon generator, we want the output icons to have no relation to each other. If sequential arguments are input, the resulting icons should spread all over the available values.

c.      The icon generator must be fast: some tasks could require the generation (or computation) of hundreds of icons. Virtually every task the NeoCore database does will require the generation of at least one icon.

d.      The icon generator must handle input strings of variable length. (Note that when we allow input strings longer than 64 bits, we necessarily have some risk that two input strings will generate identical icons. Such “collisions” must be handled efficiently by the processes that rely on the icons. Traditional hash-based search algorithms, for example, get so bogged down by collision resolution that they are awkward at best for even medium-sized databases.)

e.      If the input strings are 64 bits or shorter, every possible input string should produce a different icon. If we input all possible 18(10)18 64-bit strings into the icon generator, we want to get out all possible 18(10)18 64-bit strings—except the sequence of output icons will be different from the sequence of input strings.

f.        If we want to use sub-strings of the 64 bits (for example, if we want to use the first 20 bits for one purpose and the remaining 44 bits for another purpose), the features of items a – e, above, should hold for the sub-strings also, mutatis mutandis.

g.      The creation of icons should be based on a mathematical system such that we can mathematically manipulate the icons. For example, if we know the icon generated by “Phone_book>entry>name>first>” and if we know the icon generated by “John,” then we should be able to mathematically compute—without generating a new icon from scratch—the icon for “Phone_book>entry>name>first>John”.

 

Return to Top

 

 

Transforms, Remainders, and Modulo 2 Division

 

It has become common in the software industry to use polynomial division mod 2 to generate transforms. The input string is used to create the dividend, and a carefully selected number is used as the divisor. The quotient is irrelevant to the transform generator; instead, it returns the remainder as the transform for the given input string.[2]

 

Before we explain the specific workings of the NeoCore icon generator, we need to have a basic understanding of polynomial division mod 2.[3]

1)     A binary number, perhaps 10010, is interpreted as a polynomial by using the individual bits as coefficients:

 

          1x4 + 0x3 + 0x2 +1x1 + 0x0.

 

            For another example, 01000001 would be interpreted as:

 

          0x7 + 1x6 + 0x5 + 0x4 + 0x3 + 0x2 + 0x1 + 1x0.

 

            And 1011 would be interpreted as:

 

          1x3 + 0x2 + 1x1 + 1x0.

2)     In adding two polynomials—just as we learned in Junior High School—we simply add the coefficients of those terms which have the same order: 10010 plus 01000001 is done as follows:

 

                       1x4 + 0x3 + 0x2 + 1x1 + 0x0

    + 0x7 + 1x6 + 0x5 + 0x4 + 0x3 + 0x2 + 0x1 + 1x0

                0x7 + 1x6 + 0x5 + 1x4 + 0x3 + 0x2 + 1x1 + 1x0

3)     In doing the addition in 2), we do not do carries; we use modulo 2 arithmetic instead, where 0+0=0, 0+1=1, 1+0=1, and 1+1=0. Thus 1011 plus 01000001 is done as follows:

 

                                                                        1x3 + 0x2 + 1x1 + 1x0

           + 0x7 + 1x6 + 0x5 + 0x4 + 0x3 + 0x2 + 0x1 + 1x0

                0x7 + 1x6 + 0x5 + 0x4 + 0x3 + 0x2 + 1x1 + 0x0

4)     Similarly, in doing subtraction, we use modulo 2 arithmetic, where 0-0=0, 0-1=1, 1-0=1, and 1-1=0. But this means the subtraction function is absolutely identical to the addition function. If you need to subtract one number from another in polynomial mod 2 arithmetic, simply add them. And, in fact, the addition and the subtraction functions are identical to the XOR function; to add or subtract in polynomial mod 2 arithmetic, simply use the XOR function, which is available in the common programming languages.

5)     For the sake of physical brevity, in this paper from this point on, we will drop the explicit presentation of the polynomials, writing “1011” instead of “1x3 + 0x2 + 1x1 + 1x0” and “10010” instead of ”1x4 + 0x3 + 0x2 + 1x1 + 0x0”. Thus we will write our polynomial mod 2 arithmetic as follows:

 

         1100               1100                1010               1010               1000               1000

       +1010             -1010              +1100             -1100             +1001              -1001

         0110               0110                0110               0110               0001               0001

 

Just remember that even though these look like binary numbers and even though we might actually program these operations using a simple XOR, in principle they are polynomials. If this were not so, there would be no theoretical justification for some of the mathematical manipulations we will do later. There would be no proofs that NeoCore icons satisfy criteria a–g above. Further without the theory, NeoCore probably would not have discovered many of the efficiencies that are built into the system.

6)     Since addition and subtraction yield the same results in polynomial mod 2 arithmetic, our normal concepts of ordering by and large must be abandoned. That is, since 1000 minus 1001 equals 0001 and since 1000 plus 1001 also equals 0001, we cannot say that 1001 is larger than, or smaller than, 1000. If any two binary numbers are the same length and both begin with a “1” then neither is to be considered larger than the other in doing the polynomial mod 2 arithmetic. If, however, we drop all leading zeroes and the two binary numbers are then of different length, then the longer of the two numbers is considered larger than the shorter. This is relevant in doing mod 2 division:

 

1          110

2                   101)11100

3                       101

4                        100

5                        101

6                         010

7                         000

8                          10

 

Notice that at lines 4-6 of this division example, 100 divided by 101 equals 1 with a remainder of 1. In “normal” arithmetic the answer would be 0 with a remainder of 100, because in “normal” arithmetic 101 is larger than 100. But because in our “transform” arithmetic 101 is not larger than 100, we get the answer indicated.

 

With that introduction to polynomial mod 2 arithmetic, we have laid the foundation for generating transforms. Suppose you want to create a transform for a particular binary string; we will call the binary string the input to the transform generator. Within the transform generator we are going to do a polynomial mod 2 division. The divisor for the polynomial mod 2 division is a carefully selected binary number that is fixed for that particular transform generator. If you want the generated transform to have n bits, the length of the divisor will be n + 1 bits. The dividend for the polynomial mod 2 division is the input string augmented by appending n zeroes.[4] In the transform generator the quotient of the polynomial mod 2 division is ignored. The remainder of the polynomial mod 2 division is returned as the output of the transform generator. The remainder becomes the transform for the given input string, using that particular transform generator.

 

If, in a transform generator, the single binary digit “1” were to be used as the divisor, the polynomial mod 2 division remainder would always be zero; therefore, the transform for every input string would be zero. Such a transform generator would be useless. The point is that not just any number can be used as the divisor in the transform generator. Some divisors work better than others in generating transforms that meet an application’s needs—in our case features “a” through “g” above. In brief, if you want a 64-bit transform, you need to use a 65-bit divisor which starts with a binary “1” and ends with a binary “1”, which contains several additional “1” values, and which is a prime number. The math needed to figure out suitable divisors—and prove they satisfy stated criteria—is based on finite fields and polynomial math. While we do not need to examine such theoretical foundations of the NeoCore icon generator, we do want to understand how the icon generator works in practice, since NeoCore’s search techniques depend on the use of, and mathematical manipulation of, icons.

 

Let’s manually do some simplified examples. To keep the arithmetic manageable, we will generate only 16-bit transforms using 1A533 (hex) as the divisor. In the first example, we will have the entire input be the letter “A” (ASCII 01000001). To form the divisor, we augment the input by appending sixteen zeroes.

 

                                       1000001

    11010010100110011)010000010000000000000000

                      11010010100110011......

                       10100001001100110.....

                       11010010100110011.....

                        11100111010101000....

                        11010010100110011....

                         01101011100110110...

                         00000000000000000...

                          11010111001101100..

                          11010010100110011..

                           00001011010111110.

                           00000000000000000.

                            00010110101111100

                            00000000000000000

                             0010110101111100

 

The remainder is 0010110101111100; that is the transform of “A”, using this particular generator.

 

Now let’s do the same thing, using “B” as the input. ASCII is 01000010

 

                                         1001111

     11010010100110011) 010000100000000000000000

                        11010010100110011......

                         10101101001100110.....

                         11010010100110011.....

                          11111111010101010....

                          11010010100110011....

                           01011011100110010...

                           00000000000000000...

                            10110111001100100..

                            11010010100110011..

                             11001011010101110.

                             11010010100110011.

                              00110011100111010

                              00000000000000000

                               0110011100111010

 

The remainder is 0110011100111010; that is the transform of “B”, using this generator with a divisor of 11010010100110011 and appending 16 zeroes to form the dividend.

 

Return to Top

 

 

Implementing in Hardware

 

One of the advantages of the mod 2 arithmetic is that it can be done relatively quickly in hardware, using linear feedback shift registers and XOR gates. For example, standard Ethernet cards include the hardware to implement the CRC‑32 checksum functionality.

 

Because everything is done in registers and XOR gates, each individual action takes place very quickly in a hardware-implemented transform generator. However, because the processing is done one bit at a time, the standard hardware-implemented algorithm is not the fastest possible. (On a standard Ethernet card, the time to compute the CRC code is irrelevant on the receive side: ordinary transmission delays mean the bits are arriving at a slower pace than the checksum is being computed.)

 

The other obvious problem with hardware-implemented transform generators is that we normally don’t have such specialized hardware available on our general-purpose computers. So, except when we are using specialized cards, we are normally going to implement the transform generator in software.

 

Return to Top

 

 

A Software-Based Transform Generator

 

Conceptually, it seems simple to develop a software-based transform generator to do the mod 2 division directly. It would work like this, for our 16-bit transform using “11010010100110011” as the divisor:

a.      Create a null 16-bit binary “remainder.”

b.      Copy the leftmost bit of the augmented input string into the rightmost bit of the remainder, shifting the previous bits in the remainder left. One bit is dropped off the left of the remainder.

c.      If the bit that dropped off the left of the remainder was a “1” do the following:

remainder := {remainder XOR 1010010100110011}

d.      If more bits remain in the augmented input string, loop to b.

 

Practically, this approach is not as efficient as we would like. For one thing, the loop must be executed once for every bit in the augmented input stream. Further, each loop is slower in the software implementation than in the hardware implementation. Instead, the standard approach is to read a byte at a time from the input. The mod 2 arithmetic is precalculated and held in a 256-element table. There are 256 elements because that is the number of values that can be represented by an 8-bit byte. In such an approach the input is not augmented by the n bits; the extra zeroes or ones are precalculated into the 256-element table. One algorithm would work like this:

 

1)     Create a null n-byte transform, T.

2)     Create an 8-bit pointer, P, into the pre-calculated 256-element Table.

3)     Read the next byte in the input stream. Call it “i” for “input.”

4)     Read the leftmost (most significant) byte in T. Call it “TL”.

5)     P := (( i XOR TL ))

6)     Read the n-byte value, V, contained at Table[P].

7)     Shift T left one byte, losing what had been TL and filling the rightmost byte with zeroes.

8)     T := ( T XOR V).

9)     If there are bytes remaining in the input, then loop back to 3).

 

Using C, steps 3) – 8) are typically combined into one (complex) command. Nonetheless, while there is a single C command, the actions indicated in 3) through 8) must all be done. The efficiency gained is because we are processing eight bits at a time using a table.

 

Let’s illustrate how this is done, using 16-bit transforms to keep our computations relatively simple. We will read 4 bits at a time, rather than the typical 8-bit byte; this will permit us to have a simplified 16-element table (See Appendix A, Table A-1) rather than the standard 256-element table. The table is created with 11010010100110011 as the divisor and with the input augmented by 16 zeroes; other variations would require their own tables.

 

Now let’s calculate the transform for “A” (01000001), using the table.

1.      Initialize the transform, T, to 0000000000000000 and the Table Pointer, P, to 0000.

2.      Process the first 4 bits:

a.      Read the four left bits, TL, of the transform T: 0000.

b.      Read i, the next four bits in the input: 0100.

c.      P := TL XOR i = 0000 XOR 0100 = 0100.

d.      Read V, the value at Table[P], namely 0111101110011001.

e.      Shift the transform register 4 places to the left, losing the four leftmost bits and inserting four zeroes into the four rightmost bits. (Because the transform register contained 16 zeroes, there is no change as a result of this step, for this particular iteration.)

f.        XOR the contents of the transform register, T, and V, the value read from the table. Assign this new value to T
     T := 0000000000000000 XOR 0111101110011001
     T := 0111101110011001

3.      Process the second four bits:

a.      Read the four left bits, TL, in the transform register: 0111.

b.      Read i, the next 4 bits in the input: 0001.

c.      P := TL XOR i = 0111 XOR 0001 = 0110.

d.      Read V, the value at Table[P], namely 1001010011001100.

e.      Shift the transform register 4 places to the left, losing the four leftmost bits and inserting four zeroes into the four rightmost bits:  1011100110010000

f.        XOR the contents of the transform register, T, and V, the value read from the table. Assign this new value to T
     T := 1011100110010000 XOR 1001010011001100
     T := 0010110101011100

4.      The input is now empty; exit.

 

This example seems a bit contrived because we are reading only four bits at a time and we have only a 16-bit transform and we had only an eight-bit input. In a more typical transform generator, we would read a byte at a time and be generating a 32-bit transform. (As we will see shortly, NeoCore reads four bytes at a time and generates a 64-bit transform.)

 

If we process one byte instead of one bit at a time, we reduce the number of loops we have to execute by a factor of 8.

 

Return to Top

 

 

The NeoCore Multiple-Table Implementation

 

We could speed up the process of generating an icon if instead of reading one byte at a time, we could read two bytes, or three, or more. The problem is that if we used the identical technique except to read two bytes instead of one, it would take some 65 thousand table entries instead of 256. And this then would take a lot of resources, especially since we want to keep the tables in RAM for fast access. The large numbers that would be needed if we were to read four bytes at a time and generate 64-bit icons would make such an approach extremely awkward with current technology. You would need 256 gigabytes of space for such a table.

 

NeoCore gets around this problem by using multiple tables. For example, if we were to use two 256-entry tables, and XOR their outputs, we would have the effect of a single 65,536-entry table (2562 = 65,536). And if we used four 256-entry tables, we would have the equivalent of a single table with 4,294,967,296 entries. Four 256-entry tables means we have provided 1024 entry points; by XORing the four lookups, the 1024 entry points have the effect of 4,294,967,296 entry points.

 

Suppose we were processing four bytes at a time. Further suppose the next four input bytes are CCDDEEFF (hex). We would look up CC in a “triple-shifted” table, DD in a “double-shifted” table, EE in a “shifted” table, and FF in a standard transform look-up table.

 

In computing the table values, we would calculate the triple-shifted table’s 256 values based on the transforms for 00000000 (hex), 01000000, 02000000, 03000000, ... , FC000000, FD000000, FE000000, FF000000. The double-shifted table would have its 256 values based on transforms for 000000, 010000, 020000, 030000, ... , FD0000, FE0000, FF0000. The shifted table would have its 256 values based on transforms for 0000, 0100, 0200, ... , FD00, FE00, FF00. Finally, the standard transform look-up table has its 256 values based on transforms for 00, 01, 02, ... , FD, FE, FF.

 

Let’s illustrate the use of multiple tables by using our own 16-bit transforms, reading four 4-bit units at a time, and accessing four 16-element tables. The look-up tables are found in Appendix A. For this example, we will be using tables A‑1 through A‑4. Our input string will be “ABCD”, i.e., 01000001010000100100001101000100.

 

1)     Initialize T with 16 zeroes; initialize P1 through P4 for tables one through four.

2)     Process the first four four-bit units:

a)     Read the leftmost four four-bit units in T: 0000000000000000.

b)     Read the next four four-bit units from the input: 0100000101000010.

c)      XOR the results of a) and b): 0100000101000010.

d)     From the value obtained at step c) read the appropriate values from the four tables:

                                                                          i.      The rightmost 4-bit unit is assigned to P1; we read Table A–1 at 0010: 1110111101010101.

                                                                        ii.      The second-right 4-bit unit is assigned to P2; we read Table A–2 at 0100: 1000100001101111.

                                                                      iii.      The third-right 4-bit unit is assigned to P3; we read Table A–3 at 0001: 1100111011101001.

                                                                       iv.      The fourth-right 4-bit unit is assigned to P4; we read Table A–4 at 0100: 0010110111011111.

e)     XOR the four values from step d):

                                                                          i.      1110111101010101

                                                                        ii.      1000100001101111

                                                                      iii.      1100111011101001

                                                                       iv.      0010110111011111

   1000010000001100

f)        Shift the bits in T, four 4-bit units (i.e., 16 bits total) to the left, losing the leftmost 16 bits and filling the rightmost 16 bits with zeroes: 0000000000000000. (Note: since we are using 16-bit transforms and reading 16 bits at a time, we could skip steps f) and g) and merely assign the result of e) to the current value of the transform; however, conceptually, we need to include steps f) and g), and we certainly have to include them if our transform is larger than the number of bits we are reading at a time.)

g)     XOR the values from e) and f):  1000010000001100; assign this value to T.

3)     Process the next four four-bit units

a)     Read the leftmost four four-bit units in T: 1000010000001100.

b)     Read the next four four-bit units in from the input: 0100001101000100.

c)      XOR the results of a) and b): 1100011101001000.

d)     From the value obtained at step c) read the appropriate values from the four tables:

                                                                          i.      The rightmost 4-bit unit is assigned to P1; we read Table A–1 at 1000: 1111011100110010.

                                                                        ii.      The second-right 4-bit unit is assigned to P2; we read Table A–2 at 0100: 1000100001101111.

                                                                      iii.      The third-right 4-bit unit is assigned to P3; we read Table A–3 at 0111: 1000011111001010.

                                                                       iv.      The fourth-right 4-bit unit is assigned to P4; we read Table A–4 at 1100: 0111011001100001.

e)     XOR the four values from step d):

                                                                          i.      1111011100110010

                                                                        ii.      1000100001101111

                                                                      iii.      1000011111001010

                                                                       iv.      0111011001100001

   1000111011110110

f)        Shift the bits in T, four 4-bit units (i.e., 16 bits total) to the left, losing the leftmost 16 bits and filling the rightmost 16 bits with zeroes.

g)     XOR the values from e) and f): 1000111011110110; assign this value to T. (It is the transform of “ABCD”).

4)     Observe there is no more input and exit.

 

While our example reads only four four-bit units at a time, the NeoCore icon generator would typically read four bytes at a time. Reading four bytes at a time is not four times faster than reading one byte at a time because (1) there is some additional processing to do the extra XOR operation, because (2) the last set of bytes would contain the full four bytes only 25 percent of the time, and because (3) some of the specific steps (like reading a table) have to be done four times if we are processing four bytes. But we have made significant gains—if only in loop control—processing 32 bits at a time instead of one bit at a time (as we did when we were near the beginning of the chapter). Note that NeoCore has a patent application covering the technique of using multiple tables to generate a transform.

 

Return to Top

 

 

Icon Algebra

 

Sometimes we can avoid much of the work of generating a transform by doing some mathematical calculations on existing transforms.

 

One formula which is useful is: 

                        RAB = R(xzRA) Å RB

where “RA”, “RB”, and “RAB” refer to the remainders (i.e., the transforms) of strings “A”, “B”, and “AB” respectively; where “Å” indicates a mod 2 addition (i.e., an XOR); where “Z” refers to the number of bits in string “B”; where “xzRA” refers to a string consisting of “RA” followed by “Z” zeroes; and where “R(xzRA)” is the transform of xzRA.

 

Let’s illustrate the use of this formula, using our 16-bit transforms, our same divisor (11010010100110011), and Table A-1. Suppose we have a quite lengthy string A and we have already computed the transform of A: assume it is 0111101110011001. Now suppose for some reason we need to append “0001” to the end of string A and compute the new transform. The brute force inclination is to generate from scratch the transform of “A0001”. Since “A” is assumed to be a lengthy string, we have a significant task in front of us.

 

Instead of generating the transform from scratch, we can do the following:

1)     Compute the transform for 0001. From Table A-1, we know it is 1010010100110011. This is the value of “RB” in the formula above.

2)     Compute the value of “R(xzRA)”.

a.      We are assuming that RA is 0111101110011001.

b.      Using the techniques explained in previous sections of this chapter, we consider 0111101110011001 to be the current value of the transform, T. We want to read xz (i.e., “0000”) as the next value in the input stream. Take the leftmost 4 bits of T, 0111, and read the corresponding value in Table A-1: 0011000111111111. Shift T left 4 bits: 1011100110010000. XOR those previous two values, getting 1000100001101111.

3)     XOR the values from 1) and 2): 0010110101011100. This is the transform for string AB.

 

Notice in 1) through 3), we never did consider the content of string A. In fact we don’t know what it is—only that its transform is 0111101110011001.

 

As another example of icon algebra, suppose we know the transform for string AB, the transform for string B, and the length of string B; further suppose we want to get the transform for string A. By using the formula above, we can subtract RB from RAB getting R(xzRA). If the length of B is, for example, four bits, then R(xzRA) represents the transform of A0000, i.e., the transform of the string obtained when four zeroes are appended to string A. Now if we had some way of calculating the transform of A from the transform of A0000, we could solve our problem without calculating the transform of A from scratch—in fact without even looking at the content of string A. Fortunately, this can be done fairly easily; let’s perform such a calculation using our simplified 16-bit transforms. We will again use Table A-1. This time we will also use Table A-5, a “Reverse Look-Up Table.”

 

Suppose we have a transform for string AB: 0010110101011100. Suppose we also have a transform for string B: 1010010100110011. Further suppose that string B is 4 bits in length. We want to compute the transform for string A. Here’s how we do it:

1)     Subtract the transform for string B from the transform for string AB (this is identical to an XOR): 1000100001101111. This is the transform for the string A0000 (A with four appended zeroes).

2)     From step 1), read the rightmost four bits from the transform for string A0000: 1111.

3)     Using the 1111 from step 2) as our index, access Table A-5, the Reverse Transform Look-Up Table; the value we read is 0111. Note that we will use this value, 0111, twice; once at step 4) and once at step 6).

4)     Using the 0111 value from step 3), access Table A-1, Transform Look-up table. Get the value, 0011000111111111.

5)     XOR the transform for string AB (from step 1) ) and the value from Table A-1 (from step 4) ): 1011100110010000. Assign this value to T.

6)     From step 3), take the value 0111, move it into the leftmost 4 bits of T, shifting the previous 16 bits of T four bits to the right, losing the rightmost four bits: 0111101110011001. The is the transform of A.

7)     Note that if string B is longer than four bits, we will have to execute steps 2) through 6) repeatedly, four bits on each loop.

 

The NeoCore icon generator of course will work with a full byte at a time instead of our four bits; it will have 256-entry tables instead of our 16, and it will generate 64-bit icons instead of our 16-bit transforms. We should expect that implementation details will also vary. Further, we should expect that the Icon Generator has been designed to use additional techniques of icon algebra besides the ones we just explained. But the basic principles of icon generation explained in this chapter have been implemented in the NeoCore Icon Generator: the Icon Generator uses multiple tables to generate icons quickly; furthermore, it provides a great deal of flexibility in doing icon algebra to calculate new icons from previously existing icons.

 

In future chapters, we will see how the NeoCore native XML database uses the icons in its indexing and searching.

 

Return to Top

 

 

 

<< Chapter_3_Flattening

Index of HTML Docs

Chapter_5_Confirmers >>

 

 



[1] We will assume that NeoCore icons are 64 bits in length. (To keep the math simple, we will often provide specific illustrations using our own 16-bit icons.) But in principle, they can be any length desired. I suspect they will be longer when NeoCore begins supporting huge databases.

[2] A frequent use of this technique is to generate cyclic redundancy code, commonly used as a checksum in determining whether a file has been corrupted, e.g., because of transmission errors, viruses, etc. One common version (CRC-32), generates a 32-bit transform. When used as a checksum, the probability of failing to detect a corrupted file is (½)32.

[3] We are going to scratch the surface, ignoring virtually all the theoretical math. For a more complete treatment, see Rudolf Lidl and Harald Niederreiter, Introduction to Finite Fields and Their Applications (Cambridge University Press, 1994).

[4] There are several variations for preconditioning the dividend. Common CRC algorithms append zeroes, and  that’s the way I present it in this section. NeoCore uses a different variation not disclosed in the patent materials, but functionally the two ways would be the same.