Chapter 9. Conclusion
Our project was to evaluate the NeoCore XMS core storage architecture. Because so much of the architecture is patented and hence part of the public record, we were able understand and explain in some depth how the database is put together. In addition, we conducted several weeks of hands-on-the-keyboard testing.
Essentially, we were able to verify that the claims on the NeoCore website concerning the excellence of its core database are not just marketing hype: the product is based on a creative and yet fundamentally sound architecture. We did not evaluate the broader management system which is built around the core database storage.
The architecture of the NeoCore XMS is able to effectively deal with the extensibility of XML. It provides a storage methodology which accepts changing context as easily as traditional databases accept changing data. As the tag structures of incoming XML documents change, the database storage structures do not. The tags and attribute names contained in XML documents are stored in one database structure; data items and attribute values are stored in another. This is done without extensive structural analysis and without the assistance of DTDs, DOCTYPE declarations, or any other external structure definition; the system uses only the actual XML tagset. A separate storage structure contains a document map which enables the system to find quickly a document’s tags and data and to reconstruct the original document.
The system is able to efficiently store, update, query, insert, and delete both data and tagsets. (Most of the XPath/XQuery syntax is supported; it appears each version will support more.)
XOO7 Benchmark XML documents with structures similar to those used in common business-to-business applications were stored on our limited testing system in five or six seconds per megabyte. Text-heavy XML documents with more simple tag structures were stored in about four seconds per megabyte. (Windows XP Professional on Pentium IV 3.0 GHz, 1.5G RAM.)
14 of the 19 XOO7 Benchmark queries were completed in less than one second—most of those in less than a quarter of a second. Two took two seconds; the others took about four, ten, and 18 seconds. The indexing structures allow searches to be conducted with binary numbers and pointers rather than with strings. This is one feature which allows the queries to complete so quickly.
Each action by the database is initiated by transforming the strings to be stored or queried into long binary numbers which NeoCore calls “icons”. NeoCore is able to do this with great speed by using a patented algorithm that uses four tables to transform four bytes at a time. We evaluated and tested the algorithm; it is indeed fast.
Another NeoCore patent provides a structure for associative indices such that a relatively modest-sized index (we frequently used indices with as few as 212 entry locations) can handle 18(10)18 different entry keys. Further, even with a full index, it takes only an average of 1.5 table lookups to find the desired row—the collision problem of traditional hash tables has been effectively solved. In testing a core index, we were unable to detect any consistently measurable difference between the time it took to find an item in a near-empty index and the time it took to find an item when the index is completely full.
The NeoCore XMS is not currently configured to handle huge databases effectively. NeoCore indicates this capability is planned for a future release.
It is critical to operate the NeoCore database with plenty of RAM. Ideally, all indexes would be held in RAM at all times. There also needs to be plenty of RAM committed to full-time buffers—and that need increases as you get more and more clients trying to use the database at the same time. Performance deteriorates significantly if you try to run the system with insufficient RAM.
The XMS handles data-centric XML well. Its lack of attention to white space details could cause some awkwardness if storing document-centered XML, such as newspaper archives or a poetry database. In such cases you would probably want to store the documents as explicit CDATA. There is nothing in the architecture that forces white-space problems; NeoCore has not emphasized white-space accuracy in the past, due to its data-centric customer base. The NeoCore architecture could support 100% accurate round tripping of all documents, whereas many competitors could not. We recommend that NeoCore work to provide that functionality.
In some ways the NeoCore XMS is still a work in progress. We saw first-hand that some of the anticipated improvements are indeed needed: more efficient duplicate indices, a convenient method for resizing the core indices when we need to “grow” the database, and more efficient searches for unindexed data substrings.
Our testing revealed the limitations or issues noted in the previous four paragraphs. But we shouldn’t finish this report with those issues, because they are relatively minor problems compared to the overwhelming strengths of the NeoCore XMS. If we could summarize all our findings in one sentence, it would be this: “The NeoCore XMS offers an excellent solution to the database problems associated with the extensibility of XML.”