<< APPENDIX_H_BMark_Queries

Index of HTML Docs

 

                                                     

APPENDIX I

TEST 9: Non-Indexed Text Searches

 

1.                  Here are the documents we stored initially.

 

ALICE

wrk1001.xml     11424  ALICE/Alice1.xml

wrk1002.xml     11480  ALICE/Alice10.xml

wrk1003.xml     10502  ALICE/Alice11.xml

wrk1004.xml     11713  ALICE/Alice12.xml

wrk1005.xml     11002  ALICE/Alice2.xml

wrk1006.xml      9297  ALICE/Alice3.xml

wrk1007.xml     13983  ALICE/Alice4.xml

wrk1008.xml     11951  ALICE/Alice5.xml

wrk1009.xml     13946  ALICE/Alice6.xml

wrk1010.xml     12837  ALICE/Alice7.xml

wrk1011.xml     13796  ALICE/Alice8.xml

wrk1012.xml     12779  ALICE/Alice9.xml

wrk1013.xml      1341  ALICE/AliceProlog.xml

DIR_TOTAL:     146051

 

GLASS

wrk1014.xml     17337  GLASS/Glass1.xml

wrk1015.xml       416  GLASS/Glass10.xml

wrk1016.xml       127  GLASS/Glass11.xml

wrk1017.xml      3245  GLASS/Glass12.xml

wrk1018.xml     15632  GLASS/Glass2.xml

wrk1019.xml     15822  GLASS/Glass3.xml

wrk1020.xml     16346  GLASS/Glass4.xml

wrk1021.xml     17061  GLASS/Glass5.xml

wrk1022.xml     16945  GLASS/Glass6.xml

wrk1023.xml     14471  GLASS/Glass7.xml

wrk1024.xml     21398  GLASS/Glass8.xml

wrk1025.xml     22563  GLASS/Glass9.xml

wrk1026.xml      1243  GLASS/GlassPoem.xml

wrk1027.xml       695  GLASS/GlassPoem2.xml

wrk1028.xml      1043  GLASS/GlassPreface.xml

DIR_TOTAL:     164344

 

GROUND

wrk1029.xml     20642  GROUND/Ground1.xml

wrk1030.xml     16706  GROUND/Ground2.xml

wrk1031.xml     13728  GROUND/Ground3.xml

wrk1032.xml     17576  GROUND/Ground4.xml

DIR_TOTAL:     68652

 

PLATO

wrk1033.xml     59186  PLATO/Apology.xml

wrk1034.xml    138107  PLATO/Cratylus.xml

wrk1035.xml     28164  PLATO/Crito.xml

wrk1036.xml     37504  PLATO/Euthyphro.xml

wrk1037.xml     49032  PLATO/Lysis.xml

wrk1038.xml    237268  PLATO/Phaedo.xml

wrk1039.xml    661862  PLATO/Republic.xml

wrk1040.xml    130980  PLATO/Sophist.xml

wrk1041.xml    137527  PLATO/Statesman.xml

DIR_TOTAL:     1479630

 

SHAKESPEARE

wrk1042.xml   4580871  SHAKESPEARE/CompleteWorks.xml

DIR_TOTAL:     4580871

 

GRAND_TOTAL: 6439548

 

2.                  After storing the above documents and running our queries on them, we also added enough “nonsense” text so that we were storing about 100M of text files altogether.

 

3.                  Here are the eight queries we ran:

 

neo1001.24.qml  QML/Broncos.qml

for $a in /ND/text_document, $b in $a[contains(.,"Denver Broncos") ]  return $a/@*

 

 

neo1002.24.qml  QML/FourNamesOneDoc.qml

for $a in /ND/text_document, $b in $a[contains(.,"Euthyphro") and contains(.,"Socrates") and contains(.,"Crito")and contains(.,"Plato")] return $a/@*

 

 

neo1003.24.qml  QML/Hamlet.qml

for $a in /ND/text_document, $b in $a[contains(.,"Hamlet") ]  return $a/@*

 

 

neo1004.24.qml  QML/ManyTermsNoHits.qml

for $a in /ND/text_document, $b in $a[contains(.,"if") and contains(.,"but") and contains(.,"every") and contains(.,"person") and contains(.,"what") and contains(.,"necessary") and contains(.,"therefore") and contains(.,"issue") and contains(.,"Denver Broncoes")] return $a/@*

 

 

neo1005.24.qml  QML/ManyTermsTwoHits.qml

for $a in /ND/text_document, $b in $a[contains(.,"if") and contains(.,"but") and contains(.,"every") and contains(.,"person") and contains(.,"what") and contains(.,"necessary") and contains(.,"therefore") and contains(.,"issue")] return $a/@*

 

 

neo1006.24.qml  QML/PrologOnlySecunda.qml

for $a in document("neo1013.xml")/text_document[contains(.,"Secunda") ] return $a/@*

 

 

neo1007.24.qml  QML/ShakesOnlyBroncos.qml

for $a in document("neo1042.xml")/text_document, $b in $a[contains(.,"Denver Broncos") ]  return $a/@*

 

 

neo1008.24.qml  QML/ShakesOnlyHamlet.qml

for $a in document("neo1042.xml")/text_document, $b in $a[contains(.,"Hamlet") ]  return $a/@*

 

 

4.                  Here are the query results. The first entries reflect times (in milliseconds) experienced when we had only the regular text documents (about 6.5 megs); the second entries reflect times experienced when we had the full 100 megs.

 

Query 1:              81.2            1082.8

Query 2:            295.3            3932.8

Query 3:               73.5              1060.9

Query 4:              614.1              8635.9

Query 5:              537.5              7728.1

Query 6:                 0.0                   0.0

Query 7:               54.7                104.7

Query 8:               50.0                  95.3

 

 

 

<< APPENDIX_H_BMark_Queries

Index of HTML Docs