Recommend this page to a friend! |
Classes of Jericko Tejido | basset-ir | README.markdown | Download |
|
Download
BassetBasset is a full-text PHP Information Retrieval library. This is a collection of developments in the field of IR and ported over to PHP for research purposes. Basset provides different ways of searching through documents in a collection (ad-hoc retrieval), by applying advanced and experimental IR algorithms and/or techniques gathered from different Research studies and Conferences, most notably: DocumentationYou can read about it here Using the Cranfield Collection and the sample.php fileThe Cranfield Collection has been the pioneer collection in information retrieval to validate a system's effectiveness. I've included the 1400 abstract Cranfield Collection as an XML file that you can parse into separate files. The test file at tests/sample.php can be executed right away to do the parsing and do a search for a single test query. Customize it to your needs if needed. You can read Cranfield/cranfield-collection/cranqrel for Glassgow's qrels result. I've also included SMART system's stopword list for standardization (see stopwords/stopwords.txt). |