next up previous
Next: Measuring Retrieval Performance Up: Measuring Search Engine Quality Previous: Introduction

Commercial Retrieval Systems and Document Ranking

The Boolean retrieval model has been used by most commercial information retrieval systems since the 1960s, although researchers in the field of information retrieval have suggested a number of other retrieval models, such as the vector space model [SWY75], the probabilistic model [MK60], and the fuzzy set retrieval model [Boo85,SM83]. In 1993, however, two new commercial retrieval engines became available through LEXIS-NEXIS and DIALOG. These retrieval engines, called Freestyle (LEXIS-NEXIS) and Target (DIALOG), have been called natural language search systems because they do not require the user to enter Boolean search statements. As Turtle (1994) stated, a natural language search system ``accepts as input a description of an information need in some natural language (such as English) and produces as output a list of documents ranked by the likelihood that they will be judged to match the information need" (p. 212). Even though Freestyle and Target offer natural language searching, however, neither employs true artificial intelligence, which parses or ``understands" a subject domain enough to paraphrase or make conclusions. In fact, the natural language interface in these systems stems from a common automatic indexing strategy that involves three steps:
1.
identification of key concepts,
2.
removal of stop words, and
3.
determination and expansion of root words.
Freestyle and Target are direct competitors and each system has unique aspects. For example, Target eliminates the use of Boolean operators but does not actually process natural language queries. Instead, the system asks users to enter a list of important terms and phrases and then produces a ranked list of documents. A document's rank is based on the number of search terms in the document, the proximity of the search terms to each other in the document, the frequency of a search term in the database, and the length of the document. In order to shed light on how a document's rank is determined, Target provides the frequency of each search term in each document and the relevance weight for each term. Target's creators chose to limit the list of documents retrieved to the 50 documents with the highest ranks, a decision that limits the user's ability to do a comprehensive search in Target. As far as search aids, Target does not offer a thesaurus, probably because a thesaurus that could be used for all of DIALOG's databases would be have to be painstakingly exhaustive. Target does provide unlimited truncation but not automatic stemming or automatic identification of phrases. Parentheses approximate a Boolean OR, and an asterisk indicates terms that must be present in all documents retrieved. The system defaults to searching for articles published in the past two years, but the desired date of publication can be changed if necessary. Perhaps the most obvious difference between Target and Freestyle is that Freestyle does allow the user to enter natural language queries, such as ``Tell me more about the Gulf War." In fact, Duval and Main (1994) suggest that this feature makes Freestyle particularly appropriate for novice users and users with a vague or ill-defined search topic. The user manual for Freestyle explains that the system is based on ``a mathematical concept called associative retrieval" [Mea94, p. 1]. As the manual states:
Searches using the FREESTYLE feature rely on statistical algorithms that examine your search query, identify and rank relevant search terms and phrases, drop out the ``noise" words that won't be searched and compare the relevant terms with every document in the library and file being searched. The FREESTYLE feature then retrieves the top 25 documents that have the best statistical fit with your search terms. [Mea94, p. 1]
The manual goes on to explain that the Freestyle assigns a weight to each query term and then retrieves documents that match the query. A Freestyle query, then, goes through five steps:
1.
identification of significant terms and phrases from the query,
2.
removal of stop words from the query,
3.
calculation of the statistical importance of the terms and phrases in the query and comparison of those terms and phrases to each document in the database,
4.
retrieval of documents with the highest probability of matching the query, and
5.
ranking of each retrieved document based on the number of query terms in the document and the statistical importance of each query term.
Freestyle also provides date limiting and an online thesaurus. As in Target, mandatory terms may be indicated by an asterisk and a Boolean OR may be indicated through the use of parentheses. The system automatically searches singulars and plurals, but does not offer automatic truncation. Freestyle does recognize more than 300,000 phrases, but the user may indicate unusual phrases (e.g., ``fatty acids") by using quotation marks. Relevance feedback is not an option in Freestyle or Target. The .WHERE and .WHY screens in Freestyle are particularly helpful because they offer some clues about the algorithms used to weight query terms and rank retrieved documents. For example, the .WHERE screen displays a grid that shows the presence or absence of each query term in each document retrieved. On the other hand, the .WHY screen displays the weight assigned to each query term, the number of retrieved documents containing each of the query terms, and the number of documents matched in the database. LEXIS-NEXIS researchers participated in TREC-2, TREC-3, TREC-4, and TREC-5, but did not employ Freestyle as an information retrieval engine for any of these conferences. Instead, in TREC-3, LEXIS-NEXIS used the SMART system and manual expansion of queries as a retrieval engine, with the result that LEXIS-NEXIS was ranked third in the manually formed ad hoc questions category. In reporting on their TREC-3 system, LEXIS-NEXIS researchers asserted that automatic query expansion is not a ``viable option in the on-line service environment because automatic query expansion largely excludes the user from the query formulation process" [LK95]. This argument provides a possible explanation for the lack of relevance feedback in Freestyle. Lu and Keefer also stated that the typical real world query is extremely short, based on the finding that the average length of a Freestyle query was seven terms.
next up previous
Next: Measuring Retrieval Performance Up: Measuring Search Engine Quality Previous: Introduction
Bob Losee
1999-07-29