Next: Measuring Retrieval Performance
Up: Measuring Search Engine Quality
Previous: Introduction
The Boolean retrieval model has been used by most commercial information
retrieval systems since the 1960s, although researchers in the field of
information retrieval have suggested a number of other retrieval models,
such as the vector space model [SWY75], the
probabilistic model [MK60], and the fuzzy set
retrieval model [Boo85,SM83]. In 1993, however,
two new commercial retrieval engines became available through LEXIS-NEXIS
and DIALOG. These retrieval engines, called Freestyle (LEXIS-NEXIS) and
Target (DIALOG), have been called natural language search systems because
they do not require the user to enter Boolean search statements. As Turtle
(1994) stated, a natural language search system ``accepts as input a
description of an information need in some natural language (such as
English) and produces as output a list of documents ranked by the likelihood
that they will be judged to match the information need" (p. 212).
Even
though Freestyle and Target offer natural language searching, however,
neither employs true artificial intelligence, which parses or ``understands"
a subject domain enough to paraphrase or make conclusions. In fact, the
natural language interface in these systems stems from a common automatic
indexing strategy that involves three steps:
- 1.
- identification of key concepts,
- 2.
- removal of stop words, and
- 3.
- determination and expansion of root words.
Freestyle and Target are direct competitors and each
system has unique aspects. For example, Target eliminates the use of Boolean
operators but does not actually process natural language queries. Instead,
the system asks users to enter a list of important terms and phrases and
then produces a ranked list of documents. A document's rank is based on the
number of search terms in the document, the proximity of the search terms to
each other in the document, the frequency of a search term in the database,
and the length of the document. In order to shed light on how a document's
rank is determined, Target provides the frequency of each search term in
each document and the relevance weight for each term. Target's creators
chose to limit the list of documents retrieved to the 50 documents with the
highest ranks, a decision that limits the user's ability to do a
comprehensive search in Target.
As far as search aids, Target does not offer a thesaurus, probably
because a thesaurus that could be used for all of DIALOG's databases would
be have to be painstakingly exhaustive. Target does provide unlimited
truncation but not automatic stemming or automatic identification of
phrases. Parentheses approximate a Boolean OR, and an asterisk indicates
terms that must be present in all documents retrieved. The system defaults
to searching for articles published in the past two years, but the desired
date of publication can be changed if necessary.
Perhaps the most obvious difference between Target and Freestyle is
that Freestyle does allow the user to enter natural language queries, such
as ``Tell me more about the Gulf War."
In fact, Duval and Main (1994) suggest
that this feature makes Freestyle particularly appropriate for novice users
and users with a vague or ill-defined search topic. The user manual for
Freestyle explains that the system is based on ``a mathematical concept
called associative retrieval" [Mea94, p. 1].
As the manual states:
Searches using the FREESTYLE feature rely on statistical algorithms that
examine your search query, identify and rank relevant search terms and
phrases, drop out the ``noise" words that won't be searched and compare the
relevant terms with every document in the library and file being searched.
The FREESTYLE feature then retrieves the top 25 documents that have the best
statistical fit with your search terms. [Mea94, p. 1]
The manual goes on to explain that the Freestyle assigns a weight to each
query term and then retrieves documents that match the query. A Freestyle
query, then, goes through five steps:
- 1.
- identification of significant terms and phrases from the query,
- 2.
- removal of stop words from the query,
- 3.
- calculation of the statistical importance of the terms and phrases in the
query and comparison of those terms and phrases to each document in the
database,
- 4.
- retrieval of documents with the highest probability of matching the query, and
- 5.
- ranking of each retrieved document based on the number of query terms in
the document and the statistical importance of each query term.
Freestyle also provides date limiting and an online thesaurus. As in Target,
mandatory terms may be indicated by an asterisk and a Boolean OR may be
indicated through the use of parentheses. The system automatically searches
singulars and plurals, but does not offer automatic truncation. Freestyle
does recognize more than 300,000 phrases, but the user may indicate unusual
phrases (e.g., ``fatty acids") by using quotation marks. Relevance feedback
is not an option in Freestyle or Target.
The .WHERE and .WHY screens in Freestyle are particularly helpful
because they offer some clues about the algorithms used to weight query
terms and rank retrieved documents. For example, the .WHERE screen displays
a grid that shows the presence or absence of each query term in each
document retrieved. On the other hand, the .WHY screen displays the weight
assigned to each query term, the number of retrieved documents containing
each of the query terms, and the number of documents matched in the
database.
LEXIS-NEXIS researchers participated in TREC-2, TREC-3, TREC-4, and TREC-5,
but did not employ Freestyle as an information retrieval engine for any of
these conferences. Instead, in TREC-3, LEXIS-NEXIS used the SMART system and
manual expansion of queries as a retrieval engine, with the result that
LEXIS-NEXIS was ranked third in the manually formed ad hoc questions
category. In reporting on their TREC-3 system, LEXIS-NEXIS researchers
asserted that automatic query expansion is not a ``viable option in the
on-line service environment because automatic query expansion largely
excludes the user from the query formulation process"
[LK95].
This argument provides a possible explanation for the lack of
relevance feedback in Freestyle. Lu and Keefer
also stated that the
typical real world query is extremely short, based on the finding that the
average length of a Freestyle query was seven terms.
Next: Measuring Retrieval Performance
Up: Measuring Search Engine Quality
Previous: Introduction
Bob Losee
1999-07-29