Next: Document Rankings from Commercial
Up: Measuring Search Engine Quality
Previous: Measuring Retrieval Performance
The performance of a retrieval or filtering system may be computed analytically using probabilistic methods, with probabilistic parameters as input, rather than using repeated experimentation from which performance results are extrapolated
[Los98].
Given the set of characteristics of a set of documents, the performance may be directly computed.
The analysis here computes the parameters associated with ranked lists of documents from searches producing ranked output using commercial search engines.
Using this model can lead to an understanding of the direct relationship between retrieval performance and changes
in queries, documents, relevance judgments, database size, and document rankings.
Following the development in Losee [Los98],
the average search length, for the case of a single binary feature, is

(1) 
where N is the number of documents in the database,
Q is the probability that ranking is optimal (the quality measure),
and
A is
the expected proportion of documents examined in an optimal ranking
if one examines all the documents up to the document in the average position
of a relevant document,
or the optimal ASL rescaled to the range from 0 to 1.
We compute

(2) 
where p is the probability that the feature occurs in relevant documents and
t is the unconditional probability that the feature occurs.
When A=0, for example, the relevant documents would be at the front of the
ordered list of documents, while when A=.5,
the average position for a relevant document is in the middle of the ordered list of
documents.
Interestingly,
a very bad A value, such as A=1,
works well with a worstcase retrieval engine, with
This is equivalent to placing the best documents at the end of the list of
ranked documents (A=1) and then retrieving them backwards (Q=0.)
When discussing results below, only the positive results are given, with
those cases where a ``double negative" produces a positive result being
ignored, and the twin positivepositive process,
producing a positive retrieval result, being presented instead.
We will use this simple retrieval model to capture the performance of the
search engines and queries being studied.
If we consider the queries as each representing an information need or
concept cluster, we may treat this concept cluster as a single feature upon
which we desire to discriminate.
This cluster also has probabilities of occurring in various classes of documents.
One could examine the number of terms in queries and attempt to model the
system using this many Q values, which might then be averaged in some
way, but accurate estimates would require far greater numbers of documents
with relevance judgments than are typically available in experimental
databases.
We believe that using this single concept model is an adequate approximation of what would be obtained with a multivariate model, where accurate estimates would require very large numbers of document rankings.
Next: Document Rankings from Commercial
Up: Measuring Search Engine Quality
Previous: Measuring Retrieval Performance
Bob Losee
19990729