next up previous
Next: Document Rankings from Commercial Up: Measuring Search Engine Quality Previous: Measuring Retrieval Performance

Analytic Models of Performance

The performance of a retrieval or filtering system may be computed analytically using probabilistic methods, with probabilistic parameters as input, rather than using repeated experimentation from which performance results are extrapolated [Los98]. Given the set of characteristics of a set of documents, the performance may be directly computed. The analysis here computes the parameters associated with ranked lists of documents from searches producing ranked output using commercial search engines. Using this model can lead to an understanding of the direct relationship between retrieval performance and changes in queries, documents, relevance judgments, database size, and document rankings. Following the development in Losee [Los98], the average search length, for the case of a single binary feature, is

\begin{displaymath}ASL = N\left[Q A + (1-Q)(1-A)\right]
\end{displaymath} (1)

where N is the number of documents in the database, Q is the probability that ranking is optimal (the quality measure), and A is the expected proportion of documents examined in an optimal ranking if one examines all the documents up to the document in the average position of a relevant document, or the optimal ASL re-scaled to the range from 0 to 1. We compute

\begin{displaymath}A = \frac{1 - p + t}{2},
\end{displaymath} (2)

where p is the probability that the feature occurs in relevant documents and t is the unconditional probability that the feature occurs. When A=0, for example, the relevant documents would be at the front of the ordered list of documents, while when A=.5, the average position for a relevant document is in the middle of the ordered list of documents. Interestingly, a very bad A value, such as A=1, works well with a worst-case retrieval engine, with $Q \rightarrow 0.$ This is equivalent to placing the best documents at the end of the list of ranked documents (A=1) and then retrieving them backwards (Q=0.) When discussing results below, only the positive results are given, with those cases where a ``double negative" produces a positive result being ignored, and the twin positive-positive process, producing a positive retrieval result, being presented instead. We will use this simple retrieval model to capture the performance of the search engines and queries being studied. If we consider the queries as each representing an information need or concept cluster, we may treat this concept cluster as a single feature upon which we desire to discriminate. This cluster also has probabilities of occurring in various classes of documents. One could examine the number of terms in queries and attempt to model the system using this many Q values, which might then be averaged in some way, but accurate estimates would require far greater numbers of documents with relevance judgments than are typically available in experimental databases. We believe that using this single concept model is an adequate approximation of what would be obtained with a multivariate model, where accurate estimates would require very large numbers of document rankings.
next up previous
Next: Document Rankings from Commercial Up: Measuring Search Engine Quality Previous: Measuring Retrieval Performance
Bob Losee
1999-07-29