User-Centered Evaluation of Digital Libraries
Gary Marchionini
University of North Carolina
march@ils.unc.edu

Evaluation Perspective
Need to choose
product testing
controlled comparisons
Need to assess
system performance
outcome research (e.g., social programs)
Need to understand
basic research

Slide 3

Existing Models
Library Effectiveness
circulation
collection size
reference encounters
satisfaction
Information Retrieval
recall/precision tradeoff
satisfaction

Library Effectiveness
Count stuff
Volumes, circulations, reference questions
Transaction log equivalents in DLs?

Increased Usage at BLS

Decreased Rates of Growth at LC and BLS

Length of session at BLS October 2000

User Centered Library Evaluation
D’Elia & Walsh LQ article (physical libraries)
Satisfaction complexity (direct, indirect)
Results must be contextualized
See LibQual (www.libqual.org) for ARL/TAMU
See http://www.vuw.ac.nz/~agsmith/evaln/
See http://www.library.ucla.edu/libraries/college/help/critical/
MIT Press book Fall 03

IR Evaluation
Recall and Precision metrics
System performance (e.g., response time, broken links, etc.)
Satisfaction
Usability?

Claims
Today’s IR systems are not comparable to paper-based systems.
bibliographic, full-text, and multimedia IR systems are not comparable
Complex systems are greater than the sum of their respective components.
systems that include human components are inherently complex
Information seeking is an interactive process.
different users, domains, and settings require distinct IR system capabilities

Retrieval as Matching Documents to Queries

Information-Seeking Process

Evaluate Systems
TREC ad hoc and routing evaluations
TREC interactive track
introduces the user as a component but not the problems, perceived needs, and actions
Hybrid solutions
human + automatic
statistical + natural language processing

Evaluate Actions: Medical Case
Does the patient recover?
Were good decisions made?
patient, physician, hospital, HMO views?
Difficult (impossible?) to disambiguate component effects
Task-oriented studies (e.g.., Hersh’s medical student decisions)

Evaluate Interactions
Think aloud protocols
Observations, Transaction log analysis
Interviews, Stimulated recall
Error analysis
Time on task
Cost-benefit analysis
Questionnaires
Simulations

New, User-oriented Questions
Given many relevant documents, which can be most easily processed/understood?
What are the cost-benefits to different stakeholders?
What are the organizational/institutional changes due to a system?
What are the most useful surrogates (representations) for multimedia objects?
How to best integrate results
multiple retrieved sets
multiple evaluation efforts

Alternative Strategies
Consider the information seeker’s context
Cognitive accessibility (it does not matter how good the results are if the information cannot be easily understood)
Cost-benefit assessment (it does not matter how good results are if there is no time to use it)
Study special populations (cell biologist vs. practicing physician)
Usability testing approach (iterative, impressionistic)
Systematic case studies
Epidemiology approach (start with outcomes and trace influences)
Develop an IR interaction model

The Perseus Case
Multiple stakeholders, methods, and components
A set of evaluation questions (learning, teaching, system, publishing)
Longitudinal effects
mechanical advantages
side effects
new types of learning and teaching
systemic change

Evaluating New Systems
“We may never know quantitatively the impact of these combined effects, partly because we don’t know what would have happened without the collaboratory.”
  William Wulf, The National Collaboratory--A White Paper, 1989