User-Centered Evaluation
of Digital Libraries
|
|
|
Gary Marchionini |
|
University of North Carolina |
|
march@ils.unc.edu |
|
|
Evaluation Perspective
|
|
|
|
Need to choose |
|
product testing |
|
controlled comparisons |
|
Need to assess |
|
system performance |
|
outcome research (e.g., social
programs) |
|
Need to understand |
|
basic research |
Slide 3
Existing Models
|
|
|
|
Library Effectiveness |
|
circulation |
|
collection size |
|
reference encounters |
|
satisfaction |
|
Information Retrieval |
|
recall/precision tradeoff |
|
satisfaction |
Library Effectiveness
|
|
|
|
Count stuff |
|
Volumes, circulations, reference
questions |
|
|
|
|
|
Transaction log equivalents in DLs? |
Increased Usage at BLS
Decreased Rates of Growth
at LC and BLS
Length of session at BLS
October 2000
User Centered Library
Evaluation
|
|
|
|
D’Elia & Walsh LQ article (physical
libraries) |
|
Satisfaction complexity (direct,
indirect) |
|
Results must be contextualized |
|
See LibQual (www.libqual.org) for
ARL/TAMU |
|
See http://www.vuw.ac.nz/~agsmith/evaln/ |
|
See http://www.library.ucla.edu/libraries/college/help/critical/ |
|
MIT Press book Fall 03 |
|
|
|
|
IR Evaluation
|
|
|
Recall and Precision metrics |
|
System performance (e.g., response
time, broken links, etc.) |
|
Satisfaction |
|
Usability? |
Claims
|
|
|
|
Today’s IR systems are not comparable
to paper-based systems. |
|
bibliographic, full-text, and
multimedia IR systems are not comparable |
|
Complex systems are greater than the
sum of their respective components. |
|
systems that include human components
are inherently complex |
|
Information seeking is an interactive
process. |
|
different users, domains, and settings
require distinct IR system capabilities |
Retrieval as Matching
Documents to Queries
Information-Seeking
Process
Evaluate Systems
|
|
|
|
TREC ad hoc and routing evaluations |
|
TREC interactive track |
|
introduces the user as a component but
not the problems, perceived needs, and actions |
|
Hybrid solutions |
|
human + automatic |
|
statistical + natural language
processing |
Evaluate Actions: Medical
Case
|
|
|
|
Does the patient recover? |
|
Were good decisions made? |
|
patient, physician, hospital, HMO
views? |
|
Difficult (impossible?) to disambiguate
component effects |
|
Task-oriented studies (e.g.., Hersh’s
medical student decisions) |
Evaluate Interactions
|
|
|
Think aloud protocols |
|
Observations, Transaction log analysis |
|
Interviews, Stimulated recall |
|
Error analysis |
|
Time on task |
|
Cost-benefit analysis |
|
Questionnaires |
|
Simulations |
New, User-oriented
Questions
|
|
|
|
Given many relevant documents, which
can be most easily processed/understood? |
|
What are the cost-benefits to different
stakeholders? |
|
What are the
organizational/institutional changes due to a system? |
|
What are the most useful surrogates
(representations) for multimedia objects? |
|
How to best integrate results |
|
multiple retrieved sets |
|
multiple evaluation efforts |
Alternative Strategies
|
|
|
|
Consider the information seeker’s
context |
|
Cognitive accessibility (it does not
matter how good the results are if the information cannot be easily
understood) |
|
Cost-benefit assessment (it does not
matter how good results are if there is no time to use it) |
|
Study special populations (cell
biologist vs. practicing physician) |
|
Usability testing approach (iterative,
impressionistic) |
|
Systematic case studies |
|
Epidemiology approach (start with
outcomes and trace influences) |
|
Develop an IR interaction model |
The Perseus Case
|
|
|
|
Multiple stakeholders, methods, and
components |
|
A set of evaluation questions
(learning, teaching, system, publishing) |
|
Longitudinal effects |
|
mechanical advantages |
|
side effects |
|
new types of learning and teaching |
|
systemic change |
Evaluating New Systems
|
|
|
“We may never know quantitatively the
impact of these combined effects, partly because we don’t know what would
have happened without the collaboratory.” |
|
|
|
William Wulf, The National Collaboratory--A White Paper, 1989 |