1	User-Centered Evaluation of Digital Libraries Gary Marchionini University of North Carolina march@ils.unc.edu
2	Evaluation Perspective Need to choose product testing controlled comparisons Need to assess system performance outcome research (e.g., social programs) Need to understand basic research
3
4	Existing Models Library Effectiveness circulation collection size reference encounters satisfaction Information Retrieval recall/precision tradeoff satisfaction
5	Library Effectiveness Count stuff Volumes, circulations, reference questions Transaction log equivalents in DLs?
6	Increased Usage at BLS
7	Decreased Rates of Growth at LC and BLS
8	Length of session at BLS October 2000
9	User Centered Library Evaluation D’Elia & Walsh LQ article (physical libraries) Satisfaction complexity (direct, indirect) Results must be contextualized See LibQual (www.libqual.org) for ARL/TAMU See http://www.vuw.ac.nz/~agsmith/evaln/ See http://www.library.ucla.edu/libraries/college/help/critical/ MIT Press book Fall 03
10	IR Evaluation Recall and Precision metrics System performance (e.g., response time, broken links, etc.) Satisfaction Usability?
11	Claims Today’s IR systems are not comparable to paper-based systems. bibliographic, full-text, and multimedia IR systems are not comparable Complex systems are greater than the sum of their respective components. systems that include human components are inherently complex Information seeking is an interactive process. different users, domains, and settings require distinct IR system capabilities
12	Retrieval as Matching Documents to Queries
13	Information-Seeking Process
14	Evaluate Systems TREC ad hoc and routing evaluations TREC interactive track introduces the user as a component but not the problems, perceived needs, and actions Hybrid solutions human + automatic statistical + natural language processing
15	Evaluate Actions: Medical Case Does the patient recover? Were good decisions made? patient, physician, hospital, HMO views? Difficult (impossible?) to disambiguate component effects Task-oriented studies (e.g.., Hersh’s medical student decisions)
16	Evaluate Interactions Think aloud protocols Observations, Transaction log analysis Interviews, Stimulated recall Error analysis Time on task Cost-benefit analysis Questionnaires Simulations
17	New, User-oriented Questions Given many relevant documents, which can be most easily processed/understood? What are the cost-benefits to different stakeholders? What are the organizational/institutional changes due to a system? What are the most useful surrogates (representations) for multimedia objects? How to best integrate results multiple retrieved sets multiple evaluation efforts
18	Alternative Strategies Consider the information seeker’s context Cognitive accessibility (it does not matter how good the results are if the information cannot be easily understood) Cost-benefit assessment (it does not matter how good results are if there is no time to use it) Study special populations (cell biologist vs. practicing physician) Usability testing approach (iterative, impressionistic) Systematic case studies Epidemiology approach (start with outcomes and trace influences) Develop an IR interaction model
19	The Perseus Case Multiple stakeholders, methods, and components A set of evaluation questions (learning, teaching, system, publishing) Longitudinal effects mechanical advantages side effects new types of learning and teaching systemic change
20	Evaluating New Systems “We may never know quantitatively the impact of these combined effects, partly because we don’t know what would have happened without the collaboratory.” William Wulf, The National Collaboratory--A White Paper, 1989