User-Centered Evaluation of Digital Libraries

Gary Marchionini

University of North Carolina

march@ils.unc.edu

Evaluation Perspective

Need to choose

product testing

controlled comparisons

Need to assess

system performance

outcome research (e.g., social programs)

Need to understand

basic research

Slide 3

Existing Models

Library Effectiveness

circulation

collection size

reference encounters

satisfaction

Information Retrieval

recall/precision tradeoff

satisfaction

Library Effectiveness

Count stuff

Volumes, circulations, reference questions

Transaction log equivalents in DLs?

Increased Usage at BLS

Decreased Rates of Growth at LC and BLS

Length of session at BLS October 2000

User Centered Library Evaluation

D’Elia & Walsh LQ article (physical libraries)

Satisfaction complexity (direct, indirect)

Results must be contextualized

See LibQual (www.libqual.org) for ARL/TAMU

See http://www.vuw.ac.nz/~agsmith/evaln/

See http://www.library.ucla.edu/libraries/college/help/critical/

MIT Press book Fall 03

IR Evaluation

Recall and Precision metrics

System performance (e.g., response time, broken links, etc.)

Satisfaction

Usability?

Claims

Today’s IR systems are not comparable to paper-based systems.

bibliographic, full-text, and multimedia IR systems are not comparable

Complex systems are greater than the sum of their respective components.

systems that include human components are inherently complex

Information seeking is an interactive process.

different users, domains, and settings require distinct IR system capabilities

Retrieval as Matching Documents to Queries

Information-Seeking Process

Evaluate Systems

TREC ad hoc and routing evaluations

TREC interactive track

introduces the user as a component but not the problems, perceived needs, and actions

Hybrid solutions

human + automatic

statistical + natural language processing

Evaluate Actions: Medical Case

Does the patient recover?

Were good decisions made?

patient, physician, hospital, HMO views?

Difficult (impossible?) to disambiguate component effects

Task-oriented studies (e.g.., Hersh’s medical student decisions)

Evaluate Interactions

Think aloud protocols

Observations, Transaction log analysis

Interviews, Stimulated recall

Error analysis

Time on task

Cost-benefit analysis

Questionnaires

Simulations

New, User-oriented Questions

Given many relevant documents, which can be most easily processed/understood?

What are the cost-benefits to different stakeholders?

What are the organizational/institutional changes due to a system?

What are the most useful surrogates (representations) for multimedia objects?

How to best integrate results

multiple retrieved sets

multiple evaluation efforts

Alternative Strategies

Consider the information seeker’s context

Cognitive accessibility (it does not matter how good the results are if the information cannot be easily understood)

Cost-benefit assessment (it does not matter how good results are if there is no time to use it)

Study special populations (cell biologist vs. practicing physician)

Usability testing approach (iterative, impressionistic)

Systematic case studies

Epidemiology approach (start with outcomes and trace influences)

Develop an IR interaction model

The Perseus Case

Multiple stakeholders, methods, and components

A set of evaluation questions (learning, teaching, system, publishing)

Longitudinal effects

mechanical advantages

side effects

new types of learning and teaching

systemic change

Evaluating New Systems

“We may never know quantitatively the impact of these combined effects, partly because we don’t know what would have happened without the collaboratory.”

William Wulf, The National Collaboratory--A White Paper, 1989


	Gary Marchionini
	University of North Carolina
	march@ils.unc.edu


	Need to choose
		product testing
		controlled comparisons
	Need to assess
		system performance
		outcome research (e.g., social programs)
	Need to understand
		basic research


	Library Effectiveness
		circulation
		collection size
		reference encounters
		satisfaction
	Information Retrieval
		recall/precision tradeoff
		satisfaction


	Count stuff
		Volumes, circulations, reference questions


	Transaction log equivalents in DLs?


	D’Elia & Walsh LQ article (physical libraries)
		Satisfaction complexity (direct, indirect)
		Results must be contextualized
	See LibQual (www.libqual.org) for ARL/TAMU
	See http://www.vuw.ac.nz/~agsmith/evaln/
	See http://www.library.ucla.edu/libraries/college/help/critical/
	MIT Press book Fall 03


	Recall and Precision metrics
	System performance (e.g., response time, broken links, etc.)
	Satisfaction
	Usability?


	Today’s IR systems are not comparable to paper-based systems.
		bibliographic, full-text, and multimedia IR systems are not comparable
	Complex systems are greater than the sum of their respective components.
		systems that include human components are inherently complex
	Information seeking is an interactive process.
		different users, domains, and settings require distinct IR system capabilities


	TREC ad hoc and routing evaluations
	TREC interactive track
		introduces the user as a component but not the problems, perceived needs, and actions
	Hybrid solutions
		human + automatic
		statistical + natural language processing


	Does the patient recover?
	Were good decisions made?
		patient, physician, hospital, HMO views?
	Difficult (impossible?) to disambiguate component effects
	Task-oriented studies (e.g.., Hersh’s medical student decisions)


	Think aloud protocols
	Observations, Transaction log analysis
	Interviews, Stimulated recall
	Error analysis
	Time on task
	Cost-benefit analysis
	Questionnaires
	Simulations


	Given many relevant documents, which can be most easily processed/understood?
	What are the cost-benefits to different stakeholders?
	What are the organizational/institutional changes due to a system?
	What are the most useful surrogates (representations) for multimedia objects?
	How to best integrate results
		multiple retrieved sets
		multiple evaluation efforts


	Consider the information seeker’s context
		Cognitive accessibility (it does not matter how good the results are if the information cannot be easily understood)
		Cost-benefit assessment (it does not matter how good results are if there is no time to use it)
	Study special populations (cell biologist vs. practicing physician)
	Usability testing approach (iterative, impressionistic)
	Systematic case studies
	Epidemiology approach (start with outcomes and trace influences)
	Develop an IR interaction model


	Multiple stakeholders, methods, and components
	A set of evaluation questions (learning, teaching, system, publishing)
	Longitudinal effects
		mechanical advantages
		side effects
		new types of learning and teaching
		systemic change


	“We may never know quantitatively the impact of these combined effects, partly because we don’t know what would have happened without the collaboratory.”

	William Wulf, The National Collaboratory--A White Paper, 1989