1	"Design and Evaluation Challenges" Design and Evaluation Challenges 2004 Digital Library Colloquium Series University of Pittsburgh-Carnegie Mellon University April 16, 2004
2	Preview Interplay between basic research; system development and evaluation; system operation and sustainability Overview of Open Video DL as a system Focus on user studies that have informed redesign and future systems, contributed to our understanding of how people make sense of video
3	Top View Digital video a burgeoning DL challenge Substantial activity on storage, retrieval Many large-scale DLs InforMedia, Fischlar, ECHO, Internet Archive Prelinger Collection, Open Video Most attention on system/collection building Commercial attention on system and management IBM, MERL, Microsoft, Artesia, Virage NIST TREC Video Track for retrieval evaluation Crucial need for evaluation that includes human factors
4	Open Video Vision/Contributions An open repository of video files that can be re-used in a variety of ways by the education and research communities Encourages contributions A testbed for interactive interfaces An easy to use DL based upon the agile views interface design framework Multiple, cascading, easy to control views (pre, over, re, shared, peripheral) Views based upon empirically validated surrogates An environment for building theory of human information interaction A set of methods and metrics that reveal how people understand digital video through surrogates
5	Background & Status Begun 1995 with colleagues at UMD & BCPS Current funding: NSF# IIS-0099538 Collaborators/Contributors: I2-DSI, ibiblio, CMU, UMD, NIST, Internet Archive, NASA, CHI community ~2000+ video segments ~1400 different titles ~24000 unique visitors per month (March 04) ~3,000,000 hits/month (March 04) I2-DSI video channel MPEG-1, MPEG-2, MPEG-4, QT OAI provider Ongoing user studies
6	Open Video As a System
7
8	Backend Tools and Services Workstations, servers, disk arrays Tape players (VHS, Beta SP, PAL), digitization boards (e.g., Broadway), and software for AVI/MOV to MPEG-1, MPEG-2, and QuickTime (Media Cleaner, Adobe Premier, Final Cut Pro) Bandwidth (UNC-CH switched ethernet) Linux OS, PHP scripting language, MySQL DBMS, Apache server
9	Backend Tools and Services (cont’) Merit (UMCP UMIACS), ported to Linux to extract candidate keyframes Speech to text (e.g., Sphinx at CMU) VAST keyframe/posterframe extraction, selection, and management Transaction logs and scripts (for evaluation and for recommenders) Peer to peer exchange ISEE (shared remote video use, e.g., DE) Indexer workstation (VIVO)
10	Tools and Services for User Studies Database driven web pages for user interaction Usability workstation (multiple camera, mixer, VCR) eye tracking system Speech synthesis (for audio keywords) Java and Perl scripts for managing, moving files, managing server (security, upgrades, etc.)
11
12
13
14
15
16
17	Agile Views Interface Research Provide a variety of access representations (e.g., indexes) and control mechanisms Usual search and browse capabilities Leverage both visual and linguistic cues Create and test surrogates for overview preview, shared and history views
18	Digital Video Surrogates Classes Textual Visual Audio Cost benefit analysis: maximize ‘meaning’ per unit time Transmission time Compaction rate Cognitive processing time Performance vs. Preference
19	Research Framework
20	Surrogates Examined Storyboard with text keywords (20-36 per board@ 500 ms) Storyboard with audio keywords Slide show with text keywords (250ms repeated once) Slide show with audio keywords Fast forwards 32X, 64X, 128X, 256X Poster frames (1-3) Real time clips/excerpts (7 sec) Text Visual features (e.g., in/out, people, etc.)
21	Surrogate Examples
22	Metrics
23	User Studies Qualitative Comparison of Surrogates (Spring 02, ECDL 02) Fast Forwards (Fall 02, JCDL 03) Text or Pictures (Spring 03, CIVR 03) Narrativity (CHI 02, ASIST 03) Shared views and History Views (Geisler dissertation) TREC evaluation (Spring/summer 03) ViSOR (Gruss Master’s paper) Look vs Read (Hughes Master’s paper) Current studies
24	Exploratory Study to Constraint Surrogate Design Space (Spring 02) What are the strengths and weaknesses of different surrogates from the users’ perspective? Are any of the surrogates better than the others in supporting user performance?
25	The Surrogates Storyboard with text keywords (20-36 per board@ 500 ms) Storyboard with audio keywords Slide show with text keywords (250ms repeated once) Slide show with audio keywords Fast forward (~ 4X)
26	Method 7 video segments (2-10 min), 5 surrogates created for each 10 subjects with high video and computer experience Three phases (all multi-camera videotaped) View full video then use 3 surrogates, repeat Participant observation and debriefing Do NOT view full video, use 3 surrogates, repeat Participant observation and debriefing Complete 3 assigned tasks with surrogates of choice Think aloud and debriefing
27	Tasks Gist determination—free text Gist determination—multiple choice Object recognition—textual Object recognition—graphical Action recognition (2-3 second clips) Visual gist (predict which frames belong)
28	Performance No SRD on gist (both free text and multiple choice) SRD on action recognition favoring ff ‘Near’ SRD on text object recognition favoring SB/w audio keywords 4:1 to 29:1 compaction rates suitable for tasks Psychometric and face validity support for the tasks (means and variances; relevant to real tasks) SRD in gist and visual gist for one video àHomogeneity of frames diminishes surrogate value àKeywords help when visual variability decreases
29	Qualitative Results Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles) Three senses of gist Topic (T) Narrativity (N) T+N+visual style Individual preferences and experiences influence surrogate effectiveness
30	Fast Forward Study (Fall 02) How fast can we make fast forwards? 4 ff conditions (32X, 64X, 128X, 256X) Four video segments for each condition 45 subjects (1/2 UG, 1/2 grad, 2/3 female) 6 tasks (full text gist, multiple choice gist, word object recognition, graphical object recognition, action recognition, visual gist) Counterbalance speed and videos Web-driven experimental condition, 3-camera video tapes, single subject at a time in usability laboratory
31	Sample A. 9:19 at 32X
32	Sample B. 19:48 at 64X
33	Sample C. 14:00 at 128X
34	Sample D. 14:09 at 256X
35	Example Image Recognition Stimulus
36	Results SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate Video content/genre interacts with performance Preference does not parallel performance (people can perform well under extreme conditions but do not like/enjoy) No user characteristic differences (age, sex) àGive users control but select appropriate defaults Caveat: controlled, independent focus on FF, likely a lower bound on performance
37	Speed Effects on Performance
38	Text or Pictures? (Spring 03) Research Questions: Given both textual and visual metadata; which surrogate will be utilized, which surrogate will be preferred? Does the placement of the surrogates affect how they are used? Does the assigned task affect how surrogates are used? Does personal preference play a role in how surrogates are used?
39	Study Methods / Procedures 12 undergraduate students (paid volunteers) Pre-Study questionnaire Demographics Visual vs. Verbal learning style (VVQ) 10 search problems Counter-balanced Design 1 and 2 1 : text on left / visuals on right 2 : visuals on left / text on right Eyetracking Post-study questionnaire Follow up questions
40	Results All participants over all tasks: Mean time looking at text = 29.7 sec. Mean time looking at pics = 6.8 sec. 75% of fixations over text 18% of fixations over pics First fixations over text = 65 First fixations over pics = 54 Text requires and gets more user attention
41	Results cont’d Design 1 vs. Design 2 When text was placed on the left, mean time per fixation was slightly higher VVQ Balanced group spent more time looking at text Tasks Varied by task: Time spent looking at text Time spent per fixation over text Frequency of fixations over text
42	Screen Shots
43	Screen Shots
44	Screen Shots
45	Tasks Please find a video that discusses the destruction earthquakes can do to buildings. These search results are from a search on the word “Earthquake”. Please find a video that discusses nurses and their contributions to the United States Army. These search results are from a search on the word “Work”. Please choose a video from the following list that you think would be entertainting for you and your friends to watch.
46	Discussion In this restricted situation (i.e. pre-formulated results page) participants used text as the main anchor point ? Because text is a better surrogate? ? Because text contains more information? ? Because text is more familiar to people ? Because tasks directed users to text?
47	Text or Pictures? Text was reported as: Being the search anchor Containing significant topical information Taking longer to read than pictures Visuals were reported as: Being globally liked Being used to quickly narrow down choices Taking less time to decode than text All participants said the results page would be weaker without them Often lacking in reference points
48	Conclusion Visual metadata was used to make (confirm???) relevance judgments Combination of visual & verbal stronger than one or the other Generalize with caution: Small number of study participants Specific set of search results pages Ten specific search tasks.
49	Narrativity Study (CHI 02) CHI walk up kiosk, 20 people used 20 one-minute clips (half b&w, no audio) selected on 2 criteria: contain characters, have cause/effect relations between scenes (5 in each category) SRD on chars, cause, and interaction
50	Shared Views and History Views Studies (02-03) Evaluate AV Design Framework by instantiating and evaluating a design Shared (based on recommendations) and History Views (based on logs) Phase 1: compare OV to Views interface (28 participants). OV>accuracy; NSRD on time, but learning effect; AV>navigation/efficiency; AV>satisfaction Phase 2: qualitative analysis of shared and history views
51	VisOR study (Fall 03) Interface effects of automatically extracted features (TREC 02 features); 17 subjects each doing 14 search tasks Sliders to adjust weights of different features did not affect performance Keywords, indoors/outdoors and cityscape/landscape most useful Use of color and brightness helped with exact match searches General satisfaction with using different features
52	Look vs Read Study (Sp 03) Twelve subjects think aloud while viewing results pages for five search tasks with text (titles, descriptions) or visual (3 keyframes, storyboard) surrogates Surrogates used differently depending on task; neither primary with considerable switching and combining (e.g., find airplane, most used visual first) Time a factor in deciding which to use and when
53	TREC 03 Study Compare transcript only, feature only, and combined surrogates with 36 subjects NSRD in precision across 3 surrogates, transcript only and combined yielded SR higher recall in less time and SR greater satisfaction results.
54	Current Studies Relative value of surrogates in context Four sets of surrogates (ff, sb, excerpt, combined) compared (Spring 04) Mu dissertation: cognitive load effects on collaborative learning with video (ISEE) Investigation of tasks Yang dissertation: how do people make relevance judgments about video?
55	Take Away Summary User studies inform good design Give people multiple views and easy control mechanisms No silver bullets (many factors determine performance and preference) Video offers new kinds of potentials for learning and communication
56