Open Video Project
Overview
Open Video Project
|
|
|
|
Goals |
|
Create an open source DL for use by
researchers, students, and the public. |
|
A testbed for interactive interfaces |
|
An environment for building theory of
human information interaction |
|
Ongoing work: begun 1995 with
colleagues at UMD |
|
Current funding: NSF# IIS-0099538, NCNI |
|
Collaborators/Contributors: I2-DSI,
ibiblio, CMU, UMD, NIST, Internet Archive, NASA |
|
www.open-video.org |
Slide 3
Current Status
|
|
|
~ 0.5 TB of content |
|
~2000 video segments |
|
~1200 different titles |
|
~1800 unique visitors per month |
|
I2-DSI video channel |
|
OAI provider |
|
Ongoing user studies |
Slide 5
Backend Tools and
Services
|
|
|
Workstations, servers, disk arrays |
|
Tape players (VHS, Beta SP),
digitization boards (e.g., Broadway), and software for AVI/MOV to MPEG-1,
MPEG-2, and QuickTime |
|
Bandwidth (UNC-CH switched ethernet) |
|
Linux OS, PHP scripting language, MySQL
DBMS, Apache server |
Backend Tools and
Services (cont’)
|
|
|
Merit (UMCP UMIACS), ported to Linux to
extract candidate keyframes |
|
Speech to text (e.g., Sphinx at CMU) |
|
VAST keyframe/posterframe extraction,
selection, and management |
|
Transaction logs and scripts (for
evaluation and for recommenders) |
|
Peer to peer exchange |
|
ISEE (shared remote video use, e.g.,
DE) |
|
Indexer workstation |
|
|
Tools and Services for
User Studies
|
|
|
Database driven web pages for user
interaction |
|
Usability workstation (multiple camera,
mixer, VCR) |
|
eye tracking system |
|
Speech synthesis (for audio keywords) |
|
|
|
Java and Perl scripts for managing,
moving files, managing server (security, upgrades, etc.) |
Agile Views Interface
|
|
|
Provide a variety of access
representations (e.g., indexes) and control mechanisms |
|
Usual search and browse capabilities |
|
Leverage both visual and linguistic
cues |
|
Create and test surrogates for overview
and preview |
Browse: by Categories
& Attributes
Search: by Category &
Attribute
Search: by Free Text
& Keyword
Search Results
Segment Details
Video Transcript Text
Video Segment Preview
AgileViews Overview –
Genre: Documentary
AgileViews Overview –
Genre: Education
AgileViews Overview –
Color/B&W
Previews
Agile Views Preview –
Faces
Agile Views Preview –
Faces
Agile Views Preview –
Superimposition
Agile Views Preview –
Brightness
User Study Research
Agenda
Exploratory Study
|
|
|
What are the strengths and weaknesses
of different surrogates from the users’ perspective? |
|
Are any of the surrogates better than
the others in supporting user performance? |
The Surrogates
|
|
|
Storyboard with text keywords (20-36
per board@ 500 ms) |
|
Storyboard with audio keywords |
|
Slide show with text keywords (250ms
repeated once) |
|
Slide show with audio keywords |
|
Fast forward (~ 4X) |
Method
|
|
|
|
|
7 video segments (2-10 min), 5
surrogates created for each |
|
10 subjects with high video and
computer experience |
|
Three phases (all multi-camera
videotaped) |
|
View full video then use 3 surrogates,
repeat |
|
Participant observation and debriefing |
|
Do NOT view full video, use 3
surrogates, repeat |
|
Participant observation and debriefing |
|
Complete 3 assigned tasks with
surrogates of choice |
|
Think aloud and debriefing |
|
http://www.open-video.org/experiments/chi-2002/methods/study1.mov |
|
|
|
|
Tasks
|
|
|
|
Gist determination—free text |
|
Gist determination—multiple choice |
|
Object recognition—textual |
|
Object recognition—graphical |
|
Action recognition (2-3 second clips) |
|
Visual gist (predict which frames
belong) |
|
http://www.open-video.org/experiments/chi-2002/surrogates/index.html |
|
|
Preferences
|
|
|
In debriefing after each phase,
subjects asked about preferences. |
|
Some preferences changed over the
phases |
|
2 subjects preferred ff |
|
4 subjects said ff if audio keywords
added |
|
1 storyboard with audio keywords |
|
2 slide show with audio keywords |
|
à drop ss with text keywords, develop ff |
Performance
|
|
|
|
No SRD on gist (both free text and
multiple choice) |
|
SRD on action recognition favoring ff |
|
‘Near’ SRD on text object recognition
favoring SB/w audio keywords |
|
4:1 to 29:1 compaction rates suitable
for tasks |
|
Psychometric and face validity support
for the tasks (means and variances; relevant to real tasks) |
|
SRD in gist and visual gist for one
video |
|
àHomogeneity of frames diminishes surrogate value |
|
àKeywords help when visual variability decreases |
Qualitative Results
|
|
|
|
Subjects suggested different surrogates
for different tasks (e.g., ff for judging kid safe, sb for identifying
images, ff for video styles) |
|
Three senses of gist |
|
Topic (T) |
|
Narrativity (N) |
|
T+N+visual style |
|
Individual preferences and experiences
influence surrogate effectiveness |
|
|
|
|
|
|
Fast Forward Study
|
|
|
|
How fast can we make fast forwards? |
|
4 ff conditions (32X, 64X, 128X, 256X) |
|
Four video segments for each condition |
|
45 subjects |
|
6 tasks (full text gist, multiple
choice gist, word object recognition, graphical object recognition, action
recognition, visual gist) |
Preliminary Results
|
|
|
SRD on 4 of 6 tasks as speed increases,
however, reasonable performance at even the highest rate |
|
Video content/genre interacts with
performance |
|
Preference does not parallel
performance (people can perform well under extreme conditions but do not
like/enjoy) |
|
àGive users control but select appropriate defaults |
Next Steps
|
|
|
|
Poster frame and keyword placement
effects using eye-tracking |
|
Integrate surrogates into production
system |
|
User studies with overall system |
|
New tools |
|
Shared video study environment (ISEE) |
|
Peer to peer sharing |
|
Indexer’s toolkit |
|
Audio?? |
|
Continue to build and sustain Open
Video |
Summary
|
|
|
|
Give people many ‘views’ to look ahead |
|
Make these views easy to manipulate
(agile) |
|
Challenges |
|
Mapping video characteristics to
surrogates (e.g., keyframes, keywords), mapping surrogates to control
mechanisms (e.g., mouse actions) |
|
Automating production processes |