Open Video Project Overview

INLS 235

Spring 2003

Open Video Project

Goals

Create an open source DL for use by researchers, students, and the public.

A testbed for interactive interfaces

An environment for building theory of human information interaction

Ongoing work: begun 1995 with colleagues at UMD

Current funding: NSF# IIS-0099538, NCNI

Collaborators/Contributors: I2-DSI, ibiblio, CMU, UMD, NIST, Internet Archive, NASA

www.open-video.org

Slide 3

Current Status

~ 0.5 TB of content

~2000 video segments

~1200 different titles

~1800 unique visitors per month

I2-DSI video channel

OAI provider

Ongoing user studies

Slide 5

Backend Tools and Services

Workstations, servers, disk arrays

Tape players (VHS, Beta SP), digitization boards (e.g., Broadway), and software for AVI/MOV to MPEG-1, MPEG-2, and QuickTime

Bandwidth (UNC-CH switched ethernet)

Linux OS, PHP scripting language, MySQL DBMS, Apache server

Backend Tools and Services (cont’)

Merit (UMCP UMIACS), ported to Linux to extract candidate keyframes

Speech to text (e.g., Sphinx at CMU)

VAST keyframe/posterframe extraction, selection, and management

Transaction logs and scripts (for evaluation and for recommenders)

Peer to peer exchange

ISEE (shared remote video use, e.g., DE)

Indexer workstation

Tools and Services for User Studies

Database driven web pages for user interaction

Usability workstation (multiple camera, mixer, VCR)

eye tracking system

Speech synthesis (for audio keywords)

Java and Perl scripts for managing, moving files, managing server (security, upgrades, etc.)

Agile Views Interface

Provide a variety of access representations (e.g., indexes) and control mechanisms

Usual search and browse capabilities

Leverage both visual and linguistic cues

Create and test surrogates for overview and preview

Browse: by Categories & Attributes

Search: by Category & Attribute

Search: by Free Text & Keyword

Search Results

Segment Details

Video Transcript Text

Video Segment Preview

AgileViews Overview – Genre: Documentary

AgileViews Overview – Genre: Education

AgileViews Overview – Color/B&W

Previews

Agile Views Preview – Faces

Agile Views Preview – Superimposition

Agile Views Preview – Brightness

User Study Research Agenda

Exploratory Study

What are the strengths and weaknesses of different surrogates from the users’ perspective?

Are any of the surrogates better than the others in supporting user performance?

The Surrogates

Storyboard with text keywords (20-36 per board@ 500 ms)

Storyboard with audio keywords

Slide show with text keywords (250ms repeated once)

Slide show with audio keywords

Fast forward (~ 4X)

Method

7 video segments (2-10 min), 5 surrogates created for each

10 subjects with high video and computer experience

Three phases (all multi-camera videotaped)

View full video then use 3 surrogates, repeat

Participant observation and debriefing

Do NOT view full video, use 3 surrogates, repeat

Participant observation and debriefing

Complete 3 assigned tasks with surrogates of choice

Think aloud and debriefing

http://www.open-video.org/experiments/chi-2002/methods/study1.mov

Tasks

Gist determination—free text

Gist determination—multiple choice

Object recognition—textual

Object recognition—graphical

Action recognition (2-3 second clips)

Visual gist (predict which frames belong)

http://www.open-video.org/experiments/chi-2002/surrogates/index.html

Preferences

In debriefing after each phase, subjects asked about preferences.

Some preferences changed over the phases

2 subjects preferred ff

4 subjects said ff if audio keywords added

1 storyboard with audio keywords

2 slide show with audio keywords

à drop ss with text keywords, develop ff

Performance

No SRD on gist (both free text and multiple choice)

SRD on action recognition favoring ff

‘Near’ SRD on text object recognition favoring SB/w audio keywords

4:1 to 29:1 compaction rates suitable for tasks

Psychometric and face validity support for the tasks (means and variances; relevant to real tasks)

SRD in gist and visual gist for one video

àHomogeneity of frames diminishes surrogate value

àKeywords help when visual variability decreases

Qualitative Results

Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles)

Three senses of gist

Topic (T)

Narrativity (N)

T+N+visual style

Individual preferences and experiences influence surrogate effectiveness

Fast Forward Study

How fast can we make fast forwards?

4 ff conditions (32X, 64X, 128X, 256X)

Four video segments for each condition

45 subjects

6 tasks (full text gist, multiple choice gist, word object recognition, graphical object recognition, action recognition, visual gist)

Preliminary Results

SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate

Video content/genre interacts with performance

Preference does not parallel performance (people can perform well under extreme conditions but do not like/enjoy)

àGive users control but select appropriate defaults

Next Steps

Poster frame and keyword placement effects using eye-tracking

Integrate surrogates into production system

User studies with overall system

New tools

Shared video study environment (ISEE)

Peer to peer sharing

Indexer’s toolkit

Audio??

Continue to build and sustain Open Video

Summary

Give people many ‘views’ to look ahead

Make these views easy to manipulate (agile)

Challenges

Mapping video characteristics to surrogates (e.g., keyframes, keywords), mapping surrogates to control mechanisms (e.g., mouse actions)

Automating production processes


	Goals
		Create an open source DL for use by researchers, students, and the public.
		A testbed for interactive interfaces
		An environment for building theory of human information interaction
	Ongoing work: begun 1995 with colleagues at UMD
	Current funding: NSF# IIS-0099538, NCNI
	Collaborators/Contributors: I2-DSI, ibiblio, CMU, UMD, NIST, Internet Archive, NASA
	www.open-video.org


	~ 0.5 TB of content
	~2000 video segments
	~1200 different titles
	~1800 unique visitors per month
	I2-DSI video channel
	OAI provider
	Ongoing user studies


	Workstations, servers, disk arrays
	Tape players (VHS, Beta SP), digitization boards (e.g., Broadway), and software for AVI/MOV to MPEG-1, MPEG-2, and QuickTime
	Bandwidth (UNC-CH switched ethernet)
	Linux OS, PHP scripting language, MySQL DBMS, Apache server


	Merit (UMCP UMIACS), ported to Linux to extract candidate keyframes
	Speech to text (e.g., Sphinx at CMU)
	VAST keyframe/posterframe extraction, selection, and management
	Transaction logs and scripts (for evaluation and for recommenders)
	Peer to peer exchange
	ISEE (shared remote video use, e.g., DE)
	Indexer workstation


	Database driven web pages for user interaction
	Usability workstation (multiple camera, mixer, VCR)
	eye tracking system
	Speech synthesis (for audio keywords)

	Java and Perl scripts for managing, moving files, managing server (security, upgrades, etc.)


	Provide a variety of access representations (e.g., indexes) and control mechanisms
	Usual search and browse capabilities
	Leverage both visual and linguistic cues
	Create and test surrogates for overview and preview


	What are the strengths and weaknesses of different surrogates from the users’ perspective?
	Are any of the surrogates better than the others in supporting user performance?


	Storyboard with text keywords (20-36 per board@ 500 ms)
	Storyboard with audio keywords
	Slide show with text keywords (250ms repeated once)
	Slide show with audio keywords
	Fast forward (~ 4X)


7 video segments (2-10 min), 5 surrogates created for each
10 subjects with high video and computer experience
Three phases (all multi-camera videotaped)
	View full video then use 3 surrogates, repeat
		Participant observation and debriefing
	Do NOT view full video, use 3 surrogates, repeat
		Participant observation and debriefing
	Complete 3 assigned tasks with surrogates of choice
		Think aloud and debriefing
http://www.open-video.org/experiments/chi-2002/methods/study1.mov


	Gist determination—free text
	Gist determination—multiple choice
	Object recognition—textual
	Object recognition—graphical
	Action recognition (2-3 second clips)
	Visual gist (predict which frames belong)
		http://www.open-video.org/experiments/chi-2002/surrogates/index.html


	In debriefing after each phase, subjects asked about preferences.
	Some preferences changed over the phases
	2 subjects preferred ff
	4 subjects said ff if audio keywords added
	1 storyboard with audio keywords
	2 slide show with audio keywords
	à drop ss with text keywords, develop ff


	No SRD on gist (both free text and multiple choice)
	SRD on action recognition favoring ff
	‘Near’ SRD on text object recognition favoring SB/w audio keywords
	4:1 to 29:1 compaction rates suitable for tasks
	Psychometric and face validity support for the tasks (means and variances; relevant to real tasks)
	SRD in gist and visual gist for one video
		àHomogeneity of frames diminishes surrogate value
		àKeywords help when visual variability decreases


	Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles)
	Three senses of gist
		Topic (T)
		Narrativity (N)
		T+N+visual style
	Individual preferences and experiences influence surrogate effectiveness


	How fast can we make fast forwards?
		4 ff conditions (32X, 64X, 128X, 256X)
		Four video segments for each condition
		45 subjects
		6 tasks (full text gist, multiple choice gist, word object recognition, graphical object recognition, action recognition, visual gist)


	SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate
	Video content/genre interacts with performance
	Preference does not parallel performance (people can perform well under extreme conditions but do not like/enjoy)
	àGive users control but select appropriate defaults


	Poster frame and keyword placement effects using eye-tracking
	Integrate surrogates into production system
	User studies with overall system
	New tools
		Shared video study environment (ISEE)
		Peer to peer sharing
		Indexer’s toolkit
		Audio??
	Continue to build and sustain Open Video


	Give people many ‘views’ to look ahead
	Make these views easy to manipulate (agile)
	Challenges
		Mapping video characteristics to surrogates (e.g., keyframes, keywords), mapping surrogates to control mechanisms (e.g., mouse actions)
		Automating production processes