INLS 609:
Experimental Information Retrieval

Description: Information Retrieval (IR) is a broad field, encompassing a wide-range of information-seeking tasks, such as web search, patent search, medical record search, and micro-blog search. While the underlying goal is the same (i.e., to retrieve relevant or useful content in response to an information request), different tasks require different solutions and methods of evaluation.

This course takes an in-depth look at experimental IR systems evaluated in annual community-wide evaluation forums such as TREC. Through weekly readings and in-class discussions, students will gain an understanding of different search problems and their best-performing solutions. Through a semester-long project, students will gain practical experience in putting together and evaluating an information retrieval system that addresses a particular information-seeking task.

Student groups will be strongly encouraged to put together a system that can participate in TREC 2017. However, this is not a requirement to do well in the course.

Prerequisites: INLS 509, Informaton Retrieval or consent from the instructor
In-Class Discussions: This is an individual assignment. Each student will be assigned a search task (or track) from TREC 2016 and will lead two back-to-back in-class discussions of 1 hour and 15 minutes each.

These two sessions will be divided into two parts. In the first part, the student will present a historical overview of the track and a survey of the the best-performing systems from TREC 2016. In the second part, the student will lead a brainstorming session on new experimental solutions that might be competitive with the best-performing systems. This will account for 30% of the total grade. See discussion leadership guidelines for helpful tips on being a good presenter and discussion moderator.

Term Projects: Each term project will focus on a particular information-seeking task and will use data (documents + relevance judgements) provided by TREC or INEX (2016 or earlier). The goal of each project will be to investigate and evaluate at least one "special sauce" component that might improve a baseline system's performance. Each project should be associated with a hypothesis of the form: System A + "special sauce" will outperform System A without "special sauce". It is not crucial for the "special sauce" to work in order for the project to be successful. It is more important to determine why it does or doesn't work.

Students must work in groups of two or three. Projects with three students will be expected to be more ambitious than projects with two students.

Time & Location: M 10:10am-11:25am, Manning 303
Instructor: Jaime Arguello (email, web)
Office Hours: By Appointment, Manning 10 (Garden Level)
Recommended Textbook: Search Engines - Information Retrieval in Practice, W. B. Croft, D. Metzler, and T. Strohman. Cambridge University Press. 2009. Available on-line.
Additional Resources: Foundations of Statistical Natural Language Processing. C. Manning and H Schutze. 1999.

Introduction to Information Retrieval. C. Manning, P. Raghavan and H. Schutze. 2008.

Clinical Decision Support Track
Contextual Suggestion Track
Dynamic Domain Track
Live Question-Answering Track
Open Search Track
Real-time Summarization Track
Tasks Track
Total Recall Track
Course Policies: Laptops, Attendance, Participation, Collaboration, Plagiarism & Cheating, Late Policy
Grading: 20% class participation
30% in-class discussion (15% survey presentation, 15% brainstorming discussion leadership) See the Track Overview and Brainstorming Session Guidelines
50% final project (10% project proposal, 30% project report, 10% project presentation)
Grade Assignments: Undergraduate grading scale: A+ 97-100%, A 94-96%, A- 90-93%, B+ 87-89%, B 84-86, B- 80-83%, C+ 77-79%, C 74-76%, C- 70-73%, D+ 67-69%, D 64-66%, D- 60-63%, F 0-59%

Graduate grading scale: H 95-100%, P 80-94%, L 60-79%, and F 0-59%.

All assignments, exams, and the literature review will be graded on a curve.
Acknowledgement The structure of this course is inspired by Jamie Callan's Experimental Information Retrieval course at Carnegie Mellon University, which I took as a PhD student.
Schedule: Subject to change!
Lecture Date Events Topic
1 Wed. 1/11   Course Overview
2 Mon. 1/16 MLK Day (No Class)  
3 Wed. 1/18  

History of TREC

Readings:
4 Mon. 1/23  

Overview of TREC 2016

Readings:
5 Wed. 1/25  

Experimentation Review

Readings:
6 Mon. 1/30   Open-Source Toolkits
7 Wed. 2/1  

Using the Killdevil Computer Cluster

Readings:
8 Mon. 2/6  

Real-time Summarization Track (Jaime)

Readings:
9 Wed. 2/8   Real-time Summarization Track (Jaime)
10 Mon. 2/13  

Tasks Track (Albert)

Readings:
11 Wed. 2/15   Tasks Track (Albert)
12 Mon. 2/20  

Clinical Decision Support Track (Jeff)

Readings:
13 Wed. 2/22   Clinical Decision Support Track (Jeff)
14 Mon. 2/27  

Contextual Suggestion Track (Katherine)

Readings:
15 Wed. 3/1   Contextual Suggestion Track (Katherine)
16 Mon. 3/6  

Dynamic Domain Track (Tripp)

Readings:
17 Wed. 3/8   Dynamic Domain Track (Tripp)
18 Mon. 3/13 Spring Break (No class)
19 Wed. 3/15 Spring Break (No class)
20 Mon. 3/20   Clinication Decision Support - Experimentation
21 Wed. 3/22   Project Discussion
22 Mon. 3/27  

Live QA Track (Pamela)

Readings:
23 Wed. 3/29   Live QA Track (Pamela)
24 Mon. 4/3  

OpenSearch Track (Jaime)

Readings:
25 Wed. 4/5   Using Query Performance Predictors to Reduce Spoken Queries
26 Mon. 4/10 No Class XXX
27 Wed. 4/12 No Class XXX
28 Mon. 4/17  

Total Recall Track (Bogeum)

Readings:
29 Wed. 4/19   Total Recall Track (Bogeum)
30 Mon. 4/24   Student Presentations
31 Wed. 4/26   Student Presentations
32 Mon. 5/1 Final Project Due XXX