INLS 509:
Information Retrieval

Description: The field of information retrieval (IR) is concerned with the analysis, organization, storage, and retrieval of unstructured and semi-structured data. In this course, we will focus on mostly text. While IR systems are often associated with Web search engines (e.g., Google), IR applications also include digital library search, patent search, search for local businesses, and expert search, to name a few. Likewise, IR techniques (the underlying technology behind IR systems) are used to solve a wide range of problems, such as organizing documents into an ontology, recommending news stories to users, detecting spam, and predicting reading difficulty. This course will provide an overview of the theory, implementation, and evaluation of IR systems and IR techniques. In particular, we will explore how search engines work, how they "interpret" human language, what different users expect from them, how they are evaluated, why they sometimes fail, and how they might be improved in the future.
Prerequisites: There are no prerequisites for this course.
Expectations: Information retrieval is the study of computer-based solutions to a human problem. Thus, the first half of the course will be system-focused, while the second half will be user-focused. During the first half, you should expect to see some math (e.g., basic probability and statistics and some linear algebra). However, we will focus on the concepts rather than the details.

Students will have an opportunity to explore their interests with a open-ended literature review.

Time & Location: M, W 12:20-1:35 pm, Manning 307
Instructor: Jaime Arguello (email, web)
Office Hours: T, Th 11:00am-12:00pm, Manning 10 (Garden Level)
Required Textbook: Search Engines - Information Retrieval in Practice, W. B. Croft, D. Metzler, and T. Strohman. Cambridge University Press. 2009. Available on-line.
Additional Resources: Foundations of Statistical Natural Language Processing. C. Manning and H Schutze. 1999.

Introduction to Information Retrieval. C. Manning, P. Raghavan and H. Schutze. 2008.
Other Readings: Selected papers and chapters from other books will sometimes be assigned for reading. These will be available online.
Course Policies: Laptops, Attendance, Participation, Collaboration, Plagiarism & Cheating, Late Policy
Grading: 30% homework (10% each)
15% midterm exam
15% final exam
30% literature review (5% proposal, 10% presentation, 15% paper)
10% participation
Grade Assignments: Undergraduate grading scale: A+ 97-100%, A 94-96%, A- 90-93%, B+ 87-89%, B 84-86, B- 80-83%, C+ 77-79%, C 74-76%, C- 70-73%, D+ 67-69%, D 64-66%, D- 60-63%, F 0-59%

Graduate grading scale: H 95-100%, P 80-94%, L 60-79%, and F 0-59%.

All assignments, exams, and the literature review will be graded on a curve.
Schedule: Subject to change! The required textbook (Croft, Metzler, and Strohman) is denoted as CMS below.
Lecture Date Events Topic Reading Due
1 Wed. 8/24   Introduction to Information Retrieval: The Big Picture  
2 Mon. 8/29   Course Overview: Roadmap and Expectations CMS Ch. 1
3 Wed. 8/31   Introduction To Ad-hoc Retrieval I CMS Ch. 2, 5.3.0-5.3.3, 7.1.0-7.1.1
4 Mon. 9/5 Labor Day (No class)    
5 Wed. 9/7 HW1 Out Introduction To Ad-hoc Retrieval II  
6 Mon. 9/12   Indexing and Query Processing  
7 Wed. 9/14   Statistical Properties of Text CMS Ch. 4.1-4.2
8 Mon. 9/19   Text Representation I CMS Ch. 4.3-4.7, MRS Ch. 2
9 Wed. 9/21 HW1 Due, HW2 Out Text Representation II  
10 Mon. 9/26   Retrieval Models: Vector Space I CMS Ch. 7.0-7.1.2
11 Wed. 9/28   Retrieval Models: Vector Space II  
12 Mon. 10/3 Literature Review Proposal Due Retrieval Models: Query-likelihood I CMS Ch. 7.3
13 Wed. 10/5   Retrieval Models: Query-likelihood II  
14 Mon. 10/10 HW2 Due Document Priors  
15 Wed. 10/12   Evaluation Overview CMS Ch. 8
16 Mon. 10/17 Midterm Review Midterm Review  
17 Wed. 10/19 Midterm Exam Midterm Exam  
18 Mon. 10/24   Test Collection-based Evaluation I Robertson '08, Sanderson '10 (page 248-298)
19 Wed. 10/26   Test Collection-based Evaluation II  
20 Mon. 10/31 HW 3 Out Evaluation Metrics Hersh et al., '00, Turpin & Hersh '01, Sanderson '10 (page 308-350)
21 Wed. 11/2   Experimentation I Smucker et al., '07, Cross-Validation, Parameter Tunning and Overfitting
22 Mon. 11/7   Experimentation II  
23 Wed. 11/9   Relevance Saracevic '07
24 Mon. 11/14 HW3 Due User Studies in Information Retrieval Kelly '09 Chapter 10 (pgs. 99-125), Tombros et al., '05
25 Wed. 11/16   Search-log Analysis Joachims et al., '05, Dumais et al., 14
26 Mon. 11/21   Federated Search I Shokouhi and Si '11
27 Wed. 11/23 Thanksgiving (No class)    
28 Mon. 11/28   Federated Search II  
29 Wed. 11/30   Student Presentations  
30 Mon. 12/5   Student Presentations  
31 Wed. 12/7   Student Presentations  
32 Fri. 12/9 Literature Review Due  
33 TBD   Final Exam Review  
34 Fri. 12/16 Final Exam, Manning 307, 12-3pm Final Exam