INLS 509_002:
Information Retrieval

Description: The field of information retrieval (IR) is concerned with the analysis, organization, storage, and retrieval of unstructured and semi-structured data. In this course, we will focus on mostly text. While IR systems are often associated with Web search engines (e.g., Google), IR applications also include digital library search, patent search, search for local businesses, and expert search, to name a few. Likewise, IR techniques (the underlying technology behind IR systems) are used to solve a wide range of problems, such as organizing documents into an ontology, recommending news stories to users, detecting spam, and predicting reading difficulty. This course will provide an overview of the theory, implementation, and evaluation of IR systems and IR techniques. In particular, we will explore how search engines work, how they "interpret" human language, what different users expect from them, how they are evaluated, why they sometimes fail, and how they might be improved in the future.
Prerequisites: There are no prerequisites for this course.
Expectations: Information retrieval is the study of computer-based solutions to a human problem. Thus, the first half of the course will be system-focused, while the second half will be user-focused. During the first half, you should expect to see some math (e.g., basic probability and statistics and some linear algebra). However, we will focus on the concepts rather than the details.

Students will have an opportunity to explore their interests with a open-ended literature review.

Time & Location: M,W 2:30-3:45pm, Manning 208 (in-person starting on 1/31). Please review our Community Standards and Mask Use Policy.
Instructor: Jaime Arguello (email, web)
Office Hours: By appointment, Manning 10 (Garden Level)
Required Textbook: Search Engines - Information Retrieval in Practice, W. B. Croft, D. Metzler, and T. Strohman. Cambridge University Press. 2009. Available on-line.
Additional Resources: Foundations of Statistical Natural Language Processing. C. Manning and H Schutze. 1999.

Introduction to Information Retrieval. C. Manning, P. Raghavan and H. Schutze. 2008.
Other Readings: Selected papers and chapters from other books will sometimes be assigned for reading. These will be available online.
Course Policies: Laptops, Attendance, Participation, Collaboration, Plagiarism & Cheating, Late Policy
Grading: 30% homework (10% each)
15% midterm exam
15% final exam
30% literature review (5% proposal, 10% presentation, 15% paper)
10% participation
Grade Assignments: Undergraduate grading scale: A+ 97-100%, A 94-96%, A- 90-93%, B+ 87-89%, B 84-86, B- 80-83%, C+ 77-79%, C 74-76%, C- 70-73%, D+ 67-69%, D 64-66%, D- 60-63%, F 0-59%

Graduate grading scale: H 95-100%, P 80-94%, L 60-79%, and F 0-59%.

All assignments, exams, and the literature review will be graded on a curve.
Schedule: Subject to change! The required textbook (Croft, Metzler, and Strohman) is denoted as CMS below.
Lecture Date Events Topic Reading Due
1Mon. 1/10 Introduction to Text Mining: The Big Picture 
2Wed. 1/12 Course Overview: Roadmap and ExpectationsCMS Ch. 1
3Mon. 1/17MLK Day (No Class)  
4Wed. 1/19 Introduction To Ad-hoc Retrieval ICMS Ch. 2, CMS 7.0-7.1
5Mon. 1/24HW1 OutIntroduction To Ad-hoc Retrieval II 
6Wed. 1/26 Indexing and Query ProcessingCMS Ch. 5.0-5.3
7Mon. 1/31 Statistical Properties of Text ICMS Ch. 4.0-4.2
8Wed. 2/2 Statistical Properties of Text II 
9Mon. 2/7HW1 DueText RepresentationCMS Ch. 4.3-4.7, MRS Ch. 2
10Wed. 2/9HW2 OutVector Space Model ICMS Ch. 7.0-7.1.2
11Mon. 2/14 Vector Space Model II 
12Wed. 2/16Literature Review Proposal DueQuery Likelihood Model ICMS Ch. 7.3, CMS 4.5
13Mon. 2/21 Query Likelihood Model II 
14Wed. 2/23HW2 DueClass Cancelled 
15Mon. 2/28 Document Priors 
16Wed. 3/2 Evaluation OverviewCMS Ch. 8
21Mon. 3/7Midterm ReviewMidterm Review 
22Wed. 3/9MidtermMidterm 
23Mon. 3/14Spring Break (No Class)  
24Wed. 3/16Spring Break (No Class)  
25Mon. 3/21 Test Collection EvaluationSanderson '10 (pages 248-298), Hersh et al., '00, Turpin & Hersh '01
26Wed. 3/23 Evaluation Metrics ISanderson '10 (pages 308-350)
27Mon. 3/28HW3 OutEvaluation Metrics II 
28Wed. 3/30 Experimentaion ISmucker et al., '07, Cross-Validation, Parameter Tunning and Overfitting
29Mon. 4/4 Experimentaion II 
30Wed. 4/6 A/B Testing IDmitriev et al., '17, Video Tutorial (Kohavi et al. '17)
31Mon. 4/11HW3 DueClass Cancelled 
32Wed. 4/13 A/B Testing II 
33Mon. 4/18 Interactive Information RetrievalArguello & Crescenzi '19
34Wed. 4/20 Literature Review Presentations I 
35Mon. 4/25 Literature Review Presentations II 
36Wed. 4/27 Literature Review Presentations III 
37Fri. 4/29Literature Review Due  
38Fri. 5/6Final Exam