INLS 613: Text Mining

Objective: Gain experience with both the theoretical and practical aspects of text mining. Learn how to build and evaluate computer programs that generate new knowledge from natural language text.
Description: Changes in technology and publishing practices have eased the task of recording and sharing textual information electronically. This increased quantity of information has spurred the development of a new field called text mining. The overarching goal of this new field is to use computers to automatically learn new things from textual data.

The course is divided into three modules: basics, principles, and applications (see details below). The third part of the course will focus on several applications of text mining: methods for automatically organizing textual documents for sense-making and navigation (clustering and classification), methods for detecting opinion and bias, methods for detecting and resolving specific entities in text (information extraction and resolution), and methods for learning new relations between entities (relation extraction). Throughout the course, a strong emphasis will be placed on evaluation. Students will develop a deep understanding of one particular method through a course project.

Prerequisites: Students should have a reasonable background in programming in a structured or object oriented programming language, such as Java or C++. "Reasonable" means either coursework or equivalent practical experience. You should be able to design, implement, debug and test small to medium sized programs. If you would like to take this course, but do not know if you meet these pre-requisites, please send me an email.
Time & Location: M,W 10:10-11:25, Manning 01
Instructor: Jaime Arguello (email, web)
Office Hours: By Appointment, Manning 10 (Garden Level)
Required Textbook: Data Mining: Practical Machine Learning Tools and Techniques (Fourth Edition) Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2017. Morgan Kaufman. ISBN 978-0128042915. Available online
Additional Resources: Foundations of Statistical Natural Language Processing. C. Manning and H Schutze. 1999.

Introduction to Information Retrieval. C. Manning, P. Raghavan and H. Schutze. 2008.
Course Policies: Laptops, Attendance, Participation, Collaboration, Plagiarism & Cheating, Late Policy
Grading: 10% Class participation
20% Midterm Exam
30% Homework (10% each)
40% Final project (5% project proposal, 25% project report, 10% project presentation)
Grade Assignments: Undergraduate grading scale: A+ 97-100%, A 94-96%, A- 90-93%, B+ 87-89%, B 84-86, B- 80-83%, C+ 77-79%, C 74-76%, C- 70-73%, D+ 67-69%, D 64-66%, D- 60-63%, F 0-59%

Graduate grading scale: H 95-100%, P 80-94%, L 60-79%, and F 0-59%.
Topics: Subject to change! Readings from the required textbook (Witten, Frank, Hall, and Pal) is marked with a WFHP bellow.
Lecture Date Events Topic Reading Due
1 Wed. 1/8   Introduction to Text Mining: The Big Picture  
2 Mon. 1/13   Course Overview: Roadmap and Expectations WFH Ch. 1, Mitchell '06, Hearst '99
3 Wed. 1/15 HW1 Out Predictive Analysis: Concepts, Features, and Instances I WFH Ch. 2, Dominigos '12
4 Mon. 1/20 MLK Day (No class)    
5 Wed. 1/22   Predictive Analysis: Concepts, Features, and Instances II  
6 Mon. 1/27   Text Representation I  
7 Wed. 1/29 HW1 Due Text Representation II  
8 Mon. 2/3   Machine Learning Algorithms: Naïve Bayes WFH Ch. 4.2, Mitchell Sections 1 and 2
9 Wed. 2/5   Machine Learning Algorithms: Linear Classifiers I WFH 3.2 and 4.6
10 Mon. 2/10 HW2 Out LighSIDE Tutorial LightSIDE User's Manual
11 Wed. 2/12   Machine Learning Algorithms: Linear Classifiers II  
12 Mon. 2/17 Final Project Proposal Due Machine Learning Algorithms: Instance-based Classification I WFH Ch. 4.7
13Wed. 2/19 Predictive Analysis: Experimentation and Evaluation IWFH Ch. 5
14Mon. 2/24HW2 DueWeka TutorialWFH Appendix B
15Wed. 2/26 Predictive Analysis: Experimentation and Evaluation IISmucker et al., '07, Cross-Validation, Parameter Tunning and Overfitting
16Mon. 3/2 Exploratory Analysis: Clustering IManning Ch. 16
17Wed. 3/4 Exploratory Analysis: Clustering II 
18Mon. 3/9Spring Break (No class)  
19Wed. 3/11Spring Break (No class)  
20Mon. 3/16Spring Break II (No class)  
21Wed. 3/18Spring Break II (No class)  
22Mon. 3/23Midterm ReviewMidterm Review 
23Wed. 3/25MidtermMidterm 
24Mon. 3/30HW3 OutProject Report Status Meetings 
25Wed. 4/1 Sentiment AnalysisPang and Lee, '08 (skip Section 5 and only skim Section 6), Pang and Lee, '02
26Mon. 4/6 Discourse AnalysisArguello '15
27Wed. 4/8 Detecting Viewpoint and PersepectiveYano et al., '10, Weibe '10
28Mon. 4/13HW3 DueText-based ForecastingO'connor et al., '10, Lerman et al., '08
29Wed. 4/15 TBD 
30Mon. 4/20 Student Presentations 
31Wed. 4/22 Student Presentations 
32Fri. 5/1Final Project Due 11:55pm