Schedule
Applications of Natural Language Processing
INLS 512_001, Spring 2008
Notes:
This tentative schedule
lists the assignments and readings for
each of the topics we'll cover.
I will add readings as topics are identified.
Each reading is labeled with its location: see
Reading Locations for more information
on where to find them.
There is no textbook for this class,
but I have
put some useful books on reserve in the SILS library
(see the
Resources page).
Other readings or assignments may be assigned
as appropriate.
Thursday, 1/10/08: Introductions, business.
Tuesday, 1/15/08: What is NLP? Where is NLP?
Thursday, 1/17/08: What do we know about NLP? What would we like
to know?
- Jurafsky, D. & Martin, J. (2000).
Ch. 1, Introduction. In
Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall.
1-18.
- Manning, C. & Schutze, H. (1999).
Ch.
1,
Introduction.
In Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press. 3-36.
- For a quick introduction or review of basic linguistics,
see Manning, C. & Schutze, H. (1999) Ch. 3, Linguistic
Essentials. In
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press, 81-115.
[SILS reserve, P98.5.S83 M36 1999]
- Slides: NLP Basics
Tuesday, 1/22/08: What we know and would like to know, cont.
Thursday, 1/24/08: No class today
Tuesday, 1/29/08: Topic 1: Extraction
- Cowie, J. & Wilks, Y. (2000). Information Extraction. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 10, 241-260. [e-book]
- Grishman, R. (2003). Information Extraction. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 30, 545-559.
[
e-reserves]
-
Introduction to Information Extraction, from
the National Institute of Standards and Technology (NIST), which
runs MUC, TREC, DUC, and other such conferences.
Be sure to look at the examples. [web]
-
Automatic
Content Extraction (ACE). Gives overview of ACE, and you can look
at earlier conferences' tasks.
-
ACE08 Evaluation Plan, especially sections 1 & 2, the
task description.
- You can also look at
ACE Overview for a general introduction to the ACE tasks.
- Ahn, D. (2006).
The stages of event extraction. Proceedings of the
Workshop on Annotating and Reasoning about Time and Events, 1-8.
This is an example of an ACE application. [ACL]
-
Slides: Information Extraction
Thursday, 1/31/08 Extraction, cont.
Tuesday, 2/5/08: Extraction, cont.
Thursday, 2/7/08/08: Grammars and Rules
- Jurafsky, D. & Martin, J. (2000).
Ch. 2, Regular Expressions and Automata. In
Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall.
21-56. [
e-reserves]
- If needed, review Manning, C. & Schutze, H. (1999) Ch. 3, Linguistic
Essentials. In
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press, 81-115.
[SILS reserve, P98.5.S83 M36 1999]
- Slides: Grammars etc.
Tuesday, 2/12/08: Grammars and Rules, cont.
Thursday, 2/14/08: Grammars and Rules, cont.
- Mitkov, R. (2003). Anaphora Resolution. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 14, 266-283.
[
e-reserves]
Tuesday, 2/19/08: In-class exercise part 1: Semantic analysis
Thursday, 2/21/08: In-class exercise part 2: Generation
Tuesday, 2/26/08: Forget the rules – it's all statistics!
- Manning, C. & Schutze, H. (1999) Ch. 2.1 Mathematical
Foundations: Elementary
Probability
Theory,
In
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press, 39-59.
[
e-reserve]
[SILS reserve, P98.5.S83 M36 1999]
- Bod, R. (2003).
Ch. 2.1-2.3, Introduction to Elementary Probability Theory
and Formal Stochastic Language Theory. In Bod, R.,
Hay, J. & Jannedy, S. (eds.)
Probabilistic Linguistics. Cambridge, MA: MIT Press.
11-18. [
e-reserves]
- For another take on features and language models, I also recommend
Berger, A., Della Pietra, S., Della Pietra, V. (1996).
A maximum entropy approach to natural language processing.
Computational Linguistics< 22(1), 39-71, especially
sections 1-3.
- Slides: statistical
approaches
Thursday, 2/28/08: Statistical approaches, cont.
- Mooney, R. (2003). Machine Learning.
In Mitkov, R. (ed.)The Oxford Handbook of Computational
Linguistics.
New York: Oxford University Press. 376-394.
[
e-reserves] [SILS reserve P98 .095 2003]
-
Paper or project proposals due.
Tuesday, 3/4/08: Statistical approaches, cont.
- Pan, B., Lee, L & Vaithyanathan, S. (2002).
Thumbs up? Sentiment
classification using machine learning techniques. Proceedings of the
Workshop on Empirical Methods in Natural Language Processing (EMNLP),
79-86.
- Glickman, O., Shnarch, E., Daga, I. (2006).
Lexical reference:
A semantic matching subtask.
Proceedings of the 2006 Conference on Empirical Methods in Natural
Language
Processing, 172-179.
- Glickman et al. refer to Lin, D. (1998).
Automatic retrieval and clustering of similar words.
Proceedings of COLING, 768-774. You may want to
look at it to learn more about the methods.
Thursday, 3/6/08: Tools
- Review Manning, C. & Schutze, H. (1999).
Ch.
1.4, Dirty Hands.
In Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press. 19-34.
- Please read the GATE
Overview
- GATE demo.
- We'll be doing an
in-class exercise
on Tuesday,
3/18/08.
In preparation, please
load GATE on your laptop so you can bring it to class
on Tuesday.
- Topic 3,
Machine Translation outline due.
Tuesday, 3/11/08 & Thursday 3/13/08: Spring Break, no class
Tuesday, 3/18/08: Tools, cont.
Thursday, 3/20/08: Topic 2: Word Sense Disambiguation (WSD)
- Navigli, R., Litkowski, K. & Hargraves, O. (2007).
SemEval-2007 Task 07: Coarse-Grained English All-Words Task.
Proceedings of the 4th International Workshop on Semantic Evaluations
(SemEval-2007), 30-35. [ACL Anthology]
- SemEval-2007 Task 06: Word-Sense Disambiguation of Prepositions.
Proceedings of the 4th
International Workshop on Semantic
Evaluations (SemEval-2007), 24-29. [ACL Anthology]
- If you want more background on WSD, I recommend
Ide, N. & Veronis, J. (1998).
Introduction to the Special Issue on Word Sense Disambiguation:
The State of the Art.
Computational
Linguistics,
24(1), 1-40. [ACL Anthology]
- Slides: Word Sense Disambiguation
Tuesday, 3/25/08: Topic 3: Machine Translation
- Topic 3,
Machine Translation,
presentation
- Somers, H. (2000). Machine Translation. Ch. 13, 329-346.
In Dale, R.,
Moisl, H. & Somers, H. (2000). Handbook of Natural Language Processing.
New York: Marcel Dekker. [e-book, SILS reserve QA76.9 .N38 H363 2000]
Link to catalog record
- Also read one of the following:
- D. Vickrey, L. Biewald, M. Teyssier, and D. Koller. Word-Sense
Disambiguation for Machine Translation.
Human Language Technology
Conference Conference on Empirical Methods in Natural Language Processing,
6-8 October 2005; Vancouver, Canada.
Available online
Description: In word sense disambiguation, a system attempts to determine
the sense of a word from contextual features. Major barriers to building a
high-performing word sense disambiguation system include the difficulty of
labeling data for this task and of predicting fine-grained sense
distinctions. These issues stem partly from the fact that the task is
being treated in isolation from possible uses of automatically
disambiguated data. In this paper, we consider the related task of word
translation, where we wish to determine the correct translation of a word
from context. [from the abstract]
- Y Chen, C Zong. A Structure-Based Model for Chinese Organization Name
Translation. ACM Transactions on Asian Language Information Processing
(TALIP). 2008;7(1). Available
online:
Available online.
Description: Named entity (NE) translation is a fundamental task in
multilingual natural language processing. The performance of a machine
translation system depends heavily on precise translation of the inclusive
NEs. Furthermore, organization name (ON) is the most complex NE for
translation among all the NEs. In this article, the structure formulation
of ONs is investigated and a hierarchical structure-based ON translation
model for Chinese-to-English translation system is presented. [from the
abstract]
- W Lam, S Chan, R Huang. Named Entity Translation Matching and
Learning: With Application for Mining Unseen Translations. ACM
Transactions on Information Systems (TOIS). 2007;25(1).
Available online
Description: This article introduces a named entity matching model that
makes use of both semantic and phonetic evidence. The matching of semantic
and phonetic information is captured by a unified framework via a
bipartite graph model. By considering various technical challenges of the
problem, including order insensitivity and partial matching, this approach
is less rigid than existing approaches and highly robust. [from the
abstract]
-
Slides: Machine Translation
WARNING! Do not look at the exercise in advance!
- Topic 5,
Question Answering, outline due.
Thursday, 3/27/08: class canceled
Topic 3: Machine Translation, cont.
Tuesday, 4/1/08: Topic 4: Figurative Language
- Topic 4,
Figurative Language,
presentation
- Gedigan, M., Bryant, J., Narayanan, S., & Ciric, B. (2006). Catching
Metaphors. Proceedings of the Third Workshop on Scalable Natural Language
Understanding, New York, NY, 41-48. [ACL Anthology]
- Lakoff, G. (1992). The Contemporary Theory of Metaphor. In A. Ortony
(Ed.),
Metaphor and Thought (pp. 202-251). Cambridge, UK: Cambridge University
Press. [see Laura's email]
(We recommend the reading from page 1-13 and stop before the Duality
heading; unless you feel inspired to read the whole thing.)
- Slides: Figurative
Language
- Schedule poster presentations:
Thursday, 4/17/08 or Tuesday 4/22/08.
- Tools assignment due.
Thursday, 4/3/08: More MT (SWH)
Tuesday, 4/8/08: Topic 5, Question Answering
Thursday, 4/10/08: Generation (SWH)
Tuesday, 4/15/08: Genre and Sublanguage (SWH)
- Friedman, C., Kra, P. & Rzhetsky, A. (2002).
Two biomedical sublanguages: a description based on
the theories of Zellig Harris.
Journal of Biomedical Informatics, 35(4), 222-235.
[ejournl]
This comes from a special issue on sublanguage.
- Travers, D. & Haas, S. (2003).
Using nurses'
natural language entries
to build a concept-oriented terminology for patients' chief complaints
in the emergency department.
Journal of Biomedical Informatics, 36(4-5), 260-270. [ejournal]
- Swales, J. (1990). Genre Analysis. Cambridge University Press.
Ch. 3,
the concept of genre, 33-67.
[
e-reserves]
- Slides: Sublanguage and
genre
Thursday, 4/17/08: Poster Session 1
Tuesday, 4/22/08: Poster Session 2
- Poster presentations:
Presenters: Laura Christopherson,
Maureen Dostert, Lauren Kearns, Shayne Muelling
Thursday, 4/24/08: Wrap up
Thursday, 5/1/08, 4:00 p.m.:
Final papers due.
Return to the top of the
page.
This page was last modified on January 4, 2008
by Stephanie W. Haas.
Address questions and comments about this
page to Stephanie W. Haas:
stephani at ils dot unc dot edu
© Stephanie W. Haas, 2008 All
rights reserved.