Schedule
Applications of Natural Language Processing
INLS 512_001, Fall 2006
Notes:
This tentative schedule
lists the assignments and readings for
each of the topics we'll cover.
There is no textbook for this class.
Readings are mostly available online, but I have also
put some useful books on reserve in the SILS library, including a textbook
(see the
Resources page).
Each reading is marked with its location.
- [e-reserves] Online
reserves.
- [e-journal] Electronic journals are available through
the UNC-CH library
e-journal list.
- [ACM DL] Journals in the ACM Digital Library are available
through
the UNC-CH library
e-journal list, or
you can purchase your own subscription.
- [ACL] Articles published by the Association for Computational
Linguistics
may be reached through the
ACL Anthology.
- [e-book] Readings from electronic books are available through
netLibrary, or from a link
in their records in the UNC library catalogue.
- [web] Articles on the web.
- [SILS PAM] Photocopied articles are in the SILS library. PAM boxes
are on the far side of the low bookcase in front of the reference desk.
- [SILS book] Reserve books are behind the reference desk in the SILS
library.
Other readings or assignments may be assigned
as appropriate.
Thursday, 8/24/06: Introduction and overview.
Tuesday, 8/29/06: Introduction, cont.
Thursday, 8/31/06: Language, software, and corpora
- Fellbaum, C. (1998). Introduction. In
Fellbaum, C. (ed). (1998). WordNet: An Electronic Lexical Database
MIT Press, 1-19. [e-reserves] [SILS book,
SILS P325.5.D38 W67 1998]
- Miller, G. (1998). Nouns in WordNet. In
Fellbaum, C. (ed). (1998). WordNet: An Electronic Lexical Database
MIT Press, 23-46. [e-reserves] [SILS book,
SILS P325.5.D38 W67 1998]
- Demo:
WordNet
- Download and install
GATE. As the semester
goes on, you may wish/need to download other tools as well.
Tuesday, 9/5/06: Language, software, corpora, cont.
- Thompson, H. (2000). Corpus creation for data-intensive linguistics. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 16, 385-401. [e-book]
- Demo: Linguistic Data
Consortium, British National
Corpus, other corpora.
Thursday, 9/7/06: Tokenisation and sentence identification.
- Palmer, D. (2000). Tokenisation and sentence segmentation. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 2, 11-35. [e-book]
- Mikheev, A. (2003). Text segmentation. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 10, 201-218. [e-reserves] [SILS book, P98 .O95 2003]
Tuesday, 9/12/06: Tokenisation and sentence identification, cont.
Thursday 9/14/06: Part of speech tagging.
- Sproat, R. (2000). Lexical Analysis. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 3, 37-57. [e-book]
read section 1, skim the first part of section 3, skim 4, read 5,
skim 7 & 8
- Voutilainen, A. (2003). Part-of-Speech Tagging. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 11, 219-232. [e-reserves] [PAM] [SILS book, P98 .O95 2003]
recommended
- Brill, E. (1992).
A simple
rule-based part of speech tagger. Proceedings
of the Third Conference on Applied Natural Language Processing, 152-155.
highly recommended
- POS slides
- A list of the
Penn Treebank Word Level Tags
- Exercise 1 due.
Tuesday, 9/19/06: POS tagging, cont.
Thursday, 9/21/06: Happy 75th Birthday to SILS!
No class today.
Please attend the
Student Day events.
Tuesday, 9/26/06: Sentence structure, grammars, and parsers.
Thursday, 9/28/06: Grammars & parsers, cont.
Tuesday, 10/3/06: Grammars and Parsing, cont.
Thursday, 10/5/06: Information Extraction
- Cowie, J. & Wilks, Y. (2000). Information Extraction. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 10, 241-260. [e-book]
- Grishman, R. (2003). Information Extraction. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 30, 545-559. [e-reserves] [SILS book, P98 .O95 2003]
-
Introduction to Information Extraction, from
the National Institute of Standards and Technology (NIST), which
runs MUC, TREC, DUC, and other such conferences. [web]
-
What is Information Extraction?, including the examples. [web]
-
MUC-6. Skim the rules and look at the
scores to see what performance was like in 1995. [web] skim
- Chincor, N. (1996).
Overview of MUC-7 gives a brief history of MUC and description
of the tasks. [web] recommended
- Exercise 3 due.
- Discuss
final projects or papers.
Tuesday, 10/10/06: Information extraction, cont.
- Mitkov, R. (2003). Anaphora Resolution. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 14, 266-283. [e-reserves] [SILS book, P98 .O95 2003] skim
-
The ACE 2006 (ACE06) Evaluation Plan describes the tasks of the
current evaluation conference. [web] skim
- The
Automatic Content Extraction (ACE) homepage. [web] skim
- Ahn, D. (2006).
The stages of event extraction. Proceedings of the
Workshop on Annotating and Reasoning about Time and Events, 1-8.
This is an example of an ACE application. [ACL]
- Jacqeumin, C. & Bush, C. (2000).
Combining lexical and formatting cues
for named entity acquisition from the Web. Joint SIGDAT Workshop on
Empirical Methods in NLP and Very Large Corpora, 181-189. [ACL]
- Minkov, E., Wang, R. & Cohen, W. (2005).
Extracting personal names from
email: Applying named entity recognition to informal text.
Proceedings
of Human Language Technology Conference and Conference on Empirical
Methods in Natural Language Processing (HLT/EMNLP), 443-450. [ACL]
- Assign
GATE assignment, due Thursday, 11/3/06.
Thursday, 10/12/06: University Day, no class.
Tuesday, 10/17/06: Information extraction, cont.
Thursday, 10/19/06: Fall Break, no class.
Tuesday, 10/24/06: Summarization
- Mani, I. (2001). Automatic Summarization. Amsterdam: John Benjamins
Publishing Co.
- Chapter 1, Preliminaries, 1-25. [e-reserve]
- Chapter 7, Multi-document summarization, 169-220.
[e-reserve]
- Moens, M. (2000). Automatic Indexing and Abstracting of Document
Texts. Boston; Kluwer Academic Publishers.
- Ch. 7, Text structuring and categorization when summarizing legal
cases,
157-172. [e-reserve]
- Ch. 8, Clustering of paragraphs when summarizing legal cases, 173-190.
[e-reserve]
- Radev, D. et al. (2003).
Experiments in single and multi-document
summarization using MEAD. Proceedings of the Document Understanding
Conference Workshop on Summarization. [web]
- The
Document Understanding Conference (DUC) homepage. [web] skim
- Demo:
NewsInEssence,
MEAD
- MEAD demo results
- Exercise 4 due today.
Thursday, 10/26/06: Summarization, cont.
Tuesday, 10/31/06:
Sublanguage and genre
- Kittredge, R. (2003). Sublanguages and controlled languages. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 23, 430-447. [e-reserves] [SILS book, P98 .O95 2003]
- Travers, D. & Haas, S. (2003). Using nurses' natural language entries
to build a concept-oriented terminology for patients' chief complaints in
the emergency department. Journal of Biomedical Informatics, 36(4-5),
260-270. [e-journal]
- Haas, S. & Travers, D. (2004). Issues in the development of a
thesaurus for patients' chief complaints in the hospital emergency
department. Proceedings of the 2004 Annual Meeting of the American
Society for Information Science and Technology, 411-417. [e-reserves]
- Slides for this week.
- Exercise 5 due.
- Project proposals due.
Thursday, 11/2/06: Sublanguage and genre, cont.
Tuesday, 11/7/06: Sentiment identification
Thursday, 11/9/06: Machine translation.
- Somers, H. (2000). Machine translation. In
R. Dale, H. Moisl & H. Somers (eds.). Handbook of Natural Language
Processing, Ch. 13, 329-346. [e-book]
- Hutchins, J. (2003) Machine translation: General overview. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 27, 501-511. [e-reserves] [SILS book, P98 .O95 2003]
- Somers, H. (2003). Machine translation: Latest developments. In
R. Mitkov (ed.) The Oxford Handbook of Computational
Linguistics. Oxford University Press.
Ch. 28, 512-528. [e-reserves] [SILS book, P98 .O95 2003]
- Demo: back translation
- Slides for today.
Tuesday, 11/14/06: MT cont.: Cross-Language Information Retrieval
Thursday, 11/16/06: Generation
Tuesday, 11/21/06: Spoken Dialogue Systems and Chatterbots
Thursday, 11/23/06: Thanksgiving, no class.
Tuesday, 11/28/06: Wrap-up: Future applications of NLP. Course
evaluation.
Thursday, 11/30/06:
Poster session
- Presentations: Paul Greitzer, Tyler Kendall, Genya O'gara, Joshua
Vossler.
Tuesday, 12/5/06:
Poster session
- Presentations: Anna Craft, Philip Fulcher, Martina Gargard, Ann
Irvine.
Tuesday, 12/12/06, 12:00 noon:
Final papers due.
Return to the top of the
page.
This page was last modified on August 16, 2006,
by Stephanie W. Haas.
Address questions and comments about this
page to Stephanie W. Haas:
stephani at ils dot unc dot edu
© Stephanie W. Haas, 2006 All
rights reserved.