Resources

Applications of Natural Language Processing
INLS 170, Fall 2004

This page contains links to useful or helpful resources, both literature and tools. I've put a few on to start, but this is a cooperative effort. As you come across interesting or useful Web sites, please email the url and a brief description to me or to the class list. I will add them to the page.
PRIMARILY LITERATURE

The Association for Computational Linguistics is one of the important organizations for those who work in NLP or CL. Look at its list of resources. I've included direct links to some on this page, as well.

ACL Anthology contains almost all of the journals and conference proceedings from the Association for Computational Linguistics. This is a good place to find ideas for your research project summary, and references for your own project.

The ACM Digital Library is available through the UNC library. It contains all the ACM journals, proceedings, etc.

The Survey of the State of the Art in Human Language Technology. Although this is a little dated, the survey provides a good overview of most NLP topics. In past years, I've used this as a textbook.

Allen, J. (1995). Natural Language Understanding. Redwood City, CA: Benjamin/Cummings Publishing Company. A good basic textbook for NLP. [SILS reserve, QA76.6 .A44 1995]

Dale, R., Moisl, H. & Somers, H. (2000). Handbook of Natural Langauge Processing. New York: Marcel Dekker. An outstanding encyclopedia of linguistic phenomena, NLP tools, and applications. If it weren't so expensive, I'd have used this for our textbook. As it is, we'll be reading a few chapters from it. [e-book, SILS reserve QA76.9 .N38 H363 2000]

Darling, C. Guide to Grammar and Writing is a very helpful collection of articles, examples, quizzes, etc. on basic English grammar and usage. We'll read some in class, but if you have any questions on language usage (including writing papers, avoiding plagiarism, etc.), this is a very good site.

Huddleston, R. & Pullum, G. (2002). The Cambridge Grammar of the English Language. Cambridge University Press. A good descriptive grammar of English, similar to Quirk et al. [Davis Reference, PE1106 .H74 2002]

Jaworsky, D. & Martin, J. (2002). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. [SILS reserve P98 .J87 2000] Also see associated links at http://www.cs.colorado.edu/~martin/SLP/slp-web-resources.html

Manning. C & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. [SILS reserve P98.5.S83 M36 1999] All you ever needed to know about statistical approaches.

Mitkov, R. (ed.) (2003). The Oxford Handbook of Computational Linguistics. Oxford University Press. [SILS reserve, P98 .O95 2003] Another very good collection of reference/overview articles on CL topics and techniques. We'll be reading some chapters from this, too.

Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehansive Grammar of the English Language. London: Longman [SILS reserve, PE1106 .C65 1985] The precurser to Huddleston, another very good descriptive grammar of English.

Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove: Brooks/Cole. [Q387 .S68 2000] A very interesting book on all kinds of representational issues.

PRIMARILY TOOLS OR OTHER RESOURCES
Disclaimer: I have not tried all of these tools, and so make no claims as to their usability, effectiveness, etc.
Collections | Corpora and related tools
Dictionaries and Ontologies | Analysis Tools | Chatterbots

Collections

The ACL Universe is a collection of links, to tools, projects, departments, etc. maintained by members of the Association for Computational Linguistics.

The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.

Natural Language Processing and AI is a long list of links to organizations, resources, people, and other goodies.

NIST Information Access Division sponsors the TREC conferences, among others. This site includes conference requirements and proceedings.

"Colibri is an electronic newsletter and WWW service aimed at people interested in the fields of natural language processing, speech processing and/or logic." (http://colibri.let.uu.nl/INFO). Contains links to software, articles, dictionaries, etc.

The Natural Language Software Registry is a good directory of NLP software.

Text Analysis Info Page focuses primarily on content analysis software, although there are other types also listed.

Dan Malamed's NLP Research Software Library

Software and demos from the University of Zurich's Institute for Computational Linguistics.

Demos from the Sheffield NLP group.

Dependency and similarity demos from Dekang Lin.

Statistical NLP and corpus based CL annotated list.

Corpora and related tools

The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.

Web Term Document Frequency Form lets you retrieve frequency of a term from a collection of web pages.

Reuters Corpus statistics.

Michigan Corpus of Spoken Academic English.

Phrases in English is a database of phrases drawn from the British National Corpus.

British National Corpus can be searched online.

Sense Tagged Text in Senseval format from Ted Pedersen.

Linguists Search Engine, allows one to do "searches involving syntactic structure, non-contiguous constructions, and the like."

Przemek Kaszbski's home page has has a link to search the PICLE corpus, and also a bibliography.

Unitex, a multilingual corpus processing system, available for download.

Dictionaries and Ontologies

You can look up words in The Oxford English Dictionary online through the UNC Library.

Information about the Collins COBUILD Dictionary and the underlying Bank of English. You can query the Bank for concordances and collocations of a word.

WordNet is a freely available dictionary that is used in many NLP applications.

EuroWordNet was a multilingual version of WordNet.

"MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6."

The Unified Medical Language System (UMLS) is a product of the NLM that combines many controlled vocabularies into a single thesaurus.

CyCorp. Cyc started as an early attempt to record all the world knowledge that an NLP system would need. Now we'd call it an ontology. The upper levels are freely available, but Cyc i now a commercial product.

Analysis Tools

TextQuest software does vocabulary, readability, content, and style analysis.

The Brill taggers are available from his homepage.

Kolokacje a web crawler and collocation finder.

Collocation and coocurance analysis software.

Collocate, by Michael Barlow. Demo version can be downloaded.

Tapor text analysis tools.

Natural Language Toolkit is a collection of tools that requires Python.

Senseclusters "takes a user through the entire process of unsupervised learning of word senses."

Ngram Statistics Package from Ted Pedersen

Stuttgart Finite State Transducer Tools by Helmut Schmidt. Can be downloaded, includes morphological analyser.

SVMTool, a POS tagger based on Support Vector Machines.

Machine Translation

Open Translation Engine

Chatterbots

Loebner Prize Competition. Home page of annual "conversation program" competition, based on the Turing Test. Has links to rules, results, and transcripts.

BotSpot An annotated list of chatterbots.

The Simon Laven Page A site devoted to chatterbots: links, discussions, etc.


This page was last modified on August 11, 2004, by Stephanie W. Haas. Address questions and comments about this page to Stephanie W. Haas at stephani@ils.unc.edu
© 2001, 2004 Stephanie W. Haas