Resources

Applications of Natural Language Processing
INLS 512_001, Spring, 2008

This page contains links to useful or helpful resources, both literature and tools. I've put a few on to start, but this is a cooperative effort. As you come across interesting or useful Web sites, please email the url and a brief description to me or to the class list. I will add them to the page.
PRIMARILY LITERATURE

The Association for Computational Linguistics is one of the important organizations for those who work in NLP or CL. Look at its wiki for more resources. I've included direct links to some on this page, as well.

ACL Anthology contains almost all of the journals and conference proceedings from the Association for Computational Linguistics. This is a good place to find ideas for your research project summary, and references for your own project.

The ACM Digital Library is available through the UNC library (e-journals). It contains all the ACM journals, proceedings, etc.

The Survey of the State of the Art in Human Language Technology. Although this is a little dated, the survey provides a good overview of most NLP topics. In past years, I've used this as a textbook.

Allen, J. (1995). Natural Language Understanding. Redwood City, CA: Benjamin/Cummings Publishing Company. A good basic textbook for NLP. [SILS reserve, QA76.6 .A44 1995]

Dale, R., Moisl, H. & Somers, H. (2000). Handbook of Natural Langauge Processing. New York: Marcel Dekker. An outstanding encyclopedia of linguistic phenomena, NLP tools, and applications. If it weren't so expensive, I'd have used this for our textbook. As it is, we'll be reading a few chapters from it. [e-book, SILS reserve QA76.9 .N38 H363 2000]

Darling, C. Guide to Grammar and Writing is a very helpful collection of articles, examples, quizzes, etc. on basic English grammar and usage.

Friedl, J. (1997). Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools Cambridge: Sebastoppol O'Reilly & Associates. Available as e-book through UNC library.

Huddleston, R. & Pullum, G. (2002). The Cambridge Grammar of the English Language. Cambridge University Press. A good descriptive grammar of English, similar to Quirk et al. [Davis Reference, PE1106 .H74 2002]

Jurefsky, D. & Martin, J. (2002). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. [Davis P98 .J87 2000] Also see associated links at http://www.cs.colorado.edu/~martin/SLP/slp-web-resources.html

Manning. C & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. [SILS P98.5.S83 M36 1999] All you ever needed to know about statistical approaches.

Mitkov, R. (ed.) (2003). The Oxford Handbook of Computational Linguistics. Oxford University Press. [missing from library, but I have a copy, P98 .O95 2003] Another very good collection of reference/overview articles on CL topics and techniques. We'll be reading some chapters from this, too.

Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman [Davis, PE1106 .C65 1985] The precurser to Huddleston, another very good descriptive grammar of English.

Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove: Brooks/Cole. [Q387 .S68 2000] A very interesting book on all kinds of representational issues.

Lists and Blogs
Language Log. Observations on language usage by some of my favorite linguists.

Corpora list discusses working with corpora: tools, algorithms, collections, etc. The associated information page links to the archives, as well as several other useful pages.

PRIMARILY TOOLS OR OTHER RESOURCES
Disclaimer: I have not tried all of these tools, and so make no claims as to their usability, effectiveness, etc.
Collections | Corpora and related tools
Dictionaries and Ontologies | Analysis Tools | Chatterbots

Collections

The Association for Computational Linguistics is one of the important organizations in NLP. The homepage has links to other resources as well.

The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.

NIST Information Access Division sponsors the TREC conferences, among others. This site includes conference requirements and proceedings.

"Colibri is an electronic newsletter and WWW service aimed at people interested in the fields of natural language processing, speech processing and/or logic." (http://colibri.let.uu.nl/INFO). Contains links to software, articles, dictionaries, etc.

The Natural Language Software Registry is a good directory of NLP software.

Text Analysis Info Page focuses primarily on content analysis software, although there are other types also listed.

Dan Malamed's NLP Research Software Library

Software and demos from the University of Zurich's Institute for Computational Linguistics.

GATE Demos from the Sheffield NLP group.

Word dependency and similarity demos from Dekang Lin.

Proxem Resources for NLP. Links to tools, articles, collections, projects.

Corpora and related tools

The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.

The Text Encoding Initiative (TEI) develops and provides markup standards for many kinds of texts for literary and linguistic uses.

Reuters Corpus statistics.

Michigan Corpus of Spoken Academic English.

Phrases in English is a database of phrases drawn from the British National Corpus.

British National Corpus can be searched online.

Sense Tagged Text in Senseval format from Ted Pedersen.

Linguists Search Engine, allows one to do "searches involving syntactic structure, non-contiguous constructions, and the like."

Przemek Kaszbski's home page has has a link to search the PICLE corpus, and also a bibliography.

Unitex, a multilingual corpus processing system, available for download.

A corpus of Enron email messages is available to researchers.

Dictionaries and Ontologies

You can look up words in The Oxford English Dictionary online through the UNC Library.

The Concordance and Collocation Sampler lets you search the Collins WordbanksOnline English.

WordNet is a freely available dictionary that is used in many NLP applications.

EuroWordNet was a multilingual version of WordNet.

"MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6."

The Unified Medical Language System (UMLS) is a product of the NLM that combines many controlled vocabularies into a single thesaurus.

CyCorp. Cyc started as an early attempt to record all the world knowledge that an NLP system would need. Now we'd call it an ontology. The upper levels are freely available, but Cyc i now a commercial product.

Universal Biological Indexer and Organizer (uBio) has tools for looking up and finding names of biological species.

Analysis Tools

TextQuest software does vocabulary, readability, content, and style analysis.

The Brill taggers are available from his homepage.

Kolokacje a web crawler and collocation finder.

AntConc software that includes concordance, collocation, and keyword tools.

Collocation and coocurance analysis software.

Collocate, by Michael Barlow. Demo version can be downloaded.

TAPoR text analysis tools.

GATE, General Architecture for Text Engineering, from the University of Sheffield.

LingPipe Java libraries for NLP.

Natural Language Toolkit is a collection of tools that requires Python.

Visual Interactive Syntax Learning (VISL) has tool for sentence-level analysis (AKA parsing).

Swesum Automatic Text Summarizer demo.

Open Text Summarizer, available for download.

Senseclusters "takes a user through the entire process of unsupervised learning of word senses."

Ngram Statistics Package from Ted Pedersen

Stuttgart Finite State Transducer Tools by Helmut Schmidt. Can be downloaded, includes morphological analyser.

SVMTool, a POS tagger based on Support Vector Machines.

Weka, a data mining tool that can be used for text mining.

TextState - Simple Text Analysis Tool is concordance software.

The Preposition Project "is designed to provide a comprehensive characterization of English preposition senses suitable for use in natural language processing."

Machine Translation

Open Translation Engine

Chatterbots

Loebner Prize Competition. Home page of annual "conversation program" competition, based on the Turing Test. Has links to rules, results, and transcripts.

BotSpot An annotated list of chatterbots.

The Simon Laven Page A site devoted to chatterbots: links, discussions, etc.


This page was last modified on January 4, 2008, by Stephanie W. Haas. Address questions and comments about this page to Stephanie W. Haas: stephani at ils dot unc dot edu © 2001, 2004, 2006, 2008 Stephanie W. Haas