The Association for Computational Linguistics is one of the important organizations for those who work in NLP or CL. Look at its wiki for more resources. I've included direct links to some on this page, as well.
ACL Anthology contains almost all of the journals and conference proceedings from the Association for Computational Linguistics. This is a good place to find ideas for your research project summary, and references for your own project.
The ACM Digital Library is available through the UNC library (e-journals). It contains all the ACM journals, proceedings, etc.
The Survey of the State of the Art in Human Language Technology. Although this is a little dated, the survey provides a good overview of most NLP topics. In past years, I've used this as a textbook.
Allen, J. (1995). Natural Language Understanding. Redwood City, CA: Benjamin/Cummings Publishing Company. A good basic textbook for NLP. [SILS reserve, QA76.6 .A44 1995]
Dale, R., Moisl, H. & Somers, H. (2000). Handbook of Natural Langauge Processing. New York: Marcel Dekker. An outstanding encyclopedia of linguistic phenomena, NLP tools, and applications. If it weren't so expensive, I'd have used this for our textbook. As it is, we'll be reading a few chapters from it. [e-book, SILS reserve QA76.9 .N38 H363 2000]
Darling, C. Guide to Grammar and Writing is a very helpful collection of articles, examples, quizzes, etc. on basic English grammar and usage.
Friedl, J. (1997). Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools Cambridge: Sebastoppol O'Reilly & Associates. Available as e-book through UNC library.
Huddleston, R. & Pullum, G. (2002). The Cambridge Grammar of the English Language. Cambridge University Press. A good descriptive grammar of English, similar to Quirk et al. [Davis Reference, PE1106 .H74 2002]
Jurefsky, D. & Martin, J. (2002). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. [Davis P98 .J87 2000] Also see associated links at http://www.cs.colorado.edu/~martin/SLP/slp-web-resources.html
Manning. C & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. [SILS P98.5.S83 M36 1999] All you ever needed to know about statistical approaches.
Mitkov, R. (ed.) (2003). The Oxford Handbook of Computational Linguistics. Oxford University Press. [missing from library, but I have a copy, P98 .O95 2003] Another very good collection of reference/overview articles on CL topics and techniques. We'll be reading some chapters from this, too.
Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. London: Longman [Davis, PE1106 .C65 1985] The precurser to Huddleston, another very good descriptive grammar of English.
Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove: Brooks/Cole. [Q387 .S68 2000] A very interesting book on all kinds of representational issues.
Lists and Blogs
Language Log.
Observations on language usage by some of my favorite linguists.
Corpora list discusses working with corpora: tools, algorithms, collections, etc. The associated information page links to the archives, as well as several other useful pages.
The Association for Computational Linguistics is one of the important organizations in NLP. The homepage has links to other resources as well.
The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.
NIST Information Access Division sponsors the TREC conferences, among others. This site includes conference requirements and proceedings.
"Colibri is an electronic newsletter and WWW service aimed at people interested in the fields of natural language processing, speech processing and/or logic." (http://colibri.let.uu.nl/INFO). Contains links to software, articles, dictionaries, etc.
The Natural Language Software Registry is a good directory of NLP software.
Text Analysis Info Page focuses primarily on content analysis software, although there are other types also listed.
Dan Malamed's NLP Research Software Library
Software and demos from the University of Zurich's Institute for Computational Linguistics.
GATE Demos from the Sheffield NLP group.
Word dependency and similarity demos from Dekang Lin.
Proxem Resources for NLP. Links to tools, articles, collections, projects.
The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.
The Text Encoding Initiative (TEI) develops and provides markup standards for many kinds of texts for literary and linguistic uses.
Reuters Corpus statistics.
Michigan Corpus of Spoken Academic English.
Phrases in English is a database of phrases drawn from the British National Corpus.
British National Corpus can be searched online.
Sense Tagged Text in Senseval format from Ted Pedersen.
Linguists Search Engine, allows one to do "searches involving syntactic structure, non-contiguous constructions, and the like."
Przemek Kaszbski's home page has has a link to search the PICLE corpus, and also a bibliography.
Unitex, a multilingual corpus processing system, available for download.
A corpus of Enron email messages is available to researchers.
You can look up words in The Oxford English Dictionary online through the UNC Library.
The Concordance and Collocation Sampler lets you search the Collins WordbanksOnline English.
WordNet is a freely available dictionary that is used in many NLP applications.
EuroWordNet was a multilingual version of WordNet.
"MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6."
The Unified Medical Language System (UMLS) is a product of the NLM that combines many controlled vocabularies into a single thesaurus.
CyCorp. Cyc started as an early attempt to record all the world knowledge that an NLP system would need. Now we'd call it an ontology. The upper levels are freely available, but Cyc i now a commercial product.
Universal Biological Indexer and Organizer (uBio) has tools for looking up and finding names of biological species.
Analysis Tools
TextQuest software does
vocabulary, readability, content, and style analysis.
The Brill taggers are
available from his homepage.
Kolokacje a web crawler and collocation finder.
AntConc software that includes concordance, collocation, and keyword
tools.
Collocation and coocurance
analysis software.
Collocate, by Michael
Barlow. Demo version can be downloaded.
TAPoR text analysis
tools.
GATE, General Architecture for Text
Engineering, from the University of Sheffield.
LingPipe Java libraries for
NLP.
Natural Language Toolkit is a
collection of tools that requires Python.
Visual Interactive Syntax
Learning (VISL) has tool for sentence-level analysis (AKA parsing).
Swesum Automatic Text
Summarizer demo.
Open Text Summarizer,
available for download.
Senseclusters
"takes a user through the entire process of unsupervised
learning of word senses."
Ngram Statistics
Package from Ted Pedersen
Stuttgart Finite State Transducer Tools by Helmut Schmidt. Can be
downloaded, includes morphological analyser.
SVMTool, a POS tagger
based on Support Vector Machines.
Weka, a data mining
tool that can be used for text mining.
TextState - Simple Text Analysis Tool is concordance software.
The Preposition Project "is designed to provide a comprehensive
characterization of English preposition senses suitable for use in natural
language processing."
Machine Translation
Loebner Prize Competition. Home page of annual "conversation
program" competition, based on the Turing Test. Has links to rules,
results, and transcripts.
BotSpot
An annotated list of chatterbots.
The Simon
Laven Page A site devoted to chatterbots: links, discussions, etc.
This page was last
modified on January 4, 2008, by Stephanie W. Haas.
Address questions and comments about
this page to Stephanie W. Haas:
stephani at ils dot unc dot edu
© 2001, 2004, 2006, 2008 Stephanie W. Haas