The Association for Computational Linguistics is one of the important organizations for those who work in NLP or CL. Look at its list of resources. I've included direct links to some on this page, as well.
ACL Anthology contains almost all of the journals and conference proceedings from the Association for Computational Linguistics. This is a good place to find ideas for your research project summary, and references for your own project.
The ACM Digital Library is available through the UNC library. It contains all the ACM journals, proceedings, etc.
The Survey of the State of the Art in Human Language Technology. Although this is a little dated, the survey provides a good overview of most NLP topics. In past years, I've used this as a textbook.
Allen, J. (1995). Natural Language Understanding. Redwood City, CA: Benjamin/Cummings Publishing Company. A good basic textbook for NLP. [SILS reserve, QA76.6 .A44 1995]
Dale, R., Moisl, H. & Somers, H. (2000). Handbook of Natural Langauge Processing. New York: Marcel Dekker. An outstanding encyclopedia of linguistic phenomena, NLP tools, and applications. If it weren't so expensive, I'd have used this for our textbook. As it is, we'll be reading a few chapters from it. [e-book, SILS reserve QA76.9 .N38 H363 2000]
Darling, C. Guide to Grammar and Writing is a very helpful collection of articles, examples, quizzes, etc. on basic English grammar and usage. We'll read some in class, but if you have any questions on language usage (including writing papers, avoiding plagiarism, etc.), this is a very good site.
Huddleston, R. & Pullum, G. (2002). The Cambridge Grammar of the English Language. Cambridge University Press. A good descriptive grammar of English, similar to Quirk et al. [Davis Reference, PE1106 .H74 2002]
Jaworsky, D. & Martin, J. (2002). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. [SILS reserve P98 .J87 2000] Also see associated links at http://www.cs.colorado.edu/~martin/SLP/slp-web-resources.html
Manning. C & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. [SILS reserve P98.5.S83 M36 1999] All you ever needed to know about statistical approaches.
Mitkov, R. (ed.) (2003). The Oxford Handbook of Computational Linguistics. Oxford University Press. [SILS reserve, P98 .O95 2003] Another very good collection of reference/overview articles on CL topics and techniques. We'll be reading some chapters from this, too.
Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985). A Comprehansive Grammar of the English Language. London: Longman [SILS reserve, PE1106 .C65 1985] The precurser to Huddleston, another very good descriptive grammar of English.
Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove: Brooks/Cole. [Q387 .S68 2000] A very interesting book on all kinds of representational issues.
The ACL Universe is a collection of links, to tools, projects, departments, etc. maintained by members of the Association for Computational Linguistics.
The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.
Natural Language Processing and AI is a long list of links to organizations, resources, people, and other goodies.
NIST Information Access Division sponsors the TREC conferences, among others. This site includes conference requirements and proceedings.
"Colibri is an electronic newsletter and WWW service aimed at people interested in the fields of natural language processing, speech processing and/or logic." (http://colibri.let.uu.nl/INFO). Contains links to software, articles, dictionaries, etc.
The Natural Language Software Registry is a good directory of NLP software.
Text Analysis Info Page focuses primarily on content analysis software, although there are other types also listed.
Dan Malamed's NLP Research Software Library
Software and demos from the University of Zurich's Institute for Computational Linguistics.
Demos from the Sheffield NLP group.
Dependency and similarity demos from Dekang Lin.
Statistical NLP and corpus based CL annotated list.
The Linguistic Data Consortium is one of the major providers of corpora. It's interesting to browse throught the catalogue of what's available.
Web Term Document Frequency Form lets you retrieve frequency of a term from a collection of web pages.
Reuters Corpus statistics.
Michigan Corpus of Spoken Academic English.
Phrases in English is a database of phrases drawn from the British National Corpus.
British National Corpus can be searched online.
Sense Tagged Text in Senseval format from Ted Pedersen.
Linguists Search Engine, allows one to do "searches involving syntactic structure, non-contiguous constructions, and the like."
Przemek Kaszbski's home page has has a link to search the PICLE corpus, and also a bibliography.
Unitex, a multilingual corpus processing system, available for download.
You can look up words in The Oxford English Dictionary online through the UNC Library.
Information about the Collins COBUILD Dictionary and the underlying Bank of English. You can query the Bank for concordances and collocations of a word.
WordNet is a freely available dictionary that is used in many NLP applications.
EuroWordNet was a multilingual version of WordNet.
"MultiWordNet is a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6."
The Unified Medical Language System (UMLS) is a product of the NLM that combines many controlled vocabularies into a single thesaurus.
CyCorp. Cyc started as an early attempt to record all the world knowledge that an NLP system would need. Now we'd call it an ontology. The upper levels are freely available, but Cyc i now a commercial product.
Analysis Tools
TextQuest software does
vocabulary, readability, content, and style analysis.
The Brill taggers are
available from his homepage.
Kolokacje a web crawler and collocation finder.
Collocation and coocurance
analysis software.
Collocate, by Michael
Barlow. Demo version can be downloaded.
Tapor text analysis
tools.
Natural Language Toolkit is a
collection of tools that requires Python.
Senseclusters
"takes a user through the entire process of unsupervised
learning of word senses."
Ngram Statistics
Package from Ted Pedersen
Stuttgart Finite State Transducer Tools by Helmut Schmidt. Can be
downloaded, includes morphological analyser.
SVMTool, a POS tagger
based on Support Vector Machines.
Machine Translation
Loebner Prize Competition. Home page of annual "conversation
program" competition, based on the Turing Test. Has links to rules,
results, and transcripts.
BotSpot
An annotated list of chatterbots.
The Simon
Laven Page A site devoted to chatterbots: links, discussions, etc.
This page was last
modified on August 11, 2004, by Stephanie W. Haas.
Address questions and comments about
this page to Stephanie W. Haas at
stephani@ils.unc.edu
© 2001, 2004 Stephanie W. Haas