The Natural Language Toolkit (NLTK) is a collection of Python-based software tools and libraries designed for processing and analyzing natural language data. It combines symbolic and statistical approaches to support a range of natural language processing tasks.
The real power of the NLTK, however, comes from its extensive documentation. There's plenty of material available from the nltk home page, including tutorials that provide insight into the underlying concepts behind the language processing tasks that the toolkit supports. These tutorials are ideal for anyone who wants to learn how to use the toolkit.
Additionally, the toolkit's reference documentation describes in detail every aspect of the toolkit, from its various modules, interfaces, classes, methods, functions, and variables. This documentation is useful to both users and developers, so that everyone has access to the information they need.
If you're interested in why the toolkit is designed the way it is, you can consult the technical reports that are available from the nltk home page. These reports explain and justify the toolkit's design and implementation, and are used by the developers of the toolkit to guide and document the toolkit's construction.
The latest release of the NLTK includes a range of exciting new features and improvements, such as expanded semantics packages for first order logic, linear logic, glue semantics, DRT, and LFG. There's also a new WordSense class in wordnet.synset that supports access to synsets from sense keys and accessing sense counts, as well as an interface to Mallet's linear chain CRF implementation.
Other improvements included in the release are better support for chunkers with a flexible chunk corpus reader and new rule type: ChunkRuleWithContext, as well as new GUIs for pos-tagged concordancing and developing regexp chunkers.
The toolkit also features new packages for ngram language modeling with Katz backoff and improved support for lazy sequences. Additionally, probability distributions now include a generate() method, and there's been further work done on the toolkit's API documentation, including fixes to docstrings.
Along with the NLTK release, there's also been work done on the Contrib package, which includes new NLG, dependency parser, Coreference, CCG Parser, and first-order resolution theorem prover packages.
Finally, data resources have also been improved, including the addition of the Nnw NPS Chat Corpus and corpus reader, an improvement to the ConllCorpusReader for reading CoNLL 2004 and 2005 corpora, and a HMM-based Treebank POS tagger and phrase chunker for nltk_contrib.coref in api.py.
If you're interested in learning more and want to get started with NLP using Python, the NLTK is definitely worth investigating.
Version 0.9.9 / 2.0 Beta 7: N/A