This software is a Part-of-Speech tagger designed for English natural language processing purposes. It provides accurate identification of grammatical elements in any given text, making it an essential tool for language analysis and text processing.
This software is a port of Perl's Lingua::EN::Tagger, which is a tagger based on probability and trained with the help of a corpus. Given a lookup dictionary and a set of probability values, the tagger assigns proper tags to English text. Conditional probabilities are utilized by the tagger, which scrutinizes the preceding tag to choose a suitable tag for a given word.
This tool is equipped with the facility to handle unknown words via word morphology or by setting them to be recognized as nouns or any other parts of speech as needed. It also actively extracts as many nouns and noun phrases as possible, using a set of regular expressions.
Overall, EngTagger is a robust software solution that simplifies PoS tagging for English text and offers the added benefit of extracting noun phrases that can be readily used in downstream NLP applications.
Version 0.1.1: N/A