April 3, 2008

Uplug software provides a set of efficient tools to process linguistic corpus including word alignment and term extraction from parallel corpora.

Version 0.2.0c

License GPL

Platform Linux

Supported Languages English

Homepage sourceforge.net

Developed by Joerg Tiedemann

Uplug is a fantastic software tool for anyone looking for linguistic corpus processing, word alignment, and term extraction from parallel corpora. With many useful tools integrated into this software package, users can easily preprocess their data with a sentence splitter, tokenizer, and external part-of-speech tagger, as well as external shallow parsers.

The Grok system, which is used for English tagging and chunking, and the morphological analyzer ChaSen, which is used for Japanese, are just a few examples of the external tools that can be found in Uplug. Users can easily add additional tools such as the popular TreeTagger.

The software also allows for sentence alignment using a length-based approach. Furthermore, words and phrases can be aligned using the clue alignment approach and training statistical alignment models with GIZA++.

This latest release of Uplug has made several important improvements to the software. For example, the software now features robust conversion of encodings in tag.pl, toktag.pl and chunk.pl. There are also new treetagger startup scripts for Spanish and Dutch, adding to the already available scripts for es and nl. The release also has an updated startup script for other treetagger models to correspond to the latest TreeTagger distribution.

Additionally, several other improvements have been made, such as fixing a bug in the conversion of alignment output to xml with hunalign and adding a missing semicolon at line 40 in Uplug.pm.

Overall, Uplug is an excellent software tool with powerful features for anyone who needs to extract linguistic data from parallel corpora.

What's New

Version 0.2.0c: N/A

Free Download 21.9M

Softpile

Free Downloads

Uplug

Most Popular

Related Downloads