Cognitive Technologies released a multi-language OCR system as open-source software. The system can recognize text in different languages using Optical Character Recognition (OCR) technology.
Cuneiform can auto-detect and build against ImageMagick++ on the system. This means that it can process any image format that ImageMagick supports. Otherwise, it can only read uncompressed BMP images. If you want to run Cuneiform without installing it on your system, you need to set the CF_DATADIR environment variable to a directory that contains the .dat files found in the "datafiles" directory of the source package.
After installing the software, running Cuneiform is simple. Run the command "cuneiform [-l language -o result_file --html --dotmatrix --fax] < image_file >", and it will write the output to pumaout.txt. Cuneiform assumes that your image contains only a single column of text. By default, it recognizes English text, but you can change the language by using the command line switch "-l" followed by your preferred language string. To get a list of supported languages, you can always run "cuneiform -l".
Cuneiform outputs plain text by default, but you can specify the "--html" switch to make it output in HTML format. If you do not define an output file using the -o switch, Cuneiform will write the result to a file "cuneiform-out.[format]", with the file extension being either "txt" or "html" depending on your output format. Overall, Cuneiform is an excellent OCR software that is easy to install and use.
Version 0.8: N/A