Sphinx is a software that recognizes continuous speech without being trained to a specific voice. It has a wide vocabulary and can handle complex language.
To get Sphinx up and running, the first crucial step is to download SphinxBase from the official website, unpack it and place it in the same parent directory as PocketSphinx. On Windows, you need to rename the SphinxBase version number to simply "sphinxbase" for this to work. In a unix-like environment such as Linux or Solaris, you will need to build SphinxBase and configure it with the --enable-fixed option if you want to use fixed-point arithmetic.
If you downloaded directly from the CVS repository, you have to generate the "configure" file by running ./autogen.sh at least once. Then you can proceed to compile and install using the following commands:
- ./configure
- make clean all
- make test
- make install
In this release, Sphinx boasts several key features, including a new, re-entrant API, a GStreamer plugin, and a Python module that provides access to most of the API. You'll also find support for continuous density models (albeit slow), more flexible semi-continuous modeling, and reduced code size and memory footprint. The software is up to 18% faster on ARM platforms, and word posterior probabilities (confidence scores) are also available.
You'll be pleased to know that Sphinx also offers experimental JSGF grammar file support, and comes with a new default WSJ language model which is more suitable for dictation. All in all, Sphinx is a top-notch open-source software for anyone looking to build their own speech recognition system.
Version 0.5: N/A