I decided to try to install PocketSphinx for Raspberry Pi for offline speech-to-text processing (until now, I’ve been using the Google Translate APIs for speech-to-text).
The short version is that the out-of-the-box experience isn’t very good. It can recognize “Hello” pretty reliably; but, not much else! I’m hopeful that matters will improve with a custom dictionary.
Anyway, here’s a HOWTO for getting it up-and-running on the RPi:
The latest source for pocketsphinx can be obtained here.
You’ll need to download both sphinxbase and pocketsphinx.
You’ll also need to ensure that ALSA is installed. There’s a pretty good primer about how to accomplish this here.
The short version is that you must, at the minimum, install alsa-utils:
sudo apt-get install alsa-utils
You’ll also need bison and libasound2-dev.
sudo apt-get install bison
sudo apt-get install libasound2-dev
Note that the Raspberry Pi lacks any hardware microphone input, so you’ll have to install your own microphone. I am using the Logitech C920 webcam for this purpose.
After you’ve accomplished all of this, you should be able to install pocketsphinx.
gzip -d sphinxbase-0.8.tar.gz
tar -xvf sphinxbase-0.8.tar
sudo make install
gzip -d pocketsphinx-0.8.tar.gz
tar -xvf pocketsphinx-0.8.tar
sudo make install
./src/programs/pocketsphinx_continuous -adcdev plughw:1,0 -nfft 2048 -samprate 48000
Note, if you receive an error such as the following:
Error opening audio device plughw:1,0 for capture: Connection refused
Mixer load failed: Invalid argument FATAL_ERROR: "continuous.c", line
246: Failed to open audio device
You likely have pulseaudio installed, which is causing sphinxbase to attempt to use pulse instead of alsa.
The workaround (if you indeed wish to use ALSA) is to remove pulse, and then follow the steps above to re-install sphinxbase.
sudo apt-get remove pulseaudio -y
sudo aptitude purge pulseaudio -y
sudo mv /usr/include/pulse/pulseaudio.h /usr/include/pulse/pulseaudio.h.old
If you’ve done everything correctly, running ./configure on sphinxbase should give you output that looks like the following:
I primarily posted this because I ran into the pulseaudio issue, and there were no good resources on the subject (most people just recommended installing and configuring pulseaudio, and allowing sphinxbase to use that instead of alsa). I wanted to avoid the overhead and keep things simple, so I dug deeper. After reading through the application and library source code (<3 open source!), making some sandbox code to attempt to reproduce the error, etc., I finally figured out how to get sphinxbase to use alsa as intended.
I am hopeful that this post may save someone else the trouble I went through!