In this article, we will be trying a transcriber made using Mozilla's DeepSpeech.
Installation
Let's start with an example.
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate
# Install DeepSpeech
pip3 install deepspeech
# Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.scorerDemo - transcribing an audio file
Now let's transcribe an audio file.
# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/audio-0.7.3.tar.gz
tar xvf audio-0.7.3.tar.gz
# Transcribe an audio file
deepspeech --model deepspeech-0.7.3-models.pbmm --scorer deepspeech-0.7.3-models.scorer --audio audio/2830-3980-0043.wavThe output should look similar to the verbose below. Notice the line "experience proves this" which shows this is working.
Loading model from file deepspeech-0.7.3-models.pbmm
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.3-0-g8858494
Loaded model in 0.00997s.
Loading scorer from files deepspeech-0.7.3-models.scorer
Loaded scorer in 0.000207s.
Running inference.
experience proves this
Inference took 1.007s for 1.975s audio file.The last line shows something note worthy - inference took less time than the audio file length.
Streaming to DeepSpeech
Make sure you have the prerequisites before continuing.
# Prerequisites
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
# Dependancies
sudo apt-get install libasound2-dev
# Clone and move into working directory
git clone https://github.com/mozilla/DeepSpeech-examples moz-ds-examples
cd moz-ds-examples/nodejs_mic_vad_streaming
# Install npm packages
npm install
# Start the server
node start.jsI had an issue where no verboise was given. This is because it uses the default mic which may not exist.
So update microphone variable to the following - notice device property was
added to the options. Run
arecord -l to see the list of microphones installed.
var microphone = mic({
rate: "16000",
channels: "1",
debug: false,
fileType: "wav",
device: "plughw:2,0",
});Now start streaming.
Next steps
Mozilla's DeepSpeech has an API in C,.NET, Java, JavaScript (NodeJS/ElectronJS) and Python so i'm sure you could integrate this the next time you need speech-to-text - I know I will be. You may want to train your own models. You could use this for an IoT device.