Mozilla's DeepSpeech an open source speech to text engine

In this article, we will be trying a transcriber made using Mozilla's DeepSpeech.

Installation

Let's start with an example.

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate

# Install DeepSpeech
pip3 install deepspeech

# Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.scorer

Demo - transcribing an audio file

Now let's transcribe an audio file.

# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/audio-0.7.3.tar.gz
tar xvf audio-0.7.3.tar.gz

# Transcribe an audio file
deepspeech --model deepspeech-0.7.3-models.pbmm --scorer deepspeech-0.7.3-models.scorer --audio audio/2830-3980-0043.wav

The output should look similar to the verbose below. Notice the line "experience proves this" which shows this is working.

Loading model from file deepspeech-0.7.3-models.pbmm
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.3-0-g8858494
Loaded model in 0.00997s.
Loading scorer from files deepspeech-0.7.3-models.scorer
Loaded scorer in 0.000207s.
Running inference.
experience proves this
Inference took 1.007s for 1.975s audio file.

The last line shows something note worthy - inference took less time than the audio file length.

Mozilla's DeepSpeech has an API in C,.NET, Java, JavaScript (NodeJS/ElectronJS) and Python so i'm sure you could integrate this the next time you need speech-to-text - I know I will be. You may want to train your own models. You could use this for an IoT device.

Mozilla's DeepSpeech an open source speech to text engine

Installation

Demo - transcribing an audio file

Streaming to DeepSpeech

Next steps