In this article, we will be unveiling the process of Conversion of Speech to Text in Python using SpeechRecognition Library.
Speech Recognition is the process of recognizing the voice and representing it in a textual manner. In today’s fast-moving world, Speech Recognition is useful in many aspects such as Automatic driving car, House Surveillance, etc.
Prerequisites for Python speech to text conversion
Before diving into the process of Python speech to text conversion, it is mandatory for us to install the necessary libraries.
Step 1: Install SpeechRecognition library
1 2 |
pip install speechrecognition <img class="alignnone wp-image-28336 size-full" src="http://all-learning.com/wp-content/uploads/2020/04/Installation-Of-PyAudio-Module2.png" alt="Installation Of PyAudio Module" width="1200" height="628" /> |
The SpeechRecognition library
is used for the Speech to Text conversion. Moreover, it supports various offline/online speech recognition engines and APIs.
Step 2: Install PyAudio module
1 2 |
pip install pyaudio <img class="alignnone wp-image-28335 size-full" src="http://all-learning.com/wp-content/uploads/2020/04/Installation-Of-PyAudio-Module.png" alt="Installation Of PyAudio Module" width="1200" height="628" /> |
The PyAudio library
serves as a cross-platform Input-Output module and provides bindings with PortAudio
. PyAudio enables the user to record and play the audio files irrespective of the platform i.e. it is completely platform-independent.
Understanding Python speech to text conversion using SpeechRecognition module
Step 1: Import the necessary library/module
In the process of conversion of speech to text using SpeechRecognition module
, we will have to import the same in our program so as to avail all the functions defined under the module/library.
1 |
import speech_recognition |
Step 2: Initialize the Speech Recognizer
1 |
variable = speech_recognition.Recognizer() |
In order to take the input in the audio format and recognize the sound, it is necessary for us to initialize the recognizer to recognize the audio/voice.
Step 3: Set the source of input audio/voice
The input to the speechrecognition module is of two types:
- Pre-recorded audio file
- Voice input through default Microphone
1 |
with SRG.Microphone() as source |
In the above statement, the input to our function is directly recorded through the default microphone. Thus, the Microphone()
object is being used to fetch the audio from the microphone.
Note: We need to install the PyAudio module
in order to accept the input in audio format from the default microphone.
If you want to convert a pre-recorded audio file to text, we need to follow the following statement:
1 |
with SRG.AudioFile(name of the audio file) as source |
Step 4: Define the time limit for recording the audio from the microphone.
The record() method
is used to set the source of the input and the time for which the microphone needs to accept and record the input audio.
1 |
record(source, duration) |
source
: Defines the source of input such as audio file, input from microphone, etc.duration
: The time period (in seconds) for which the microphone would be active and accept the input voice from the user.
Step 5: Convert the speech to text using a search engine or an API
The record() function accepts the voice from the user and uploads the same to the speech recognition engine such as google voice recognition engine for speech recognition. It is mandatory for the system to stay connected to the Internet in order to use the google recognition engine.
The recognize_google() function
recognizes the input voice passed to it as a parameter and returns it in the text form. If the user wishes to use any other language for speech recognition like Spanish, Japanese, etc, will need to pass the language
as a parameter to the function.
Implementation of Python Speech to text conversion using SpeechRecognition library
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import speech_recognition as SRG import time store = SRG.Recognizer() with SRG.Microphone() as s: print("Speak...") audio_input = store.record(s, duration=7) print("Recording time:",time.strftime("%I:%M:%S")) try: text_output = store.recognize_google(audio_input) print("Text converted from audio:n") print(text_output) print("Finished!!") print("Execution time:",time.strftime("%I:%M:%S")) except: print("Couldn't process the audio input.") |
Output:
1 2 3 4 5 6 7 |
<span style="color: #008000;"><strong>Speak... Recording time: 01:13:27 Text converted from audio: Python on Journaldev! Finished!! Execution time: 01:13:34 </strong></span> |
Conclusion
Thus, in this article, we have understood the conversion of Speech to Text in Python using the SpeechRecognition library.