Whisper Open Source Speech Recognition You Can Use
Written by Mike James   
Wednesday, 28 September 2022

OpenAI has released a very usable speech recognition and translation program that you can install and use on any machine that runs Python. It could well be useful for more than just research.

OpenAI has received some criticism in the past for not being quite as open as its name suggests. However with the release of Whisper under an MIT licence it has done us all a huge favour.

"Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English."

Put simply you put a  voice recording in and out comes a text transcription perhaps in a different language. Unlike many research groups OpenAI has made the code available in a form that makes it very useable. All you need is Python with PyTorch installed plus a few additional packages and a copy of ffmpeg and its Python bindings, ffmpeg-python. The ffmpeg library handles the audio file input and so Whisper will work with files in any format it can handle. Even if you aren't working with Python for AI installation should be relatively easy.

The model comes in five sizes:

Size

Params

English
only

Multi-
lingual

VRAM

speed

tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

 

The speeds are relative to the large model and, of course, the smaller models don't perform as well as the larger ones.

Using Whisper from the command line is also easy:

whisper audio.mp3 --model medium

 and if you want a translation:

whisper japanese.wav --language Japanese --task translate

 Using it from Python is just as easy:

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

What can one say - amazing!

High quality speech recognition and translation on a desktop machine was unthinkable just a short time ago and now it's open source.  On an average desktop machine it takes about 2 minutes to transcribe 1 minute of speech. At the moment it is better at English than other languages, which is hardly surprising given only a third of the training dataset was non-English.

You can find out more about the model from the published paper. It is a transformer model, again hardly surprising given how much this approach has revolutionised language processing.

whisper1

Training used 680,000 hours of multilingual voice data.

Both Apple and Google have similar systems which they haven't made generally available or easy to use. Whisper might well force the pace in making off-line speech recognition available.

OpenAI's final comment on the release is:

"We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications."

What are you waiting for? This is truly open AI.

openaionblack

More Information

Introducing Whisper

Related Articles

Mozilla Updates Voice Recognition Project

Microsoft researchers achieve speech recognition milestone

Introducing DeepSpeech

Mozilla Wants Your Voice

Mozilla DeepSpeech Gets Smaller

Speech Recognition Breakthrough

Google's Deep Learning - Speech Recognition

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


JetBrains Releases Aqua Preview
07/11/2022

JetBrains has released a public preview of a new test automation product. Aqua is described as providing powerful tools for test automation engineers, along with a combination of language-specifi [ ... ]



Lego Discontinues Its Mindstorm Range
11/11/2022

LEGO has announced that Mindstorms, its long-running robotics range, is being discontinued. This will come as a blow to a global community of hackers of all ages.


More News

picobook

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 28 September 2022 )