Whisper Open Source Speech Recognition You Can Use |
Written by Mike James | ||||||||||||||||||||||||||||||||||||
Wednesday, 28 September 2022 | ||||||||||||||||||||||||||||||||||||
OpenAI has released a very usable speech recognition and translation program that you can install and use on any machine that runs Python. It could well be useful for more than just research. OpenAI has received some criticism in the past for not being quite as open as its name suggests. However with the release of Whisper under an MIT licence it has done us all a huge favour. "Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English." Put simply you put a voice recording in and out comes a text transcription perhaps in a different language. Unlike many research groups OpenAI has made the code available in a form that makes it very useable. All you need is Python with PyTorch installed plus a few additional packages and a copy of ffmpeg and its Python bindings, ffmpeg-python. The ffmpeg library handles the audio file input and so Whisper will work with files in any format it can handle. Even if you aren't working with Python for AI installation should be relatively easy. The model comes in five sizes:
The speeds are relative to the large model and, of course, the smaller models don't perform as well as the larger ones. Using Whisper from the command line is also easy:
and if you want a translation:
Using it from Python is just as easy: import whisper model = whisper.load_model("base") result = model.transcribe("audio.mp3") print(result["text"]) What can one say - amazing! High quality speech recognition and translation on a desktop machine was unthinkable just a short time ago and now it's open source. On an average desktop machine it takes about 2 minutes to transcribe 1 minute of speech. At the moment it is better at English than other languages, which is hardly surprising given only a third of the training dataset was non-English. You can find out more about the model from the published paper. It is a transformer model, again hardly surprising given how much this approach has revolutionised language processing. Training used 680,000 hours of multilingual voice data. Both Apple and Google have similar systems which they haven't made generally available or easy to use. Whisper might well force the pace in making off-line speech recognition available. OpenAI's final comment on the release is: "We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications." What are you waiting for? This is truly open AI. More InformationRelated ArticlesMozilla Updates Voice Recognition Project Microsoft researchers achieve speech recognition milestone Mozilla DeepSpeech Gets Smaller Speech Recognition Breakthrough Google's Deep Learning - Speech Recognition To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
||||||||||||||||||||||||||||||||||||
Last Updated ( Wednesday, 28 September 2022 ) |