open source audio transcription
A few months ago, a colleague asked me for recommendations for open source audio transcription software. Audio transcription isn’t something I’ve ever really investigated so I was completely unfamiliar with the software available. But I found that the software can broadly be divided into two types: tools to help with manual transcription of audio files and tools that use machine learning models (mostly OpenAI’s Whisper[1]) to automate transcription. The former is more reliable but more time-consuming; the latter is easier but risks the hallucinations and other errors that plague machine learning models.
Here are the open source audio transcription tools I found.
manual transcription
oTranscribe is a free web app distributed under the MIT license. The web app helps with manual transcribing by embedding a media player (audio or video files or links to YouTube) next to a pad for transcribing while the media plays. oTranscribe makes use of a number of useful software components which they have also released as open source. oTranscribe was developed by Elliot Bentley and is maintained by the MuckRock Foundation.
Parlatype is a free and open source Linux application distributed under the GNU GPL-3.0 license. It’s a minimal audio player with keyboard shortcuts for audio playback control so you can easily scroll backwards or slow down while transcribing into whichever text editor you prefer. Parlatype was developed by Gabor Karsay.
There is a range of further open source and proprietary manual transcription software tools at https://www.sosciso.de/de/software/datenumwandlung/transcription/ (which uses this query to pull linked open data from Wikidata).
automated transcription
Transcription Stream is a transcription and diarization service that is distributed under the GNU GPL-3.0 license. The community edition released on GitHub offers a drag and drop interface for diarization and transcription via SSH and a web interface for importing audo files. It uses OpenAI Whisper for transcription and diarization, the Ollama and Mistral models for text summaries, and the open source Meilisearch search engine for full-text search. Transcription Stream is developed by Affordable Magic.
transcribee is in closed beta but is distributed under the AGPL-3.0 license. It will be a tool allowing you to import audio files which are converted to text using OpenAI Whisper and offers a collaborative writing interface to manually improve the transcript afterwards. transcribee is developed by Engelhardt, Habiger, Heinemann & Mandler GbR.
These tools both use OpenAI’s Whisper machine learning model available under the MIT license. From OpenAI’s website: “Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.” Whisper is available as a Python package to incorporate in other audio recognition software or that a developer can run standalone for command-line transcription.
For all the tools that use OpenAI Whisper, it’s worth noting that OpenAI has been sued for copyright infringement by multiple authors and is implicated in multiple class action lawsuits. Specifically related to Whisper, OpenAI has been accused of violating YouTube’s terms of service by using one million hours of YouTube videos to train the model. OpenAI also provides services to the Artificial Intelligence Cyber Challenge (AIxCC) which is sponsored by the US Department of Defence’s DARPA and the US Government’s Advanced Research Projects Agency for Health (ARPA-H). The company’s board includes former NSA head, Paul Nakasone. ↩︎