Automatic speech recognition with Nvidia NeMo

Overview

Combine the power of Label Studio with Nvidia’s NeMo to enhance work done by researchers and practitioners in automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP).

With the community-created integration, create audio pre-annotations and automatic transcriptions within a selected speech area within Label Studio.

Benefits

Using NeMo for pre-annotation in Label Studio has the following benefits:

Reliability: NeMo is capable of providing highly accurate speech-to-text transcriptions.
Speed: Using NeMo for transcriptions speeds the labeling process by offloading most transcription tasks to a ML model, freeing annotators to work on more difficult transcriptions.

Related Integrations

Mistral AI

High-performance open-weight language models.

LangChain

Evaluate LLM Output Quality

Hugging Face

Label Studio in Hugging Face Spaces

Baal

Library to enable Bayesian active learning