Automatic speech recognition with Nvidia NeMo

Overview

Combine the power of Label Studio with Nvidia’s NeMo to enhance work done by researchers and practitioners in automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP).

With the community-created integration, create audio pre-annotations and automatic transcriptions within a selected speech area within Label Studio.

Benefits

Using NeMo for pre-annotation in Label Studio has the following benefits:

Reliability: NeMo is capable of providing highly accurate speech-to-text transcriptions.
Speed: Using NeMo for transcriptions speeds the labeling process by offloading most transcription tasks to a ML model, freeing annotators to work on more difficult transcriptions.

Related Integrations

Flair

Named entity recognition

Grounding DINO

Text-Driven Object Detection Model

Ultralytics YOLO

Computer vision models

Scikit Learn

Machine learning toolkit