templates
- Template Gallery
- Get Started with Templates
Computer Vision
- Semantic Segmentation with Polygons
- Semantic Segmentation with Masks
- Object Detection with Bounding Boxes
- Keypoint Labeling
- Image Captioning
- Optical Character Recognition (OCR)
- Image Classification
- Visual Question Answering
- Object Detection with Ellipses
- Multi-Image Classification
- Multi-page Document Annotation
- Inventory Tracking
- Visual Genome
Natural Language Processing
- Question Answering
- Sentiment Analysis Text Classification
- Named Entity Recognition
- Taxonomy
- Relation Extraction
- Text Summarization
- Machine Translation
Audio/Speech Processing
- Automatic Speech Recognition
- Sound Event Detection
- Automatic Speech Recognition using Segments
- Signal Quality Detection
- Speaker Diarization
- Dialogue Analysis
- Intent Classification
- Audio Classification
- Audio Classification with Segments
- Voice Activity Detection
Conversational AI
- Response Generation
- Response Selection
- Coreference Resolution and Entity Linking
- Slot Filling and Intent Classification
Ranking and Scoring
- Pairwise Regression
- Document Retrieval
- Pairwise Classification
- Content-based Image Retrieval
- Website Rating
- ASR Hypotheses Selection
- Text-to-Image Generation
- Search Page Ranking
Structured Data Parsing
- Freeform Metadata
- PDF Classification
- Tabular Data
- HTML Entity Recognition
- HTML Classification
Time Series Analysis
- Time Series Forecasting
- Change Point Detection
- Activity Recognition
- Signal Quality
- Outliers and Anomaly Detection
- Time Series Classification
- Time Series Labeling
Videos
- Video Classification
- Video Timeline Segmentation
- Video Object Detection and Tracking
Template Galleries
- Template Gallery - Computer Vision
- Template Gallery - Natural Language Processing
- Template Gallery - Audio/Speech Processing
- Template Gallery - Conversational AI
- Template Gallery - Ranking & Scoring
- Template Gallery - Structured Data Parsing
- Template Gallery - Time Series Analysis
- Template Gallery - Videos
- Template Gallery - Dynamic Labels
Automatic Speech Recognition

Listen to an audio file and transcribe its content in natural language, performing speech recognition.
Interactive Template Preview
Labeling Configuration
<View>
<Audio name="audio" value="$audio" zoom="true" hotkey="ctrl+enter" />
<Header value="Provide Transcription" />
<TextArea name="transcription" toName="audio" rows="4" editable="true" maxSubmissions="1" />
</View>
About the labeling configuration
All labeling configurations must be wrapped in View
tags.
Use the Audio object tag with the hotkey
argument to allow annotators to play back audio on the labeling interface using a specific hotkey, and use the zoom="true"
argument to allow annotators to zoom in on the audio wave:
<Audio name="audio" value="$audio" zoom="true" hotkey="ctrl+enter" />
You can add a header to provide instructions to the annotator:
<Header value="Provide Transcription" />
Use the TextArea control tag to prompt annotators to add a transcript for the audio:
<TextArea name="transcription" toName="audio" rows="4" editable="true" maxSubmissions="1" />
The rows="4"
argument lets you configure the size of the text box visible on the labeling interface. The maxSubmissions="1"
argument limits the maximum number of transcripts submitted by an annotator for the audio clip to one, while the editable="true"
argument allows annotators to edit the transcript.
Related tags

If you found an error, you can file an issue on GitHub!