templates
- Template Gallery
- Get Started with Templates
Computer Vision
- Semantic Segmentation with Polygons
- Semantic Segmentation with Masks
- Object Detection with Bounding Boxes
- Keypoint Labeling
- Image Captioning
- Optical Character Recognition
- Image Classification
- Visual Question Answering
- Object Detection with Ellipses
- Multi-Image Classification
Natural Language Processing
- Question Answering
- Sentiment Analysis Text Classification
- Named Entity Recognition
- Taxonomy
- Relation Extraction
- Text Summarization
- Machine Translation
Audio/Speech Processing
- Automatic Speech Recognition
- Sound Event Detection
- Automatic Speech Recognition using Segments
- Signal Quality Detection
- Speaker Diarization
- Dialogue Analysis
- Intent Classification
- Audio Classification
- Audio Classification with Segments
- Voice Activity Detection
Conversational AI
- Response Generation
- Response Selection
- Coreference Resolution and Entity Linking
- Slot Filling and Intent Classification
Ranking and Scoring
- Pairwise Regression
- Document Retrieval
- Pairwise Classification
- Content-based Image Retrieval
- Website Rating
Structured Data Parsing
- Freeform Metadata
- PDF Classification
- Tabular Data
- HTML Entity Recognition
- HTML Classification
Time Series Analysis
- Time Series Forecasting
- Change Point Detection
- Activity Recognition
- Signal Quality
- Outliers and Anomaly Detection
- Time Series Classification
- Time Series Labeling
Videos
- Video Classification
- Video Timeline Segmentation
Template Galleries
- Template Gallery - Computer Vision
- Template Gallery - Natural Language Processing
- Template Gallery - Audio/Speech Processing
- Template Gallery - Conversational AI
- Template Gallery - Ranking & Scoring
- Template Gallery - Structured Data Parsing
- Template Gallery - Time Series Analysis
- Template Gallery - Videos
Automatic Speech Recognition using Segments

Listen to an audio file and segment it, then transcribe the contents of each segment in natural language, performing speech recognition using segments.
Interactive Template Preview
Labeling Configuration
<View>
<Labels name="labels" toName="audio">
<Label value="Speech" />
<Label value="Noise" />
</Labels>
<AudioPlus name="audio" value="$audio"/>
<TextArea name="transcription" toName="audio"
rows="2" editable="true"
perRegion="true" required="true" />
</View>
About the labeling configuration
All labeling configurations must be wrapped in View tags.
Use the Labels control tag to allow annotators to highlight portions of the audio that represent different types of noise:
<Labels name="labels" toName="audio">
<Label value="Speech" />
<Label value="Noise" />
</Labels>
Use the AudioPlus object tag to display a waveform of audio that can be labeled:
<AudioPlus name="audio" value="$audio"/>
Use the TextArea control tag to prompt annotators to provide a transcript for each segment of audio:
<TextArea name="transcription" toName="audio"
rows="2" editable="true"
perRegion="true" required="true" />
The editable="true"
argument specifies that the transcript can be edited, and required="true"
sets the transcript as a required field for the annotator. Without a transcript provided for each segment of the audio clip (set by the perRegion="true"
argument), the annotation can’t be submitted.
Enhance this template
Add context to specific audio segments
If you want to prompt annotators to add context to specific audio segments, such as by selecting the accent or assumed gender of the speakers in a given audio clip, you can add the following to your labeling configuration:
<View visibleWhen="region-selected">
<Header value="Select the assumed gender of the speaker:" />
<Choices name="gender" toName="audio"
perRegion="true" required="true">
<Choice value="Man" />
<Choice value="Woman" />
</Choices>
</View>
The visibleWhen
parameter for the View tag means that the choice is only visible when a specific audio segment is selected. The Header tag provides instructions to the annotator. The Choices tag includes the perRegion
parameter to apply the selected choice only to the selected audio segment.
Related tags

If you found an error, you can file an issue on GitHub!