templates
- Template Gallery
- Get Started with Templates
Computer Vision
- Semantic Segmentation with Polygons
- Semantic Segmentation with Masks
- Object Detection with Bounding Boxes
- Keypoint Labeling
- Image Captioning
- Optical Character Recognition (OCR)
- Image Classification
- Visual Question Answering
- Object Detection with Ellipses
- Multi-Image Classification
- Multi-page Document Annotation
- Inventory Tracking
- Visual Genome
Natural Language Processing
- Question Answering
- Sentiment Analysis Text Classification
- Named Entity Recognition
- Taxonomy
- Relation Extraction
- Text Summarization
- Machine Translation
Audio/Speech Processing
- Automatic Speech Recognition
- Sound Event Detection
- Automatic Speech Recognition using Segments
- Signal Quality Detection
- Speaker Diarization
- Dialogue Analysis
- Intent Classification
- Audio Classification
- Audio Classification with Segments
- Voice Activity Detection
Conversational AI
- Response Generation
- Response Selection
- Coreference Resolution and Entity Linking
- Slot Filling and Intent Classification
Ranking and Scoring
- Pairwise Regression
- Document Retrieval
- Pairwise Classification
- Content-based Image Retrieval
- Website Rating
- ASR Hypotheses Selection
- Text-to-Image Generation
- Search Page Ranking
Structured Data Parsing
- Freeform Metadata
- PDF Classification
- Tabular Data
- HTML Entity Recognition
- HTML Classification
Time Series Analysis
- Time Series Forecasting
- Change Point Detection
- Activity Recognition
- Signal Quality
- Outliers and Anomaly Detection
- Time Series Classification
- Time Series Labeling
Videos
- Video Classification
- Video Timeline Segmentation
- Video Object Detection and Tracking
Template Galleries
- Template Gallery - Computer Vision
- Template Gallery - Natural Language Processing
- Template Gallery - Audio/Speech Processing
- Template Gallery - Conversational AI
- Template Gallery - Ranking & Scoring
- Template Gallery - Structured Data Parsing
- Template Gallery - Time Series Analysis
- Template Gallery - Videos
- Template Gallery - Dynamic Labels
Voice Activity Detection

If you want to train a voice activity detection (VAD) model for automating call center interactions, improving voice-activated assistant systems, or other speech detection use cases, you can use this template. Create segments of the audio clip that possibly include speech, then classify each segment as to whether it includes speech, a specific wake word for voice-activated assistant system use cases, or is simply noise.
Interactive Template Preview
Labeling Configuration
<View>
<Header value="Listen to the audio:"></Header>
<AudioPlus name="audio" value="$url" zoom="true"></AudioPlus>
<Header value="Highlight segments with sound louder than baseline:"></Header>
<Labels name="label" toName="audio" choice="multiple">
<Label value="Above Baseline Sound" background="green" alias="possible-speech"></Label>
</Labels>
<Header value="Select a segment and classify it:"></Header>
<Choices name="voice" toName="audio" choice="multiple" showInline="true" perRegion="true">
<Choice value="Wake Word" alias="wake-word"></Choice>
<Choice value="Speech" alias="plain-speech"></Choice>
<Choice value="Noise" alias="not-speech"></Choice>
</Choices>
</View>
About the labeling configuration
All labeling configurations must be wrapped in View
tags.
You can add a header to provide instructions to the annotator:
<Header value="Listen to the audio:"></Header>
Use the AudioPlus object tag to specify the location of the audio file to process:
<AudioPlus name="audio" value="$url"></AudioPlus>
Use the Labels control tag to allow annotators to segment the audio and identify possible spots where speech might be present.
<Labels name="label" toName="audio" choice="multiple">
<Label value="Above Baseline Sound" background="green" alias="possible-speech"></Label>
</Labels>
The choice="multiple"
parameter allows one audio segment to be labeled with overlapping labels. The alias
parameter lets you specify a name for the label in the exported annotations that is different from what appears to annotators.
Use the Choices control tag to prompt annotators to classify the type of sound in each audio segment:
<Choices name="voice" toName="audio" choice="multiple" showInline="true" perRegion="true">
<Choice value="Wake Word" alias="wake-word"></Choice>
<Choice value="Speech" alias="plain-speech"></Choice>
<Choice value="Noise" alias="not-speech"></Choice>
</Choices>
The choice="multiple"
parameter allows annotators to select both “Speech” and “Wake Word” as options for a specific segment. The perRegion
parameter means that each classification applies to a specific audio segment.
Related tags
Related templates

If you found an error, you can file an issue on GitHub!