How to build a labeling tool for two level sentiment and aspect annotation
Quality training data drives modern machine learning. Building an interface for two-level sentiment and aspect annotation requires more than a simple text classification form. You need a system that captures the overall document mood and granular span-level opinions. It must do this without slowing down the human reviewer. This guide shows you how to deploy a complete annotation environment using an autonomous coding agent.
Use a coding agent to generate an optimized configuration instead of writing custom frontend code.
Bind classification controls directly to highlighted text regions to keep annotators focused.
Import model predictions with confidence scores to bootstrap the human review process.
Measure span overlap and document-level agreement to maintain high training data quality.
Deploy self-hosted instances to comply with strict social media data retention policies.
The problem
Labeling for two-level sentiment and aspect annotation requires a complex data shape where users assess an entire document while highlighting specific entity spans. Annotators struggle when they have to switch context between a global classification menu and disconnected text highlighting tools. Furthermore, when you pull source data from platforms like YouTube or Reddit, strict data API terms prohibit sending that data to unauthorized third-party platforms. Building a custom frontend to meet these security and user experience requirements takes weeks of expensive engineering time that you should spend on model development.
The short answer
With Label Studio, you establish a secure foundation while a coding agent builds the application. Rather than building a new labeling application from scratch, agents generate the interface from your spec and deploy it into Label Studio in one pass. The agent uses the XML labeling config builder skill to produce optimized Label Studio interface configurations from a plain-language spec. It also uses the Label Studio SDK/CLI to wire the config into a real project programmatically.
Docs: Task format guide → https://labelstud.io/guide/task_format
Docs: Import predictions → https://labelstud.io/guide/predictions
Docs: Export formats → https://labelstud.io/guide/export.html
LLM-friendly docs (markdown) → https://labelstud.io/llms.txt
What you're building
A central text panel shows the source document for easy reading and highlighting during two-level sentiment and aspect annotation.
A document-level radio button group captures the overall sentiment of the text.
An inline span picker lets annotators highlight specific aspects like customer support or battery life.
A dynamic classification menu appears automatically when a user clicks a highlighted aspect span.
A free-text area collects optional rationale for each aspect-level sentiment decision.
Pre-annotated model suggestions appear on the screen to speed up the human review process.
How to build it in Label Studio
1. Set up the project
Start by installing a self-hosted instance of Label Studio to ensure you comply with the data retention policies of your source APIs. One task for two-level sentiment and aspect annotation consists of a single JSON object containing a text field with your raw document. You also need to attach metadata fields for the source URL and the original author so that the project navigator can filter tasks effectively. Load your domain-specific ontology files into the workspace before you begin to make the correct taxonomy references available to your annotators immediately.
2. Generate the labeling interface with the XML config skill
Instruct your coding agent to process the feature list from the previous section. The agent uses the XML labeling config builder skill to translate those requirements into a valid layout. This skill outputs a completed configuration that applies the correct tags for two-level sentiment and aspect annotation.
<Text name="text" value="$text"> - Use this tag to display the raw text document that the annotator reads during two-level sentiment and aspect annotation.
<Labels name="aspect_spans" toName="text"> - Use this tag to create the inline text spans to identify aspects for two-level sentiment and aspect annotation.
<Choices name="doc_sentiment" toName="text" choice="single"> - Use this tag to provide the high-level classification options to capture the overall document mood in two-level sentiment and aspect annotation.
<Choices name="aspect_sentiment" toName="text" perRegion="true"> - Use this tag to bind a conditional polarity menu directly to the selected span for two-level sentiment and aspect annotation.
<TextArea name="aspect_rationale" toName="text" perRegion="true"> - Use this tag to supply a per-region text box to collect optional reasoning for each span-level decision in two-level sentiment and aspect annotation.
3. Wire it into a project with the SDK
Tell the agent to execute the Label Studio SDK/CLI to create a new project using the generated configuration. The agent uploads the source text JSON tasks and imports existing model predictions to serve as pre-annotations for two-level sentiment and aspect annotation. You can run a small batch of data through this setup to observe the annotator workflow. If the reviewers struggle with the layout, have the agent regenerate the XML and redeploy the project in seconds.
4. Set up review and quality workflows
Configure the project to require an overlap percentage greater than one so that multiple reviewers evaluate the same document. Reviewers rely on specialized queues to resolve disagreements between annotators. For two-level sentiment and aspect annotation, you must track Cohen's kappa for the document-level choices and span overlap F1 scores for the aspect boundaries. Disagreements often occur on exact word boundaries or subjective mixed sentiments, so you track these metrics to identify whether the humans need better guidelines or if the text itself lacks clarity.
5. Export and integrate
Extract your completed tasks using the default JSON export format. This format captures the nested relationship between text spans and their assigned sentiments. Downstream consumers of two-level sentiment and aspect annotation require the span boundaries, the regional classifications, and the document-level scores to update their systems. You typically hand off this exported payload to a training pipeline or an analytics warehouse to fine-tune your machine learning models.
Why Label Studio for two-level sentiment and aspect annotation
Self-hosted deployment options help you comply with strict third-party data API terms.
Per-region classification properties attach sentiment choices directly to spans to eliminate context switching.
Unified task JSON structures organize document-level labels and span-level labels in a single readable payload.
Pre-annotation imports allow you to display existing model outputs as a starting point for human reviewers.
Built-in agreement metrics track span overlap to resolve ambiguous entity boundaries effectively.
Common variations
Plain sentiment classification tasks collect only the document-level score without requiring granular span highlights.
Generic named entity recognition extracts specific terms from the text without linking them to polarity choices.
Topic taxonomy tagging applies multiple categorical labels to an entire document instead of mapping aspects.
Next steps
XML labeling config builder skill → https://github.com/HumanSignal/create-xml-labeling-config-skill
Label Studio SDK/CLI → https://api.labelstud.io/api-reference/introduction/getting-started
LLM-friendly docs (markdown) → https://labelstud.io/llms.txt
Task format guide → https://labelstud.io/guide/task_format
Pre-annotations format → https://labelstud.io/guide/predictions
Export formats → https://labelstud.io/guide/export.html
How do you manage data retention mandates when using the Reddit or YouTube APIs?
You must delete or refresh user-derived data according to specific platform developer policies. Reddit Data API Terms prohibit storing data beyond their specified limits and ban the circumvention of rate restrictions. Deploy a self-hosted Label Studio instance to maintain strict control over your storage architecture so you can purge tasks quickly to remain compliant.
How do you bind aspect-level sentiment choices to specific text spans?
You add the perRegion attribute to the choices tag in your Label Studio XML configuration. This renders a conditional polarity menu only when an annotator selects a highlighted aspect span. In your JSON payload, this configuration forces the text span boundaries and the regional sentiment choice to share the exact same result ID.
What is the correct JSON structure for importing aspect pre-annotations?
You must include a predictions array in your task JSON that matches your exact annotation output format. Ensure that the text span coordinates and the per-region classification share identical string IDs so the interface binds them together. You can also include a prediction score to help reviewers prioritize uncertain model outputs during the annotation workflow.
How do you design an efficient workspace for two-level sentiment tasks?
You configure hotkeys for both the document-level polarity choices and the inline aspect labels. Set the span-level controls to appear only when a user selects a text region. This prevents visual clutter and allows annotators to keep their hands on the keyboard while rapidly evaluating long text documents.
How do you calculate inter-annotator agreement for aspect spans?
You track span overlap F1 scores for the aspect boundaries and use Cohen's kappa for the document-level choices. Because reviewers often disagree on exact word boundaries, require an overlap greater than one to route conflicting annotations to senior reviewers. Standard data engineering practices dictate tracking these metrics constantly to identify ambiguous text edges.