How to build a labeling tool for document layout and reading order annotation
When you train machine learning models for document intelligence, you need exact ground truth for visual regions and the sequence in which text is read. This guide shows you how to use a coding agent to build a custom labeling interface for document layout and reading order annotation on top of Label Studio.
Define the interface specifications to handle multi-page images and bounding box drawing.
Generate an optimized labeling configuration using an automated XML builder skill.
Deploy the configuration and task data programmatically using the Python SDK.
Establish review workflows with overlap thresholds to measure bounding box agreement.
Export the structured spatial data to evaluate external extraction pipelines.
The problem
Training multimodal pipelines like LayoutLMv3 requires explicit geometric boundaries for structural elements and ordered sequencing to capture human reading patterns. Annotators struggle when they have to switch between drawing precise bounding boxes and clicking tiny elements sequentially to establish relationships on dense document images. Regulated documents like medical records or financial filings add strict compliance constraints that prohibit syncing files to external platforms. Attempting to build an interactive and compliant spatial tool from scratch easily costs hundreds of engineering hours before the first label is ever placed.
The short answer
Use Label Studio as the foundation and have a coding agent generate the labeling interface directly from your specifications. The agent uses the XML labeling config builder skill to produce an optimized configuration, and it relies on the Label Studio SDK/CLI to wire that configuration into a real project programmatically. Rather than building a new labeling application from scratch, agents generate the interface from your spec and deploy it into Label Studio in one pass.
Docs: Multi-page document annotation → https://labelstud.io/templates/multi-page-document-annotation
Docs: Import pre-annotated tasks → https://labelstud.io/guide/predictions.html
Docs: Export structured data → https://labelstud.io/guide/export.html
LLM-friendly docs (markdown) → https://labelstud.io/llms.txt
What you're building
Display multi-page document images within a unified viewer using a dedicated page navigator.
Provide a categorical label picker to classify structural blocks like titles and tables.
Enable annotators to draw precise bounding boxes around distinct layout regions on each page.
Apply sequential numerical indices to each drawn region to establish an explicit reading order.
Show model-generated pre-annotations as interactive bounding boxes that reviewers can accept or correct.
Include a text filter input to help annotators quickly find specific categories within large label taxonomies.
How to build it in Label Studio
1. Set up the project
Install Label Studio locally or use a self-hosted Enterprise instance if your document layout and reading order annotation data involves strict compliance constraints. Structure each task as a JSON object containing an array of presigned image URLs that represent the individual pages of a single document. Include metadata fields like the original document filename and the total page count to help reviewers filter tasks in the data manager. Pre-load reference dictionaries or labeling guidelines into the project instructions so annotators understand the required taxonomy before they begin.
2. Generate the labeling interface with the XML config skill
Direct your coding agent to process the requirements from your interface specification using the XML labeling config builder skill. The skill accepts the document layout and reading order annotation constraints and outputs a validated Label Studio XML configuration. This automated process ensures the resulting interface binds correctly to your multi-page task arrays.
<Image name="doc" valueList="$pages"> — renders the array of document pages and provides a navigation interface for the annotator.
<RectangleLabels name="blocks" toName="doc"> — enables users to draw bounding boxes and assign layout classification tags to specific visual regions.
<Label value="Paragraph" background="blue"> — defines a distinct classification option within the layout taxonomy and assigns it a color indicator.
<Number name="order" toName="doc" perRegion="true"> — applies explicit numerical indices to individual bounding boxes to record human reading order.
<Filter name="search" toName="blocks"> — provides a text input field to help annotators quickly locate specific tags in complex hierarchies.
3. Wire it into a project with the SDK
Instruct the agent to use the Label Studio SDK/CLI to create a new project and apply the generated configuration. Have the agent upload the JSON task files and import existing layout predictions from your external models as interactive pre-annotations. The same agent loop can iterate on the configuration continuously. Run a small batch of documents, watch the annotators struggle with specific regions, ask the agent to regenerate the XML with improved taxonomy choices, and redeploy the update.
4. Set up review and quality workflows
Establish an annotation overlap percentage to ensure multiple team members process the same complex documents. Route tasks with conflicting annotations into a reviewer queue so a domain expert can resolve disagreements. For document layout and reading order annotation, configure the agreement metrics to evaluate bounding box intersection over union (IoU) and exact match for the categorical tags. Set a strict IoU threshold of 0.5 to 0.7 for spatial blocks so the system accurately flags misaligned regions before they enter the training pipeline.
5. Export and integrate
Export the completed project data using the default JSON format. Downstream consumers of document layout and reading order annotation require the geometric percentage coordinates, the page item index, and the sequence numbers. Pass this structured output to your data warehouse for analytics or feed it directly into an evaluation harness to validate the performance of your document intelligence models.
Why Label Studio for document layout and reading order annotation
Connect directly to secured cloud storage buckets to satisfy strict compliance constraints without syncing physical files.
Display multi-page tasks within a single annotation view to eliminate the friction of switching files during sequential clicking.
Supply dedicated bounding box tools to speed up the complex geometric drawing required for structural elements.
Attach numerical attributes directly to spatial regions to record sequence data without requiring separate relation extraction steps.
Provide an automated XML builder to eliminate the high build cost of creating interactive spatial tools from scratch.
Common variations
Draw precise bounding boxes around charts and graphs to train specialized figure extraction models.
Annotate individual table cells and link them sequentially to evaluate tabular data extraction pipelines.
Mark specific signature blocks and date fields to speed up automated contract processing workflows.
Identify irregular text blobs on scanned historical documents using polygonal drawing tools.
Next steps
XML labeling config builder skill → https://github.com/HumanSignal/create-xml-labeling-config-skill
Label Studio SDK/CLI → https://api.labelstud.io/api-reference/introduction/getting-started
LLM-friendly docs (markdown) → https://labelstud.io/llms.txt
Label Studio tags → https://labelstud.io/tags
Multi-page document annotation → https://labelstud.io/templates/multi-page-document-annotation
Exporting results → https://labelstud.io/guide/export.html
GitHub → https://github.com/HumanSignal/label-studio
How do you manage API rate limits when ingesting public document datasets?
When you pull financial filings to build your training corpus, respect platform-specific quotas to avoid access bans. The SEC EDGAR system caps automated requests at 10 requests per second and requires an identifying user agent string. If you pull data from Google Drive, exceeding daily read limits triggers 403 errors, requiring backoff retries in your ingestion script.
What is the correct way to encode reading order across layout regions?
Do not use ranking interfaces to sort spatial elements. Configure the workspace with a numerical tag set to apply per region, allowing annotators to attach explicit indices directly to bounding boxes. You can also use relations tags to draw sequential directional arrows between text blocks to map the exact path a human reader follows.
Why do multi-page document images fail to load in the labeling interface?
Multi-page workflows rely on an array of presigned image URLs mapped to a page navigator component. If your images fail to load, configure Cross-Origin Resource Sharing rules on your content delivery network. Without explicit headers allowing cross-origin requests, the browser blocks the labeling tool from preloading the document pages.
How do you convert layout annotations for LayoutLMv3 or Detectron2 training?
The interface exports spatial coordinates as percentages rather than absolute pixel values. Before you feed this structured data into document intelligence models, multiply these percentages by the original image dimensions. For multi-page files, extract the item index field to map each bounding box back to its specific source page.
How do you measure inter-annotator agreement for spatial document blocks?
Evaluate geometric accuracy using Intersection over Union metrics rather than simple categorical matching. Set an overlap threshold between 0.5 and 0.7 for bounding boxes to automatically flag misaligned regions. The review system aggregates these spatial scores with exact match metrics for the block classifications to generate a single confidence score.