HumanSignal Joins the DocLang Ecosystem to Support Human-in-the-Loop Document AI
Documents are still one of the biggest sources of enterprise knowledge. But most document formats were built for people to read, not for AI systems to understand. That gap creates problems for modern AI workflows.
PDFs, scans, presentations, forms, and reports often contain critical structure and context that gets lost during extraction. Tables break. Reading order disappears. Layout relationships become ambiguous. Governance policies around what models can or cannot use are difficult to preserve downstream.
Those challenges are part of the reason the Linux Foundation AI & Data Foundation recently announced the new DocLang Specification Working Group, an open effort focused on creating more interoperable standards for document understanding workflows.
That broader effort also connects closely with HumanSignal’s collaboration with the Docling team on structured document understanding workflows inside Label Studio. These workflows include reviewing layouts, refining reading order, and improving model-generated document structures through human feedback loops.
The Role of Open Standards in Document AI
The DocLang effort builds alongside Docling, the open source document processing toolkit originally developed by IBM Research Zurich and now hosted by the Linux Foundation AI & Data Foundation.
Docling focuses on converting documents into structured, AI-ready representations across formats like PDFs, DOCX, PPTX, HTML, spreadsheets, and images.
DocLang complements that effort by defining a more standardized way to express and exchange those structured document representations across tools and systems.
Together, these efforts point toward a broader shift toward more interoperable infrastructure for document AI workflows.
What is DocLang?
DocLang is designed to represent both the semantic meaning and geometric structure of documents in a format AI systems can consistently interpret.
Instead of flattening a document into plain text, DocLang preserves structural context such as:
- headings and hierarchy
- tables and structured regions
- layout and positioning
- reading order
- document relationships
- governance metadata
Document understanding extends far beyond OCR alone.
Modern AI systems increasingly need to interpret how information is organized across a page, which elements belong together, and how content flows through complex layouts. That becomes especially important in workflows involving:
- enterprise search and retrieval
- agentic AI systems
- document classification
- structured extraction
- fine-tuning document models
- human review and correction loops
Building Human-in-the-Loop Workflows for Document Understanding
At HumanSignal, we’ve been exploring how human-in-the-loop workflows can help teams adapt and improve document understanding systems for their own specialized data.
One of the biggest challenges in document AI is that enterprise documents are rarely standardized. Teams often work with highly customized layouts, domain-specific structures, and edge cases that generic models struggle to interpret consistently.
We’ve been collaborating with the Docling team on workflows for visualizing, reviewing, and refining DocLang document structures inside Label Studio. That includes enabling teams to validate layouts, adjust reading order, correct document regions, and improve model outputs over time through structured review workflows.
As document AI systems become more integrated into production workflows, organizations will need infrastructure that supports interoperability, transparency, and continuous improvement across both models and data pipelines.
We’re excited to support the broader ecosystem effort and look forward to sharing more soon.
Stay tuned.