NEW10X Faster Labeling with Prompts—Now Generally Available in SaaS

Testing SmolDocling with Label Studio: Evaluating OCR for Document Conversion

Integrations

Smoldocling isn’t your typical OCR tool. Where most optical character recognition models struggle with tables, charts, and structured formatting, Smoldocling stands out by offering a lightweight, all-in-one solution for full-document conversion.

In this post, we walk through how to test Smoldocling’s OCR capabilities using Label Studio, helping you evaluate how well it extracts text, layout, and structure from complex documents.

What is SmolDocling?

As introduced in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion, SmolDocling is designed to process entire pages while retaining structure, spatial location, and formatting. Unlike traditional OCR models that require multiple specialized components, SmolDocling generates DocTags, a universal markup format that captures all document elements in full context. This makes it more efficient and scalable for a wide range of document types, including business reports, academic papers, patents, and technical documents.

But how well does it perform on real-world data? To help answer that, we have created a Jupyter Notebook that walks you through testing SmolDocling’s OCR capabilities using Label Studio.

Why Evaluating OCR Models Matters

OCR models have improved significantly, but they still face major challenges.

  • Inconsistent recognition of tables, formulas, and charts
  • Misaligned bounding boxes that affect structured data extraction
  • Formatting errors that disrupt readability

SmolDocling aims to solve these issues by providing a compact, vision-language model that processes full-page documents with structured outputs. However, evaluation is critical to measure accuracy and fine-tune results for real-world use.

Try It Yourself

To get started, check out the step-by-step notebook.

By integrating SmolDocling with Label Studio, you can gain insights into how well the model performs and fine-tune results to improve document understanding.

Related Content