Get Started With Label Studio Integrations
Introduction
Label Studio is one of the most popular open source data labeling platforms, supporting hundreds of thousands of data scientists around the world and used to create hundreds of millions of annotations.
One of the primary reasons Label Studio has been so widely adopted is its flexibility, including an extensible interface and broad support for technical integrations. Data scientists can import a wide range of data types, configure the labeling interface to meet their exact needs, accelerate their workflow with machine learning integrations, and build Label Studio into their preferred data science pipeline.
This flexibility is made possible by the broad range of integration points built into Label Studio. Label Studio's collection of extensible APIs allows you to connect services for data storage, machine learning, and platform automation. This article outlines options for writing your own integrations for Label Studio.
Storage APIs
Label Studio offers two primary means of storing and managing data for labeling. The first is Local Storage, with data held locally in the Label Studio installation. This method is appropriate for starting with Label Studio but is not recommended for production data labeling work.
The preferred method for managing data is through Label Studio’s “Source Storage” and “Target Storage” interfaces. Label Studio ships with many different source and target storage connector types, including:
- Amazon S3.
- Google Cloud Storage.
- Microsoft Azure Blob Storage.
- Redis Database.
- Local File Storage.
You may add a new storage interface by patching Label Studio with a library that conforms to the connector API. You can find this in the Label Studio Source repository. The Pachyderm community used this approach with their custom storage integration, which you can read more about in this article.
Machine Learning APIs
Label Studio offers a public API that enables you to integrate machine learning models in various ways with your labeling workflow. These integration points include:
- Pre-labeling: models predict data labels that are uploaded manually to Label Studio, with annotators performing further manual refinements.
- Auto-labeling: models automatically predict annotations within Label Studio.
- Online Learning: models are automatically updated as new annotations are created, retraining the model as part of the labeling process.
- Active Learning: users annotate example tasks that are difficult for models to make predictions on, targeting retraining to improve the performance of a model for a specific task.
You only need two methods to integrate your machine learning pipeline with Label Studio:
- A predict method that generates a label based on the input data.
- A fit method that updates a model based on a user-provided annotation.
The input and output types depend on the data and project setup and will vary based on your needs.
You can learn how to integrate Label Studio into your machine learning pipeline and check out an entire collection of machine learning integration examples for inspiration.
Label Studio Platform API
Almost every part of label studio can be automated through its API, allowing you to integrate it into your machine learning and data science workflows.
Amongst the things you can do with the API are:
- Manage projects and users.
- Configure storage.
- Attach machine learning integrations.
- Export annotations.
The Label Studio Community provides a Python SDK for the API, and you can get started by following along with the Label Studio Python SDK tutorial.
Label Studio Webhooks
Webhooks in Label Studio let you set up integrations that subscribe to certain events that occur inside Label Studio. Label Studio sends an HTTP POST request to the configured webhook URL when an event is triggered.
For example, you might use events sent to your webhook URL to do the following:
- Start training a machine learning model after annotating a certain number of tasks.
- Perform active learning after annotating a task has been annotated.
- Prompt annotators to begin working on a project after it is fully set up.
- Create a new version of training data in a dataset versioning repository.
You can find a complete guide for connecting web hooks in the Label Studio documentation.
Frontend Configuration and Integration
Label Studio consists of a backend for managing annotations and your annotation workflow and a frontend that serves as a highly extensible and configurable view for the backend. Every data labeling workflow is unique, and Label Studio’s flexible interface allows users to tailor their workflow to meet their exact needs and maximize annotator productivity. The Label Studio Playground showcases over 50 annotation templates across 10 categories and has an interactive design platform for creating custom templates.
The Frontend can also be used as an embeddable frontend component for other data annotation platforms, allowing you to leverage the flexibility of the frontend against your own data management platform. For more information, check out the Label Studio frontend integration guide and how the Streamlit community has created a Label Studio Frontend Component for Streamlit applications.
Additional MLOps Integration Points
Label Studio provides many tools to help you manage the application within your MLOps Framework. These include:
- The official PyPi and Docker releases.
- Helm charts for deploying to Kubernetes.
- Terraform scripts for deploying to managed infrastructure.
Several open platforms utilize these packages to deploy and integrate with Label Studio, making Label Studio a critical component of a rich MLOps ecosystem.
Sharing your Integration
Need help integrating Label Studio with your project or workflow? Join the Label Studio Slack, where you can get help from the development team and the active community with thousands of members.
Have an integration that you want to share? Send the team a request describing your integration, and we’ll add it to the collection. Or connect with the community team on Slack with information and a link, and we’ll work with you to get it featured.