logofirst
logofirst

Set up machine learning

Set up machine learning with your labeling process by setting up a machine learning backend for Label Studio.

With Label Studio, you can set up your favorite machine learning models to do the following:

With these capabilities, you can use Label Studio as part of a production-ready prediction service.

How to set up machine learning with Label Studio?

Use the Label Studio ML backend to integrate Label Studio with machine learning models. The Label Studio ML backend is an SDK that you can use to wrap your machine learning code and turn it into a web server. You can then connect that server to a Label Studio instance to perform 2 tasks:

To set up a Label Studio ML backend, perform the following steps:

  1. Get your model code, either by writing one from scratch or using an existing model.
  2. Wrap it with the Label Studio SDK.
  3. Create a running server script.
  4. Launch the script.
  5. Connect Label Studio to the ML backend on the UI.
    For an example, follow the Quickstart. For help with steps 1-3, see how to create your own machine learning backend.

Quickstart

Get started with a machine learning (ML) backend with Label Studio. You need to start both the machine learning backend and Label Studio to start labeling. You can review examples in the label-studio-ml/examples section of the Label Studio ML backend repository or in the machine learning tutorials.

Follow these steps to set up an example text classifier ML backend with Label Studio:

  1. Clone the Label Studio Machine Learning Backend git repository.

    git clone https://github.com/heartexlabs/label-studio-ml-backend
  2. Set up the environment.

    It is highly recommended to use venv, virtualenv or conda python environments. You can use the same environment as Label Studio. Read more in the Python documentation about creating virtual environments via venv.

    cd label-studio-ml-backend
    
    # Install label-studio-ml and its dependencies
    pip install -U -e .
    
    # Install example dependencies
    pip install -r label_studio_ml/examples/requirements.txt
  3. Initialize an ML backend based on an example script:

    label-studio-ml init my_ml_backend \
      --script label_studio_ml/examples/simple_text_classifier.py

    This ML backend is an example provided by Label Studio. See how to create your own ML backend.

  4. Start the ML backend server.

    label-studio-ml start my_ml_backend
  5. Start Label Studio. Run the following:

    label-studio start
  6. Create a project and import text data. Set up the labeling interface to use the Text Classification template.

  7. In the Machine Learning section of the project settings page, add the link http://localhost:9090 to your machine learning model backend.


If you run into any issues, see Troubleshoot machine learning

Train a model

After you connect a model to Label Studio as a machine learning backend, you can start training the model:

You must have at least one task annotated before you can start training.

In development mode, training logs appear in the web browser console. In production mode, you can find runtime logs in my_backend/logs/uwsgi.log and RQ training logs in my_backend/logs/rq.log on the server running the ML backend, which might be different from the Label Studio server. To see more detailed logs, start the ML backend server with the --debug option.

Get predictions from a model

After you connect a model to Label Studio as a machine learning backend, you can see model predictions in the labeling interface if the model is pre-trained, or right after it finishes training.

If the model has not been trained yet, do the following to get predictions to appear:

  1. Start labeling data in Label Studio.
  2. Return to the Machine Learning settings for your project and click Start Training to start training the model.
  3. In the data manager for your project, select the tasks that you want to get predictions for and select Retrieve predictions using the drop-down actions menu. Label Studio sends the selected tasks to your ML backend.
  4. After retrieving the predictions, they appear in the task preview and Label stream modes for the selected tasks.

You can also retrieve predictions automatically by loading tasks. To do this, enable Retrieve predictions when loading a task automatically on the Machine Learning settings for your project. When you scroll through tasks in the data manager for a project, the predictions for those tasks are automatically retrieved from the ML backend. Predictions also appear when labeling tasks in the Label stream workflow.

Note: For a large dataset, the HTTP request to retrieve predictions might be interrupted by a timeout. If you want to get all predictions for all tasks in a dataset from connected machine learning backends, make a POST call to the predictions endpoint of the Label Studio API to prompt the machine learning backend to create predictions for the tasks.

If you want to retrieve predictions manually for a list of tasks using only an ML backend, make a GET request to the /predict URL of your ML backend with a payload of the tasks that you want to see predictions for, formatted like the following example:

{
  "tasks": [
    {"data": {"text":"some text"}}
  ]
}

Get interactive preannotations

ML-assisted labeling with interactive preannotations works with image segmentation and object detection tasks using rectangles, ellipses, polygons, brush masks, and keypoints, as well as with HTML and text named entity recognition tasks. Your ML backend must support the type of labeling that you’re performing and recognize the input that you create and be able to respond with the relevant output for a prediction.

  1. Set up your machine learning backend for ML-assisted labeling.
    1. For your project, open Settings > Machine Learning.
    2. Click Add Model or select Edit for an existing machine learning backend.
    3. Type a Title for the machine learning backend.
    4. Enter the URL for the running machine learning backend. For example, http://example.com:9090.
    5. Enable Use for interactive preannotation.
    6. Click Validate and Save.
  2. For image labeling, you can update your labeling configuration to include smart="true" option for the type of labeling you’re performing. Smart tools appear by default if auto-annotation is enabled.
    This option is supported for Rectangle, Ellipse, Polygon, Keypoint, and Brush tags. See the tag documentation. If you only want the smart option to appear and don’t want to perform manual labeling at all, use smartOnly="true".
    1. For your project, open Settings > Labeling Interface.
    2. Click Code to view the XML labeling configuration.
    3. For the relevant tag type that you want to use to generate predictions with your ML backend, add the smart="true" parameter. For example:
      <Brush name="brush" toName="img" smart="true" showInline="true"/>
    4. Save your changes.
  3. After you start labeling, enable Auto-Annotation to see and use the smart option to draw a shape, mask, or assign a keypoint.
  4. For image labeling, after you enable auto-annotation you can choose whether to Auto accept annotation suggestions. If you automatically accept annotation suggestions, regions show up automatically and are immediately created. If you don’t automatically accept suggestions, the regions appear but you can reject or approve them manually, either individually or all at once.


Delete predictions

If you want to delete all predictions from Label Studio, you can do it using the UI or the API:

Choose which predictions to display to annotators

After setting up an ML backend with Label Studio Enterprise, you can choose which model predictions to display to annotators by default. You must have multiple ML backends configured with Label Studio in order to choose which predictions to display.

  1. For a specific project, open the Settings and select Machine Learning.
  2. Under Model Version, select the version of the model that you want to use to display predictions to annotators by default. Your changes save automatically.

When annotators start labeling, they’ll see the predictions from that model version for each task, which they can then modify as needed. If there are no predictions for a task from the model version selected, no predictions display to the annotator even if another model version has predictions for the task.

Set up a machine learning backend with Docker Compose

Label Studio includes everything you need to set up a production-ready ML backend server powered by Docker.

The Label Studio machine learning server uses uWSGI and supervisord and handles background training jobs with RQ.

Prerequisites

Perform these prerequisites to make sure your server starts successfully.

  1. Specify all requirements in a my-ml-backend/requirements.txt file. For example, to specify scikit-learn as a requirement for your model, do the following:
    scikit-learn
  2. Make sure ports 9090 and 6379 are available and do not have services running on them. To use different ports, update the default ports in my-ml-backend/docker-compose.yml, created after you start the machine learning backend.

Start with Docker Compose

  1. Start the machine learning backend with an example model or your custom machine learning backend.

    label-studio-ml init my-ml-backend --script label_studio-ml/examples/simple_text_classifier.py

    You see configurations in the my-ml-backend/ directory that you need to build and run a Docker image using Docker Compose.

  2. From the my-ml-backend/ directory, start Docker Compose.

    docker-compose up

    The machine learning backend server starts listening on port 9090.

  3. Connect the machine learning backend to Label Studio on the Machine Learning settings for your project in Label Studio UI.

If you run into any issues, see Troubleshoot machine learning

Active Learning

The process of creating annotated training data for supervised machine learning models is often expensive and time-consuming. Active Learning is a branch of machine learning that seeks to minimize the total amount of data required for labeling by strategically sampling observations that provide new insight into the problem. In particular, Active Learning algorithms aim to select diverse and informative data for annotation, rather than random observations, from a pool of unlabeled data using prediction scores. For more theory read our article on Towards data science.

You can select a task ordering like Predictions score on Data manager and the sampling strategy will fit the active learning scenario. Label Studio will send a train signal to ML Backend automatically on each annotation submit/update. You can enable these train signals on the machine learning settings page for your project.