logofirst
logofirst

Set up machine learning with your labeling process

Set up machine learning with your labeling process by setting up a machine learning backend for Label Studio.

With Label Studio, you can set up your favorite machine learning models to do the following:

With these capabilities, you can use Label Studio as part of a production-ready Prediction Service.

Quickstart

Get started with a machine learning (ML) backend with Label Studio. You need to start both the machine learning backend and Label Studio to start labeling. You can review examples in the label-studio-ml/examples section of the repository.

Follow these steps to set up an example text classifier ML backend with Label Studio:

  1. Clone the Label Studio Machine Learning Backend git repository.
    git clone https://github.com/heartexlabs/label-studio-ml-backend
  1. Set up the environment.
    cd label-studio-ml-backend
    pip install -e .
    cd label_studio_ml/examples
    pip install -r requirements.txt
  1. Create a new ML backend using the example simple text classifier included in the repository.
    label-studio-ml init my_ml_backend --script label_studio-ml/examples/simple_text_classifier.py
  1. Start the ML backend server.
    label-studio-ml start my_ml_backend
  1. Start Label Studio and connect your project to the running ML backend using the Label Studio UI.

Tutorials

For other example machine learning model setups with Label Studio, see the following tutorials:

Getting predictions from a machine learning model

After connecting a model as a machine learning backend, you see model predictions in the labeling interface and on the Tasks page that you use to manage your data.

For example, for an image classification task, the model pre-selects an image class for data labelers to verify.

You can also get a prediction for specific data using the API. For example, to get a prediction for task data of {"text":"some text"} from a Label Studio installation accessible at http://localhost:8080, run the following cURL command:

curl -X POST -d '{"text":"some text"}' -H "Content-Type: application/json" http://localhost:8080/api/models/predictions

Train a model with Label Studio

After you connect a model to Label Studio as a machine learning backend, you can start model training from the UI or using the API.

In development mode, training logs appear in the web browser console.
In production mode, you can find runtime logs in my_backend/logs/uwsgi.log and RQ training logs in my_backend/logs/rq.log.

Set up a machine learning backend for Label Studio with Docker Compose

Label Studio includes everything you need to set up a production-ready ML backend server powered by Docker.

The Label Studio machine learning server uses uWSGI and supervisord and handles background training jobs with RQ.

Prerequisites

Perform these prerequisites to make sure your server starts successfully.

  1. Specify all requirements in a my-ml-backend/requirements.txt file. For example, to specify scikit-learn as a requirement for your model, do the following:

    scikit-learn
  2. Make sure ports 9090 and 6379 are available and do not have services running on them. To use different ports, update the default ports in my-ml-backend/docker-compose.yml, created after you start the machine learning backend.

Start a machine learning backend with Docker Compose

  1. Start the machine learning backend with an example model, or your custom machine learning backend.

    label-studio-ml init my-ml-backend --script label_studio-ml/examples/simple_text_classifier.py

    You see configurations in the my-ml-backend/ directory that you need to build and run a Docker image using Docker Compose.

  2. From the my-ml-backend/ directory, start Docker Compose.

    docker-compose up

    The machine learning backend server starts listening on port 9090.

  3. Connect the machine learning backend to Label Studio on the Model page in Label Studio UI, or use the following command on the command line:

    label-studio start text_classification_project --init --template text_classification --ml-backends http://localhost:9090

Active Learning

The process of creating annotated training data for supervised machine learning models is often expensive and time-consuming. Active Learning is a branch of machine learning that seeks to minimize the total amount of data required for labeling by strategically sampling observations that provide new insight into the problem. In particular, Active Learning algorithms aim to select diverse and informative data for annotation, rather than random observations, from a pool of unlabeled data using prediction scores.

Depending on score types you can select a sampling strategy:

See more about active learning sampling in Set up task sampling for your project.

Troubleshooting

When you encounter any error, there are several hints to get more insights. You can investigate most problems using the server console log. The machine learning backend runs as a separate server from Label Studio, so make sure you check the correct server console logs while troubleshooting.

Note: When you start an ML backend using Docker Compose, the logs are located in:

  • main process / inference logs: logs/uwsgi.log
  • training logs: logs/rq.log

I launched the ML backend, but it appears as Disconnected after adding it in the Label Studio UI

Your ML backend server might not have started properly.

  1. First, try to do a health check by running the following:
    curl -X GET http://localhost:9090/health
  2. If the health check doesn’t respond, or you see errors, check the server logs.
  3. If you used Docker Compose to start the ML backend, check for requirements missing from the requirements.txt file used to set up the environment inside Docker.

The ML backend seems to be connected, but after I click “Start Training”, I see “Error. Click here for details.” message

Check for the traceback after you click on the error message. Some common errors are an insufficient amount of annotations made or memory issues.
If you can’t resolve them by yourself, contact us on Slack.

My predictions are wrong or I can’t see the model prediction results on the labeling page

Most likely, the format of the predictions you’re trying to view are incorrect. The ML backend predictions format follows the same structure as predictions in imported preannotations.