logofirst
logofirst GitHub

Import pre-annotated data into Label Studio

If you have predictions generated for your dataset from a model, either as pre-annotated tasks or pre-labeled tasks, you can import the predictions with your dataset into Label Studio for review and correction. Label Studio automatically displays the pre-annotations that you import on the Labeling page for each task.

To generate interactive pre-annotations with a machine learning model while labeling, see Set up machine learning with Label Studio.

You can import pre-annotated tasks into Label Studio using the UI or using the API.

Format pre-annotations for Label Studio

To import predicted labels into Label Studio, you must use the Basic Label Studio JSON format and set up your tasks with the predictions JSON key. The Label Studio ML backend also outputs tasks in this format.

Specific examples for pre-annotations

Refer to the following examples for sample pre-annotation formats:

To format pre-annotations for Label Studio not represented in these examples, refer to the sample results JSON for the relevant object and control tags for your labeling configuration, such as the Audio tag for audio classification tasks. Each tag must be represented in the JSON pre-annotations format to render predictions in the Label Studio UI. Not all object and control tags list sample results JSON.

You can also use the Label Studio Playground to preview the output JSON for a specific labeling configuration.

JSON format for pre-annotations

Label Studio JSON format for pre-annotations must contain two sections:

The JSON format for pre-annotations must match the labeling configuration used for your data labeling project.

Specify the data object

Use the data object to reference the value of the data specified by the Object tag in your labeling configuration. For example, the following excerpt of a time series labeling configuration:

...
    <TimeSeries name="ts" value="$csv" valueType="url">
        <Channel column="first_column"/>
    </TimeSeries>
...

This excerpt specifies value="$csv" in the TimeSeries Object tag. As a result, the data object for the pre-annotations JSON file for this labeling configuration must use “csv” to specify the location of the CSV data for the time series pre-annotations, like in the following example:

[
  {
    "data": {
      "csv": "https://app.heartex.ai/samples/time-series.csv?time=None&values=first_column"    },
    "predictions": []
  }
]

Add results to the predictions array

The predictions array also depends on the labeling configuration. Some pre-annotation fields are only relevant for certain types of labeling. The following table describes the JSON objects and arrays that exist for all pre-annotations:

JSON key type description
predictions array Contains all pre-annotations for a specific task.
predictions.model_version string Specifies the model version that produced the prediction.
predictions.result array Contains all pre-annotated results for a specific task in a JSON object.
result.value object Contains details for a specific labeled region.
result.id string Arbitrary string used to identify a labeled region. Must match related regions, such as in OCR pre-annotations.
result.from_name string String used to reference the labeling configuration from_name for the type of labeling being performed. Must match the labeling configuration.
result.to_name string String used to reference the labeling configuration to_name for the type of labeling being performed. Must match the labeling configuration.
result.type string Specify the labeling tag for the type of labeling being performed. For example, a named entity recognition task has a type of labels.

Other types of annotation contain specific fields. You can review the examples on this page, or review the tag documentation for the Object and Control tags in your labeling configuration labeling-specific result objects. For example, the Audio tag, HyperText tag, Paragraphs tag, KeyPointLabels and more all contain sample result JSON examples.

Note: If you’re generating pre-annotations for a custom ML backend, you can use the self.parsed_label_config variable to retrieve the labeling configuration for a project and generate pre-annotations. See the custom ML backend documentation for more details.

Import pre-annotations for images

For example, import predicted labels for tasks to determine whether an item in an image is an airplane or a car.

For image pre-annotations, Label Studio expects the x, y, width, and height of image annotations to be provided in percentages of overall image dimension. See Units for image annotations on this page for more about how to convert formats.

Use the following labeling configuration:

<View>
  <Choices name="choice" toName="image" showInLine="true">
    <Choice value="Boeing" background="blue"/>
    <Choice value="Airbus" background="green" />
  </Choices>

  <RectangleLabels name="label" toName="image">
    <Label value="Airplane" background="green"/>
    <Label value="Car" background="blue"/>
  </RectangleLabels>

  <Image name="image" value="$image"/>
</View>

Example JSON

After you set up an example project, create example tasks that match the following format.


Click to expand the example image JSON

Save this example JSON as a file to import it into Label Studio, for example, example_prediction_task.json.

[{
  "data": {
    "image": "http://localhost:8080/static/samples/sample.jpg" 
  },

  "predictions": [{
    "result": [
      {
        "id": "result1",
        "type": "rectanglelabels",        
        "from_name": "label", "to_name": "image",
        "original_width": 600, "original_height": 403,
        "image_rotation": 0,
        "value": {
          "rotation": 0,          
          "x": 4.98, "y": 12.82,
          "width": 32.52, "height": 44.91,
          "rectanglelabels": ["Airplane"]
        }
      },
      {
        "id": "result2",
        "type": "rectanglelabels",        
        "from_name": "label", "to_name": "image",
        "original_width": 600, "original_height": 403,
        "image_rotation": 0,
        "value": {
          "rotation": 0,          
          "x": 75.47, "y": 82.33,
          "width": 5.74, "height": 7.40,
          "rectanglelabels": ["Car"]
        }
      },
      {
        "id": "result3",
        "type": "choices",
        "from_name": "choice", "to_name": "image",
        "value": {
          "choices": ["Airbus"]
      }
    }],
    "score": 0.95
  }]
}]

In this example there are 3 results inside 1 prediction, or pre-annotation:

  • result1 - the first bounding box
  • result2 - the second bounding box
  • result3 - choice selection
    The prediction score applies to the entire prediction.

Import pre-annotated tasks into Label Studio using the UI or using the API.

In the Label Studio UI, the imported prediction for this task looks like the following:

screenshot of the Label Studio UI showing an image of airplanes with bounding boxes covering each airplane.

Import pre-annotated regions for images

If you want to import images with pre-annotated regions without labels assigned to them, follow this example.

Use the following labeling configuration:

<View>
  <View style="display:flex;align-items:start;gap:8px;flex-direction:row">
    <Image name="image" value="$image" zoom="true" zoomControl="true" rotateControl="false"/>
    <Rectangle name="rect" toName="image" showInline="false"/>
  </View>
  <Ellipse name="ellipse" toName="image"/>
  <KeyPoint name="kp" toName="image"/>
  <Polygon name="polygon" toName="image"/>
  <Brush name="brush" toName="image"/>
  <Labels name="labels" toName="image" fillOpacity="0.5" strokeWidth="5">
    <Label value="Vehicle" background="green"/>
    <Label value="Building" background="blue"/>
    <Label value="Pavement" background="red"/>
  </Labels>
</View>

Example JSON

After you set up an example project, create example tasks that match the following format.


Click to expand the example image region JSON

Save this example JSON as a file to import it into Label Studio, for example, example_prediction_task.json.

[{
    "id":8,
    "predictions":[
        {
            "id":10,
            "result":[
               {
                  "original_width":800,
                  "original_height":450,
                  "image_rotation":0,
                  "value":{
                     "x":55.46666666666667,
                     "y":2.3696682464454977,
                     "width":35.86666666666667,
                     "height":46.91943127962085,
                     "rotation":0
                  },
                  "id":"ABC",
                  "from_name":"rect",
                  "to_name":"image",
                  "type":"rectangle"
               },
               {
                  "original_width":800,
                  "original_height":450,
                  "image_rotation":0,
                  "value":{
                     "x":58.4,
                     "y":64.21800947867298,
                     "width":30.533333333333335,
                     "height":19.90521327014218,
                     "rotation":0
                  },
                  "id":"DEF",
                  "from_name":"rect",
                  "to_name":"image",
                  "type":"rectangle"
               },
               {
                  "original_width":800,
                  "original_height":450,
                  "image_rotation":0,
                  "value":{
                     "points":[
                        [
                           20.933333333333334,
                           28.90995260663507
                        ],
                        [
                           25.866666666666667,
                           64.69194312796209
                        ],
                        [
                           38.4,
                           62.796208530805686
                        ],
                        [
                           34.13333333333333,
                           27.488151658767773
                        ]
                    ]
                },
                "id":"GHI",
                "from_name":"polygon",
                "to_name":"image",
                "type":"polygon"
                },
                {
                "original_width":800,
                "original_height":450,
                "image_rotation":0,
                "value":{
                    "x":8.4,
                    "y":20.14218009478673,
                    "radiusX":4,
                    "radiusY":7.109004739336493,
                    "rotation":0
                    },
                "id":"JKL",
                "from_name":"ellipse",
                "to_name":"image",
                "type":"ellipse"
                }
            ],
            "task":8
        }
    ],
    "data":{
    "image":"/data/upload/31159626248_d0362d027c_c.jpg"
    },
    "project":4
}]

In this example there are 3 regions inside 1 result field for a prediction, or pre-annotation:

  • region ABC - a rectangle bounding box
  • region DEF - a second rectangle bounding box
  • region GHI - a polygonal segmentation
  • region JKL - an ellipse
    None of the regions have labels applied. The labeling configuration must use the Rectangle tag instead of the RectangleLabels tag to support this type of prediction. Even though the labeling configuration for this example has a Labels tag, the predictions do not need to specify labels for the regions.

Units of image annotations

The units the x, y, width and height of image annotations are provided in percentages of overall image dimension.

Use the following conversion formulas for x, y, width, height:

pixel_x = x / 100.0 * original_width
pixel_y = y / 100.0 * original_height
pixel_width = width / 100.0 * original_width
pixel_height = height / 100.0 * original_height

For example:

task = {
    "annotations": [{
        "result": [
            {
                "...": "...",

                "original_width": 600,
                "original_height": 403,
                "image_rotation": 0,

                "value": {
                    "x": 5.33,
                    "y": 23.57,
                    "width": 29.16,
                    "height": 31.26,
                    "rotation": 0,
                    "rectanglelabels": [
                        "Airplane"
                    ]
                }
            }
        ]
    }]
}

# convert from LS percent units to pixels 
def convert_from_ls(result):
    if 'original_width' not in result or 'original_height' not in result:
        return None

    value = result['value']
    w, h = result['original_width'], result['original_height']

    if all([key in value for key in ['x', 'y', 'width', 'height']]):
        return w * value['x'] / 100.0, \
               h * value['y'] / 100.0, \
               w * value['width'] / 100.0, \
               h * value['height'] / 100.0

# convert from pixels to LS percent units 
def convert_to_ls(x, y, width, height, original_width, original_height):
    return x / original_width * 100.0, y / original_height * 100.0, \
           width / original_width * 100.0, height / original_height * 100


# convert from LS
output = convert_from_ls(task['completions'][0]['result'][0])
if output is None:
    raise Exception('Wrong convert') 
pixel_x, pixel_y, pixel_width, pixel_height = output
print(pixel_x, pixel_y, pixel_width, pixel_height)

# convert back to LS 
x, y, width, height = convert_to_ls(pixel_x, pixel_y, pixel_width, pixel_height, 600, 403)
print(x, y, width, height)

Import pre-annotations for text

In this example, import pre-annotations for text using the named entity recognition template:

<View>
  <Labels name="label" toName="text">
    <Label value="Person"></Label>
    <Label value="Organization"></Label>
    <Label value="Fact"></Label>
    <Label value="Money"></Label>
    <Label value="Date"></Label>
    <Label value="Time"></Label>
    <Label value="Ordinal"></Label>
    <Label value="Percent"></Label>
    <Label value="Product"></Label>
    <Label value="Language"></Label>
    <Label value="Location"></Label>
  </Labels>
  <Text name="text" value="$text"></Text>
</View>

Example JSON

This example JSON file contains two tasks, each with two sets of pre-annotations from different models. The first task also contains prediction scores for each NER span.


Click to expand the example NER JSON

Save this example JSON as a file, for example: example_preannotated_ner_tasks.json.

[
  {
    "data": {
      "text": "All that changed when he was 27 and he came to Jerusalem. It was the weekend of both Easter and Passover, and the city was flooded with tourists."
    },
    "predictions": [
      {
        "model_version": "one",
        "result": [
          {
            "id": "abc",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 29,
              "end": 31,
              "score": 0.70,
              "text": "27",
              "labels": [
                "Date"
              ]
            }
          },
          {
            "id": "def",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 47,
              "end": 56,
              "score": 0.65,
              "text": "Jerusalem",
              "labels": [
                "Location"
              ]
            }
          },
          {
            "id": "ghi",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 65,
              "end": 76,
              "score": 0.95,
              "text": "the weekend",
              "labels": [
                "Date"
              ]
            }
          },
          {
            "id": "jkl",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 85,
              "end": 91,
              "score": 0.50,
              "text": "Easter",
              "labels": [
                "Date"
              ]
            }
          }
        ]
      },
      {
        "model_version": "two",
        "result": [
          {
            "id": "mno",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 29,
              "end": 31,
              "score": 0.55,
              "text": "27",
              "labels": [
                "Date"
              ]
            }
          },
          {
            "id": "pqr",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 47,
              "end": 56,
              "score": 0.40,
              "text": "Jerusalem",
              "labels": [
                "Location"
              ]
            }
          },
          {
            "id": "stu",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 65,
              "end": 76,
              "score": 0.32,
              "text": "the weekend",
              "labels": [
                "Time"
              ]
            }
          },
          {
            "id": "vwx",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 85,
              "end": 91,
              "score": 0.22,
              "text": "Easter",
              "labels": [
                "Location"
              ]
            }
          },
          {
            "id": "yz1",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 96,
              "end": 104,
              "score": 0.96,
              "text": "Passover",
              "labels": [
                "Date"
              ]
            }
          }
        ]
      }
    ]
  },
  {
    "data": {
      "text": " Each journal was several inches thick and bound in leather. On one page are drawn portraits of Sunny in a flowery, Easter dress and sun hat. On another page are hundreds of sketches of leaves that Niyati saw in her yard."
    },
    "predictions": [
      {
        "model_version": "one",
        "result": [
          {
            "id": "ab2",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 18,
              "end": 32,
              "text": "several inches",
              "labels": [
                "Product"
              ]
            }
          },
          {
            "id": "cd3",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 64,
              "end": 67,
              "text": "one",
              "labels": [
                "Percent"
              ]
            }
          },
          {
            "id": "ef4",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 96,
              "end": 101,
              "text": "Sunny",
              "labels": [
                "Person"
              ]
            }
          },
          {
            "id": "gh5",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 162,
              "end": 170,
              "text": "hundreds",
              "labels": [
                "Percent"
              ]
            }
          },
          {
            "id": "ij6",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 198,
              "end": 204,
              "text": "Niyati",
              "labels": [
                "Person"
              ]
            }
          }
        ]
      },
      {
        "model_version": "two",
        "result": [
          {
            "id": "kl7",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 18,
              "end": 32,
              "text": "several inches",
              "labels": [
                "Fact"
              ]
            }
          },
          {
            "id": "mn8",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 64,
              "end": 67,
              "text": "one",
              "labels": [
                "Percent"
              ]
            }
          },
          {
            "id": "op9",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 96,
              "end": 101,
              "text": "Sunny",
              "labels": [
                "Time"
              ]
            }
          },
          {
            "id": "qr10",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 116,
              "end": 122,
              "text": "Easter",
              "labels": [
                "Location"
              ]
            }
          },
          {
            "id": "st11",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 162,
              "end": 170,
              "text": "hundreds",
              "labels": [
                "Money"
              ]
            }
          },
          {
            "id": "uv12",
            "from_name": "label",
            "to_name": "text",
            "type": "labels",
            "value": {
              "start": 198,
              "end": 204,
              "text": "Niyati",
              "labels": [
                "Person"
              ]
            }
          }
        ]
      }
    ]
  }
]

Import pre-annotated tasks into Label Studio using the UI or using the API.

In the Label Studio UI, the imported prediction for the first task looks like the following:

screenshot of the Label Studio UI showing the text with highlighted text labels and prediction scores visible.

You can sort the prediction scores for each labeled region using the Regions pane options.

Import brush segmentation pre-annotations in RLE format

If you want to import pre-annotations for brush mask image segmentation using the BrushLabels tag, you must convert the masks to RLE format first. The Label Studio Converter package has some helper functions for this. See the following for common conversion cases and guidance.

Install Label Studio Converter:

pip install -U label-studio-converter

Import brush tools:

from label_studio_converter import brush

For more assistance, review this example code creating a Label Studio task with pre-annotations for brush labels.

Import OCR pre-annotations

Import pre-annotations for optical character recognition (OCR), such as output from tesseract like in this example blog post.

In this example, import pre-annotations for OCR tasks using the OCR template:

<View>
  <Image name="image" value="$ocr"/>
  <Labels name="label" toName="image">
    <Label value="Text" background="green"/>
    <Label value="Handwriting" background="blue"/>
  </Labels>
  <Rectangle name="bbox" toName="image" strokeWidth="3"/>
  <Polygon name="poly" toName="image" strokeWidth="3"/>
  <TextArea name="transcription" toName="image" editable="true" perRegion="true" required="true" maxSubmissions="1" rows="5" placeholder="Recognized Text" displayMode="region-list"/>
</View>

Example JSON

This example JSON contains one task with three results dictionaries, one for each type of tag in the labeling configuration: Rectangle, Labels, and TextArea.


Click to expand the example image JSON

Save this example JSON as a file to import it into Label Studio, for example, example_prediction_task.json.

{
   "data": {
      "ocr": "/data/upload/receipt_00523.png"
   },
   "predictions": [
      {
         "model_version": "best_ocr_model_1_final",
         "result": [
            {
               "original_width": 864,
               "original_height": 1296,
               "image_rotation": 0,
               "value": {
                  "x": 48.93333333333333,
                  "y": 61.333333333333336,
                  "width": 9.733333333333333,
                  "height": 2.8444444444444446,
                  "rotation": 0
               },
               "id": "bb1",
               "from_name": "bbox",
               "to_name": "image",
               "type": "rectangle"
            },
            {
               "original_width": 864,
               "original_height": 1296,
               "image_rotation": 0,
               "value": {
                  "x": 48.93333333333333,
                  "y": 61.333333333333336,
                  "width": 9.733333333333333,
                  "height": 2.8444444444444446,
                  "rotation": 0,
                  "labels": [
                     "Text"
                  ]
               },
               "id": "bb1",
               "from_name": "label",
               "to_name": "image",
               "type": "labels"
            },
            {
               "original_width": 864,
               "original_height": 1296,
               "image_rotation": 0,
               "value": {
                  "x": 48.93333333333333,
                  "y": 61.333333333333336,
                  "width": 9.733333333333333,
                  "height": 2.8444444444444446,
                  "rotation": 0,
                  "text": [
                     "TOTAL"
                  ]
               },
               "id": "bb1",
               "from_name": "transcription",
               "to_name": "image",
               "type": "textarea"
            }
         ],
         "score": 0.89
      }
   ]
}

This example JSON also includes a prediction score for the task. The IDs for each rectangle result match the label assigned to the region and the text area transcription for the region.

The image data in this example task references an uploaded file, identified by the source_filename assigned by Label Studio after uploading the image. The best way to reference image data is using presigned URLs for images stored in cloud storage, or absolute paths to image data stored in local storage and added to Label Studio by syncing storage.

Import pre-annotated tasks into Label Studio using the UI or using the API.

Troubleshoot pre-annotations

If annotators can’t see predictions or if you encounter unexpected behavior after you import pre-annotations into Label Studio, review this guidance to resolve the issues.

Make sure the predictions are visible to annotators

In the Settings > Machine Learning section for your project, make sure that the following settings are configured:

Check the configuration values of your labeling configuration and tasks

The from_name of the pre-annotation task JSON must match the value of the name in the <Labels name="label" toName="text"> portion of the labeling configuration. The to_name must match the toName value.

In the text example on this page, the JSON includes "from_name": "label" to correspond with the <Labels name="label" and "to_name": text to correspond with the toName="text of the labeling configuration. The default template might contain <Labels name="ner" toName="text">. To work with this example JSON, you need to update the values to match.

In the image example on this page, the XML includes

...
<Choices name="choice" toName="image" showInLine="true">`
...
<RectangleLabels name="label" toName="image">
...

To correspond with the following portions of the example JSON:

...
"type": "rectanglelabels",        
"from_name": "label", "to_name": "image",
...
type": "choices",
"from_name": "choice", "to_name": "image",
...

Check the labels in your configuration and your tasks

Make sure that you have a labeling configuration set up for the labeling interface, and that the labels in your JSON file exactly match the labels in your configuration. If you’re using a tool to transform your model output, make sure that the labels aren’t altered by the tool.

Check the IDs and toName values

If you’re performing nested labeling, such as displaying a TextArea tag for specific Label or Choice values, the IDs for those results must match.

For example, if you want to transcribe text alongside a named entity resolution task, you might have the following labeling configuration:

<View>
  <Labels name="label" toName="text">
    <Label value="PER" background="red"/>
    <Label value="ORG" background="darkorange"/>
    <Label value="LOC" background="orange"/>
    <Label value="MISC" background="green"/>
  </Labels>
  <Text name="text" value="$text"/>
  <TextArea name="entity" toName="text" perRegion="true"/>
</View>

If you wanted to add predicted text and suggested transcriptions for this labeling configuration, you might use the following example JSON.

{
"data":{
         "text":"The world that we live in is a broad expanse of nothingness, said the existential philosopher, before he rode away with his cat on his motorbike. "
      },
   "predictions":[
      {
            "result":[
               {
                  "value":{
                     "start":135,
                     "end":144,
                     "text":"motorbike",
                     "labels":[
                        "ORG"
                     ]
                  },
                  "id":"def",
                  "from_name":"ner",
                  "to_name":"text",
                  "type":"labels"
               },
               {
                  "value":{
                     "start":135,
                     "end":144,
                     "text":[
                        "yay"
                     ]
                  },
                  "id":"def",
                  "from_name":"entity",
                  "to_name":"text",
                  "type":"textarea"
               }
            ]
      }
   ]
}

Because the TextArea tag applies to each labeled region, the IDs for the label results and the textarea results must match.

Read only and hidden regions

In some situations it’s very helpful to hide or to make read-only bounding boxes, text spans, audio segments, etc. You can put "readonly": true or "hidden": true in regions to achieve this (the dict inside of annotations.result list).