Data format

There are two possible ways to import data to your labeling project:

The --input-path argument points to a file or a directory where your labeling tasks reside. By default it expects JSON-formatted tasks,
but you can also specify all other formats listed bellow by using --input-format option.

Import data

JSON file

label-studio init --input-path=my_tasks.json

The most common format for input tasks is single JSON-formatted file tasks.json with tasks are expected to be a list of objects like:

[
  {"my_key": "my_value_1"},
  {"my_key": "my_value_2"}
]

"my_key" is specified by value attribute from some object tag from label config, e.g. <Text value="$my_key"/>. Any other fields are optional within task.
Depending on the object tag type, field values are interpreted differently:

Predefined completions and predictions

In case you want to import predefined completions and/or predictions for labeling (e.g. after being exported from another Label Studio’s project in JSON format),
use the following high level task structure

{
  "data": {"my_key": "my_value_1"},
  "completions": [...],
  "predictions": [...]
}

where "completions" and "predictions" are taken from raw completion format

Directory with JSON files

label-studio init --input-path=dir/with/json/files --input-format=json-dir

Instead of putting all tasks into one file, you can split your input data into several tasks.json, and specify the directory path. Each JSON file is formatted the same as above.

CSV, TSV

When CSV/TSV formatted text file is used, column names are interpreted in the same manner as keys in JSON formatted file, i.e.

my_key
my_value_1
my_value_2

Here "my_key" is specified by value attribute from some object tag from label config, e.g. <Text value="$my_key"/>.

Plain text file

label-studio init --input-path=my_tasks.txt --input-format=text

In a common scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.

my_value_1
my_value_2
...

Values interpretation is the same as in single JSON file case

Directory with text files

label-studio init --input-path=dir/with/text/files --input-format=text-dir

You can split your input data into several plain text files, and specify the directory path. Each plain text file is formatted the same as above.

Directory with image files

label-studio init --input-path=dir/with/images --input-format=image-dir

You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task.

Supported formats are .png, .jpg, .jpeg, .tiff, .bmp, .gif

Directory with audio files

label-studio init --input-path=my/audios/dir --input-format=audio-dir

You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task.

Supported formats are .wav, .aiff, .mp3, .au, .flac

Export data

Your annotation results are stored in raw completion format inside my_project_name/completions directory, one file per labeled task named as task_id.json.

You can optionally convert and export raw completions to more common format by doing one of the following:

Several Export formats are supported:

JSON

List of items in raw completion format stored in JSON file

JSON_MIN

List of items where only "from_name", "to_name" values from raw completion format are kept:

{
  "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg",
  "tag": [{
    "height": 10.458911419423693,
    "rectanglelabels": [
        "Moonwalker"
    ],
    "rotation": 0,
    "width": 12.4,
    "x": 50.8,
    "y": 5.869797225186766
  }]
}

CSV

Results are stored in comma-separated tabular file with column names specified by "from_name" "to_name" values

TSV

Results are stored in tab-separated tabular file with column names specified by "from_name" "to_name" values

CONLL2003

Popular format used for CoNLL-2003 named entity recognition challenge

COCO

Popular machine learning format used by COCO dataset for object detection and image segmentation tasks

Pascal VOC XML

Popular XML-formatted task data used for object detection and image segmentation tasks

Completion format

The output data is stored in completions - JSON formatted files, one per each completed task saved in project directory in completions folder or in the "output_dir" option The example structure of completion is the following:

{
    "completions": [
        {
            "id": "1001",
            "lead_time": 15.053,
            "result": [
                {
                    "from_name": "tag",
                    "id": "Dx_aB91ISN",
                    "source": "$image",
                    "to_name": "img",
                    "type": "rectanglelabels",
                    "value": {
                        "height": 10.458911419423693,
                        "rectanglelabels": [
                            "Moonwalker"
                        ],
                        "rotation": 0,
                        "width": 12.4,
                        "x": 50.8,
                        "y": 5.869797225186766
                    }
                }
            ]
        }
    ],
    "data": {
        "image": "https://htx-misc.s3.amazonaws.com/opensource/label-studio/examples/images/nick-owuor-astro-nic-visuals-wDifg5xc9Z4-unsplash.jpg"
    },
    "id": 1,
    "predictions": [
        {
            "created_ago": "3 hours",
            "model_version": "model 1",
            "result": [
                {
                    "from_name": "tag",
                    "id": "t5sp3TyXPo",
                    "source": "$image",
                    "to_name": "img",
                    "type": "rectanglelabels",
                    "value": {
                        "height": 11.612284069097889,
                        "rectanglelabels": [
                            "Moonwalker"
                        ],
                        "rotation": 0,
                        "width": 39.6,
                        "x": 13.2,
                        "y": 34.702495201535505
                    }
                }
            ]
        },
        {
            "created_ago": "4 hours",
            "model_version": "model 2",
            "result": [
                {
                    "from_name": "tag",
                    "id": "t5sp3TyXPo",
                    "source": "$image",
                    "to_name": "img",
                    "type": "rectanglelabels",
                    "value": {
                        "height": 33.61228406909789,
                        "rectanglelabels": [
                            "Moonwalker"
                        ],
                        "rotation": 0,
                        "width": 39.6,
                        "x": 13.2,
                        "y": 54.702495201535505
                    }
                }
            ]
        }
    ]
}

completions

That’s where the list of labeling results per one task is stored.

id

Unique completion identifier

lead_time

Time in seconds spent to create this completion

result

Completion result data

id

Unique completion result identifier

from_name

Name of the tag that was used to label region (control tags)

to_name

Name of the object tag that provided the region to be labeled (object tags)

type

Type of the labeling/tag

value

Tag specific value that includes the labeling result details. The exact structure of value depends on the chosen labeling tag.
Explore each tag for more details.

data

Data copied from input task

id

Task identifier

predictions

Machine learning predictions (aka pre-labeling results)