logofirst
logofirst

Get data into Label Studio

Get data into Label Studio by importing files, referencing URLs, or syncing with cloud or database storage.

Types of data you can import into Label Studio

You can import many different types of data, including text, timeseries, audio, and image data. The file types supported depend on the type of data.

Data type Supported file types
Audio .aiff, .au, .flac, .m4a, .mp3, .ogg, .wav
HTML .html, .htm, .xml
Images .bmp, .gif, .jpg, .png, .svg, .webp
Structured data .csv, .tsv, .json
Text .txt
Time series .csv, .tsv

If you don’t see a supported data or file type that you want to import, reach out in the Label Studio Slack community.

How to format your data to import it

Label Studio treats different file types different ways.

If you want to import multiple types of data to label at the same time, for example, images with captions or audio recordings with transcripts, you must use the basic Label Studio JSON format.

You can also use a CSV file or a JSON list of tasks to point to URLs with the data, rather than directly importing the data if you need to import thousands of files. You can import files containing up to 250,000 tasks or up to 50MB in size into Label Studio.

If you’re specifying data in a cloud storage bucket or container, and you don’t want to sync cloud storage, create and specify presigned URLs for Amazon S3 storage, signed URLs for Google Cloud Storage, or shared access signatures for Microsoft Azure in a JSON, CSV, or TXT file.

Basic Label Studio JSON format

One way to import data into Label Studio is using a JSON-formatted list of tasks. The data key of the JSON file references each task as an entry in a JSON dictionary. If there is no data key, Label Studio interprets the entire JSON file as one task.

In the data JSON dictionary, use key-value pairs that correspond to the source key expected by the object tag in the label config that you set up for your dataset.

Depending on the type of object tag, Label Studio interprets field values differently:

You can add other, optional keys to the JSON file.

JSON key Description
id Optional. Integer to use as the task ID.
annotations Optional. List of annotations exported from Label Studio. Label Studio’s annotation format allows you to import annotation results in order to use them in subsequent labeling tasks.
predictions Optional. List of model prediction results, where each result is saved using Label Studio’s prediction format. Import predictions for automatic task pre-labeling and active learning. See Import predicted labels into Label Studio

Example JSON format

For an example text classification project, you can set up a label config like the following:

<View>
  <Text name="message" value="$my_text"/>
  <Choices name="sentiment_class" toName="message">
    <Choice value="Positive"/>
    <Choice value="Neutral"/>
    <Choice value="Negative"/>
  </Choices>
</View>

You can then import text tasks to label that match the following JSON format:

[{
  # "data" must contain the "my_text" field defined in the text labeling config as the value and can optionally include other fields
  "data": {
    "my_text": "Opossums are great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # annotations are not required and are the list of annotation results matching the labeling config schema
  "annotations": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "annotations" 
  # except that they also include some ML-related fields like a prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
  # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

If you’re placing JSON files in cloud storage, place 1 task in each JSON file in the storage bucket. If you want to upload a JSON file from your machine directly into Label Studio, you can place multiple tasks in one JSON file.

Example JSON with multiple tasks

You can place multiple tasks in one JSON file if you’re uploading the JSON file to Label Studio.


To place multiple tasks in one JSON file, use this JSON format example

This example contains multiple text classification tasks with no annotations or predictions.

The “data” parameter must contain the “my_text” field defined in the text labeling config and can optionally include other fields. The “id” parameter is not required.

[
   {
      "id":1,
      "data":{
         "my_text":"Opossums like to be aloft in trees."
      }
   },
   {
      "id":2,
      "data":{
         "my_text":"Opossums are opportunistic."
      }
   },
   {
      "id":3,
      "data":{
         "my_text":"Opossums like to forage for food."
      }
   }
]

Example JSON for older versions of Label Studio

If you’re still using a Label Studio version earlier than 1.0.0, refer to this example JSON format.


For versions of Label Studio earlier than 1.0.0, use this JSON format example.

If you’re using a version of Label Studio earlier than version 1.0.0, import tasks that match the following JSON format:

[{
  # "data" must contain the "my_text" field defined by labeling config,
  # and can optionally include other fields
  "data": {
    "my_text": "Opossums are great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # completions are the list of annotation results matching the labeling config schema
  "completions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "completions" 
  # except that they also include some ML-related fields like a prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
  # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

Import CSV or TSV data

When you import a CSV / TSV formatted text file, Label Studio interprets the column names are as task data keys that correspond to the labeling config you set up:

my_text,optional_field
this is a first task,123
this is a second task,456

Note: If your labeling config has a TimeSeries tag, Label Studio interprets the CSV/TSV as time series data when you import it. This CSV/TSV is hosted as a resource file and Label Studio automatically creates a task with a link to the uploaded CSV/TSV.

Plain text

Import data as plain text. Label Studio interprets each line in a plain text file as a separate data labeling task.

You might use plain text for labeling tasks if you have only one stream of input data, and only one object tag specified in your label config.

this is a first task
this is a second task

If you want to import entire plain text files without each line becoming a new labeling task, customize the labeling configuration to specify valueType="url" in the Text tag. See the Text tag documentation. Using this setting means that when you export your tasks, you export the links to the raw data created by Label Studio, rather than the raw data. If you want to export tasks with data, and label text files with new lines, use the Label Studio JSON format.

Import data from a local directory

To import data from a local directory, you have two options:

Run a web server to generate URLs to local files

To run a web server to generate URLs for the files, you can refer to this provided helper shell script in the Label Studio repository or write your own script.
Use that script to do the following:

  1. On the machine with the file directory that you want Label Studio to import, call the helper script and specify a regex pattern to match the files that you want to import. In this example, the script identifies files with the JPG file extension:
    ./script/serve_local_files.sh <directory/with/files> *.jpg
    The script collects the links to the files provided by that HTTP server and saves them to a files.txt file with one URL per line.
  2. Import the file with URLs into Label Studio using the Label Studio UI.

Note: You must keep the web server running while you perform your data labeling so that the URLs remain accessible to Label Studio.

If your labeling configuration supports HyperText or multiple data types, use the Label Studio JSON format to specify the local file locations instead of a txt file. See an example of this format.

If you serve your data from an HTTP server created like follows: python -m http.server 8081 -d, you might need to set up CORS for that server so that Label Studio can access the data files successfully. If needed, run the following from the command line:

npm install http-server -g
http-server -p 3000 --cors

Add the file directory as source storage in the Label Studio UI

If you’re running Label Studio on Docker and want to add local file storage, you need to mount the file directory and set up environment variables. See Run Label Studio on Docker and use local storage.

Import data from the Label Studio UI

To import data from the Label Studio UI, do the following:

  1. On the Label Studio UI, open a specific project.
  2. Click Import to open the import page available at http://localhost:8080/import.
  3. Import your data from files or URLs.

Data that you import is project-specific.

Import data using the API

Import your data using the Label Studio API. See the API documentation for importing tasks.

Import data from the command line

In versions of Label Studio earlier than 1.0.0, you can import data from a local directory using the command line.

To import data from the command line, do the following:

  1. Start Label Studio and use command line arguments to specify the path to the data and format of the data.
    For example:
    label-studio init --input-path my_tasks.json --input-format json
  2. Open the Label Studio UI and confirm that your data was properly imported.

You can use the --input-path argument to specify a file or directory with the data that you want to label. You can specify other data formats using the --input-format argument. For example run the following command to start Label Studio and import audio files from a local directory:

label-studio init my-project --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files

WARNING: the --allow-serving-local-files argument is intended for use only with locally-running instances of Label Studio. Avoid using it for remote servers unless you are sure what you’re doing.

By default, Label Studio expects JSON-formatted tasks using the Basic Label Studio JSON format.

If you add more files to a local directory after Label Studio starts, you must restart Label Studio to import the tasks in the additional files.