logofirst
logofirst

Get data into Label Studio

Get data into Label Studio by importing files, referencing URLs, or syncing with cloud or database storage.

Types of data you can import into Label Studio

You can import many different types of data, including text, timeseries, audio, and image data. The file types supported depend on the type of data.

Data type Supported file types
Audio .aiff, .au, .flac, .m4a, .mp3, .ogg, .wav
HTML .html, .htm, .xml
Images .bmp, .gif, .jpg, .png, .svg, .tiff, .webp
Structured data .csv, .tsv, .json
Text .txt
Time series .csv, .tsv

If you don’t see a supported data or file type that you want to import, reach out in the Label Studio Slack community.

How to format your data to import it

Label Studio treats different file types different ways.

If you want to import multiple types of data to label at the same time, for example, images with captions or audio recordings with transcripts, you must use the basic Label Studio JSON format.

You can also use a CSV file or a JSON list of tasks to point to URLs with the data, rather than directly importing the data if you need to import thousands of files.

Basic Label Studio JSON format

One way to import data into Label Studio is using a JSON-formatted list of tasks. The data key of the JSON file references each task as an entry in a JSON dictionary. If there is no data key, Label Studio interprets the entire JSON file as one task.

In the data JSON dictionary, use key-value pairs that correspond to the source key expected by the object tag in the label config that you set up for your dataset.

Depending on the type of object tag, Label Studio interprets field values differently:

You can add other, optional keys to the JSON file.

JSON key Description
id Optional. Integer to use as the task ID.
annotations Optional. List of annotations exported from Label Studio. Label Studio’s annotation format allows you to import annotation results in order to use them in subsequent labeling tasks.
predictions Optional. List of model prediction results, where each result is saved using Label Studio’s prediction format. Import predictions for automatic task pre-labeling and active learning. See Import predicted labels into Label Studio

Example JSON format

For an example text classification project, you can set up a label config like the following:

<View>
  <Text name="message" value="$my_text"/>
  <Choices name="sentiment_class" toName="message">
    <Choice value="Positive"/>
    <Choice value="Neutral"/>
    <Choice value="Negative"/>
  </Choices>
</View>

You can then import tasks to label that match the following JSON format:

[{
  # "data" must contain the "my_text" field defined by labeling config,
  # and can optionally include other fields
  "data": {
    "my_text": "Opossums are great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # annotations are the list of annotation results matching the labeling config schema
  "annotations": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "annotations" 
  # except that they also include some ML-related fields like a prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
  # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

Note: For versions of Label Studio earlier than 1.0.0, use the following JSON format example.

If you’re using a version of Label Studio earlier than version 1.0.0, import tasks that match the following JSON format:

[{
  # "data" must contain the "my_text" field defined by labeling config,
  # and can optionally include other fields
  "data": {
    "my_text": "Opossums are great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # completions are the list of annotation results matching the labeling config schema
  "completions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "completions" 
  # except that they also include some ML-related fields like a prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
  # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

Import CSV or TSV data

When you import a CSV / TSV formatted text file, Label Studio interprets the column names are as task data keys that correspond to the labeling config you set up:

my_text,optional_field
this is a first task,123
this is a second task,456

Note: If your labeling config has a TimeSeries tag, Label Studio interprets the CSV/TSV as time series data when you import it. This CSV/TSV is hosted as a resource file and Label Studio automatically creates a task with a link to the uploaded CSV/TSV.

Plain text

Import data as plain text. Label Studio interprets each line in a plain text file as a separate data labeling task.

You might use plain text for labeling tasks if you have only one stream of input data, and only one object tag specified in your label config.

this is a first task
this is a second task

If you want to import entire plain text files without each line becoming a new labeling task, customize the labeling configuration to specify valueType="url" in the Text tag. See the Text tag documentation

Import data from a local directory

To import data from a local directory, you have two options:

To run a web server to generate URLs for the files, you can refer to this provided helper shell script in the Label Studio repository or write your own script.
Use that script to do the following:

  1. On the machine with the file directory that you want Label Studio to import, call the helper script and specify a regex pattern to match the files that you want to import. In this example, the script identifies files with the JPG file extension:

    ./script/serve_local_files.sh <directory/with/files> *.jpg

    The script collects the links to the files provided by that HTTP server and saves them to a files.txt file with one URL per line.

  2. Import the file with URLs into Label Studio using the Label Studio UI.

To import tasks from a file directory on a remote server, make sure the remote server is accessible to the server running Label Studio and set up the directory as a local storage connection.

Import data from the Label Studio UI

To import data from the Label Studio UI, do the following:

  1. On the Label Studio UI, open a specific project.
  2. Click Import to open the import page available at http://localhost:8080/import.
  3. Import your data from files or URLs.

Data that you import is project-specific.

Import data using the API

Import your data using the Label Studio server API. See the API documentation.

Import data from the command line

In versions of Label Studio earlier than 1.0.0, you can import data from a local directory using the command line.

To import data from the command line, do the following:

  1. Start Label Studio and use command line arguments to specify the path to the data and format of the data.
    For example:
    label-studio init --input-path my_tasks.json --input-format json
  2. Open the Label Studio UI and confirm that your data was properly imported.

You can use the --input-path argument to specify a file or directory with the data that you want to label. You can specify other data formats using the --input-format argument. For example run the following command to start Label Studio and import audio files from a local directory:

label-studio init my-project --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files

WARNING: the --allow-serving-local-files argument is intended for use only with locally-running instances of Label Studio. Avoid using it for remote servers unless you are sure what you’re doing.

By default, Label Studio expects JSON-formatted tasks using the Basic Label Studio JSON format.

If you add more files to a local directory after Label Studio starts, you must restart Label Studio to import the tasks in the additional files.