logofirst
logofirst

Sync data from cloud or Redis storage

Integrate popular cloud storage systems with Label Studio to collect new items uploaded to the buckets and return the annotation results so that you can use them in your machine learning pipelines.

Set up the following cloud and other storage systems with Label Studio:

Each source and target storage setup is project-specific. You can connect multiple buckets as source or target storage for a project.

If you upload new data to a connected cloud storage bucket, sync the storage connection to add the new labeling tasks to Label Studio without restarting.

Note: Choose your target storage carefully. When you start the labeling project, it must be empty or contain annotations that match previously created or imported tasks from source storage. Tasks are synced with annotations based on internal IDs, so if you accidentally connect to target storage with existing annotations with the same IDs, the connection might fail with undefined behavior.

Amazon S3

To connect your S3 bucket with Label Studio, make sure you have programmatic access enabled. See the Amazon Boto3 configuration documentation for more on how to set up access to your S3 bucket.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

  1. Open Label Studio in your web browser.
  2. For a specific project, open Settings > Cloud Storage.
  3. Click Add Source Storage.
  4. In the dialog box that appears, select Amazon S3 as the storage type.
  5. Specify the name of the S3 bucket.
  6. (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
  7. Click Add Storage.
  8. Repeat these steps for Target Storage to sync completed data annotations to a bucket.

Optional parameters

You can specify additional parameters from the Label Studio UI.

Parameter Description Default
prefix Specify an internal folder or container empty
regex Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. Skips all bucket objects.
use_blob_urls If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. false

Create connection on startup

For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio and configure the connection to your S3 bucket, scan for existing tasks, and load them into the labeling app.

Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.

Read a bucket with JSON-formatted tasks

label-studio start my_project --init --source s3 --source-path my-s3-bucket

Write annotations to the bucket

label-studio start my_project --init --target s3-completions --target-path my-s3-bucket

Troubleshoot CORS and access problems

If you have trouble accessing bucket objects in Label Studio, check your web browser console for errors.

Working with Binary Large OBjects (BLOBs)

When you store BLOBs in your S3 bucket (like images or audio files), you might want to use them as is, by generating URLs pointing to those objects (e.g. gs://my-s3-bucket/image.jpg)

Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.

For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params when launching the app:

label-studio start my_project --init --source s3 --source-path my-s3-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"

You can also skip or leave the "data_key" parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.

Google Cloud Storage

To connect your GCS bucket with Label Studio, make sure you have programmatic access enabled. See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

  1. Open Label Studio in your web browser.
  2. For a specific project, open Settings > Cloud Storage.
  3. Click Add Source Storage.
  4. In the dialog box that appears, select Google Cloud Storage as the storage type.
  5. Specify the name of the GCS bucket.
  6. (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
  7. Click Add Storage.
  8. Repeat these steps for Target Storage to sync completed data annotations to a bucket.

Optional parameters

You can specify additional parameters from the Label Studio UI.

Parameter Description Default
prefix Specify an internal folder or container empty
regex Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. Skips all bucket objects.
create_local_copy If true, creates a local copy of the remote storage. true
use_blob_urls If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. false

Create connection on startup

For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio, configure the connection to your GCS bucket, scan for existing tasks, and load them into the app for labeling.

Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.

Read a bucket with JSON-formatted tasks

label-studio start my_project --init --source gcs --source-path my-gcs-bucket

Write annotations to a bucket

label-studio start my_project --init --target gcs-completions --source-path my-gcs-bucket

Troubleshoot CORS and access problems

If you have trouble accessing bucket objects in Label Studio, check your web browser console for errors.

Working with Binary Large OBjects (BLOBs)

When you store BLOBs in your GCS bucket, like images or audio files, you might want to use them as is and generate URLs pointing to those objects. For example, gs://my-gcs-bucket/image.jpg.

Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.

For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params when launching the app:

label-studio start my_project --init --source gcs --source-path my-gcs-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"

You can also skip or leave the "data_key" parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.

Microsoft Azure Blob storage

Connect your Microsoft Azure Blob storage container with Label Studio.

You must set two environment variables in Label Studio to connect to Azure Blob storage:

Configure the specific Azure Blob container that you want Label Studio to use in the UI or with the command-line interface parameters.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

  1. Open Label Studio in your web browser.
  2. For a specific project, open Settings > Cloud Storage.
  3. Click Add Source Storage.
  4. In the dialog box that appears, select Microsoft Azure as the storage type.
  5. Specify the name of the Azure Blob container.
  6. (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
  7. Click Add Storage.
  8. Repeat these steps for Target Storage to sync completed data annotations to a bucket.

Optional parameters

You can specify additional parameters from the Label Studio UI.

Parameter Description Default
prefix Specify an internal folder or container empty
regex Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. Skips all bucket objects.
create_local_copy If true, creates a local copy of the remote storage. true
use_blob_urls If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. false

Create connection on startup

For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio, configure the connection to your Azure Blob storage, scan for existing tasks, and load them into the app for labeling.

Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.

Read an Azure storage container with JSON-formatted tasks

label-studio start my_project --init --source azure-blob --source-path my-az-container-name

Write annotations to an Azure storage container

label-studio start my_project --init --target azure-blob --source-path my-az-container-name

Working with Binary Large OBjects (BLOBs)

When you store BLOBs in your Azure Storage Container (like images or audio files), you might want to use them as is, by generating URLs pointing to those objects (e.g. azure-blob://container-name/image.jpg)

Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.

For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params when launching the app:

label-studio start my_project --init --source azure-blob --source-path my-az-container-name --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"

You can also skip or leave the "data_key" parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.

Redis database

You can also store your tasks and annotations in a Redis database. You must store the tasks and annotations in different databases.

You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.

Currently, this is only supported if the Redis database is hosted in the default mode, with the default IP address.

You can integrate Label Studio with Redis, but Label Studio does not manage the Redis database for you. See the Redis Quick Start for details about hosting and managing your own Redis database.

Because Redis is an in-memory database, data saved in Redis does not persist. To make sure you don’t lose data, set up Redis persistence or use another method to persist the data, such as using Redis in the cloud with Microsoft Azure or Amazon AWS.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

  1. Open Label Studio in your web browser.
  2. For a specific project, open Settings > Cloud Storage.
  3. Click Add Source Storage.
  4. In the dialog box that appears, select Redis Database as the storage type.
  5. (Optional) Update Redis configuration parameters. See Optional Redis configuration parameters on this page for the list.
  6. Click Add Storage.
  7. Repeat these steps for Target Storage to sync completed data annotations to a bucket.

Optional Redis configuration parameters

You can specify additional parameters from the Label Studio UI.

Parameter Description Default
project_path Path to the Label Studio project
path Specify the path to the database None
db The Redis database to use 1 (for source) or 2 (for target)
host IP of the server hosting the database None
port Port of the server hosting the database None
password Server password None

Create connection on startup

Run the following command to launch Label Studio, configure the connection to your Redis database, scan for existing tasks, and load them into the app for labeling for a specific project.

label-studio start my_project --init --db redis

Local storage

If you have local files that you want to add to Label Studio from a specific directory, you can set up a specific local directory as source or target storage.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

  1. Open Label Studio in your web browser.
  2. For a specific project, open Settings > Cloud Storage.
  3. Click Add Source Storage.
  4. In the dialog box that appears, select Local Files as the storage type.
  5. Specify the name of the local directory.
  6. (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
  7. Click Add Storage.

Warning: Serving data from the local file system can be a security risk. If you’re sure you know what you’re doing, set LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true in your environment setup. Without this setting, URLs pointed to local files won’t work.

Optional parameters

You can specify additional parameters from the Label Studio UI.

Parameter Description Default
prefix Specify an internal folder or container empty
regex Specify a regular expression to filter directory objects. Use “.*” to collect all objects. Skips all directory objects.
use_blob_urls If true, treat every directory object as a source file. Use for resources like JPG, MP3, or similar file types. If false, directory objects are interpreted as tasks in Label Studio JSON format with one object per task. false