Get Started
- Get started
- Create user accounts
Install and Upgrade
- Install and upgrade
- Start
- Troubleshoot
- Set up database storage
Import Data
- Get data
- Import pre-annotated data
- Sync data from cloud or Redis storage
Manage Data & Projects
- Set up your labeling project
- Label and annotate data
- Export results
Machine Learning Setup
- Set up machine learning with your labeling process
Advanced Development
- Frontend library
- Frontend reference
- Backend API
Sync data from cloud or Redis storage
Integrate popular cloud storage systems with Label Studio to collect new items uploaded to the buckets and return the annotation results so that you can use them in your machine learning pipelines.
Set up the following cloud and other storage systems with Label Studio:
Each source and target storage setup is project-specific. You can connect multiple buckets as source or target storage for a project.
If you upload new data to a connected cloud storage bucket, sync the storage connection to add the new labeling tasks to Label Studio without restarting.
Note: Choose your target storage carefully. When you start the labeling project, it must be empty or contain annotations that match previously created or imported tasks from source storage. Tasks are synced with annotations based on internal IDs, so if you accidentally connect to target storage with existing annotations with the same IDs, the connection might fail with undefined behavior.
Amazon S3
To connect your S3 bucket with Label Studio, make sure you have programmatic access enabled. See the Amazon Boto3 configuration documentation for more on how to set up access to your S3 bucket.
Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
- Open Label Studio in your web browser.
- For a specific project, open Settings > Cloud Storage.
- Click Add Source Storage.
- In the dialog box that appears, select Amazon S3 as the storage type.
- Specify the name of the S3 bucket.
- (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
- Click Add Storage.
- Repeat these steps for Target Storage to sync completed data annotations to a bucket.
Optional parameters
You can specify additional parameters from the Label Studio UI.
Parameter | Description | Default |
---|---|---|
prefix | Specify an internal folder or container | empty |
regex | Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. | Skips all bucket objects. |
use_blob_urls | If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. | false |
Create connection on startup
For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio and configure the connection to your S3 bucket, scan for existing tasks, and load them into the labeling app.
Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.
Read a bucket with JSON-formatted tasks
label-studio start my_project --init --source s3 --source-path my-s3-bucket
Write annotations to the bucket
label-studio start my_project --init --target s3-completions --target-path my-s3-bucket
Troubleshoot CORS and access problems
If you have trouble accessing bucket objects in Label Studio, check your web browser console for errors.
If you see CORS problems, see Configuring and using cross-origin resource sharing (CORS) in the Amazon S3 User Guide.
Make sure you specified the region when creating a new bucket. Don’t forget to change it in your source or target storage settings or the
.aws/config
file, otherwise you might have problems accessing your bucket objects.E.g.:
~/.aws/config
[default] region=us-east-2 # change to the region of your bucket
If you’re using an older version of Label Studio, upgrade to a version >= 0.7.5 that has a signature version s3v4 to support more AWS regions.
If you see 403 errors, make sure you have the correct credentials configured. See Configuration and credential file settings in the Amazon AWS Command Line Interface User Guide.
Working with Binary Large OBjects (BLOBs)
When you store BLOBs in your S3 bucket (like images or audio files), you might want to use them as is, by generating URLs pointing to those objects (e.g. gs://my-s3-bucket/image.jpg
)
Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.
For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params
when launching the app:
label-studio start my_project --init --source s3 --source-path my-s3-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"
You can also skip or leave the "data_key"
parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.
Google Cloud Storage
To connect your GCS bucket with Label Studio, make sure you have programmatic access enabled. See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
- Open Label Studio in your web browser.
- For a specific project, open Settings > Cloud Storage.
- Click Add Source Storage.
- In the dialog box that appears, select Google Cloud Storage as the storage type.
- Specify the name of the GCS bucket.
- (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
- Click Add Storage.
- Repeat these steps for Target Storage to sync completed data annotations to a bucket.
Optional parameters
You can specify additional parameters from the Label Studio UI.
Parameter | Description | Default |
---|---|---|
prefix | Specify an internal folder or container | empty |
regex | Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. | Skips all bucket objects. |
create_local_copy | If true, creates a local copy of the remote storage. | true |
use_blob_urls | If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. | false |
Create connection on startup
For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio, configure the connection to your GCS bucket, scan for existing tasks, and load them into the app for labeling.
Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.
Read a bucket with JSON-formatted tasks
label-studio start my_project --init --source gcs --source-path my-gcs-bucket
Write annotations to a bucket
label-studio start my_project --init --target gcs-completions --source-path my-gcs-bucket
Troubleshoot CORS and access problems
If you have trouble accessing bucket objects in Label Studio, check your web browser console for errors.
If you see CORS problems, see Configuring cross-origin resource sharing (CORS) in the Google Cloud Storage documentation.
If you see 403 errors, make sure you have the correct credentials configured. See Setting up authentication in the Google Cloud Storage documentation.
Working with Binary Large OBjects (BLOBs)
When you store BLOBs in your GCS bucket, like images or audio files, you might want to use them as is and generate URLs pointing to those objects. For example, gs://my-gcs-bucket/image.jpg
.
Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.
For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params
when launching the app:
label-studio start my_project --init --source gcs --source-path my-gcs-bucket --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"
You can also skip or leave the "data_key"
parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.
Microsoft Azure Blob storage
Connect your Microsoft Azure Blob storage container with Label Studio.
You must set two environment variables in Label Studio to connect to Azure Blob storage:
- AZURE_BLOB_ACCOUNT_NAME - The name of the storage account
- AZURE_BLOB_ACCOUNT_KEY - The secret key to the storage account
Configure the specific Azure Blob container that you want Label Studio to use in the UI or with the command-line interface parameters.
Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
- Open Label Studio in your web browser.
- For a specific project, open Settings > Cloud Storage.
- Click Add Source Storage.
- In the dialog box that appears, select Microsoft Azure as the storage type.
- Specify the name of the Azure Blob container.
- (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
- Click Add Storage.
- Repeat these steps for Target Storage to sync completed data annotations to a bucket.
Optional parameters
You can specify additional parameters from the Label Studio UI.
Parameter | Description | Default |
---|---|---|
prefix | Specify an internal folder or container | empty |
regex | Specify a regular expression to filter bucket objects. Use “.*” to collect all objects. | Skips all bucket objects. |
create_local_copy | If true, creates a local copy of the remote storage. | true |
use_blob_urls | If true, treat every bucket object as a source file. Use for resources like JPG, MP3, or similar file types. If false, bucket objects are interpreted as tasks in Label Studio JSON format with one object per task. | false |
Create connection on startup
For Label Studio versions earlier than 1.0.0, you can use command line arguments to start Label Studio, configure the connection to your Azure Blob storage, scan for existing tasks, and load them into the app for labeling.
Starting in Label Studio 1.0.0 you can only configure cloud storage from the Label Studio UI because the settings are per-project.
Read an Azure storage container with JSON-formatted tasks
label-studio start my_project --init --source azure-blob --source-path my-az-container-name
Write annotations to an Azure storage container
label-studio start my_project --init --target azure-blob --source-path my-az-container-name
Working with Binary Large OBjects (BLOBs)
When you store BLOBs in your Azure Storage Container (like images or audio files), you might want to use them as is, by generating URLs pointing to those objects (e.g. azure-blob://container-name/image.jpg
)
Label Studio lets you generate input tasks with corresponding URLs when you set up cloud storage sync in the Label Studio UI. Set the treat every bucket object as a source file option to true when setting up the cloud storage.
For versions of Label Studio earlier than 1.0.0, you can generate task URLs from the command line by specifying --source-params
when launching the app:
label-studio start my_project --init --source azure-blob --source-path my-az-container-name --source-params "{\"data_key\": \"my-object-tag-$value\", \"use_blob_urls\": true, \"regex\": ".*"}"
You can also skip or leave the "data_key"
parameter empty and Label Studio automatically generates input tasks from the first task key in the label config, which can be useful when you only have one object tag exposed.
Redis database
You can also store your tasks and annotations in a Redis database. You must store the tasks and annotations in different databases.
You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.
Currently, this is only supported if the Redis database is hosted in the default mode, with the default IP address.
You can integrate Label Studio with Redis, but Label Studio does not manage the Redis database for you. See the Redis Quick Start for details about hosting and managing your own Redis database.
Because Redis is an in-memory database, data saved in Redis does not persist. To make sure you don’t lose data, set up Redis persistence or use another method to persist the data, such as using Redis in the cloud with Microsoft Azure or Amazon AWS.
Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
- Open Label Studio in your web browser.
- For a specific project, open Settings > Cloud Storage.
- Click Add Source Storage.
- In the dialog box that appears, select Redis Database as the storage type.
- (Optional) Update Redis configuration parameters. See Optional Redis configuration parameters on this page for the list.
- Click Add Storage.
- Repeat these steps for Target Storage to sync completed data annotations to a bucket.
Optional Redis configuration parameters
You can specify additional parameters from the Label Studio UI.
Parameter | Description | Default |
---|---|---|
project_path | Path to the Label Studio project | |
path | Specify the path to the database | None |
db | The Redis database to use | 1 (for source) or 2 (for target) |
host | IP of the server hosting the database | None |
port | Port of the server hosting the database | None |
password | Server password | None |
Create connection on startup
Run the following command to launch Label Studio, configure the connection to your Redis database, scan for existing tasks, and load them into the app for labeling for a specific project.
label-studio start my_project --init --db redis
Local storage
If you have local files that you want to add to Label Studio from a specific directory, you can set up a specific local directory as source or target storage.
Set up connection in the Label Studio UI
In the Label Studio UI, do the following to set up the connection:
- Open Label Studio in your web browser.
- For a specific project, open Settings > Cloud Storage.
- Click Add Source Storage.
- In the dialog box that appears, select Local Files as the storage type.
- Specify the name of the local directory.
- (Optional) Adjust the remaining parameters. See Optional parameters on this page for more details.
- Click Add Storage.
Warning: Serving data from the local file system can be a security risk. If you’re sure you know what you’re doing, set
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
in your environment setup. Without this setting, URLs pointed to local files won’t work.
Optional parameters
You can specify additional parameters from the Label Studio UI.
Parameter | Description | Default |
---|---|---|
prefix | Specify an internal folder or container | empty |
regex | Specify a regular expression to filter directory objects. Use “.*” to collect all objects. | Skips all directory objects. |
use_blob_urls | If true, treat every directory object as a source file. Use for resources like JPG, MP3, or similar file types. If false, directory objects are interpreted as tasks in Label Studio JSON format with one object per task. | false |