Sync data from external storage

Integrate popular cloud and external storage systems with Label Studio to collect new items uploaded to the buckets, containers, databases, or directories and return the annotation results so that you can use them in your machine learning pipelines.

Set up the following cloud and other storage systems with Label Studio:

Amazon S3
Google Cloud Storage
Microsoft Azure Blob storage
Redis database
Local storage

Troubleshooting

When working with an external cloud storage connection, keep the following in mind:

For Source storage:
- When “Treat every bucket object as a source file” is checked, Label Studio doesn’t import the data stored in the bucket, but instead creates references to the objects. Therefore, you have full access control on the data to be synced and shown on the labeling screen.
- When “Treat every bucket object as a source file” is unchecked, bucket files are assumed to be immutable; the only way to push an updated file’s state to Label Studio is to upload it with a new filename or delete all tasks that are associated with that file and resync.
Sync operations with external buckets only goes one way. It either creates tasks from objects on the bucket (Source storage) or pushes annotations to the output bucket (Target storage). Changing something on the bucket side doesn’t guarantee consistency in results.
We recommend using a separate bucket folder for each Label Studio project.
Storage Regions: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server.

For more troubleshooting information, see Troubleshooting Label Studio.

How external storage connections and sync work

You can add source storage connections to sync data from an external source to a Label Studio project, and add target storage connections to sync annotations from Label Studio to external storage. Each source and target storage setup is project-specific. You can connect multiple buckets, containers, databases, or directories as source or target storage for a project.

Source storage

Label Studio does not automatically sync data from source storage. If you upload new data to a connected cloud storage bucket, sync the storage connection using the UI to add the new labeling tasks to Label Studio without restarting. You can also use the API to set up or sync storage connections. See Label Studio API and locate the relevant storage connection type.

Task data synced from cloud storage is not stored in Label Studio. Instead, the data is accessed using presigned URLs. You can also secure access to cloud storage using VPC and IP restrictions for your storage. For details, see Secure access to cloud storage.

Source storage permissions

If you set the import method to “Files”, Label Studio backend will only need LIST permissions and won’t download any data from your buckets.
If you set the import method to “JSON”, Label Studio backend will require GET permissions to read JSON files and convert them to Label Studio tasks.

When your users access labeling, the backend will attempt to resolve URI (e.g., s3://) to URL (https://) links. URLs will be returned to the frontend and loaded by the user’s browser. To load these URLs, the browser will require HEAD and GET permissions from your Cloud Storage. The HEAD request is made at the beginning and allows the browser to determine the size of the audio, video, or other files. The browser then makes a GET request to retrieve the file body.

Source storage Sync and URI resolving

Source storage functionality can be divided into two parts:

Sync - when Label Studio scans your storage and imports tasks from it.
URI resolving - when the Label Studio backend requests Cloud Storage to resolve URI links (e.g., s3://bucket/1.jpg) into HTTPS (https://aws.amazon.com/bucket/1.jpg). This way, user’s browsers are able to load media.

Import method

Label Studio Source Storages feature an “Import method” dropdown. This setting enables two different methods of loading tasks into Label Studio.

JSON

When set to “JSON”, tasks in JSON or JSONL/NDJSON format can be loaded directly from storage buckets into Label Studio. This approach is particularly helpful when dealing with complex tasks that involve multiple media sources.

You may put multiple tasks inside the same JSON file, but not mix task formats inside the same file.

Example with bare tasks

task_01.json

{
  "image": "s3://bucket/1.jpg",
  "text": "opossums are awesome"
}

task_02.json

{
  "image": "s3://bucket/2.jpg",
  "text": "cats are awesome"
}

Or:

tasks.json

[
  {
    "image": "s3://bucket/1.jpg",
    "text": "opossums are awesome"
  },
  {
    "image": "s3://bucket/2.jpg",
    "text": "cats are awesome"
  }
]

Example with tasks, annotations and predictions

task_with_predictions_and_annotations_01.json

{
    "data": {
        "image": "s3://bucket/1.jpg",
        "text": "opossums are awesome"
    },
    "annotations": [...],  
    "predictions": [...]
}

task_with_predictions_and_annotations_02.json

{
    "data": {
      "image": "s3://bucket/2.jpg",
      "text": "cats are awesome"
    }
    "annotations": [...],  
    "predictions": [...]
}

Or:

tasks_with_predictions_and_annotations.json

[
  {
      "data": {
          "image": "s3://bucket/1.jpg",
          "text": "opossums are awesome"
      },
      "annotations": [...],  
      "predictions": [...]
  },
  {
      "data": {
        "image": "s3://bucket/2.jpg",
        "text": "cats are awesome"
      }
      "annotations": [...],  
      "predictions": [...]
  }
]

Example with JSONL

tasks.jsonl

{ "image": "s3://bucket/1.jpg", "text": "opossums are awesome" }
{ "image": "s3://bucket/2.jpg", "text": "cats are awesome" }

In Label Studio Enterprise and Starter Cloud editions, Parquet files can also be used to import tasks in the same way as JSON and JSONL.

Files

When set to “Files”, Label Studio automatically lists files from the storage bucket and constructs tasks. This is only possible for simple labeling tasks that involve a single media source (such as an image, text, etc.).*

Pre-signed URLs vs. Storage proxies

There are two secure mechanisms in which Label Studio fetches media data from cloud storage: via pre-signed URLS and via proxy. Which one you use depends on whether you have Use pre-signed URLs toggled on or off when setting up your source storage. Use pre-signed URLs is used by default. Proxy storage is enabled when Use pre-signed URLs is OFF.

See more details

Pre-signed URLs

In this scenario, your browser receives an HTTP 303 redirect to a time-limited S3/GCS/Azure presigned URL. This is the default behavior.

The main benefit to using pre-signed URLs is if you want to ensure that your media files are isolated from the Label Studio network as much as possible.

The permissions required for this are already included in the cloud storage configuration documentation below.

Proxy storage

When in proxy mode, the Label Studio backend fetches objects server-side and streams them directly to the browser.

This has multiple benefits, including:

Security
- Access to media files is further restricted based on Label Studio user roles and project access.
- This access is applied to cached files. This means that even if the media is cached, access will be restricted to that file if a user’s access to the task is revoked.
- Data stays within the Label Studio network boundary. This is especially useful for on-prem environments who want to maintain a single entry point for their network traffic.
Configuration
- No CORS settings are needed.
- No pre-signed permissions are needed.

To allow proxy storage, you need to ensure your permissions include the following:

AWS S3

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket-name",
                "arn:aws:s3:::your-bucket-name/*"
            ]
        }
    ]
}

Google Cloud Storage

storage.objects.get - Read object data and metadata
storage.objects.list - List objects in the bucket (if using prefix)

Azure Blob Storage

Add the Storage Blob Data Reader role, which includes:

Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/getTags/action

Note for on-prem deployments

Large media files are streamed in sequential 8 MB chunks, which are split into different GET requests. This can result in frequent requests to the backend to get the next portion of data and uses additional resources.

You can configure this using the following environment variables:

RESOLVER_PROXY_MAX_RANGE_SIZE - Defaults to 8 MB, and defines the largest chunk size returned per request.
RESOLVER_PROXY_TIMEOUT - Defaults to 20 seconds, and defines the maximum time uWSGI workers spend on a single request.

Target storage

When annotators click Submit or Update while labeling tasks, Label Studio saves annotations in the Label Studio database.

If you configure target storage, annotations are sent to target storage after you click Sync for the configured target storage connection. The target storage receives a JSON-formatted export of each annotation. See Label Studio JSON format of annotated tasks for details about how exported tasks appear in target storage.

You can also delete annotations in target storage when they are deleted in Label Studio. See Set up target storage connection in the Label Studio UI for more details.

Target storage permissions

To use this type of storage, you must have PUT permission, and DELETE permission is optional.

Amazon S3

Connect your Amazon S3 bucket to Label Studio to retrieve labeling tasks or store completed annotations.

For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.

Configure access to your S3 bucket

Before you set up your S3 bucket or buckets with Label Studio, configure access and permissions. These steps assume that you’re using the same AWS role to manage both source and target storage with Label Studio. If you only use S3 for source storage, Label Studio does not need PUT access to the bucket.

Enable programmatic access to your bucket. See the Amazon Boto3 configuration documentation for more on how to set up access to your S3 bucket.

note

A session token is only required in case of temporary security credentials. See the AWS Identity and Access Management documentation on Requesting temporary security credentials.

Assign the following role policy to an account you set up to retrieve source tasks and store annotations in S3, replacing <your_bucket_name> with your bucket name:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<your_bucket_name>",
                "arn:aws:s3:::<your_bucket_name>/*"
            ]
        }
    ]
}

note

"s3:PutObject" is only needed for target storage connections, and "s3:DeleteObject" is only needed for target storage connections in Label Studio Enterprise where you want to allow deleted annotations in Label Studio to also be deleted in the target S3 bucket.

Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Amazon S3 User Guide. Use or modify the following example:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "x-amz-server-side-encryption",
            "x-amz-request-id",
            "x-amz-id-2"
        ],
        "MaxAgeSeconds": 3000
    }
]

Set up connection in the Label Studio UI

After you configure access to your S3 bucket, do the following to set up Amazon S3 as a data source connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Source Storage.
In the dialog box that appears, select Amazon S3 as the storage type.
In the Storage Name field, type a name for the storage to appear in the Label Studio UI.
Specify the name of the S3 bucket, and if relevant, the bucket prefix to specify an internal folder or container.
Adjust the remaining parameters:
- In the File Filter Regex field, specify a regular expression to filter bucket objects. Use .* to collect all objects.
- In the Region Name field, specify the AWS region name. For example us-east-1.
- (Optional) In the S3 Endpoint field, specify an S3 endpoint if you want to override the URL created by S3 to access your bucket.
- In the Access Key ID field, specify the access key ID of the temporary security credentials for an AWS account with access to your S3 bucket.
- In the Secret Access Key field, specify the secret key of the temporary security credentials for an AWS account with access to your S3 bucket.
- In the Session Token field, specify a session token of the temporary security credentials for an AWS account with access to your S3 bucket.
- In the Import method dropdown, choose how to import your data:
  - Files - Automatically creates a task for each storage object (e.g. JPG, MP3, TXT). Use this if your bucket contains BLOB storage files such as JPG, MP3, or similar file types.
  - JSON - Treat each JSON or JSONL file as a task definition (one or more tasks per file). Use this if you have multiple JSON files in the bucket with one task per JSON file.
- (Optional) Enable Scan all sub-folders to include files from all nested folders within your S3 bucket prefix.
- In the Use pre-signed URLs (On) / Proxy through Label Studio (Off) toggle, choose how media is loaded:
  - ON (Pre-signed URLs) - All data bypasses the platform and user browsers directly read data from storage.
  - OFF (Proxy) - The platform proxies media using its own backend.
- Set the Expire pre-signed URLs (minutes) counter to control how long pre-signed URLs remain valid.
Click Add Storage.

After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.

Set up target storage connection in the Label Studio UI

After you configure access to your S3 bucket, do the following to set up Amazon S3 as a target storage connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Target Storage.
In the dialog box that appears, select Amazon S3 as the storage type.
In the Storage Name field, type a name for the storage to appear in the Label Studio UI.
Specify the name of the S3 bucket, and if relevant, the bucket prefix to specify an internal folder or container.
Adjust the remaining parameters:
- In the Region Name field, specify the AWS region name. For example us-east-1.
- (Optional) In the S3 Endpoint field, specify an S3 endpoint if you want to override the URL created by S3 to access your bucket.
- In the Access Key ID field, specify the access key ID of the temporary security credentials for an AWS account with access to your S3 bucket.
- In the Secret Access Key field, specify the secret key of the temporary security credentials for an AWS account with access to your S3 bucket.
- In the Session Token field, specify a session token of the temporary security credentials for an AWS account with access to your S3 bucket.
Click Add Storage.

After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync export storage

S3 connection with IAM role access

In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An ‘external ID’ is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. See how to Set up an S3 connection with IAM role access in the Enterprise documentation.

Add storage with the Label Studio API

You can also create a storage connection using the Label Studio API.

See Create new import storage then sync the import storage.
See Create export storage and after annotating, sync the export storage.

IP Filtering and VPN for Enhanced Security for S3 storage

To maximize security and data isolation behind a VPC, restrict access to the Label Studio backend and internal network users by setting IP restrictions for storage, allowing only trusted networks to perform task synchronization and generate pre-signed URLs. Additionally, establish a secure connection between storage and users’ browsers by configuring a VPC private endpoint or limiting storage access to specific IPs or VPCs.

Read more about Source storage behind your VPC.

Bucket Policy Example for S3 storage

warning

These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see How to regain access to an Amazon S3 bucket.

Helpful Resources:

Go to your S3 bucket and then Permissions > Bucket Policy in the AWS management console. Add the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAccessUnlessFromSaaSIPsForListAndGet",
            "Effect": "Deny",
            "Principal": {
                "AWS": "arn:aws:iam::490065312183:role/label-studio-app-production"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME",
                "arn:aws:s3:::YOUR_BUCKET_NAME/*"
            ],
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": [
                      //// IP ranges for app.humansignal.com from the documentation
                        "x.x.x.x/32",
                        "x.x.x.x/32",
                        "x.x.x.x/32"
                    ]
                }
            }
        },
//// Optional
        {
            "Sid": "DenyAccessUnlessFromVPNForGetObject",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": "YOUR_VPN_SUBNET/32"
                }
            }
        }
    ]
}

Google Cloud Storage

Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.

Prerequisites

To connect your GCS bucket with Label Studio, set up the following:

Enable programmatic access to your bucket. See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
Set up authentication to your bucket. Your account must have the Service Account Token Creator and Storage Object Viewer roles and storage.buckets.get access permission. See Setting up authentication and IAM permissions for Cloud Storage in the Google Cloud Storage documentation.
If you’re using a service account to authorize access to the Google Cloud Platform, make sure to activate it. See gcloud auth activate-service-account in the Google Cloud SDK: Command Line Interface documentation.
Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Google Cloud User Guide. Use or modify the following example:
```
echo '[
   {
      "origin": ["*"],
      "method": ["GET"],
      "responseHeader": ["Content-Type","Access-Control-Allow-Origin"],
      "maxAgeSeconds": 3600
   }
]' > cors-config.json
```

Replace YOUR_BUCKET_NAME with your actual bucket name in the following command to update CORS for your bucket:

gsutil cors set cors-config.json gs://YOUR_BUCKET_NAME

Google Cloud Storage with application credentials

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Source Storage.
In the dialog box that appears, select Google Cloud Storage as the storage type.
In the **** field, type a name for the storage to appear in the Label Studio UI.
Specify the name of the GCS bucket, and if relevant, the bucket prefix to specify an internal folder or container.
Adjust the remaining optional parameters:
- In the File Filter Regex field, specify a regular expression to filter bucket objects. Use .* to collect all objects.
- In the Import method dropdown, choose how to import your data:
  - Files - Automatically creates a task for each storage object (e.g. JPG, MP3, TXT). Use this if your bucket contains BLOB storage files such as JPG, MP3, or similar file types.
  - JSON - Treat each JSON or JSONL file as a task definition (one or more tasks per file). Use this if you have multiple JSON files in the bucket with one task per JSON file.
- In the Use pre-signed URLs (On) / Proxy through Label Studio (Off) toggle, choose how media is loaded:
  - ON (Pre-signed URLs) - All data bypasses the platform and user browsers directly read data from storage.
  - OFF (Proxy) - The platform proxies media using its own backend.
- Set the Expire pre-signed URLs (minutes) counter to control how long pre-signed URLs remain valid.
In the Google Application Credentials field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.

On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below.
Click Add Storage.
Repeat these steps for Target Storage to sync completed data annotations to a bucket.

After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.

Application Default Credentials for enhanced security for GCS

If you use Label Studio on-premises with Google Cloud Storage, you can set up Application Default Credentials to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.

The recommended way to to do this is by using the GOOGLE_APPLICATION_CREDENTIALS environment variable. For example:

export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json

Add storage with the Label Studio API

See our API documentation.

IP filtering for enhanced security for GCS

Google Cloud Storage offers bucket IP filtering as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.

Read more about Source storage behind your VPC.

Common Use Cases:

Restrict bucket access to only your organization’s IP ranges
Allow access only from specific VPC networks in your infrastructure
Secure sensitive data by limiting access to known IP addresses
Control access for third-party integrations by whitelisting their IPs

How to Set Up IP Filtering

First, create your GCS bucket through the console or CLI

Create a JSON configuration file to define IP filtering rules. You have two options: For public IP ranges:

{
  "mode": "Enabled", 
  "publicNetworkSource": {
    "allowedIpCidrRanges": [
      "xxx.xxx.xxx.xxx", // Your first IP address
      "xxx.xxx.xxx.xxx", // Your second IP address
      "xxx.xxx.xxx.xxx/xx" // Your IP range in CIDR notation
    ]
  }
}

For VPC network sources:

{
  "mode": "Enabled",
  "vpcNetworkSources": [
    {
      "network": "projects/PROJECT_ID/global/networks/NETWORK_NAME",
      "allowedIpCidrRanges": [
        RANGE_CIDR
      ]
    }
  ]
}

Apply the IP filtering rules to your bucket using the following command:

gcloud alpha storage buckets update gs://BUCKET_NAME --ip-filter-file=IP_FILTER_CONFIG_FILE

To remove IP filtering rules when no longer needed:

gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter

Limitations to Consider

Maximum of 200 IP CIDR blocks across all rules
Maximum of 25 VPC networks in the IP filter rules
Not supported for dual-regional buckets
May affect access from certain Google Cloud services

Microsoft Azure Blob storage

Connect your Microsoft Azure Blob storage container with Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.

Prerequisites

You must set two environment variables in Label Studio to connect to Azure Blob storage:

AZURE_BLOB_ACCOUNT_NAME to specify the name of the storage account.
AZURE_BLOB_ACCOUNT_KEY to specify the secret key for the storage account.

Configure the specific Azure Blob container that you want Label Studio to use in the UI. In most cases involving CORS issues, the GET permission (/GET//Access-Control-Allow-Origin/3600) is necessary within the Resource Sharing tab:

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Source Storage.
In the dialog box that appears, select Microsoft Azure as the storage type.
In the **** field, type a name for the storage to appear in the Label Studio UI.
Specify the name of the Azure Blob container, and if relevant, the container prefix to specify an internal folder or container.
Adjust the remaining optional parameters:
- In the File Filter Regex field, specify a regular expression to filter bucket objects. Use .* to collect all objects.
- In the Account Name field, specify the account name for the Azure storage. You can also set this field as an environment variable,AZURE_BLOB_ACCOUNT_NAME.
- In the Account Key field, specify the secret key to access the storage account. You can also set this field as an environment variable,AZURE_BLOB_ACCOUNT_KEY.
- Enable Treat every bucket object as a source file if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, for example azure-blob://container-name/image.jpg. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable Use pre-signed URLs, or shared access signatures.
  - ON - Label Studio generates a pre-signed URL to load media.
  - OFF - The platform proxies media using its own backend.
- Adjust the counter for how many minutes the shared access signatures are valid.
Click Add Storage.
Repeat these steps for Target Storage to sync completed data annotations to a container.

After adding the storage, click Sync to collect tasks from the container, or make an API call to sync import storage.

Add storage with the Label Studio API

You can also create a storage connection using the Label Studio API.

See Create new import storage then sync the import storage.
See Create export storage and after annotating, sync the export storage.

Redis database

You can also store your tasks and annotations in a Redis database. You must store the tasks and annotations in different databases. You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.

Currently, this configuration is only supported if you host the Redis database in the default mode, with the default IP address.

Label Studio does not manage the Redis database for you. See the Redis Quick Start for details about hosting and managing your own Redis database. Because Redis is an in-memory database, data saved in Redis does not persist. To make sure you don’t lose data, set up Redis persistence or use another method to persist the data, such as using Redis in the cloud with Microsoft Azure or Amazon AWS.

Task format for Source Redis Storage

Label Studio only supports string values for Redis databases, which should represent Label Studio tasks in JSON format.

For example:

'ls-task-1': '{"image": "http://example.com/1.jpg"}'
'ls-task-2': '{"image": "http://example.com/2.jpg"}'
...

> redis-cli -n 1
127.0.0.1:6379[1]> SET ls-task-1 '{"image": "http://example.com/1.jpg"}'
OK
127.0.0.1:6379[1]> GET ls-task-1
"{\"image\": \"http://example.com/1.jpg\"}"
127.0.0.1:6379[1]> TYPE ls-task-1
string

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Source Storage.
In the dialog box that appears, select Redis Database as the storage type.
Update the Redis configuration parameters:
- In the Path field, specify the path to the database. Used as the keys prefix, values under this path are scanned for tasks.
- In the Password field, specify the server password.
- In the Host field, specify the IP of the server hosting the database, or localhost.
- In the Port field, specify the port that you can use to access the database.
- In the File Filter Regex field, specify a regular expression to filter database objects. Use .* to collect all objects.
- Enable Treat every bucket object as a source file if your database contains files such as JPG, MP3, or similar file types. This setting creates a URL for each database object to use for labeling. Leave this option disabled if you have multiple JSON files in the database, with one task per JSON file.
Click Add Storage.
Repeat these steps for Target Storage to sync completed data annotations to a database.

After adding the storage, click Sync to collect tasks from the database, or make an API call to sync import storage.

Add storage with the Label Studio API

You can also create a storage connection using the Label Studio API.

See Create new import storage then sync the import storage.
See Create export storage and after annotating, sync the export storage.

Local storage

If you have local files that you want to add to Label Studio from a specific directory, you can set up a specific local directory on the machine where LS is running as source or target storage. Label Studio steps through the directory recursively to read tasks.

Prerequisites

Add these variables to your environment setup:

LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/home/user (or LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\\data\\media for Windows).

Without these settings, Local storage and URLs in tasks that point to local files won’t work. Keep in mind that serving data from the local file system can be a security risk. See Set environment variables for more about using environment variables.

Set up connection in the Label Studio UI

In the Label Studio UI, do the following to set up the connection:

Open Label Studio in your web browser.
For a specific project, open Settings > Cloud Storage.
Click Add Source Storage.

Screenshot of the storage settings modal described in the preceding steps.

In the dialog box that appears, select Local Files as the storage type.
In the **** field, type a name for the storage to appear in the Label Studio UI.
Specify an Absolute local path to the directory with your files. The local path must be an absolute path and include the LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT value.

For example, if LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/home/user, then your local path must be /home/user/dataset1. For more about that environment variable, see Run Label Studio on Docker and use local storage.

note

If you are using Windows, ensure that you use backslashes when entering your Absolute local path.

(Optional) In the File Filter Regex field, specify a regular expression to filter bucket objects. Use .* to collect all objects.
(Optional) Toggle Treat every bucket object as a source file.
- Enable this option if you want to create Label Studio tasks from media files automatically, such as JPG, MP3, or similar file types. Use this option for labeling configurations with one source tag.
- Disable this option if you want to import tasks in Label Studio JSON format directly from your storage. Use this option for complex labeling configurations with HyperText or multiple source tags.
Click Add Storage.
Repeat these steps for Add Target Storage to use a local file directory for exporting.

After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.

Tasks with local storage file references

In cases where your tasks have multiple or complex input sources, such as multiple object tags in the labeling config or a HyperText tag with custom data values, you must prepare tasks manually.

In those cases, you have to repeat all stages above to create local storage, but skip optional stages. Your Absolute local path have to lead to directory with files (not tasks) that you want to include by task, it also can contain other directories or files, you will specified them inside task.

Differences with instruction above:

7. File Filter Regex - stay empty (because you will specify it inside tasks)
8. Treat every bucket object as a source file - switch off (because you will specify it inside tasks)

Your window will look like this: Screenshot of the local storage settings for user task.

Click Add Storage, but not use synchronization (don’t touch button Sync Storage) after the storage creation, to avoid automatic task creation from storage files.

When referencing your files within a task, adhere to the following guidelines:

“Absolute local path” must be a sub-directory of LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT (see 6).
All file paths must begin with /data/local-files/?d=.
In the following example, the first directory is dataset1. For instance, if you have mixed data types in tasks, including
- audio files 1.wav, 2.wav within an audio folder and
- image files 1.jpg, 2.jpg within an images folder, construct the paths as follows:

[{
 "id": 1,
 "data": {
    "audio": "/data/local-files/?d=dataset1/audio/1.wav",
    "image": "/data/local-files/?d=dataset1/images/1.jpg"
  }
},
{
 "id": 2,
 "data": {
    "audio": "/data/local-files/?d=dataset1/audio/2.wav",
    "image": "/data/local-files/?d=dataset1/images/2.jpg"
  }
}]

There are several ways to add your custom task: API, web interface, another storage. The simplest one is to use Import button on the Data Manager page. Drag and drop your json file inside the window, then click the blue Import button .

Local Storage with Custom Task Format

This video tutorial demonstrates how to setup Local Storage from scratch and import json tasks in a complex task format that are linked to the Local Storage files.

Add storage with the Label Studio API

You can also create a storage connection using the Label Studio API.

See Create new import storage then sync the import storage.
See Create export storage and after annotating, sync the export storage.

Set up local storage with Docker

If you’re using Label Studio in Docker, you need to mount the local directory that you want to access as a volume when you start the Docker container. See Run Label Studio on Docker and use local storage.

Troubleshooting cloud storage

For more troubleshooting information, see Troubleshooting Label Studio.

Troubleshooting
How external storage connections and sync work
Source storage
Target storage
Amazon S3
Configure access to your S3 bucket
Set up connection in the Label Studio UI
Set up target storage connection in the Label Studio UI
S3 connection with IAM role access
Add storage with the Label Studio API
IP Filtering and VPN for Enhanced Security for S3 storage
Google Cloud Storage
Prerequisites
Google Cloud Storage with application credentials
Add storage with the Label Studio API
IP filtering for enhanced security for GCS
Microsoft Azure Blob storage
Prerequisites
Set up connection in the Label Studio UI
Add storage with the Label Studio API
Redis database
Task format for Source Redis Storage
Set up connection in the Label Studio UI
Add storage with the Label Studio API
Local storage
Prerequisites
Set up connection in the Label Studio UI
Tasks with local storage file references
Local Storage with Custom Task Format
Add storage with the Label Studio API
Set up local storage with Docker
Troubleshooting cloud storage

Designed for teams of all sizes Compare Versions

Sync data from external storage

Troubleshooting

How external storage connections and sync work

Source storage

Source storage permissions

Source storage Sync and URI resolving

Import method

JSON

Files

Pre-signed URLs vs. Storage proxies

Pre-signed URLs

Proxy storage

Target storage

Target storage permissions

Amazon S3

Configure access to your S3 bucket

Set up connection in the Label Studio UI

Set up target storage connection in the Label Studio UI

S3 connection with IAM role access

Add storage with the Label Studio API

IP Filtering and VPN for Enhanced Security for S3 storage

Google Cloud Storage

Prerequisites

Google Cloud Storage with application credentials

Set up connection in the Label Studio UI

Application Default Credentials for enhanced security for GCS

Add storage with the Label Studio API

IP filtering for enhanced security for GCS

Limitations to Consider

Microsoft Azure Blob storage

Prerequisites

Set up connection in the Label Studio UI

Add storage with the Label Studio API

Redis database

Task format for Source Redis Storage

Set up connection in the Label Studio UI

Add storage with the Label Studio API

Local storage

Prerequisites

Set up connection in the Label Studio UI

Tasks with local storage file references

Local Storage with Custom Task Format

Add storage with the Label Studio API

Set up local storage with Docker

Troubleshooting cloud storage

In this article