logofirst
logofirst

Secure Label Studio

Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.

Label Studio provides many ways to secure access to your data and your deployment architecture.

All application component interactions are encrypted using the TLS protocol.

Enterprise Role-based access control and federated access to cloud storage using SAML are only available in Label Studio Enterprise deployments. Label Studio Enterprise is available as on-premises software that you manage, or as a Software-as-a-Service (SaaS) offering.

If you’re running the open source version in production, restrict access to the Label Studio server. Label Studio establishes secure connections to the web application by enforcing HTTPS and secured cookies. Restrict access to the server itself by opening only the required ports on the server.

Secure user access to Label Studio

Secure user access to Label Studio to protect data integrity and allow changes to be performed only by those with access to the system.

Each user must create an account with a password of at least 8 characters, allowing you to track who has access to Label Studio and which actions they perform.

You can restrict signup to only those with a link to the signup page, and the invitation link to the signup page can be reset. See Set up user accounts for Label Studio for more.

If you’re using Label Studio Enterprise, you can further secure user access in many ways:

Secure API access to Label Studio

Access to the REST API is restricted by user role and requires an access token that is specific to a user account. Access tokens can be reset at any time from the Label Studio UI or using the API.

Secure access to data in Label Studio

Data in Label Studio is stored in one or two places, depending on your deployment configuration.

Secure database access

Label Studio does not permit direct access to the SQLite or PostgreSQL databases from the app to prevent SQL injection attacks and other data exfiltration attempts.

Instead, the app uses URIs to access the data stored in the database. These URIs can only be accessed by the Label Studio labeling interface and API because the requests to retrieve the data using those URIs are verified and proxied by Basic Authentication headers.

All specific object properties that are exposed with a REST API are added to an allowlist. The API endpoints can only be accessed with specific HTTP verbs and must be accessed by browser-based clients that implement a proper Cross-Origin Resource Sharing (CORS) policy. API tokens are user-specific and can be reset at any time.

The PostgreSQL database has SSL mode enabled and requires valid certificates.

Secure access to cloud storage

When using Label Studio, users don’t have direct access to cloud storage. Objects are retrieved from and stored in cloud storage buckets according to the cloud storage settings for each project.

The best way to secure access to cloud storage is to federate access with SAML:

  1. Set up identity and access management (IAM) policies with your SAML SSO identity provider (IdP).
  2. Restrict bucket access in Amazon S3 or other cloud storage providers based on the SAML-asserted roles.
  3. Set up Label Studio Enterprise with the same SAML SSO IdP as the cloud storage provider.
  4. When Label Studio Enterprise accesses cloud storage buckets on behalf of users, it uses the SAML-asserted roles to retrieve temporary access tokens that match the user permissions.
    See Federate access to data in Label Studio using SAML roles.

You can provide cloud storage authentication credentials globally for all projects in Label Studio, or use different credentials for access to different buckets on a per-project basis. Label Studio allows you to configure different cloud storage buckets for different projects, making it easier to manage access to the data. See Sync data from external storage.

Label Studio accesses the data stored in remote cloud storage using URLs, so place the data in cloud storage buckets near where your team works, rather than near where you host Label Studio.

Secure access to Redis storage

If you use Redis as an external storage database for data and annotations, the setup supports TLS/SSL and requires the Label Studio client to be authenticated to the database with a valid certificate.

Audit logging

Label Studio Enterprise automatically logs all user activities so that you can monitor the activities being performed in the application.