Secure Label Studio
Label Studio provides many ways to secure access to your data and your deployment architecture.
All application component interactions are encrypted using the TLS protocol.
Label Studio establishes secure connections to the web application by enforcing HTTPS and secured cookies.
If you’re running the open source version in production, restrict access to the Label Studio server.
Restrict access to the server itself by opening only the required ports on the server.
Secure user access to Label Studio to protect data integrity and allow changes to be performed only by those with access to the system.
Each user must create an account with a password of at least 8 characters, allowing you to track who has access to Label Studio and which actions they perform.
You can restrict signup to only those with a link to the signup page, and the invitation link to the signup page can be reset. See Set up user accounts for Label Studio for more.
Access to the REST API is restricted by user role and requires an access token that is specific to a user account. Access tokens can be reset at any time from the Label Studio UI or using the API.
When deploying Label Studio into a production environment, set the
SSRF_PROTECTION_ENABLED environment variable to
This variable is disabled by default to support users who are working with data in their local environments. However, it should be enabled in production usage.
Data in Label Studio is stored in one or two places, depending on your deployment configuration.
- Project settings and configuration details are stored in Label Studio’s internal database.
- Input data (texts, images, audio files) is hosted by external data storage and provided to the Label Studio by using URI links. The data is not stored in Label Studio directly, the content is retrieved client-side only.
- Project annotations are stored in the internal database, and optionally can be stored in a local file directory, a Redis database, or cloud storage buckets on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
Label Studio does not permit direct access to the internal databases from the app to prevent SQL injection attacks and other data exfiltration attempts.
Instead, the app uses URIs to access the data stored in the database. These URIs can only be accessed by the Label Studio labeling interface and API because the requests to retrieve the data using those URIs are verified and proxied by Basic Authentication headers.
All specific object properties that are exposed with a REST API are added to an allowlist. The API endpoints can only be accessed with specific HTTP verbs and must be accessed by browser-based clients that implement a proper Cross-Origin Resource Sharing (CORS) policy. API tokens are user-specific and can be reset at any time.
The PostgreSQL database has SSL mode enabled and requires valid certificates.
When using Label Studio, users don’t have direct access to cloud storage. Objects are retrieved from and stored in cloud storage buckets according to the cloud storage settings for each project.
Label Studio accesses the data stored in remote cloud storage using URLs, so place the data in cloud storage buckets near where your team works, rather than near where you host Label Studio.
Use workspaces, projects, and roles to further secure access to cloud storage and data accessed using URLs by setting up cloud storage credentials. You can provide cloud storage authentication credentials globally for all projects in Label Studio, or use different credentials for access to different buckets on a per-project basis. Label Studio allows you to configure different cloud storage buckets for different projects, making it easier to manage access to the data. See Sync data from external storage.
If you use Redis as an external storage database for data and annotations, the setup supports TLS/SSL and requires the Label Studio client to be authenticated to the database with a valid certificate.
Label Studio collects anonymous usage statistics about the number of page visits and data types being used in labeling configurations that you set up. No sensitive information is included in the information we collect. The information we collect helps us improve the experience of labeling data in Label Studio and helps us plan future data types and labeling configurations to support.
- Mount your self-signed certificate as a volume into
volumes: - ./my.cert:/tmp/my.cert:ro
- Add environment variable with the name
CUSTOM_CA_CERTSmentioning all certificates in comma-separated way that should be added into trust store:
Upload your self-signed certificate as a k8s secret.
my.certas a secrets with a name
kubectl create secret generic test-my-root-cert --from-file=file=my.cert
Add volumes into your values.yaml file and mention them in
global: customCaCerts: - /opt/heartex/secrets/ca_certs/file/file app: extraVolumes: - name: foo secret: secretName: test-my-root-cert extraVolumeMounts: - name: foo mountPath: "/opt/heartex/secrets/ca_certs/file" readOnly: true rqworker: extraVolumes: - name: foo secret: secretName: test-my-root-cert extraVolumeMounts: - name: foo mountPath: "/opt/heartex/secrets/ca_certs/file" readOnly: true