Announcing Label Studio v1.0
Hooray! The day is finally here. After almost a year and a half of development, 1000+ commits, 40 releases, and 50+ developers contributing minor and major parts, we’re happy to announce a big milestone – Label Studio v1.0!
We’ve rebuilt Label Studio from the ground up, thanks to a lot of feedback over the last year. Say hello to a multi-user, multi-project system. Plus it’s scalable!
Label Studio is an open source data labeling tool. It’s the most robust and easiest to use solution for labeling a raw dataset or a previously labeled dataset that you want to improve. If you work on computer vision, NLP, conversational AI, audio/speech processing, and time-series projects, you can get up and running in minutes. Improving training data with Label Studio becomes a more transparent and easily-managed process.
Label Studio also natively integrates with ML models. You can connect your models and keep updating them with new labeled data, as well as perform quality assurance on model predictions.
For the last four months, the Label Studio team has been on a nonstop journey of rethinking interfaces, the annotation flow, and the robustness of the overall system.
As a result, we’ve re-engineered almost everything! Data labeling is heavily dependent on the simplicity and ease-of-use of the user interface, so we streamlined and updated the entire Label Studio UI. In addition to the UI improvements, we have also improved the speed and performance of working with large datasets. Now you can work efficiently with datasets containing millions of items. Let’s dive into the details!
One of the biggest changes we’ve introduced in this version is the addition of user accounts. Now, multiple users can create accounts in the same Label Studio instance. User’s can work off of the same datasets and each user’s annotations are tied to their account.
Label Studio Projects enable you to create and save labeling configurations for different datasets or projects. Label Studio Projects streamline managing and working on different datasets, can be shared with other users, and can be reused for similar projects in the future.
We also restructured the project settings and made it easier to configure the labeling interface for each project. A few updates worth mentioning:
ML models can help pre-label data and optimize the data labeling process. For example, connect a segmentation model like Mask RCNN to give its prediction, then you can adjust the prediction to make it perfect. Another example is if you connect an ASR model to provide speech transcription for further labeling.
If you store your data in the cloud, Label Studio can natively sync with it. Out of the box you can configure Label Studio to read data from AWS S3, GCP, or Microsoft Azure. You can sync from multiple cloud providers or buckets at the same time, and each project can connect to a different cloud storage location.
You can start labeling quickly by using a template to configure the labeling interface for your project, or enjoy a greater level of flexibility with custom tags.
We migrated to a more robust backend from our enterprise version based on Django. We also transitioned from a filesystem-based to SQL-based data storage for tasks and annotations. While a filesystem is probably the simplest approach for storage, it doesn’t scale well when you work on datasets with more than 10,000 items. With SQLite as a storage backend, we can now easily upload datasets of hundreds of thousands of items.
For production deployments, we recommend using PostgreSQL instead of SQLite, especially if you expect to create a large number of users or projects in parallel, because SQLite doesn’t support parallel writes.
We hope you’re excited to try out this new version of Label Studio!
For the next month we will be focusing on fixing bugs and issues based on your feedback, so we’d like to ask you to join our Slack channel. The community is very active and no questions go unanswered and no feedback goes unnoticed.
Next, we’ll be releasing an update of our Label Studio Enterprise and then working on a new version of Label Studio. The focus for that version is more performance improvements and seamless integration into various ML pipelines. See you in Slack!