January 2024 Community News!
🤝 New Release! Label Studio 1.11.0
The Label Studio community is excited to announce Label Studio 1.11.0! Beginning with this release, the Label Studio Frontend and Data Manager repositories have been consolidated into the main Label Studio repository. Our primary goals are to:
- Lower the barrier to entry for open source contributions by making the underlying architecture and codebase easier to understand and QA, and
- Take advantage of several efficiency benefits that enable the HumanSignal engineering team to deliver new features to the community faster.
In addition to this major architectural change, this release includes several important security updates, including:
- Improved measures to protect against server-side request forgery offer a basic level of protection against IPs located in reserved blocks and user-customizable CIDR block bans. These improvements include improved error messaging to clarify when an action is being blocked due to SSRF protections.
- Comprehensive HTML sanitization to safeguard against common code injection and cross-site scripting vulnerabilities.
This release also delivers several bug fixes and quality improvements. Thanks to community member @Skier23, for their contributions to improving labeling hotkeys.
You can read more in the full release notes, and dive deeper into the new monorepo architecture over in the Label Studio Blog. Download the latest version of Label Studio from all the usual places, including GitHub, Docker Hub, PyPi, and Hugging Face Spaces!
5️⃣ Label Studio API Tips and Tricks
Label Studio is known for its flexibility and user-friendly interface. But one of the often unsung heroes of the Label Studio backend is its powerful API and SDK, which can significantly enhance your data labeling workflow. Whether setting up a new project, managing large datasets, or configuring complex labeling tasks, these tools provide the flexibility and robustness needed for high-quality data preparation.
In this post, we will show five essential tips and tricks that will open your eyes to the power of the API and SDK backend. From automating project creation to schema modifications and export operations, we’ll address some of the most common beginner's problems and even some advanced practitioner situations. By the end of this guide, you'll be equipped with practical insights and best practices to leverage Label Studio's capabilities to their fullest, ensuring your datasets are of the highest quality.
📖 Revisiting Tesseract OCR
In the last two years, Foundation Models have exploded onto the scene with impressive zero-shot and generative capabilities that can be applied to various novel use cases. New multi-modal models combine image recognition with natural language processing to give us even more powerful tools for describing annotation processes (see, for example, our recent article on Grounding DINO).
Despite these advances, more traditional deep-learning models still have a place in the modern machine learning and data science workflow. With decades of research behind them, they are reliable, fast, and inexpensive to use in production. In this spirit, we revisit one of the most popular Label Studio integrations: Tesseract OCR.
The Label Studio community recently updated its Tesseract OCR example backend, which now supports the latest version of Label Studio and includes a basic example of integrating cloud storage with the Label Studio Machine Learning Backend. Go check it out and see how you can bootstrap your language labeling with Tesseract OCR and Label Studio.
🏞️ Being data enthusiasts, it’s unsurprising that Data is Plural is one of our favorite newsletters. The newsletter describes itself as a weekly collection of “of useful/curious datasets, published by Jeremy Singer-Vine.” It’s a consistent source of interesting public datasets, covering a range of topics from policy (a database of 4000+ environmental treaties), public health (RSV infection rates as collected by the CDC), and miscellanea (a table of commercially available furniture that has appeared in various Star Trek series). With more than 350 published editions, it’s become a weekly source of insight and inspiration.
🥣 If you’re looking for some visual pizzaz to accompany your data, The Pudding has reliably published interesting datasets combined with deep journalistic analysis. With an archive of over 180 stories, including a late-2023 series covering different aspects of the modern business of music, you can take a data-driven, visually rich, deep-dive into the cultural issues of today.
Thank You For Being Part of the Community!
Do you know of a game-changing ML integration that's improved your labeling workflow? Do you have your own Label Studio tips and tricks to share? Head on over to the Label Studio Slack Community!