Accelerate Preprocessing Natural Language Data for Label Studio with Unstructured
Overview
Unstructured is an open source platform designed to accelerate the preprocessing of unstructured data. With an initial focus on natural language data, Unstructured provides open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
The library includes bricks for partitioning documents into their constituent parts, cleaning out unwanted text, such as boilerplate text and sentence fragments, and staging outputs for downstream tasks, such as data labeling in Label Studio or inference with Hugging Face.