BERT-based text classification
The NewModel is a BERT-based text classification model that is designed to work with Label Studio. This model uses the Hugging Face Transformers library to fine-tune a BERT model for text classification. The model is trained on the labeled data from Label Studio and then used to make predictions on new data. With this model connected to Label Studio, you can:
- Train a BERT model on your labeled data directly from Label Studio.
- Use any model for AutoModelForSequenceClassification from the Hugging Face model hub.
- Fine-tune the model on your specific task and use it to make predictions on new data.
- Automatically download the labeled tasks from Label Studio and prepare the data for training.
- Customize the training parameters such as learning rate, number of epochs, and weight decay.
Before you begin
Before you begin, you must install the Label Studio ML backend.
This tutorial uses the bert_classifier
example.
Running with Docker (recommended)
- Start the Machine Learning backend on
http://localhost:9090
with the prebuilt image:
docker-compose up
- Validate that backend is running:
$ curl http://localhost:9090/
{"status":"UP"}
- Create a project in Label Studio. Then from the Model page in the project settings, connect the model. The default URL is
http://localhost:9090
.
Warning! Note the current limitation of the ML backend: models are loaded dynamically from huggingface.co. You may need the
HF_TOKEN
env variable provided in your environment. Consequently, this may result in a slow response time for the first prediction request. If you are experiencing timeouts on Label Studio side (i.e., no predictions are visible when opening the task), check the logs of the ML backend for any errors, and refresh the page in a few minutes.
Building from source (advanced)
To build the ML backend from source, you have to clone the repository and build the Docker image:
docker-compose build
Running without Docker (advanced)
To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip:
python -m venv ml-backend
source ml-backend/bin/activate
pip install -r requirements.txt
Then you can start the ML backend:
label-studio-ml start ./dir_with_your_model
Labeling configuration
In project Settings > Labeling Interface > Browse Templates > Natural Language Processing > Text Classification
, you can find the default labeling configuration for text classification in Label Studio. This configuration includes a single <Choices>
output and a single <Text>
input.
Feel free to modify the set of labels in the <Choices>
tag to match your specific task, for example:
<View>
<Text name="text" value="$text" />
<Choices name="label" toName="text" choice="single" showInLine="true">
<Choice value="label one" />
<Choice value="label two" />
<Choice value="label three" />
</Choices>
</View>
Configuration
Parameters can be set in docker-compose.yml
before running the container.
The following common parameters are available:
BASIC_AUTH_USER
- Specify the basic auth user for the model serverBASIC_AUTH_PASS
- Specify the basic auth password for the model serverLOG_LEVEL
- Set the log level for the model serverWORKERS
- Specify the number of workers for the model serverTHREADS
- Specify the number of threads for the model serverBASELINE_MODEL_NAME
: The name of the baseline model to use for training. Default isbert-base-multilingual-cased
.
Training
The following parameters are available for training:
LABEL_STUDIO_HOST
(required): The URL of the Label Studio instance. Default ishttp://localhost:8080
.LABEL_STUDIO_API_KEY
(required): The API key for the Label Studio instance.START_TRAINING_EACH_N_UPDATES
: The number of labeled tasks to download from Label Studio before starting training. Default is 10.LEARNING_RATE
: The learning rate for the model training. Default is 2e-5.NUM_TRAIN_EPOCHS
: The number of epochs for model training. Default is 3.WEIGHT_DECAY
: The weight decay for the model training. Default is 0.01.FINETUNED_MODEL_NAME
: The name of the fine-tuned model. Default isfinetuned_model
. Checkpoints will be saved under this name.
Note: The
LABEL_STUDIO_API_KEY
is required for training the model. You can find the API key in Label Studio under the Account & Settings page.
Customization
The ML backend can be customized by adding your own models and logic inside the ./bert_classifier
directory.