NEWLabel Studio 1.20.0 Now Available: Spectrograms, Time Series Sync, and more!
Guide

Fine-Tuning OpenAI Models with Label Studio

This tutorial walks through how to fine-tune a large language model using Label Studio and Wikipedia-based data. From QA dataset generation to validation and evaluation, you’ll see how to ensure both model quality and relevance for your specific use case.

Transcript

Speaker:
Fine-tuning large language models is a powerful way to tailor AI to your specific needs. While it’s not always the right choice, fine-tuning can be helpful in cases where you have a specific dataset with prompts and expected outputs, need the model to follow a consistent format, or want to incorporate domain-specific or time-sensitive data.

In this video, we’ll walk through how to fine-tune an OpenAI model using Label Studio to ensure dataset quality and strong model performance.

First, you’ll need to download our examples repository. I’ve already done that, opened the Jupyter notebook in my local instance, and installed all dependencies.

We’ll be using Wikipedia data as our example dataset—specifically hurricanes from the 2024 season. This works well because the information was published after the model’s original knowledge cutoff, allowing us to test the real effect of fine-tuning.

After setting the list of topics, we use a function to pull Wikipedia revisions after a specific date. These revisions are collected in a list of dictionaries. At this point, we can examine metrics like the total number of examples—about 10,000 in our case—and view sample data to better understand what we’ve gathered.

Next, we generate question-and-answer pairs using boilerplate code. We run the generation function and skip the test section. Once we’ve created the QA dataset, we’re ready to review it. While you could inspect the questions directly in the notebook, it’s more efficient to review them in Label Studio for quality control.

We format the dataset for Label Studio and export it as a JSON file. Then, we create a new Label Studio project and import the tasks. In my local instance, I’ve created a project called “Hurricane Data Project.” If we click into a task, we can see a model-generated response. In practice, you’d review and edit these manually to ensure correctness, but for this demo, we’ll proceed as-is.

Back in the notebook, we pull the annotated data using a script. Since we didn’t annotate anything in this example, I changed the filter_view to true, but normally you'd leave it as false. We download the labeled examples—23 in total—and save them to JSON.

Next, we convert the Label Studio-format annotations into OpenAI’s fine-tuning format. Then we run some basic data analysis and cost estimates. With 23 examples, we calculate message and token counts. Everything is within limits, so we’re good to go.

We upload the data to OpenAI using a script that returns a file ID. We set that ID as our training file and create a fine-tuning job. You can check the job status by updating the appropriate ID field.

After some time, we check the job again and see that it’s complete. The newly fine-tuned model name includes “ft,” which stands for “fine-tuned.” We paste that model ID into the final step and test it.

We ask: “What’s the most recent hurricane to hit the US in 2024?” The fine-tuned model correctly responds with “Hurricane Beryl,” confirming that fine-tuning worked.

That’s it! You’ve now successfully fine-tuned an OpenAI model using Label Studio.

Related Content