logofirst
logofirst

Annotation statistics

Beta documentation: Label Studio Enterprise v2.0.0 is currently in Beta. As a result, this documentation might not reflect the current functionality of the product.

Label Studio Enterprise Edition includes various annotation and labeling statistics. The open source Community Edition of Label Studio does not perform these statistical calculations. If you're using Label Studio Community Edition, see Label Studio Features to learn more.

Annotation statistics help you determine the quality of your dataset, its readiness to be used to train models, and assess the performance of your annotators.

Task agreement

Task agreement shows the consensus between multiple annotators when labeling the same task. There are several types of task agreement in Label Studio Enterprise:

You can also see how the annotations from a specific annotator compare to the prediction scores for a task, or how they compare to the ground truth labels for a task.

For more about viewing agreement in Label Studio Enterprise, see Verify model and annotator performance.

Agreement method

The agreement method defines how matching scores across all annotations for a task are combined to form a single inter-annotator agreement score. Label Studio uses the mean average of all inter-annotation matching scores for each annotation pair as the final task agreement score.

Review the diagram for a full explanation:

Diagram showing annotations are collected for each task, matching scores are computed for each pair, the resulting scores are averaged for a task.

Example

One annotation that labels the text span “Excellent tool” as “positive”, a second annotation that labels the span “tool” as “positive”, and a third annotation that labels the text span “tool” as “negative”.

diagram showing example labeling scenario duplicated in surrounding text

The matching score for the first two annotations is 50%, based on the intersection of the text spans. The matching score comparing the second annotation with the third annotation is 0%, because the same text span was labeled differently.

The task agreement conditions use a threshold of 40% to group annotations based on the matching score, so the first and second annotations are matched with each other, and the third annotation is considered mismatched. In this case, task agreement exists for 2 of the 3 annotations, so the overall task agreement score is 67%.

Matching score

Depending on the type of labeling that you perform, you can select a different type of matching function to use to calculate the matching score used in task agreement statistics. See Define the matching function for annotation statistics.

The matching score assesses the similarity of annotations for a specific task. For example, for two given annotations x and y, a matching function that performs a naive comparison of the results works like the following:

The following examples describe how the matching scores for various labeling configuration tags can be computed.

Choices

For data labeling tasks where annotators select a choice, such as image or text classification, multiple matching functions are available to select.

If you select the Exact matching choices matching function, the matching score for two given task annotations x and y is computed like follows:

TextArea

For data labeling tasks where annotators transcribe text in a text area, the resulting annotations contain a list of text.

You can select matching functions based on the intersection over one-dimensional text spans such as splitting the text area by words or characters, or using an edit distance algorithm. Decide what method to use to calculate the matching score based on your use case and how important precision is for your data labels.

The matching score for two given task annotations x and y is computed like follows:

Labels

For data labeling tasks where annotators assign specific labels to regions or text spans, the matching score is calculated by comparing the intersection of annotations over the result spans, normalized by the length of each span.

For two given task annotations x and y, the matching score formula is m(x, y) = spans(x) ∩ spans(y).

Rating

For data labeling tasks where annotators select a rating, the matching score for two given task annotations x and y is computed like follows, using an exact matching function:

Ranker

For data labeling tasks where annotators perform ranking, the matching score is based on the mean average precision (mAP) of the annotation results.

RectangleLabels

For data labeling tasks where annotators create bounding boxes of rectangles and label those rectangles, the matching score calculation depends on what you select as the Metric name on the Annotation Settings page. Select one of the following options:

PolygonLabels

For data labeling tasks where annotators create polygons and label those polygonal regions, the matching score calculation depends on what you select as the Metric name on the Annotation Settings page. Select one of the following options: