NEWFine-Tuning OpenAI Models: A Guide 🚀

Open Source Tools for Sentiment Analysis

Guide

Imagine you run a business that makes coffee grinders and just launched your newest product. You’re getting tons of reviews sent to your company inbox and don’t have time to read them all and suss out how customers feel about their new coffee grinder. This is a perfect use case for sentiment analysis.

So what is sentiment analysis?

Sentiment analysis refers to a program taking in information (usually text) and then using a machine learning model to determine the sentiment of that information. That program could report if a customer is happy, neutral, or unhappy with your product without you having to read every message. But, before your program can accomplish that feat, it needs to be built, trained, and tested. This can be a daunting task, but there are a number of open source sentiment analysis tools available to help you get there.

Why choose open source tools for sentiment analysis?

Some developers and data scientists just want to grab the code, download the repo and go. If that’s your style, choosing a fully-featured open source sentiment tool might be the right choice for you.

You’ll traditionally find open source tools on Github with a minimal README doc explaining how to use the software that particular programmer has built. You can deploy these tools and modify the code to fit your use case without worrying about licensing fees or dealing with corporate red tape. That level of independence can be a plus to programmers. But, with that independence comes responsibility.

Developers of open-source tools might keep a close eye on the latest changes in security patches, operating system changes, and programming languages and update their projects accordingly. But then again, they might not. If they don’t the onus is on the developer using the tool to fill in the gaps and make sure that the software stays up-to-date and secure.

On the other hand, when it comes to paid sentiment analysis tools, you - the developer - are often paying for that company to keep up with all the changes that might affect the program or application that employs their tool. It’s the company’s responsibility to make sure you, the user, are getting what you paid for in a secure and polished fashion. This can eliminate a whole host of headaches.

The downsides with paid sentiment analysis platforms can be that they’re more expensive from a software licensing standpoint, you often don’t have as much flexibility with how and when features are added and deployed, and you’re also at the mercy of the vendor when it comes to future product development.

Let’s say you’re using a full-service sentiment analysis tool. That tool might be able to parse text from your customers and churn out a report on their sentiment. But it might be unable to parse audio because the company supplying that tool hasn't built out transcription engines, and NLP audio services. If you need that functionality anytime soon, you’ll be at the mercy of that company’s product road map.

In a case like Label Studio where there are both open source and enterprise versions of the tool, often organizations and developers will install the open source version first as a sort of “proof of concept” before adopting the enterprise version. This allows users and stakeholders to test the software out with relatively few risks.

Open source options for sentiment analysis tools

Label Studio plays nice with the tools developers love and rely on to run all sorts of sentiment analysis. We take a “more the merrier” approach to open source tools, consistently using Label Studio’s software to improve on existing solutions that programmers use everyday.

For example, if you’re using Amazon Transcribe to turn speech into text, you can use Label Studio to improve the accuracy of those transcriptions. If you’re using the wildly popular NLP tool, spaCy, you can add Label Studio into the mix to suss out the accuracy of Named Entity Recognition taggers.

These are all great options if you’ve got a sentiment analysis program up and running. But, where do you start when you’re at square one of your sentiment analysis program? Here are a few key factors to consider:

What to look for in an open source sentiment analysis labeling solution

  • Scalability:  If you’re building a massive sentiment analysis program running hundreds - if not thousands - of API requests daily, you’ll want to make sure the solution you choose can keep up with that kind of scale.
  • Usability: Is the open-source tool easy to use? Does it have the documentation you need to use it in a production-ready use case? Is the interface easy to use, both for developers and labelers?
  • Flexibility: Is the tool flexible enough to be useful with a variety of technologies and data types? While you may have a specific data type and use case in mind when you get started, often, other projects crop up and being able to use a single platform for all of them can be a big plus.
  • Language: We’re not all Cobol Cowboys, experts in archaic programming language. You should make sure that the tool you choose is written in a popular, practical programming language (Python, for example).
  • Up-to-date: You want to make sure the developer who built the tool you adopt is not taking on a watchmaker style approach in which they build a tool and then disappear off the face of the earth. Look for a solid community of developers who use this tool regularly and rely on it as a sign that it’s well-maintained.

Choosing open source APIs for sentiment analysis

Choosing the right open source tool for you is easier when you know exactly what you’re trying to accomplish with your app. Let’s say you’ve built out the infrastructure for your sentiment analysis app. You have the proverbial programming plumbing to take in data and return a sentiment analysis. But your application needs some work when it comes to the accuracy of that analysis. In that case, you might be looking for libraries to help your program brush on its skills. Pattern, TextBlob, or NLTK, would be great places to start, particularly if you’re a Python programmer.

If your application is still having trouble understanding language, you could use Alyien, SpaCy, or Gensim to make that level-up happen sooner.  The tools you pick will always be situation-dependent.

Open source labeling tools for sentiment analysis

Sentiment analysis is largely onconfined.  If you’re looking to build out your sentiment analysis tool belt, here are some tools we love:

doccano

Self-described as “an open source text annotation tool for humans,” doccano offers a good user interface and fairly straightforward installation. It’s fairly well-developed and has a strong focus on usability. You’ll get features like collaborative annotation, multi-language support, mobile support, emoji  support, and a RESTful API.

You can learn more and give it a try at https://github.com/doccano/doccano

brat rapid annotation tool

brat is a fairly basic but serviceable text annotation tool. While being fairly light on documentation, brat does offer some nice benefits, such as a sensible UI that presents text as it would appear to a reader and maintains annotations close to the text, and an easy setup process for annotators, leaving configurations and server/data maintenance to engineers. Please note that it has only had a few updates over the past few months but doesn’t necessarily seem to be under active development.

You can download brat at https://github.com/nlplab/brat

Poplar

Inspired by brat (see above) Poplar is a web-based annotation tool for text labeling. We haven’t used it and it also appears to be updated intermittently, but it does have some corporate support and is a decent alternative to brat if you want to give it a try. There is a demo available so you can test it out before you download and install it.

You can try Poplar at https://github.com/synyi/poplar

YEDDA

YEDDA is a solid text labeling project focused on efficiency and ease of use.. It supports shortcut annotation and offers a command annotation model which annotates multiple entities as a batch and supports exporting annotated text into sequence text. It also includes “intelligent recommendation modeling” which uses an existing model to suggest annotation entities and spans and has some of the best-developed administration features of all the projects we looked at.

You can download YEDDA at https://github.com/jiesutd/YEDDA

Label Studio

You didn’t think we’d leave ourselves off the list, did you? Label Studio is one of the most popular labeling solutions on this list, in part because it works for not only text labeling, but also video, images, audio, time-series, and more.

Check out Label Studio at https://github.com/heartexlabs/label-studio/ or read on for more info about Label Studio…

Why go with the Label Studio approach to sentiment analysis?

Label Studio is always a solid choice when looking to label data for sentiment analysis. It features configurable layouts and templates that can adapt to your dataset and workflow. It also easily integrates with your ML/AI pipeline using webhooks, Python SDK and APIs, and you can connect to cloud object storage and label data directly with S3 and GCP. You can save time using Label Studio’s ML-assisted predictions to assist your labeling process with ML backend integration. And you can do this across multiple projects, use cases and data types using a single platform. But don’t take our word for it, give it a try!

Ready to dive deeper into sentiment analysis and NLP? Here are ten important considerations for NLP labeling to get you on your way.

Related Content