How To Do Sentiment Analysis with Amazon SageMaker

How To Do Sentiment Analysis with Amazon SageMakerUnderstanding one another can be difficult, but what if machines could accurately comprehend and interpret human emotions, attitudes, opinions and intentions simply by “listening” to online conversations? Just envision the possibilities unlocked by sentiment analysis as a tool for your business in today’s data-driven world.

Nearly every business sector demands comprehension of public sentiment to make informed decisions that drive innovation. Recognizing the immense business value sentiment analysis unlocks, we used Amazon SageMaker, AWS’s machine learning platform, to conduct a sentiment analysis project to analyze tweets (i.e., posts or “x’s”) discussing the 2020 U.S. presidential election candidates to gauge public sentiment toward Trump and Biden.

What is Sentiment Analysis?

The process of determining emotional tone from text data, sentiment analysis involves using machine learning techniques to identify and extract subjective information like whether the emotional tone is positive, negative or neutral.

Sentiment analysis tools use natural language processing technologies to automatically pore over large volumes of text data like emails, customer support chat transcripts, social media interactions and product reviews, quickly ascertaining attitudes toward or trends about particular topics. This tech-driven process has diverse applications across industries:

  • Social media monitoring to understand the sentiment around brands, products, public figures or even marketing strategies.
  • Analyzing customer feedback and reviews to identify areas for improvement.
  • Tracking market trends in near real time via news articles, transcripts or other content.
  • Detecting harassment, hate speech or other concerning speech patterns.
  • Investigating competitors to benchmark your performance against theirs.

Think about it: With sentiment analysis, you can gauge public opinion, conduct nuanced market research, monitor brand reputation and enhance customer experiences — all by analyzing the emotions behind the words.

Unfortunately, sentiment analysis projects — like many involving AI and machine learning — often face significant challenges related to data quality, model selection, scalability and more. Unstructured text data is always difficult to clean and process in a way that retains important semantic meaning. And as datasets grow, having sufficient computing power becomes paramount — processing millions of social media posts is no easy task.

Given these challenges, practical sentiment analysis demands advanced, robust tools and technologies to facilitate machine learning workflows. Enter Amazon SageMaker.

Why We Chose Amazon SageMaker

As an AWS IoT consulting partner, we chose to use SageMaker for its comprehensive set of built-in capabilities tailored for a sentiment analysis project like ours.

Amazon SageMaker is a fully managed service that empowers any developer — even those without machine learning experience — to create, train and deploy machine learning models on the AWS cloud. Most companies can’t pay to bring in specialists and maintain resources dedicated to AI development, and even experienced developers struggle to effectively deploy machine learning models. Amazon SageMaker simplifies machine learning using common algorithms and other tools to accelerate the process.

Amazon SageMaker creates a fully managed ML instance in Amazon EC2 and supports Jupyter notebooks that include drivers, packages and libraries for common deep learning platforms and frameworks. With fully managed infrastructure, tools and workflows, SageMaker enables high-performance, low-cost machine learning for any use case.

Other factors in our selection:

  • Seamless integration with a choice of ML tools and other AWS data services like S3 for storage and QuickSight for visualization.
  • Elastic compute resources that can scale up or down based on changing model complexity and data volumes.
  • A library of pretrained models for rapid deployment without training from scratch.
  • Automated machine learning features to assist with data preprocessing and model building.

In short, AWS SageMaker simplifies building, training and deploying machine learning models, making sentiment analysis more accessible than ever before.

Our AWS SageMaker Sentiment Analysis Project: The 2020 U.S. Presidential Election

As mentioned, our sentiment analysis project using SageMaker aimed to parse through tweets and see what people were saying about the presidential candidates. Let’s walk through the main steps involved.

Step 1: Import Data Using S3

We began by importing our collection of tweets mentioning Trump and Biden into an S3 bucket to make it accessible for SageMaker training. This gave us over 1.7 million rows of unstructured text data to analyze.

Using Amazon SageMaker: Import the Data (Excerpt)

Using Amazon SageMaker: Import the Data

Step 2: Data Preprocessing

With so much information, cleaning and preparing this data for analysis proved to be one of the biggest challenges. Unstructured text is notoriously messy, with duplicate tweets, misspellings, special characters and redundant data that must be removed carefully. After combining the datasets, we eliminated unwanted columns and checked for duplicates, empty tweets, etc., to process the data.

Using Amazon SageMaker: Data Preprocessing (Excerpt)

Using Amazon SageMaker: Data Preprocessing

Step 3: Model Training with SageMaker

We began by creating a SageMaker notebook instance. The managed service allows developers to leverage a variety of built-in training algorithms or import custom algorithms. In this case, we did not use a pretrained model, so we specified the location of the data in an S3 bucket and initiated training. One of Amazon SageMaker’s key advantages is continuous automatic model tuning to reveal the optimal parameters for algorithm tuning.

One key challenge was the sheer computational requirements of processing and modeling 1.7 million text entries efficiently. This is where SageMaker’s scalable architecture paid dividends — dynamically provisioning more powerful compute resources when needed.

Step 4: Exploratory Data Analysis

With a clearer view of the data, we could start exploring characteristics like tweet volume and likes over time, hashtag usage and reactions to each candidate’s mentions. During this stage, our team looked at the different “moods” in the data to derive a story from it. Unsurprisingly, our initial glance revealed most analyzed tweets were dedicated to Trump, while only about 770,000 mentioned Biden. Other interesting findings:

  • Likes and retweets correlation
  • Top countries, states and cities
  • Tweet length
  • Number of words in a tweet
  • Average word length

Correlating Likes and Retweets

Correlating Likes and Retweets

 

Number of Tweets Per Country

Number of Tweets Per Country

 

Number of Tweets Per State

Number of Tweets Per State

 

Number of Tweets Per City

Number of Tweets Per City

 

Comparing Tweet Length

Comparing Tweet Length

Step 5: Sentiment Analysis & Visualization

Once our model was trained and optimized, we deployed it for sentiment analysis in just a few clicks. There are many libraries available, but we used TextBlob. Our polarity — the range within which we can determine sentiment — was between -1 and 1. Within that range, our algorithm calculates an average to determine whether a tweet conveyed a strongly positive or negative sentiment. SageMaker’s integration with QuickSight also allowed us to visualize model outputs easily without additional coding.

The integrated notebooks, managed computing resources and prebuilt algorithms/frameworks in SageMaker significantly streamline the entire sentiment analysis workflow compared to coding everything from scratch. By providing an end-to-end environment for the entire machine learning workflow, SageMaker allowed us to handle data ingestion, exploratory analysis, modeling and result interpretation all in one consistent ecosystem.

What Did Our Sentiment Analysis Find?

Most tweets were neutral and settled in the middle, with Trump seeing a slightly more positive sentiment overall. Next, we looked at how polarity changed over time but saw no noteworthy variation throughout our analysis. Of course, with an analysis like ours, the data is never genuinely conclusive because it can be manipulated in many ways. Consider chatbots pushing out — and interacting with — social media posts with the hashtag we’re looking for … that false data inevitably skews the results.

Tweet Polarity

Tweet Polarity

 

Sentiment Polarity Over Time

Sentiment Polarity Over Time

 

Positive Polarity Shift Over Time

Positive Polarity Shift Over Time

 

Negative Polarity Shift Over Time

Negative Polarity Shift Over Time

 

Sentiments of Tweets

Sentiments of Tweets

 

Word Cloud for Tweets Using Trump Hashtags

Word Cloud for Tweets Using Trump Hashtags

 

Word Cloud for Tweets Using Biden Hashtags

Word Cloud for Tweets Using Biden Hashtags

While no machine learning project comes without obstacles, SageMaker provided a comprehensive, managed environment that empowered our team to take on this sentiment analysis initiative successfully — without coding everything from scratch. If your team or organization is looking to leverage SageMaker for similar use cases, we advise the following:

  • Test the tool first with smaller datasets and simpler models to gauge if SageMaker fits with your workflows (and it will if you’re running other AWS services!).
  • Clean your data thoroughly upfront — unstructured text is messy.
  • Explore pretrained models before building custom solutions.
  • Monitor cloud computing costs vigilantly — they can spiral quickly when unattended.

SageMaker’s comprehensive capabilities made the managed service an ideal solution for our sentiment analysis needs. As natural language processing and understanding grows in importance, we’re well positioned as an AWS consulting partner with skills in hardware development, embedded software, cloud and mobile applications to leverage AWS tools like SageMaker for more innovative use cases in the future.

Let us know how we can help you harness the power of sentiment analysis to boost your business.

Additional Resources

For more information, check out our other blog posts on training machine learning models: