Is the 2016 presidential campaign as negative as it feels?

2016 has been called the most contentious presidential campaign in US history. Is it? Rosette takes the sentiment temperature on Twitter to find out.

party symbols

Fervor on social media is peaking as both candidates’ approval ratings are at record breaking lows. If you’re active on any major social network, you’ve seen the surge of heated political posts, clickbait articles, and poll updates filling your feed. The 2016 election is also unique in that the candidates themselves are turning to Facebook and Twitter posts as their primary medium for sharing campaign news, rather than than the emails and website announcements of past elections.

Just days before the dreaded and anticipated November 8th, we decided to take a closer look at this voluminous stash of social data generated by the election. These posts are a window into the opinions of millennial voters as few have a landline to be utilized by pollsters. Using Rosette, we pulled out the major themes, sentiments, and entities. Our results were very surprising!

Methodology

To keep things simple, we limited our analysis to Twitter data published between 10/11/2016 and 10/28/2016. For our first dataset, we utilized Twitter’s streaming APIs to collect a random subset of tweets containing the words “Hillary” or “Trump.” In the second data set we altered our search to tweets that contained @realDonaldTrump, @HillaryClinton, @GovPenceIN, @timkaine, and @USAElection2016.

What we learned, part I

We used the first dataset to analyze overall election sentiment. Rosette’s entity extraction identifies the sentiment-related keywords and Rosette’s sentiment analysis determined the positivity/negativity of the corresponding text.

Election sentiment
Note: Rosette can calculate degrees of sentiment, which is not reflected in this chart.

Overall, these tweets reflect a much more positive perspective of the election then you might have expected: almost half the tweets discussing the two main candidates are positive.

But how accurate is this picture? Your results are only as good as the data you analyze. If you used a basic data scraping or search tool to look for “Hillary” and “Trump” tweets, your results would include references to many different Hillarys and various disambiguations of the word “trump,” including the playing card and the verb to trump. However, Rosette’s entity extraction and resolution functions utilize context, rules, and machine learning to filter out these erroneous instances, keeping the data you want while removing the data you don’t.

What we learned, part II

We used the second dataset to examine entity-specific sentiment within the target tweets, those that contained the Twitter handles @realDonaldTrump @HillaryClinton, @GovPenceIN, @timkaine and @USAElection2016.

Again, we used Rosette’s entity extraction, this time to pull out the most popular and relevant topics and issues in the tweets. Rosette automatically resolved multiple spellings and representations into single entities. For example, mentions of “Secretary Clinton” would resolve to the “Hillary Clinton” entity (Q6294).

We then used Rosette’s sentiment analysis to assign a sentiment score to the top and relevant entities, tallied below:

Entity frequencies
Note: If two entities are mentioned in the same tweet (eg. Clinton and DNC) Rosette can separate and assign a sentiment score to each entity individually, in addition to the macro document — or tweet — level.

The results show that Clinton is mentioned more frequently within our data set than Trump. However, while about a third of the tweets that discuss Clinton are positive, Trump’s positive mentions are closer to half.

As with any data analysis, there are a number of biases to keep in mind. For example, Trump, on average, posts three tweets for every one of Hillary’s. The content of these tweets, whether self-promotion or opponent criticism, undoubtedly feed into the results we recorded.

Drumpf Twitter

Clinton Twitter

Why bother?

With every major news agency in the country constantly running polls, what can social media data add to the conversation? With rapid changes in technology, it has become harder to reach the electorate, particularly younger generations who may not have a landline. This is evident in the decreasing accuracy of even polling veterans like Gallup. However, these same phone-less citizens are also the most active on social media. As our relationship with technology continues to evolve, pollsters would do well to adjust their methodology accordingly.

Social media data is also invaluable to political candidates and their campaign teams. Campaign managers can filter posts by geographic location — such as an important swing state — to stay abreast of changes in public opinion where it matters most. Knowing what people are saying about a candidate in real time allows a campaign to focus its efforts where they’re needed most, potentially shaping the course of the election itself.

On a more personal level, text analytics analyses like David Robinson’s viral post breaking down the authorship of Trump’s tweets shed light on the character of a candidate. Social media is both human and personal, and can provide insights far beyond the scope of traditional surveys.

Try it yourself

Want to go further? Sign up for a free Rosette API key and try analyzing election social data yourself. Check out our RapidMiner tutorial to help you get started code-free. Let us know what you find; we’d love to feature your results on our blog!