Election 2016

What Twitter says!
This project explores how Twitter can be used to analyze the popularity of Political Parties for the Presidential Election 2016. It is purely based on the data collected from Twitter. Our aim is to present unbiased analysis of popularity based on the sentiment analysis for each party (Republican and Democratic) at basic geospatial unit area which is `county' (not Electoral Vote region) is our case.
We also present time-range analysis of the data. End user can select date-time range of their choice and analysis will only be presented based on the data from that range.

Motivation

The motivation for this project is to build a Deep Learning Sentiment Analysis model which can identify the sentiment behind every tweet. Politics is an area where lots of positive and negative tweets are available. The Election season is a good time to gather such tweets. Our aim is to associate correct sentiment with the tweets. Apart from that we used Machine Learning classifier model to predict the affiliation of the tweet either with Republican or Democrats.
A Bursty Event Detection System is used to detect when surge of tweets happened. Our Bursty Event detection correctly identified the tweet bursts and associated with the event happened at that time.

Methods

We have millions of general tweets collected from Twitter APIs and our aim is to find politics and election related tweets from this huge dataset.

Learn Keywords Associated to Politics

We crawled news articles categorized for Election from New York Times and Reuters. We used this articles to learn the keywords/features associated to politics. After we have handful of keywords we used Word2Vec and trained it on general tweet dataset to find similar keywords we have obtained earlier.

Classifier for Political Tweets

A machine learning classifier is used to classify political and non-political tweets. Furthermore same technique is used to classify the political affiliation (Democrats or Republican).

Sentiment Analysis

A Deep learning LSTM network is used to evaluate the sentiment behind the tweet. The confidence of sentiment score ranges between 0 and 1, where 0 and 1 being the most negative sentiment and most positive sentiment respectively. 0.5 is assumed to be neutral.

Time and Geotagging

Each tweet has time and geo-location attached with it. We leverage geo-location information to associate it with counties. (Note: It is not related to Electoral Vote Region)

Sentiment Score

The sentiment score associated with each county is average score of positive and negative sentiment tweets for that region.
Positive Score is the average score of all tweets with sentiment score greater than a threshold (say 0.7).
Similarly Negative score is the average score of all the tweets with sentiment score less than 1.0 - threshold
Final Sentiment Score is just the sum of the Positive and Negative sentiment score obtained above.
  • County Level Score: The political party with maximum final sentiment score is considered winner for that County.
  • State Level Score: The political party with maximum number of winning counties is considered as winner.

    Facts and Figures

    (updated Oct 16th, 2016)
  • General Tweets: 286 million (Worldwide)
  • Political Tweets 1.5+ millions (All Geotagged)
  • Republican Tweets: 822,062
  • Democrat Tweets: 702,042

    Credits:

    This project is an effort from InitialDLab, School of Computing, University of Utah, in collaboration with Prof. Jian Li from Tsinghua University
    Team Members: