Election 2016 What Twitter says!
This project explores how Twitter
can be used to analyze the popularity of Political Parties for the Presidential Election 2016
. It is purely based on the data collected from Twitter. Our aim is to present unbiased analysis of popularity based on the sentiment analysis for each party (Republican and Democratic) at basic geospatial unit area which is `county' (not
Electoral Vote region) is our case.
We also present time-range analysis of the data. End user can select date-time range of their choice and analysis will only be presented based on the data from that range.
The motivation for this project is to build a Deep Learning Sentiment Analysis model which can identify the sentiment behind every tweet. Politics is an area where lots of positive and negative tweets are available. The Election season is a good time to gather such tweets. Our aim is to associate correct sentiment with the tweets. Apart from that we used Machine Learning classifier model to predict the affiliation of the tweet either with Republican or Democrats.
A Bursty Event Detection System is used to detect when surge of tweets happened. Our Bursty Event detection correctly identified the tweet bursts and associated with the event happened at that time.
We have millions of general tweets collected from Twitter APIs and our aim is to find politics and election related tweets from this huge dataset.
Learn Keywords Associated to Politics
We crawled news articles categorized for Election from New York Times and Reuters. We used this articles to learn the keywords/features associated to politics.
After we have handful of keywords we used Word2Vec and trained it on general tweet dataset to find similar keywords we have obtained earlier.
Classifier for Political Tweets
A machine learning classifier is used to classify political and non-political tweets. Furthermore same technique is used to classify the political affiliation (Democrats or Republican).
A Deep learning LSTM network is used to evaluate the sentiment behind the tweet. The confidence of sentiment score ranges between 0 and 1, where 0 and 1 being the most negative sentiment and most positive sentiment respectively. 0.5 is assumed to be neutral.
Time and Geotagging
Each tweet has time and geo-location attached with it. We leverage geo-location information to associate it with counties. (Note: It is not related to Electoral Vote Region)
The sentiment score associated with each county is average score of positive and negative sentiment tweets for that region.
Positive Score is the average score of all tweets with sentiment score greater than a threshold
Similarly Negative score is the average score of all the tweets with sentiment score less than 1.0 - threshold
Final Sentiment Score is just the sum of the Positive and Negative sentiment score obtained above.
County Level Score: The political party with maximum final sentiment score is considered winner for that County.
State Level Score: The political party with maximum number of winning counties is considered as winner.
Facts and Figures (updated Oct 16th, 2016)
General Tweets: 286 million (Worldwide)
Political Tweets 1.5+ millions (All Geotagged)
Republican Tweets: 822,062
Democrat Tweets: 702,042
This project is an effort from InitialDLab, School of Computing,
University of Utah, in collaboration with Prof. Jian Li from
- Prof. Feifei Li
- Debjyoti Paul (Project Lead)
- Yu Xin
- Murali Krishna Teja
- Richie Frost
- Prof. Jian Li (Tsinghua University)
- Dyllon Gagnier
- Elijah Grubb
- Jun Tang