-
Notifications
You must be signed in to change notification settings - Fork 19
Text Mining Project: Sentiment Analysis for the Presidential Candidates #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
awu1
wants to merge
9
commits into
sd16fall:master
Choose a base branch
from
awu1:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ed6363f
Alyssa Wu: Mini Project 1
a52948d
Alyssa Wu: Mini Project 1
220e0e8
Updating mini proj 1 for ninja review
0a4d0c8
turning in my mini project for ninja review
f8d5a63
Turning in mini project for ninja review
d8fe44e
Turning in my Mini Project #1
07ac862
Donald Trump Histogram Exhibit
0dd54a9
Hilary Clinton Histogram Exhibit
fe64f1a
Mini Project Write Up
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 7, | ||
| "metadata": { | ||
| "collapsed": false | ||
| }, | ||
| "outputs": [ | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "The last 100 tweets about Donald Trump has an average polarity of 0.0383908730159\n", | ||
| "The last 100 tweets about Hilary Clinton has an average polarity of -0.00256539101331\n", | ||
| "\n", | ||
| "The last 100 tweets about each of the presidential candidates show that Donald Trump is the more favorable presidential candidate\n" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "from pattern.web import Twitter\n", | ||
| "from pattern.en import sentiment\n", | ||
| "\n", | ||
| "import matplotlib.pyplot as plt\n", | ||
| "\n", | ||
| "t = Twitter() #twitter object created\n", | ||
| "i = None\n", | ||
| "\n", | ||
| "def draw_histogram(polarity_list, name):\n", | ||
| " # the histogram of the data\n", | ||
| " plt.figure() #opens a new window\n", | ||
| " n, bins, patches = plt.hist(polarity_list, 50, normed=1, facecolor='green', alpha=0.75)\n", | ||
| " plt.xlabel('Polarity')\n", | ||
| " plt.ylabel('Frequency')\n", | ||
| " plt.title('Histogram of '+name)\n", | ||
| " plt.axis([-1,1,0,40])\n", | ||
| " plt.grid(True)\n", | ||
| " \n", | ||
| "\n", | ||
| "def get_polarity_list(phrase):\n", | ||
| " polarity_list = []\n", | ||
| " polarity = 0\n", | ||
| " count = 0\n", | ||
| " for i in range(10):\n", | ||
| " for tweet in t.search(phrase, start=i, count=10): #search method inside Twitter object; t.searh = list\n", | ||
| " #print tweet.text\n", | ||
| " sentiment_tuple = sentiment(tweet.text)\n", | ||
| " polarity += sentiment_tuple[0]\n", | ||
| " count += 1\n", | ||
| " polarity_list.append(sentiment_tuple[0])\n", | ||
| " i = tweet.id\n", | ||
| " return polarity_list\n", | ||
| "\n", | ||
| "dt_pol_list = get_polarity_list('donald trump')\n", | ||
| "hc_pol_list = get_polarity_list('hilary clinton')\n", | ||
| "\n", | ||
| "def find_avg_polarity(new_list):\n", | ||
| " length = len(new_list)\n", | ||
| " sum = 0\n", | ||
| " for i in new_list:\n", | ||
| " sum += i\n", | ||
| " return sum/length\n", | ||
| " \n", | ||
| "print \"The last 100 tweets about Donald Trump has an average polarity of\", find_avg_polarity(dt_pol_list)\n", | ||
| "print \"The last 100 tweets about Hilary Clinton has an average polarity of\", find_avg_polarity(hc_pol_list)\n", | ||
| "print\n", | ||
| "\n", | ||
| "if(find_avg_polarity(dt_pol_list) > find_avg_polarity(hc_pol_list)):\n", | ||
| " print \"The last 100 tweets about each of the presidential candidates show that Donald Trump is the more favorable presidential candidate\"\n", | ||
| "else:\n", | ||
| " print \"The last 100 tweets about each of the presidential candidates show that Hilary Clinton is the more favorable presidential candidate\"\n", | ||
| " \n", | ||
| "draw_histogram(dt_pol_list, 'Donald Trump') \n", | ||
| "draw_histogram(hc_pol_list, 'Hilary Clinton')\n", | ||
| "\n", | ||
| "plt.show()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": { | ||
| "collapsed": true | ||
| }, | ||
| "outputs": [], | ||
| "source": [] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 2", | ||
| "language": "python", | ||
| "name": "python2" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 2 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython2", | ||
| "version": "2.7.6" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 1 | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| Project Overview [Maximum 100 words] | ||
| What data source(s) did you use and what technique(s) did you use analyze/process them? What did you hope to learn/create? | ||
|
|
||
| I used the Twitter data source and performed sentiment analysis on the recent tweets on each of the presidential candidates: Donald Trump and Hilary Clinton. I wanted to see who tweeters thought were the more favorable candidate, and perhaps this will align with who wins the election in November. I also used histograms to compare the distribution of negative, neutral, and positive polarities of the tweets about Donald Trump and Hilary Clinton. | ||
|
|
||
|
|
||
|
|
||
| Implementation [~2-3 paragraphs] | ||
| Describe your implementation at a system architecture level. You should NOT walk through your code line by line, or explain every function (we can get that from your docstrings). Instead, talk about the major components, algorithms, data structures and how they fit together. You should also discuss at least one design decision where you had to choose between multiple alternatives, and explain why you made the choice you did. | ||
|
|
||
| The goal of my code was to: (1) compare the average polarities of each presidential candidate, and (2) compare the distribution of the polarities for each presidential candidate in histograms. To achieve goal #1, I had to get a list of polarities of each candidate in order to find the average. I utilized the twitter search function and the sentiment function from the pattern module. The twitter search function allows me to obtain the tweets and the sentiment function returns the polarity and subjectivity of the tweet in a tuple. I utilized only the polarity, which is the first element in the tuple. I placed this code inside a function called get_polarity_list so it can be called for both presidential candidates without repeating the code. Then, I created the find_avg_polarity function to use the list created from get_polarity_list and find the average polarity. | ||
| To achieve goal #2, I created a function called draw_histogram. I used some of the initial code from the Machine Learning toolbox, specifically the matplotlib.pyplot module. I passed in the polarity_list generated by get_polarity_list, as previously explained, to draw a histogram using functions such as xlabel, ylabel, title, axis, grid, and show. I used the figure function so that both the presidential candidates' histogram will pop up in different windows and can be compared. Using these histograms, we can now compare the distribution of polarities. | ||
|
|
||
|
|
||
|
|
||
| Results [~2-3 paragraphs + figures/examples] | ||
| Present what you accomplished: | ||
|
|
||
| For one of the times I ran the code, I got the output below (the two histograms of the output are attached as exhibits in the folder named 'donald trump' and 'hilary clinton': | ||
|
|
||
| The last 100 tweets about Donald Trump has an average polarity of 0.0383908730159 | ||
| The last 100 tweets about Hilary Clinton has an average polarity of -0.00256539101331 | ||
| The last 100 tweets about each of the presidential candidates show that Donald Trump is the more favorable presidential candidate 'hilary clinton'): | ||
|
|
||
| Here we can see that Donald Trump appears to be the more favorable presidential candidate because the average polarity of the last 100 tweets regarding him is 0.038, which is higher than that of Hilary Clinton's which is -0.0025. When we look at Donald Trump's histogram as compared to Hilary Clinton's histogram, we see that both presidential candidates have a large number of neutral tweets. Clinton has more neutral responses than Trump and Clinton has an even spread of negative to neutral and neutral to positive polarities from -1.0 to 1.0. Trump, on the other hand, has fewer neutral tweets and the rest are more evenly spread from -0.5 to 0.5. This may indicate that while ~15-20% of the sample of 100 tweets view the presidential candidates neutrally, the rest of the tweets view Clinton more extremely than Trump. | ||
|
|
||
|
|
||
|
|
||
| Reflection [~1 paragraph] | ||
| From a process point of view, what went well? What could you improve? Other possible reflection topics: Was your project appropriately scoped? Did you have a good plan for unit testing? How will you use what you learned going forward? What do you wish you knew before you started that would have helped you succeed? | ||
|
|
||
| Throughout this mini project, I learned to use the python community online to help me write code and utilize modules and functions that I have not learned yet. I was able to test my code piece by piece before adding more to get more results or differently. For example, I'll print the polarity_list to check that it works before I start using it in another part of my code. I will use these learned functions and modules to help me complete the toolboxes and continue to utilize the python community as well. I have also learned to write more concise code- which still needs more work- but overall, there was improvement. For example, coding with indices versus the object directly. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, good job!
A few comments: