Sentiment Analysis

Project Overview:

This project conducted a sentiment analysis on about 325,000 comments relating to Apple’s flagship (the latest and greatest from a brand), devices launched from September 15, 2020 to January 1st, 2021.

The devices:

- iPhone 12 series (Mini, 12, 12 Pro, 12 Pro Max)

-iPad Air and 4th gen

-Apple watch series 6 and SE

-Macbook Pros with M1 processors

An analysis was ran on the overarching sentiment week to week and the stock prices and were able to uncover some trends. Our results concluded that the overarching sentiment from Youtube videos, although there was light correlation, are not enough to accurately predict prices. Despite this the project revealed some interesting trends.

The Team

Taranjot Singh Samra - Team Lead, Collected data, NLTK, Vader, Transformed and cleaned Data, Visualizations, TF-IDF Analysis

Jaycob Carswell - NLTK, Write-up

Karam Singh - Write up and Analysis, Vader, Cleaned Data

Daniel Sandoval - Write ups for plots

Nathan Krause - Database design, created scraping module, time-series visualizations

Background:

Social media is a gold mine for data mining. It gives us a lot of insight into key words and concerns that consumers may have relating to the brand and their product.

Natural language processing is a powerful tool that allows us to find trends within the data and assign a numerical value to how positive to negative a comment is on a scale of -1 to 1.

Hypothesis:

Our team hypothesizes that the value of a company’s stock price is strongly correlated to the overall sentiment of YouTube comments on videos related to said company’s product. Consumer perspective, expressed through public opinion in this consumer platform, should play an influence on the value of Apple as a company. In the same vein, general consent in the YouTube community should impact the behavior of investors in the stock market; hence reflecting appropriate fluctuation of Apple stock price according to overall sentiment.

Exploratory Data Analysis

Compound Sentiment Distribution

There is a clear concentration around 0 (neutral score), therefore, if we were to randomly sample a comment a neutral comment is most likely. The right half plane density is greater than the left half plane indicating positive comments are the second most likely to occur.

Sentiment Distribution Without Neutral Comments

Comment Frequency - Exponential Backoff

The following plot illustrates the exponential decay for mean comment count per day. This shows the difficulties we experienced when collecting comments to use as an accurate sample of consumer sentiment about a company.

Comments and Video Release Time Series

Compound Sentiment and Stock Price

Visualize any correlations between video comments and stock price for a given time period. The plot indicates there is an overall upward trend of both the compound scores and stock adjusted close price. It’s clear that there are gaps in sentiment analysis, thus, the next steps would be to collect more comments from videos.

From this we are able to see the overarching sentiments of the comments have some correlation with the stock prices.

OLS Regression - Adj Close ~ Sentiment

Ordinary least squares (OLS) regression estimates the relationship between one or more independent variables and a dependent variable. In our case we are looking at positive, neutral, and negative sentiments as our independent variables, and the stock price as the dependent variable.

The R-squared value shows a poor model fit which would indicate poor predictions for future stock prices. Collecting more data from multiple social media sites can create a better model for predicting stock prices.

TF-IDF Algorithm

Term Frequency - Inverse documentation Frequency evaluates how relevant each work is to a collection of documents. TF allows us to see the most frequent words across a document while IDF allows us to see the most unique words across a set of documents.

In order to gain more insights into each product, we can use this algorithm to help us focus on what to redesign and what each product is doing well in.

Product Insights

I ran this algorithm on each of the specific products, you can read each one by clicking on the product names here: iPhone 12 Series, iPad, Mac M1 and the Apple Watch.

Ethics and Privacy

All data collected was made voluntarily available by users when they decided to post comments on Youtube videos. However, this is not a universal sign of consent, and thus data will be anonymized in an effort to maintain the highest possible degree of privacy for users who become a part of the dataset, considering there is no explicit individual consent given when their data is collected.

Furthermore, the usernames and channel ids of comment authors have no importance in the given analyses that will be performed, and will thus remain private in any future observations/visuals/statistics done for the project,thereby also protecting the anonymity of the users from whom we have collected data. In addition to this, reviewing YouTubes Terms of Services we can see that because we are not using their services for commercial use, on behalf of a third party and nor are we using their services in such a manner that would cause harm to any user, other third party, YouTube or their Affiliates we can conclude that we are well within the moral and ethical guidelines that YouTube has outlined in their term of services: https://www.youtube.com/static?template=terms

Overall we can conclude that using this data for our intended analysis does not pose any major breach of data privacy or ethical concerns. Biases in this project will come down to any human decisions that need to be made for the sake of removing complexity in development. This will be things such as manually determining which videos to scrape, instead of automating that process. Or developing the analysis in a flawed way. The only way we can account for this is to make sure there is enough volume behind our decisions (enough videos chosen) or to put extra effort into verifying code to remove human biases.

Another factor that could have caused biases in our data is the fact that a lot of this data was acquired during the time of the Covid-19 Pandemic. There could have been a rise in impulse buys. Furthermore we also know that during the height of the pandemic many users were not actually able to physically walk into an apple store and actually test the product that they were going to buy. As a result, this could have led to a weaker sentiment expressed by consumers than normal for their products: https://www.washingtonpost.com/technology/2020/04/30/apple-earnings-coronavirus/

Considering the stakeholders involved here, the primary group that stands to benefit from this analysis are the companies (as well as their respective investors) whose products’ sentiments we are analyzing. If our analysis were to show that there is indeed a strong correlation between the sentiments and a change in stock values, then companies could attempt to use this information for nefarious purposes. For example they could propagate false sentiments online through bot accounts in an attempt to artificially manipulate the value of their stocks (similar to what the Chinese government has been documented doing in the past:https://techcrunch.com/2020/09/22/facebook-gans-takes-down-networks-of-fake-accounts-originating-in-china-and-the-philippines/). Another factor to consider would be the fact that all of the data that we have collected is freely available to Google who owns YouTube. This could mean that if Google felt that our hypothesis was true they could use the data for some possible nefarious purposes if they should so choose, maybe trying to manipulate the value of certain companies stocks.

Pitfalls of VADER

VADER doesn't handle contemporary language very well. For example, That daft punk poster is so freaking rad has a compound score of -0.5598 despite the comments clear intention to compliment the daft punk poster. Furthermore, sarcasm and other contextually dependent language (memes) are hard to quantify in a programmatic manner. It would be better to train a Decision Tree Classification model to determine if a comment is positive or negative. The ideal training set for this model would have a large portion of its training data including contemporary language.

Comment Frequency

The act of scraping comments was entirely underestimated and was eventually recognized as an issue because the daily mean comment count variance is too high. An ideal data set would have roughly the same mean comments per day allowing for a more consistent sample of mean sentiment. Collecting comments is difficult due to the exponential backoff for comments/day on a single video. When a video is first released the comment rate is at its highest, however as time progreses this rate exponentially decays, thus, there would need to be a drastic number of videos scraped to reduce the comment frequency variance.

Conclusion and Discussion

Due to a lack of data, scope, and thus experimental rigor, we cannot accept our hypothesis as true. That is to say, there is not enough information to substantiate a claim that goes in either direction. By extension, our model does not currently serve as a credible basis for generating a predictive analysis on the market.

That aside, we have still demonstrated that this type of analysis does indeed have value, in that our TF-IDF analyses actually seem to empirically substantiate the “consensus” on these products that can be gathered from hearsay; thus providing a more scientific approach for gathering information that is usually reliant on gossip and spoken word.

Skills Showcased

Python, NTLK, VADER, Sentiment Analysis, Data Visualization, SQL, Google API, TF-IDF, OLS Regression Analysis, Git

User Research Project