The Dark Side of Sentiment Analysis: An Exploratory Review Using Lexicons, Dictionaries, and a Statistical Monkey and Chimp

Abstract

This article discusses the inconsistencies, inac- curacies and challenges, namely the ‘dark side’ of sentiment analysis and then demonstrates problems with using sentiment analysis lexi- cons or dictionaries for estimating sentiment in textual artifacts. Sentiment analysis, an important dimension of natural language pro- cessing (NLP), has seen an exponential adop- tion rate across research and practitioner disci- plines. Many interesting developments in NLP methods continue to improve the accuracy of sentiment analysis. However, the plethora of sentiment analysis methods, dictionaries and lexicons, tools, open source code for machine learning based sentiment analysis, and of-the- shelf sentiment analysis solutions have led to a flurry of research and applied solutions with- out sufficient concern for the limitations, con- text, and the inaccuracies of sentiment anal- ysis, and the inherent ambiguities associated with the unaddressed sentiment analysis do- main challenges. Scant attention is given, es- pecially in applied research and industry usage, to the inherent ambiguities associated with the unanswered questions pertaining to the sci- ence of sentiment analysis. This study reviews known issues with sentiment analysis as docu- mented by prior research and then compares the application of multiple of-the-shelf lexi- con and dictionary methods to stock market and vaccine tweets. The intention is not in any way to improve the accuracy of sentiment analysis as compared to prior benchmarks but to identify and discuss critical aspects of the dark side and develop a conceptual discussion of the characteristics of the dark side of sen- timent analysis. We conclude with notes on conceptual solutions for the dark side of sen- timent analysis and point to future strategies that could be used to improve the accuracy of sentiment analysis and understanding. This re- search will also help align researcher and prac- titioner expectations to understanding the limits and boundaries of natural language process- ing based solutions for sentiment analysis and estimation.

Gavin Rozzi
Gavin Rozzi
Pushing the boundaries of data, technology & public policy

Gavin Rozzi is a data scientist from New Jersey with expertise in leveraging public sector datasets, spatial data & mapping and emerging technologies to inform public policy development.