What is Sentiment Analysis?

Sentiment Analysis: Computerized quantification of subjective states from text

Sentiment analysis is a subfield of Natural Language Processing. It can be combined with other tools like Named-Entity Recognition or Topic Modelling to contextualize the sentiment, for example finding its origin or targets. Here we focus on how to quantify sentiment from text, especially in social media and other kinds of digital traces.

There has been a scientific boom in sentiment analysis with several workshops, journal issues, and books devoted to the topic. Every year there are hundreds of research papers on the topic. You can see this rise in the Google Trends volume for the term “sentiment analysis”:

While peak interest seems to have been reached in 2019, there is still a lot of interest and open research questions in sentiment analysis.

Supervised vs Unsupervised sentiment analysis

Unsupervised sentiment analysis:

Supervised sentiment analysis:

Both approaches can be combined in what is called semi-supervised or ensemble methods. Some of these approaches mix supervised and unsupervised models in one classifier.

Evaluation and generalizability are key arguments when choosing a sentiment analysis method. You can learn more about them at the end of the Supervised Sentiment analysis topic and compare yourself supervised and unsupervised methods in Evaluating sentiment analysis methods exercise.

In this topic we are going to cover various approaches to unsupervised sentiment analysis with examples of methods and software you can use.

General Inquirer

The General Inquirer

The pioneer work of Philip Stone in 1966 proposed to process text with a computer to detect the use of words of various categories. This set the basis for dictionary methods in unsupervised sentiment analysis, which are based on counting the number of appearances of the words of a list in a text. The original version of the General Inquirer contained many word classes including parts of speech, topics, as well as terms for emotions and evaluative language.

The original dictionaries of the General Inquirer were merged with other later dictionaries and an updated version was released in the 1990s. You can access the lists of positive words and of negative words of this version, which served as input for later methods like SentiStrength.

The SentimentAnalysis R package contains the General Inquirer (GI) dictionary and methods to match words in text.

Linguistic Inquiry and Word Count (LIWC)

LIWC (pronounced “Luke”) was developed as a click-and-run software by James Pennebaker in 2001. Inspired by the General Inquirer, it contains a set of word lists that are matched against words in the text to compute frequencies for each list. The word lists of LIWC were designed to cover both linguistic classes and to capture psychological processes such as cognitive processes, social processes, and emotions. Word lists for LIWC are produced by groups of experts that compare their individual word lists and expand them with synonyms. There have been three versions of LIWC in English (2001, 2007 and 2015) and dictionaries have been generated with the same method for several languages including German, Spanish, French, Arabic, and Chinese.

Here you can see an example of how LIWC words on a text:

LIWC first tokenizes the text, i.e. it identifies words by looking for separations like whitespaces and punctuation. Then LIWC iterates over each token (word) and checks if it matches any word list in the dictionary. These matches can be “hard” matches for the same exact character string, or “soft” matches with Kleene stems that are prefixes of a word. These are entries in the dictionary that end with a star symbol (“*”). You see this in the example for the entry “worr*” that matches “worry” and for “pizza*” that matches “pizza”.

In the example above you can see that words can belong to several word lists, for example the entry for "worr*" is in the “affect” list, in the “negemo” list, and in the “anxiety” list. After running these matchings, LIWC produces a list of frequency measures as the percentage of words in the whole text that are matched against each word list. In the example above, there are 12.5% words of the “negemo” list and 0% words of the “posemo” list.

The 2015 version of LIWC includes netspeak terms such as “WTF” or “LOL” and emoticons like “:)”, LIWC is a very popular tool due to the ease to use it, for example it offers a way to visualize which words are matched. It is very important to look at these matches to understand LIWC emotion word frequencies, as you can learn in the Social Data Science story about 9/11 pagers.

SentiStrength

Mike Thelwall developed in 2010 the SentiStrength method: a sentiment analysis method designed to quantify positive and negative sentiment from short, informal social media text.

SentiStrength processess text in three steps:

  1. Text preprocessing: correcting misspellings, vowel repetitions, translating emoticons and idioms, etc
  2. Match words from scored list of words in the scale [-5,+5]
  3. Apply modifiers (negation, amplification, de-amplification). These modifiers change the polarity of words and their strength. The final scores are an aggregate of these polarities.

SentiStrength takes two sources of expert input: a word list with sentiment scores and a list of modifier rules including terms for negation, amplification, etc.

SentiStrength outputs two scores: a positive score [+1,+5] and negative score [-1,-5]. Ths design matches the PANAS scales, you can learn more about them in the Measuring Emotions topic.

Sentistrength has been adapted and validated for various languages including Spanish, German, and Russian. It is distributed as a Java executable with available code and can be run from the command line with text files as input.

VADER (Valence Aware Dictionary and sEntiment Reasoner)

VADER
From VADER python tutorial

VADER is a tool very similar to SentiStrength, tailored to detect sentiment on Twitter by C.J. Hutto and Eric Gilbert in 2014. It applies the same three steps as SentiStrenght:

  1. Text preprocessing
  2. Word matching from a lexicon of positive/negative scored words
  3. Application of modifiers to the scores based on language rules

VADER’s name suggests it is the “dark version” of LIWC (“Luke”). As the authors of VADER say: “VADER distinguishes itself from LIWC in that it is more sensitive to sentiment expressions in social media contexts.”

VADER was implemented in Python and distributed as an open source package on Github and as part of the NLTK python library for NLP. Its performance was validated against annotated tweets, correlating the scores given by tweet readers with the output of VADER. VADER can be run in R with the package vader. You can learn more about how to use it in the Running sentiment analysis tutorial.

Evaluating Unsupervised Sentiment Analysis

SentiBench

Sentiment analysis methods are very easy to use, but that does not mean that their output is always accurate. It is important to choose a sentiment analysis method that captures the type of expression we want and that has been validated for the text we plan to analyze. Even if it has been validated before, you can always do a small validation yourself by annotating a random sample of texts from your dataset and following the steps of the evaluating sentiment analysis exercise.

A guide to see an overview of off-the-self (i.e. ready to use) sentiment analysis methods is SentiBench. The figure shows a summary of the accuracy of 24 methods and the original article reports various quality metrics for different kinds of texts like movie reviews, newspaper comments, and tweets. The review includes SentiStrength, LIWC, and VADER, which for some datasets can be the best performing among the surveyed methods.

Once we choose a method, then we run the analysis and assess the results. It is important not to choose a lot of different methods to see “which one works” in the statistical analysis, reporting only the ones that give results that we like. This is what is often called p-haking and can give you misleading results. The best is to run an evaluation like in our evaluating sentiment analysis exercise to make an informed decision about which method to use before applying it.