In my previous post, I had talked about Blockspring and how it offers an easy access to a host of APIs for fetching data from numerous web resources, visualization, and most importantly for tasks such as entity extraction, image tagging, and sentiment analysis etc that normally would require mastery over natural language processing, machine learning, and computer vision. With Blockspring offering access to a large number of third party APIs, building intelligent applications is now possible even for those Excel users who do not like coding. Using APIs via Blockspring is just like writing an Excel formula.
As I was going over different APIs from different vendors to use with Blockspring, I noticed three different vendors offering APIs for sentiment analysis of text. So I thought it would be a kind of cool to compare their results on different pieces of text. This post is about such a comparison.
Sentiment Dataset with Ground Truth
Before jumping into performing comparison, I needed a text data set that already had ground truth attached to each entry in the data set. Luckily one such data set, Sentiment Labelled Sentences Data Set, is available at UCI Machine Learning Repository. This data set contains sentences from three different websites (imdb.com, amazon.com, and yelp.com) dealing with reviews of movies, products, and restaurants, respectively. There are several thousand reviews at each site with about 500 of them at random at each site marked positive indicating the underlying sentence sentiment to be positive. Similarly there are about 500 sentences selected at random from each site marked negative. According to the contributors of the data set, the positive or negative labels were assigned manually to sentences with clear non-neutral sentiments. After downloading and unzipping the data, I read it into an Excel file. Data corresponding to each website was read into a separate sheet. Next, I filtered the data to keep only labelled sentences. A screenshot of an Excel sheet after filtering is shown below. A zero(one) in column two implies negative(positive) sentiment for the sentence in column one
APIs for Sentiment Analysis
The three APIs for sentiment analysis used for comparison come from Alchemy API, Aylien, and Indico. Each API requires a key that was obtained by registering at the respective vendor site. Alchemy Sentiment analysis API offers several options in terms of sentiment determination; for example, you can specify sentiment to be determined for entities in a document or for the entire document. According to Alchemy API documentation, its sentiment analysis algorithm works by looking for positive and negative words and then it aggregates them to yield output. The document level sentiment is outputted as a score between +1 to -1. A positive score implies positive sentiment and a negative score indicates negative sentiment. The neutral sentiment is scored as zero. Along with sentiment score, the Alchemy API also outputs a score for another indicator, called mixed. A value of 1 for “mixed” indicates the presence of both positive and negative sentiments in the text.
The Aylien API for sentiment analysis of text returns two pairs of output values. The first pair consists of “polarity” indicator and “confidence” in this indicator. The polarity indicator takes on positive, neutral or negative as values and the polarity confidence is a number between 0-1. A value close to 1 indicates higher confidence. The other output pair is “subjectivity” indicator and its “confidence” value. The subjectivity indicator is either returned as “subjective” meaning the text is subjective as in “Jack is a good listener”, or “objective” implying the text is stating a fact as in “cancer is dreadful”. The subjectivity confidence value is also in the range of 0-1.
The sentiment analysis API from Indico uses multinomial logistic regression on n-grams with tf-idf features to extract the sentiment of a document. The API returns a single number in the range of 0-1 as the output. This number represents the likelihood that the analyzed text is positive or negative. Values greater than 0.5 indicate positive sentiment, while values less than 0.5 indicate negative sentiment. Values near 0.5 stand for neutral sentiment.
Making API Calls in Excel
Making API calls in Excel using Blockspring or running a block is pretty straight forward. For AlchemyAPI, we use the following formula, say in cell C3, to determine the sentiment of the sentence in cell A3, and copy it down the column to get sentiment scores for all of the sentences in our spreadsheet. The “sleep” parameter in the formula ensures that API requests are made with gaps of five seconds so as to not overwhelm the system.
=BLOCKSPRING(“sentiment-analysis-from-text-with-alchemyapi”,”text”,A3, “_sleep”, 5)
The AlchemyAPI for text sentiment returns four output values in the Blockspring browser. Blockspring also provides the function bGetKey(result, key) that we can use to bring results in the spreadsheet. Thus, I used the following formulas to get sentiment type score in cells D3 and E3, respectively.
Copying down these formulas gives us the results from AlchemyAPI for the rest of the sentences. It turns out that the scores returned by Blockspring are treated as text by Excel. I, therefore, modified the formula for score as =VALUE(bGetKey(C3, “type”)) by adding “VALUE” function of Excel to convert the score format to numbers.
Running Aylien calls is similar with the following formulas:
In the case of Indico, the output is directly returned in the cell with the block running formula. Thus, I used the following formula in cell I3:
A screen shot for reviews from Yelp is shown below. Same procedure was repeated for Amazon and IMDB reviews.
After receiving sentiment scores, I tabulated the results using appropriate Excel formulas. Although the ground truth for sentences had only positive or negative labels, Alchemy and Aylien returned neutral polarity for many review sentences. While Indico doesn’t return type/polarity labels, the sentiment score near 0.5 is reflective of neutral sentiment as per Indico documentation. Thus, while analyzing Indico’s output, score above 0.55 was considered positive, and score below 0.45 was considered negative. Any score between 0.45 to 0.55 was taken as neutral. A summary of sentiment scoring performance by the three APIs is shown in the graph below.
Looking at the results, it is clear that Indico outperforms the other two APIs not only in terms of correctly predicting the sentiment but also in labelling lowest number of sentiment sentences as neutral. To get an idea of the error patterns, I examined sentences where errors were made. A compilation of some of such sentences with ground truth and sentiment labels from the three APIs is given below.
Although Indico seems to perform better, one shouldn’t conclude that it is always going to perform better than the other two APIs because many of the labelled sentences were really short and thus may not have provided enough context to accurately access the sentiments.