RELEVANCE & RESEARCH QUESTION
On e-commerce websites such as Amazon, customers readily comment on product highlights and flaws. This provides an important opportunity for companies to collect customer feedback. Fine-grained analyses of customer reviews can support managerial decision-making, especially in marketing. The vast amount of user-generated content necessitates the application of automated analysis techniques. One method of computationally processing unstructured text data is sentiment analysis, which examines people’s opinions, evaluations, emotions, and attitudes towards products, services, organizations, or other topics (Liu 2012, p. 1). Past studies have primarily focused on document-level or sentence-level sentiment analysis. However, for practical applications, there is a substantial need for finer-grained analyses to determine what exactly customers like or dislike – thus, for aspect-based sentiment analysis (ABSA). However, the implementation of ABSA is challenging.
The need for ABSA in marketing, the limitations of traditional sentiment analysis methods, and recent progress in the field of artificial neural networks make the latter’s application to ABSA a relevant research topic.
The objectives of this thesis are to
̶ propose and evaluate a novel model architecture for ABSA that combines gated recurrent units and convolutional neural networks;
̶ apply the proposed model to laptop reviews to gain insight into customer requirements and satisfaction;
̶ discuss limitations of the proposed model, also from a market research perspective.
METHODS & DATA
ABSA is divided into the subtasks of aspect term extraction and aspect sentiment classification. One neural network was trained to predict for each word of every sentence whether the word is an aspect. All aspects and their corresponding sentences were passed on to a second neural network, which predicts whether the sentiment expressed regarding the aspect is positive, negative, or neutral. Building on previous research, this thesis suggests combining two artificial neural network types, namely gated recurrent units and convolutional neural networks.
The proposed model was trained using the SemEval 2014 laptop review dataset (SemEval 2014). In addition, the thesis author manually annotated laptop reviews. The combined dataset consists of 5,165 sentences, totaling approximately 72,900 words. To evaluate model performance compared to previous research, the proposed model was trained only on the original SemEval training set and tested on the SemEval test set.
The evaluation results (Fig.1, 2) are promising. Without using sentiment lexica, handcrafted rules or manual feature engineering, the proposed system achieves competitive results on the benchmark dataset. It is especially effective at extracting aspects.
Model F1 score
Proposed model 81.63%
Filho and Pardo (2014) 25.19%
Pontiki et al. (2014) 35.64%
Toh and Wang (2014) 70.41%
Chernyshevich (2014) 74.55% †
Liu, Joty, and Meng (2015) 74.56%
Poria, Cambria, and Gelbukh (2016) 77.32%
Xu et al. (2018) 77.67%
Fig. 1: Performance on aspect term extraction on the SemEval test set
†: trained on twice as much training data, use of an additional training set
To ensure comparability, only the performance of models with publicly available word embeddings are reported in Fig. 1. With domain-specific word embeddings and a set of linguistic patterns, Poria, Cambria, and Gelbukh (2016) reached an F1 score of 82.32%, which appears to be the current state of the art for this task.
Model | Accuracy | Macro F1 score
Proposed model | 68.45% | 63.92%
Pontiki et al. (2014) | 51.37% | n/a
Negi and Buitelaar (2014) | 57.03% | n/a
Wang et al. (2016) | 68.90% | n/a
Wagner et al. (2014) | 70.48% | n/a
Kiritchenko et al. (2014) | 70.48% | n/a
Tang et al. (2016) | 71.83% | 68.43%
Chen et al. (2017) | 74.49% | 71.35%
Fig. 2: Performance on aspect sentiment classification on the SemEval test set
Sentiment misclassifications can be grouped into three types of mistakes: predicting the opposite sentiment, predicting a strong sentiment instead of neutrality, and predicting neutrality instead of a strong sentiment. For marketers who interpret the model predictions, the first type of mistake would be the most severe. The third type of mistake was most common. It is arguably the least severe mistake and shows that the proposed system tends to be conservative in its predictions.
Overall, model performance is encouraging, especially because the model used only two features. This is in sharp contrast to traditional methods. Moreover, no specialized knowledge of linguistics was needed to develop the proposed system. In addition, it does not use any sentiment lexica, which is especially beneficial when considering languages other than English.
A case study in the laptop domain illustrates how and to what degree the proposed ABSA system is useful for practical purposes in market research.
The paper’s contributions are
̶ provision of a labeled dataset for ABSA, which could enhance other models;
̶ provision of refined annotation guidelines that consider marketing needs;
̶ proposal and implementation of a system that combine gated recurrent units and convolutional neural networks;
̶ performance evaluation of the system;
̶ error analyses, which can help practitioners to interpret the model output and may allow academics to improve future models;
̶ model outputs that summarize the customer opinions voiced in unlabeled and unstructured reviews;
̶ some insight into customer satisfaction and preferences (regarding the case study laptops), which might facilitate decision-making in marketing;
̶ guidance on why, how, and under what limitations to use ABSA, especially for marketing purposes.
With only words and part-of-speech tags as inputs, the proposed system achieves competitive results on the benchmark dataset. Sentiment lexica, handcrafted rules or manual feature engineering are not required. The system can be readily used to analyze English customer reviews of laptops. Given appropriate training data, the approach may also be applicable to other product categories and languages.
ABSA offers a structured representation of the most frequently mentioned positive and negative aspects in customer reviews. Moreover, it does so in a timely manner. The output can help to determine what reviewers like and dislike about a product. Given a large amount of review text, ABSA provides a detailed picture of customer satisfaction and can stimulate product improvements. It can also support marketers in inferring the reviewers’ reasons to purchase the product and the purposes for which they use it. Moreover, ABSA can complement traditional marketing research, especially as a preliminary study or by providing up-to-date information. In short, it can help companies to understand customers.