Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges

Mondher Bouazizi; Tomoaki Ohtsuki

doi:10.26599/BDMA.2019.9020002

| Sign up

PDF (715.5 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (6)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Tables (10)

Table 1

Table 2

Table 3

Table 4

Table 5

Open Access

Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges

Mondher Bouazizi(), Tomoaki Ohtsuki

∙ Department of Information and Computer Science, Keio University, Yokohama 223-8542, Japan.

Show Author Information

Abstract

Sentiment analysis refers to the automatic collection, aggregation, and classification of data collected online into different emotion classes. While most of the work related to sentiment analysis of texts focuses on the binary and ternary classification of these data, the task of multi-class classification has received less attention. Multi-class classification has always been a challenging task given the complexity of natural languages and the difficulty of understanding and mathematically "quantifying" how humans express their feelings. In this paper, we study the task of multi-class classification of online posts of Twitter users, and show how far it is possible to go with the classification, and the limitations and difficulties of this task. The proposed approach of multi-class classification achieves an accuracy of 60.2% for 7 different sentiment classes which, compared to an accuracy of 81.3% for binary classification, emphasizes the effect of having multiple classes on the classification performance. Nonetheless, we propose a novel model to represent the different sentiments and show how this model helps to understand how sentiments are related. The model is then used to analyze the challenges that multi-class classification presents and to highlight possible future enhancements to multi-class classification accuracy.

Keywords

Twitter sentiment analysis machine learning

References

[1]

M. A.

Cabanlit

and K. J.

Espinosa

, Optimizing N-gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons, in Proc. 5th Int. Conf. Information, Intelligence, Systems and Applications, Chania, Greece, 2014, pp. 94-97.

Crossref

[2]

U. R.

Hodeghatta

, Sentiment analysis of Hollywood movies on Twitter, in Proc. 2013 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, Niagara Falls, Canada, 2013, pp. 1401-1404.

Crossref

[3]

J. M.

Soler

, F.

Cuartero

, and M.

Roblizo

, Twitter as a tool for predicting elections results, in Proc. 2012 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, 2012, pp. 1194-1200.

Crossref

[4]

Ghag

and K.

Shah

, Comparative analysis of the techniques for sentiment analysis, in Proc. 2013 Int. Conf. on Advances in Technology and Engineering, Mumbai, India, 2013, pp. 1-7.

Crossref

[5]

K. H. Y.

Lin

, C. H.

Yang

, and H. H.

Chen

, What emotions do news articles trigger in their readers? in Proc. 30th Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Amsterdam, Netherlands, 2007, pp. 733-734.

Crossref

[6]

K. H. Y.

Lin

, C. H.

Yang

, and H. H.

Chen

, Emotion classification of online news articles from the reader’s perspective, in Proc. 2008 IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, Sydney, Australia, 2008, pp. 220-226.

Crossref

[7]

, R. F.

, and J.

, Emotion prediction of news articles from reader’s perspective based on multi-label classification, in Proc. 2012 Int. Conf. on Machine Learning and Cybernetics, Xi’an, China, 2012, pp. 2019-2024.

[8]

W. B.

Liang

, H. C.

Wang

, Y. A.

Chu

, and C. H.

, Emoticon recommendation in microblog using affective trajectory model, in Proc. 2014 Asia-Pacific Signal and Information Processing Association Annu. Summit and Conf., Chiang Mai, Thailand, 2014, pp. 1-5.

Crossref

[9]

Sriram

, D.

Fuhry

, E.

Demir

, H.

Ferhatosmanoglu

, and M.

Demirbas

, Short text classification in twitter to improve information filtering, in Proc. 33rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Geneva, Switzerland, 2010, pp. 841-842.

Crossref

[10]

Boia

, B.

Faltings

, C. C.

Musat

, and P.

, A:) is worth a thousand words: How people attach sentiment to emoticons and words in tweets, in Proc. 2013 Int. Conf. on Social Computing, Alexandria, VA, USA, 2013, pp. 345-350.

Crossref

[11]

Manuel

, K. V.

Indukuri

, and P. R.

Krishna

, Analyzing internet slang for sentiment mining, in Proc. 2010 2nd Vaagdevi Int. Conf. on Information Technology for Real World Problems, Warangal, India, 2010, pp. 9-11.

Crossref

[12]

Y. H. P. P.

Priyadarshana

, K. I. H.

Gunathunga

, K. K. A.

Nipuni

, N.

Perera

, L.

Ranathunga

, P. M.

Karunaratne

, and T. M.

Thanthriwatta

, Sentiment analysis: Measuring sentiment strength of call centre conversations, in Proc. 2015 IEEE Int. Conf. on Electrical, Computer and Communication Technologies, Coimbatore, India, 2015, pp. 1-9.

Crossref

[13]

Srivastava

and M. P. S.

Bhatia

, Quantifying modified opinion strength: A fuzzy inference system for Sentiment Analysis, in Proc. 2012 Int. Conf. on Advances in Computing, Communications and Informatics, Mysore, India, 2013, pp. 1512-1519.

Crossref

[14]

Bouazizi

and T.

Ohtsuki

, Sentiment analysis: From binary to multi-class classification: A pattern-based approach for multi-class sentiment analysis in Twitter, in Proc. 2016 IEEE Int. Conf. on Communications, Kuala Lumpur, Malaysia, 2016, pp. 1-6.

Crossref

[15]

Bouazizi

and T.

Ohtsuki

, A pattern-based approach for multi-class sentiment analysis in twitter, IEEE Access, vol. 5, pp. 20617-20639, 2017.

Crossref Google Scholar

[16]

Hall

, E.

Frank

, G.

Holmes

, B.

Pfahringer

, P.

Reutemann

, and I. H.

Witten

, The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10-18, 2009.

Crossref Google Scholar

[17]

Breiman

, Random forests, Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.

Crossref Google Scholar

[18]

Davidov

, O.

Tsur

, and A.

Rappoport

, Semi-supervised recognition of sarcastic sentences in Twitter and Amazon, in Proc. 14th Conf. on Computational Natural Language Learning, Uppsala, Sweden, 2010, pp. 107-116.

Big Data Mining and Analytics

Volume 2 Issue 3,
September 2019

Pages 181-194

DOI: 10.26599/BDMA.2019.9020002

Cite this article:

Bouazizi M, Ohtsuki T. Multi-Class Sentiment Analysis on Twitter: Classification Performance and Challenges. Big Data Mining and Analytics, 2019, 2(3): 181-194. https://doi.org/10.26599/BDMA.2019.9020002

Return

10.26599/BDMA.2019.9020002.T001Table 1Structure of the dataset used.

Class	Training set	Test set
Fun	3000	2643
Happiness	3000	2963
Love	3000	1945
Neutral	3000	4989
Sadness	3000	4528
Anger	3000	1558
Hate	3000	1115
Total	21 000	19 740

10.26599/BDMA.2019.9020002.T002Table 2Accuracy, Precision, Recall, and F-measure of the binary classification.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Fun	80.1	88.4	80.1	84.0
Anger	82.2	70.9	82.2	76.1
Fun vs Anger	80.9	81.9	80.9	81.1
Happiness	81.9	74.3	81.9	77.9
Sadness	81.5	87.3	81.5	84.3
Happiness vs Sadness	81.6	82.2	81.6	81.8
Love	93.8	98.9	93.8	96.3
Hate	98.1	90.1	98.1	93.9
Love vs Hate	95.4	95.7	95.4	95.4

10.26599/BDMA.2019.9020002.T003Table 3Accuracy, Precision, Recall, and F-measure of the ternary classification.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Fun (F)	50.0	63.2	50.0	55.8
Neutral (N)	74.5	73.6	74.5	74.1
Anger (A)	70.9	54.0	70.9	61.3
(F) vs (N) vs (A)	66.9	67.3	66.9	66.7
Happiness (Hp)	68.2	64.0	68.2	66.0
Neutral (N)	69.3	62.5	69.3	65.8
Sadness (S)	59.2	70.7	59.2	64.4
(Hp) vs (N) vs (S)	65.4	65.8	65.4	65.3
Love (L)	82.0	75.4	82.0	78.6
Neutral (N)	84.8	92.2	84.8	88.4
Hate (Ht)	93.0	77.2	93.0	84.3
(L) vs (N) vs (Ht)	85.3	86.1	85.3	85.5

10.26599/BDMA.2019.9020002.T004Table 4Accuracy, Precision, Recall, and F-measure of the 4-class classification.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
(F)-(A)-(Hp)-(S)	60.4	60.7	60.4	60.2
(F)-(A)-(L)-(Ht)	74.9	75.9	74.9	74.5
(Hp)-(S)-(L)-(Ht)	74.5	75.2	74.5	74.7

10.26599/BDMA.2019.9020002.T005Table 5Accuracy, Precision, Recall, and F-measure of the 5-class classification.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
(F)-(A)-(Hp)-(S)-(N)	54.4	55.4	54.4	54.1
(F)-(A)-(L)-(Ht)-(N)	66.9	66.9	66.9	66.3
(Hp)-(S)-(L)-(Ht)-(N)	64.1	64.6	64.1	63.8

10.26599/BDMA.2019.9020002.T006Table 6Accuracy, Precision, Recall, and F-measure for the 6-class classification of tweets of 6 classes.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-measure (%)
Fun	39.1	56.8	39.1	46.3
Anger	59.3	52.4	59.3	55.6
Happiness	57.6	54.6	57.6	56.0
Sadness	63.9	68.6	63.9	66.1
Love	71.1	55.5	71.1	62.3
Hate	86.8	73.2	86.8	79.4
Overall	60.4	60.5	60.4	60.0

10.26599/BDMA.2019.9020002.T007Table 7Accuracy, Precision, Recall, and F-measure for the classification of tweets of 7 classes.

Class	Accuracy (%)	Precision (%)	Recall (%)	F-Measure (%)
Fun	40.7	60.5	40.7	48.7
Anger	62.2	63.0	62.2	62.6
Happiness	54.3	58.6	54.3	56.4
Sadness	52.1	65.3	52.1	58.0
Love	75.2	62.9	75.2	68.5
Hate	90.9	80.4	90.9	85.4
Neutral	67.8	52.3	67.8	59.0
Overall	60.2	60.8	60.2	59.7

10.26599/BDMA.2019.9020002.T008Table 8Values of $𝜹 (a, b)$ for different depths.

(a, b)	δ(a, b)
(0, 0)	1/2*
(0, 1), (1, 0)	1/8
(1, 3), (2, 2), (3, 1)	1/24
(1, 4), (2, 3), (3, 2), (4,1)	1/64
(2, 4), (3, 3), (4, 2)	1/96
(3, 4), (4, 3), (4, 4)	1/128

Note: $^{*}$ To make sure that all the coefficients sum up to 1, this coefficient is set to be equal to 65/128 instead.

10.26599/BDMA.2019.9020002.T009Table 9Distance between the different sentiments as measured with $D_{U}$ .

	(F)	(Hp)	(L)	(N)	(A)	(S)	(Ht)
(F)	0	0.61	0.85	-	1	1	1
(Hp)	0.61	0	0.79	-	1	1	1
(L)	0.85	0.79	0	-	1	1	1
(N)	-	-	-	0	-	-	-
(A)	1	1	1	-	0	0.83	0.71
(S)	1	1	1	-	0.83	0	0.84
(Ht)	1	1	1	-	0.71	0.84	0

10.26599/BDMA.2019.9020002.T010Table 10Distance between the different sentiments as measured with $D_{P}$ .

	(F)	(Hp)	(L)	(N)	(A)	(S)	(Ht)
(F)	0	0.95	0.94	0.98	1	1	1
(Hp)	0.95	0	0.95	0.99	1	1	1
(L)	0.94	0.95	0	0.99	1	1	1
(N)	0.98	0.99	0.99	0	0.99	0.99	0.99
(A)	1	1	1	0.99	0	0.96	0.97
(S)	1	1	1	0.99	0.96	0	0.96
(Ht)	1	1	1	0.99	0.97	0.96	0