Perakende Sektöründe Müşteri Duygularını Anlamak için Çok Modlu Duygu Analizi | ||
Yıl 2024, Cilt: 1 Sayı: 1, 44- 64, 25.12.2024 | ||
Özlem Hakdağlı, Erdem Hakdağlı | ||
Özet | ||
Bu çalışma, müşteri deneyimleri sırasında ortaya çıkan duygusal tepkileri daha kapsamlı ve doğru bir şekilde anlamak için çok modlu bir duygu analizi yöntemi geliştirmeyi amaçlamaktadır. Müşteri duygularının ölçülmesi, yalnızca müşteri memnuniyetini anlamakla kalmayıp, kişiselleştirilmiş hizmetlerin sunulması, pazarlama stratejilerinin optimize edilmesi ve müşteri sadakatinin artırılması açısından kritik öneme sahiptir. Bu doğrultuda, ses, görüntü ve metin verilerinden elde edilen duygusal ipuçları, derin öğrenme tabanlı modeller kullanılarak ayrı ayrı analiz edilmiş ve sonuçlar katsayı tabanlı bir yöntemle birleştirilmiştir. Ses verilerinin analizi için Xception modeli, yüz ifadelerindeki mikro düzeydeki farklılıkların tespiti için Xception, VGG16 ve VGG19 modelleri, metin verilerindeki bağlamsal ilişkilerin değerlendirilmesi için ise BERT ve ALBERT modelleri kullanılmıştır. Önerilen yöntemin performansı, RAVDESS, FER2013, LFW, TESS ve Beyazperde gibi açık kaynak veri setleri üzerinde test edilmiş; her bir modun duygu analizi sonuçları normalize edilerek -1 ile 1 arasında birleştirilmiştir. Deneysel bulgular, yöntemin görüntü verilerinde %98,25 F1 skoru, metin verilerinde %94,30 F1 skoru ve ses verilerinde %90,71 F1 skoru ile etkili bir performans sergilediğini ortaya koymaktadır. Geliştirilen yöntem, açık kaynak veri setlerinde başarıyla test edilmiş olup, henüz gerçek müşteri verileri üzerinde uygulanmamıştır. Bu yaklaşım, farklı veri türlerini bütüncül bir şekilde entegre ederek tek modlu analizlerin sınırlamalarını aşmakta ve müşteri duygularının kapsamlı bir şekilde anlaşılmasına olanak sağlamaktadır. Çalışma, müşteri deneyimlerinin daha derinlemesine analiz edilmesi ve kişiselleştirilmiş hizmetlerin geliştirilmesi için yenilikçi bir çözüm sunmaktadır. |
||
Anahtar Kelimelerr | ||
çok modlu duygu analizi, sınıflandırma | ||
Abstract | ||
This study aims to develop a multimodal sentiment analysis method to comprehensively and accurately understand emotional responses during customer experiences. Measuring customer emotions is critical not only for understanding customer satisfaction but also for delivering personalized services, optimizing marketing strategies, and enhancing customer loyalty. In this context, emotional cues derived from audio, visual, and textual data were independently analyzed using deep learning-based models, and the outputs from these modalities were integrated through a weight-based method. The Xception model was utilized for analyzing audio data, while Xception, VGG16, and VGG19 models were employed to capture micro-level differences in facial expressions. For textual data, BERT and ALBERT models were used to evaluate contextual relationships. The proposed method was tested on open-source datasets such as RAVDESS, FER2013, LFW, TESS, and Beyazperde, and the sentiment analysis results from each modality were normalized between -1 and 1 for integration. Experimental findings demonstrated that the method achieved an F1 score of 98.25% for visual data, 94.30% for textual data, and 90.71% for audio data, showcasing its high performance. While the developed method has been successfully tested on open-source datasets, it has not yet been applied to real customer data. This approach surpasses the limitations of unimodal analyses by integrating diverse data types and provides an innovative solution for a comprehensive understanding of customer emotions. The study also paves the way for deeper analyses of customer experiences and the development of personalized solutions. | ||
Keywords | ||
multimodal sentiment analysis, classification | ||
Kaynakça | ||
Aydoğan, M., & Kocaman, V. (2022). TRSAv1: A new benchmark dataset for classifying user reviews on Turkish e-commerce websites. Journal of Information Science, 49(6),1711-1725. https://doi.org/10.1177/01655515221074328 Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. Interspeech 2005,1517-1520. https://doi.org/10.21437/interspeech.2005-446 Busso, C., Bulut, M., Lee, C., Kazemzadeh, A., Mower, E., Kim, S., Lee, S., Narayanan, A., & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335-359. https://doi.org/10.1007/s10579-008-9076-6 Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkov, R., & Gur, R. E. (2014). CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Transactions on Affective Computing, 5(4), 377-390. https://doi.org/10.1109/TAFFC.2014.2336244 Chollet, F. (2017, July 21-26). Xception: Deep learning with depthwise separable convolutions [Paper]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1251-1258, Honolulu, HI, USA. IEEE. https://doi.org/10.1109/CVPR.2017.195 Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018 Demirtaş, S. C., & Hakdağlı, Ö. (2022, Kasım, 24-26). Dönüştürücü-CNN modeli ile Türkçe konuşma verisi üzerinde duygu tanıma [Bildiri]. ELECO 2022 Elektrik - Elektronik ve Bilgisayar Mühendisliği Sempozyumu, Bursa, Türkiye. IEEE. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database [Paper]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 248-255, Miami, FL, USA. IEEE. https://doi.org/10.1109/CVPR.2009.5206848 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June 2-7). BERT: Pre-training of deep bidirectional transformers for language understanding [Paper]. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Dupuis, K., & Pichora-Fuller, M. K. (2010). Toronto emotional speech set (TESS) [Dataset] https://doi.org/10.5683/SP2/E8H2MF Ekman, P., & Friesen, W. V. (1978). Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press. https://doi.org/10.1037/t27734-000 Goodfellow, I., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., & Bengio, Y. (2013). Challenges in representation learning: A report on three machine learning contests. In Proceedings of the Neural Information Processing Systems (NIPS) Workshop. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled Faces in the Wild: A database for studying face recognition in unconstrained environments [Paper]. Technical Report 07-49, University of Massachusetts, Amherst. Jackson, P. J. B., & Haq, S.(2014). Surrey audio-visual expressed emotion(SAVEE) database, University of Surrey. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations [Paper]. International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia. https://doi.org/10.48550/arXiv.1909.11942 Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE, 13(5), e0196391. https://doi.org/10.1371/journal.pone.0196391 Lotfian, R., & Busso, C. (2019, April). Curriculum learning for speech emotion recognition from crowdsourced labels [Paper]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 815-826. https://doi.org/10.1109/TASLP.2019.2898816 Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+) [Paper]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), San Francisco, CA, USA. https://doi.org/10.1109/CVPRW.2010.5543262 Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with Gabor wavelets [Paper]. In Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, 200-205, Nara, Japan. IEEE. https://doi.org/10.1109/AFGR.1998.670949 Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. https://doi.org/10.1561/1500000011 Poria, S., Cambria, E., Hazarika, D., & Mazumder, N. (2017, November 18-21). Multi-level multiple attentions for contextual multimodal sentiment analysis [Paper]. Proceedings of the IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA. IEEE. https://doi.org/10.1109/ICDM.2017.104 Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition [Paper]. In International Conference on Learning Representations (ICLR), San Diego, CA, USA. https://doi.org/10.48550/arXiv.1409.1556 Sun, C., Huang, L., & Qiu, X. (2021, November 7-11). Utilizing BERT and ALBERT models for sentiment analysis on social media text [Paper]. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1035 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions [Paper]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9, Boston, MA, USA. IEEE. https://doi.org/10.1109/CVPR.2015.7298594 Xu, H., Liu, B., Shu, L., & Yu, P. S. (2020, June 2-7). BERT post-training for review reading comprehension and aspect-based sentiment analysis [Paper]. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN, USA. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1242 Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L.-P. (2018, January-February). Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 34(1), 82-88. IEEE. https://doi.org/10.1109/MIS.2018.2888673
|