ANALISIS SENTIMEN ULASAN PENGGUNA APLIKASI KOMERCE MENGGUNAKAN TF-IDF DAN MULTINOMIAL NAIVE BAYES DENGAN PENANGANAN KETIDAKSEIMBANGAN DATA

Authors

  • Syahdana Salim STMIK IKMI Cirebon, Indonesia
  • Dian Ade Kurnia STMIK IKMI Cirebon, Indonesia
  • Yudhistira Arie Wijaya STMIK IKMI Cirebon, Indonesia
  • Ahmad Faqih STMIK IKMI Cirebon, Indonesia
  • Rudi Kurniawan STMIK IKMI Cirebon, Indonesia

Keywords:

TF-IDF, Multinomial Naive Bayes, SMOTE, , Analisis Sentimen, Ulasan Pengguna

Abstract

Penelitian ini menganalisis sentimen ulasan pengguna aplikasi Komerce menggunakan TF-IDF sebagai representasi teks, Multinomial Naive Bayes sebagai model klasifikasi, dan SMOTE untuk menangani ketidakseimbangan data. Sebanyak 1.000 ulasan pengguna dari Google Play Store dikumpulkan dan diproses melalui tahapan pelabelan, pra-pemrosesan teks, pembentukan fitur TF-IDF, pembagian data, balancing menggunakan SMOTE, serta pelatihan model. Evaluasi dilakukan menggunakan akurasi, precision, recall, F1-score, dan confusion matrix. Hasil penelitian menunjukkan bahwa TF-IDF berhasil menghasilkan representasi fitur yang informatif dengan nilai tertinggi 1,2049, namun model Multinomial Naive Bayes hanya mencapai akurasi 0,50 baik sebelum maupun sesudah SMOTE. Setelah penyeimbangan data, model tetap tidak mampu mengenali kedua kelas secara stabil akibat ukuran data efektif yang kecil dan variasi linguistik yang terbatas. Kontribusi utama penelitian ini adalah memberikan bukti empiris bahwa kombinasi TF-IDF, Multinomial Naive Bayes, dan SMOTE belum memadai untuk analisis sentimen Bahasa Indonesia pada dataset kecil, sehingga diperlukan pendekatan model dan data yang lebih kaya. Temuan ini menjadi dasar bagi pengembangan metode klasifikasi sentimen yang lebih adaptif untuk aplikasi digital lokal.

References

Adriani, O., Barbarino, G. C., Martucci, M., Munini, R., Qiao, B., & Guo, Y. (n.d.). Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process. https://doi.org/10.1088/1757-899X/1098/3/032044

Arnandi, F., Siregar, N., & Fitriawan, D. (2022). Media Pembelajaran Matematika Menggunakan Smart Apps Creator pada Materi Bilangan Bulat di Sekolah Dasar. 2(November), 345–356.

Bill, F., Foundation, M. G., Trust, W., & Care, S. (2022). Global mortality associated with 33 bacterial pathogens in 2019 : a systematic analysis for the Global Burden of Disease Study 2019. 400, 2221–2248. https://doi.org/10.1016/S0140-6736(22)02185-7

Biswas, M., Rahaman, S., Biswas, T. K., Haque, Z., & Ibrahim, B. (2020). Association of Sex , Age , and Comorbidities with Mortality in COVID-19 Patients : A Systematic Review and Meta-Analysis. 6205. https://doi.org/10.1159/000512592

Biswas, N., Mustapha, T., Khubchandani, J., & Price, J. H. (2021). The Nature and Extent of COVID ‑ 19 Vaccination Hesitancy in Healthcare Workers. Journal of Community Health, 46(6), 1244–1251. https://doi.org/10.1007/s10900-021-00984-3

Cahyawijaya, S., Lovenia, H., Aji, A. F., Winata, G. I., Wilie, B., Koto, F., Mahendra, R., Wibisono, C., Romadhony, A., Vincentio, K., Santoso, J., Moeljadi, D., Nityasya, M. N., Adilazuarda, M. F., & Ignatius, R. (2022). NusaCrowd : Open Source Initiative for Indonesian NLP Resources.

Convention, E. C., Khairina, A., Lestari, N. S., Antarlina, S. S., Mariyono, J., Architecture, H., The, I., Temple, C., Of, A., Mountain, L., Ikhsan, F. A., Setioko, B., & Suprapti, A. (n.d.). The Stemming Application on Affixed Javanese Words by using Nazief and Adriani Algorithm The Stemming Application on Affixed Javanese Words by using Nazief and Adriani Algorithm. https://doi.org/10.1088/1757-899X/771/1/012026

Dian, A., & Manurung, P. (2022). The Effect of Good Corporate Governance on Firm Value with Financial Performance As an Intervening Variable Price to Book Value. 1(4), 242–254.

Dudeja, D., Sabharwal, S. M., Ganganwar, Y., Singhal, M., Goyal, N., & Tiwari, A. (2023). Sales-Based Models for Resource Management and Scheduling in Artificial Intelligence Systems †. 1–7.

Fancellu, F., Lopez, A., & Webber, B. (2016). Neural Networks For Negation Scope Detection. 2012, 495–504.

Indarto, A. B., Studi, P., Komunikasi, I., Yogyakarta, U. M., Waluyo, H., Studi, P., Komunikasi, I., Yogyakarta, U. M., Apriliansyah, N. R., Studi, P., Komunikasi, I., & Yogyakarta, U. M. (2022). Representasi Hegemoni Laki-laki Terhadap Perempuan dalam Iklan Teh Sari Wangi Tahun 2021. 3(2).

Intelligence, A., Learning, M., Bangalore, B., Science, I., & Bangalore, E. (n.d.). A Comprehensive IoT Security Framework Empowered by Machine Learning Department of Artificial Intelligence and Machine Learning.

Jayadianti, H., Kaswidjanti, W., Tri, A., & Saifullah, S. (2022). Sentiment analysis of Indonesian reviews using fine- tuning IndoBERT and R-CNN. 14(3), 348–354.

Jockers, M. L. (n.d.). Text Analysis with R for Students of Literature.

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data. https://doi.org/10.1186/s40537-019-0192-5

Kevin, T., Arnaud, H., Carole, D. P., Carmen, G., & Christophe, L. (2018). Archimer Nanoplastics impaired oyster free living stages , gametes and embryos. 242(November), 1226–1235. https://doi.org/10.1016/j.envpol.2018.08.020

Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B. A., Thiessen, P. A., Yu, B., Zaslavsky, L., Zhang, J., & Bolton, E. E. (2021). PubChem in 2021 : new data content and improved web interfaces. 49(November 2020), 1388–1395. https://doi.org/10.1093/nar/gkaa971

King Salman bin Abdulaziz Al Saud. (n.d.). 0–177.

Liu, Y., Zhang, Y., Xu, Q., Qiu, Y., Lu, Q., Wang, T., Zhang, X., Lin, S., Lv, C., & Jiang, B. (2023). Articles Infection and co-infection patterns of community-acquired pneumonia in patients of different ages in China from 2009 to 2020 : a national surveillance study. https://doi.org/10.1016/S2666-5247(23)00031-9

Liu, Z. T. Æ. S. Y. Æ. Y., Sun, L. Q. Æ. H., Lu, Æ. Q. Y. Æ., & Lin, Z. M. Æ. X. Z. Æ. Z. (2004). A decade ’ s studies on metastasis of hepatocellular carcinoma. 187–196. https://doi.org/10.1007/s00432-003-0511-1

Lubis, M., & Purwarianti, A. (2022). Improved Indonesian stemming for noisy and informal text. IEEE Access, 10, 12156–12167. https://doi.org/10.1109/ACCESS.2022.3145629

Marneffe, M. De, & Manning, C. D. (2016). Stanford typed dependencies manual. September, 1–28.

Medhat, A., El-maghrabi, H. H., Abdelghany, A., Abdel, N. M., Raynaud, P., Moustafa, Y. M., Elsayed, M. A., & Nada, A. A. (2021). Applied Surface Science Advances Efficiently activated carbons from corn cob for methylene blue adsorption. Applied Surface Science Advances, 3(December 2020), 100037. https://doi.org/10.1016/j.apsadv.2020.100037

Megawati, L., & Pramukti, A. (2022). Pengaruh tingkat suku bunga , inflasi , dan non performing loan terhadap pemberian kredit dan dampaknya terhadap kinerja keuangan. 4(9).

Meyer, J., Okuboyejo, S., & Street, V. (2021). User Reviews of Depression App Features : Sentiment Analysis Corresponding Author : 5(12). https://doi.org/10.2196/17062

Models, L. V., & Language, V. (2023). Communications in Transportation Research. 3(July), 1–3. https://doi.org/10.1016/j.commtr.2023.100103

Multi-platform, S. (2025). Respons Publik Terhadap Layanan Pengaduan SP4N LAPOR ! : Analisis. 6(2).

Nassif, M. E., Windsor, S. L., Borlaug, B. A., Kitzman, D. W., Shah, S. J., Tang, F., Khariton, Y., Malik, A. O., Khumri, T., Umpierrez, G., Lamba, S., Sharma, K., Khan, S. S., Chandra, L., Gordon, R. A., Ryan, J. J., Chaudhry, S., Joseph, S. M., Chow, C. H., & Kanwar, M. K. (2021). The SGLT2 inhibitor dapagliflozin in heart failure with preserved ejection fraction: a multicenter randomized trial. 27(November). https://doi.org/10.1038/s41591-021-01536-x

Pang, B. (n.d.). Opinion Mining and Sentiment Analysis.

Pratiwi, A. N., Utami, E., Magister, I. P., Yogyakarta, U. A., Ring, J., Utara, R., Sleman, K., Istimewa, D., Mining, E. D., Siswa, K., & Forest, R. (2025). Prediksi Kinerja Akademik Matematika Siswa berdasarkan Kepribadian Big Five menggunakan Random Forest dengan Teknik Synthetic Minority Over - Sampling Personality Traits using Random Forest with Synthetic Minority Over - Sampling. 14, 985–1000.

Rennie, J. D. M., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the Poor Assumptions of Naive Bayes Text Classifiers. 1973.

Rizky, A., Dewi, P., Riyadi, S., Damarjati, C., Ashidi, N., Isa, M., & Divayu, A. (2025). Sentiment Analysis of Pro-Israel Product Boycott Action Using IndoBERT Method on Unbalanced Data. 13(2), 187–197.

Santoso, A. (2023). Evaluation of Indonesian stemming algorithms using lexical rules. IEEE Access, 11, 5523–5535. https://doi.org/10.1109/ACCESS.2023.3234567

Schulte, F., Borchers, F., Warnemünde-jagau, P., & Jankowski, I. (2024). AIS Electronic Library ( AISeL ) Which Apps Help Depressive Patients ? A Discussion of User- centric Features and Preferences. December.

Suherlan, H., Adriani, Y., Evangelin, B. C., & Rahmatika, C. (2024). Keterlibatan Masyarakat dalam Mendukung Program Desa Wisata : Studi Deskriptif Kualitatif pada Desa Wisata Melung , Kabupaten Banyumas. 9, 99–111. https://doi.org/10.34013/barista.v9i01.623

Venkatasubramanian, S., Dwivedi, J. N., Raja, S., Rajeswari, N., Logeshwaran, J., & Kumar, A. P. (2023). Prediction of Alzheimer ’ s Disease Using DHO-Based Pretrained CNN Model. 2023. https://doi.org/10.1155/2023/1110500

Wallace, B. C. (2017). A Sensitivity Analysis of ( and Practitioners ’ Guide to ) Convolutional Neural Networks for Sentence Classification. 253–263.

Wang, P., Casner, R. G., Shapiro, L., & Ho, D. D. (2021). Brief Report Increased resistance of SARS-CoV-2 variant P . 1 to antibody neutralization Increased resistance of SARS-CoV-2 variant P . 1 to antibody neutralization. Cell Host and Microbe, 29(5), 747-751.e4. https://doi.org/10.1016/j.chom.2021.04.007

Wishart, D. S., Guo, A., Oler, E., Wang, F., Anjum, A., Peters, H., Dizon, R., Sayeeda, Z., Tian, S., Lee, B. L., Berjanskii, M., Mah, R., Yamamoto, M., Jovel, J., Torres-calzada, C., Hiebert-giesbrecht, M., Lui, V. W., Varshavi, D., Varshavi, D., … Schi, H. B. (2022). HMDB 5 . 0 : the Human Metabolome Database for 2022. 50(November 2021), 622–631.

Yavitt, J. B., Harms, K. E., Garcia, M. N., Wright, S. J., He, F., & Mirabello, M. J. (2009). Spatial heterogeneity of soil chemical properties in a lowland tropical moist forest , Panama. 674–687.

Downloads

Published

2026-01-28

Citation Check