KLASIFIKASI RISIKO FATALITAS EKSPEDISI GUNUNG BERDASARKAN ATRIBUT DEMOGRAFI DAN GEOGRAFIS MENGGUNAKAN NATURAL LANGUANGE PROCESSING DAN SUPERVISED LEARNING

Imbaraga Gempar Guna Laksana; Dian Ade Kurnia; Yudhistira Arie Wijaya; Mulyawan Mulyawan; Gifthera Dwilestari

Authors

Imbaraga Gempar Guna Laksana STMIK IKMI Cirebon, Indonesia
Dian Ade Kurnia STMIK IKMI Cirebon, Indonesia
Yudhistira Arie Wijaya STMIK IKMI Cirebon, Indonesia
Mulyawan Mulyawan STMIK IKMI Cirebon, Indonesia
Gifthera Dwilestari STMIK IKMI Cirebon, Indonesia

Keywords:

Natural Language Processing, Supervised Learning,, Fattalitas Ekspedisi, Klasifikasi Risiko, Machine Learning

Abstract

Penelitian ini bertujuan mengembangkan model klasifikasi risiko fatalitas ekspedisi gunung dengan memanfaatkan teknik Natural Language Processing dan supervised learning untuk mengolah data teks penyebab kematian serta atribut demografis. Penelitian ini merespons tantangan pengolahan data tidak terstruktur yang sering mengandung variasi penulisan dan ambiguitas sehingga membutuhkan metode komputasi yang mampu menangkap informasi penting secara akurat. Metode yang digunakan meliputi preprocessing teks, pembobotan TF-IDF, frequency encoding untuk atribut kewarganegaraan, serta pembangunan model Random Forest dan Support Vector Machine. Model dievaluasi menggunakan metrik Accuracy, Precision, Recall, dan F1-score untuk memastikan kualitas prediksi. Hasil penelitian menunjukkan bahwa Random Forest mencapai akurasi 0.98 dan lebih stabil dibandingkan SVM dalam menangani ketidakseimbangan kelas. Fitur teks terbukti memberi kontribusi terbesar dalam menentukan kategori risiko fatalitas, sementara atribut demografis memberi pengaruh tambahan yang lebih kecil tetapi tetap relevan. Temuan ini menunjukkan bahwa analisis berbasis NLP dapat meningkatkan pemahaman terhadap pola risiko fatalitas dan berpotensi mendukung pengembangan sistem pendukung keputusan untuk keselamatan pendakian gunung. Pendekatan ini memudahkan identifikasi faktor risiko yang sebelumnya sulit diketahui karena keterbatasan analisis manual. Penelitian ini memberi dasar yang kuat untuk pengembangan model risiko yang lebih komprehensif dan dapat diadaptasi pada domain keselamatan lainnya.

References

Acharya, A. (2024). Clinical risk prediction using language models: benefits and considerations. Journal of the American Medical Informatics Association, 31(9), 1856–1867. https://doi.org/10.1093/jamia/ocad028

Bugalia, N., Tarani, V., & Gadekar, H. (2022). Machine learning-based automated classification of worker-reported safety reports in construction. Journal of Information Technology in Construction, 27, 926–950. https://doi.org/10.36680/j.itcon.2022.045

BuHamra, S. S., Al-Jarallah, M., Aldhaheri, N., & AlSumih, F. (2022). An NLP tool for data extraction from electronic health records. Frontiers in Public Health, 10, 1070870. https://doi.org/10.3389/fpubh.2022.1070870

Chen, W., Wu, X., & Wu, G. (2024). A survey on imbalanced learning: latest research, applications and future directions. Artificial Intelligence Review, 57, 137. https://doi.org/10.1007/s10462-024-10759-6

Crespí, A., Arévalo, O., & Santana, J. (2025). Lifecycle models in machine learning development. Expert Systems, e70029. https://doi.org/10.1111/exsy.70029

De Angeli, K., Chakraborty, S., Sandulescu, V., & Rosenberger, H. (2021). Class imbalance in out-of-distribution datasets: improving robustness in biomedical NLP. Scientific Reports, 11482. https://doi.org/10.1038/s41598-021-90760-w

Du, K. L. (2025). Understanding machine learning principles. Mathematics, 13(3), 451. https://doi.org/10.3390/math13030451

Eker, H., & Uçar, E. (2024). Natural Language Processing Risk Assessment in Marble Quarries. Applied Sciences, 14(19), 9045. https://doi.org/10.3390/app14199045

Gao, Y., Dligach, D., Christensen, L., Tesch, S., Laffin, R., Xu, D., Miller, T., Uzuner, Ö., Churpek, M. M., & Afshar, M. (2021). A scoping review of publicly available language tasks in clinical natural language processing. arXiv.

Hancock, J. T., Ritz, L., & Zhao, J. (2024). Data reduction techniques for highly imbalanced Medicare big data. Journal of Big Data, 11, 8. https://doi.org/10.1186/s40537-023-00869-3

Hellín, C. J., Pérez, J., Real, P., & Orts-Escolano, S. (2024). Unraveling the Impact of Class Imbalance on Deep-Learning Prediction Metrics. Applied Sciences, 14(8), 3419. https://doi.org/10.3390/app14083419

Henning, S., Beluch, W., Fraser, A., & Friedrich, A. (2023). A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (pp. 523–540).

Khairuddin, M. Z. F., Hasikin, K., Abd Razak, N. A., Lai, K. W., Osman, M. Z., Aslan, M. F., Sabanci, K., Azizan, M. M., Satapathy, S. C., & Wu, X. (2022). Predicting occupational injury causal factors using text-based analytics: A systematic review. Frontiers in Public Health, 10, 984099. https://doi.org/10.3389/fpubh.2022.984099

Khairuddin, M. Z. F., Janssen, G. R., & Schipper, S. (2024). Contextualizing injury severity from occupational accident narratives using deep-learning-based text classification. Safety, 10(2), 12. https://doi.org/10.3390/safety10020012

Khalate, P. (2024). Advancements and gaps in natural language processing: A review. Frontiers in Physics. https://doi.org/10.3389/fphy.2024.1445204

Khurana, D., Kaushik, D., & Arora, A. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-022-13428-4

Li, H., Liu, Z., Sun, W., Li, T., & Dong, X. (2024). Interpretable machine learning for the prediction of death risk in patients with acute diquat poisoning. Scientific Reports, 14, 16101. https://doi.org/10.1038/s41598-024-67257-6

Mitrakas, C. (2025). Techniques and Models for Addressing Occupational Risk: Machine Learning Approaches in Real-world Risk Assessment. Applied Sciences, 15(4), 1909. https://doi.org/10.3390/app15041909

Pugliese, R., Brambilla, M., Ferri, F., Franco, S., Ghirardi, G., & Galliani, L. (2021). Machine learning-based approach: Global trends, research directions and applications. Technological Forecasting & Social Change, 169, 120795. https://doi.org/10.1016/j.techfore.2021.120795

Seneviratne, M. G., Tran, L. T., & Stumpf, S. (2022). User-centred design for machine learning in health care: A practical toolkit. BMJ Health & Care Informatics, 29(1), e100656. https://doi.org/10.1136/bmjhci-2022-100656

Shuang, Q., Liu, J., & Zhao, Y. (2023). Determining critical cause combination of fatality accidents on construction sites via machine learning. Buildings, 13(2), 345. https://doi.org/10.3390/buildings13020345

Siregar, K. N., Megananda, N. R., & Cornelis, C. E. (2022). Strengthening causes of death identification through community-based verbal autopsy during the COVID-19 pandemic in Indonesia. BMC Public Health, 22, 14014. https://doi.org/10.1186/s12889-022-14014-x

Sundaram, G., & Berleant, D. (2022). Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review. arXiv.

Wang, S., Li, Y., & Bar, N. (2024). A natural language processing approach to detect annotation inconsistencies in death investigation notes. Communications Medicine, 4, 82. https://doi.org/10.1038/s43856-024-00631-7