IMPROVED HYBRID MODEL FOR CERVICAL CANCER RISK PREDICTION BASED ON ENSEMBLE LEARNING METHOD
Keywords:
Cervical Cancer, Cervical Magnate, Ensemble Learning, Pipeline, SmoteTomek, XGBoost.Abstract
One of the prominent health issues faced by women globally, particularly in developing nations, is cervical cancer, also known as cervical magnate. This cancer starts in the cervix, often going unnoticed in its early stages until symptoms manifest later, potentially indicating metastasis. Early detection of cervical cancer significantly increases the chances of treatment and cure. This study focuses on developing a model to help women assess their risk of cervical cancer based on demographic and medical history. It introduces an enhanced hybrid model employing ensemble learning techniques to improve predictive accuracy. The model utilizes a dataset consisting of demographic data, lifestyle habits, and medical histories of 858 patients obtained from the UC Irvine machine learning repository. A pipeline combining a transformer, sampler, and estimator was developed to mitigate overfitting and data leakage while enhancing model performance. This pipeline utilized StandardScaler for transformation, SmoteTomek for sampling, and the XGBoost classifier as the estimating mechanism. A conventional XGBoost classifier was trained to identify the top 12 important features that impact the performance of the classification model. The proposed model successfully identified 100% of at-risk women, achieving a reported accuracy of 99% and a 100% recall rate. Overall, this hybrid model significantly outperforms existing methods in detecting women at risk of developing cervical cancer, yielding superior accuracy, sensitivity, and specificity in cervical cancer risk prediction.