News

AI Tool for Early Breast Cancer Detection

Source: nature.com

Published on October 2, 2025

Updated on October 2, 2025

AI tool for early breast cancer detection using decision tree models

AI Tool for Early Breast Cancer Detection

A groundbreaking AI-based tool has been developed to enhance the early detection of breast cancer malignancy. This innovative solution leverages explainable machine learning models to provide accurate and rapid pre-screening, utilizing minimal clinical data. The tool aims to improve patient outcomes by enabling faster and more reliable diagnosis, particularly in resource-limited settings.

The AI tool was developed using a dataset from the University of Calabar Teaching Hospital, which includes clinical and demographic features from 213 patients. Eight machine learning algorithms were compared, with decision trees and ensemble models achieving the highest accuracy of 91.7%. The decision tree model was selected for its high explainability and low computational cost, making it practical for clinical use.

Key Features of the AI Tool

The AI tool offers verbal decision rules that classify malignancy based on key clinical parameters such as lymph node involvement, tumor size, and metastasis. SHapley Additive exPlanations (SHAP) analysis was used to validate the model’s decision-making process, ensuring transparency and reliability. This tool has the potential to integrate seamlessly into clinical decision support systems, providing rapid and reliable pre-screening with minimal data requirements.

Importance of Early Detection

Breast cancer is the most commonly diagnosed cancer among women and a leading cause of cancer-related mortality worldwide. Early detection significantly improves survival rates and enables less invasive treatments. However, access to early diagnosis remains limited in low- and middle-income countries due to cost, infrastructure, and staff shortages. The AI tool addresses these challenges by offering a low-cost, scalable solution that can enhance early detection rates in resource-constrained settings.

Explainable AI Model Proposal

The AI model focuses on clinical parameters such as age, menopausal status, tumor size, lymph node involvement, and breast quadrant localization, which are strongly associated with breast cancer malignancy and prognosis. By utilizing these parameters, the model reduces dependency on mammography and provides transparent decision support reports to clinicians. This approach aims to increase early detection rates and reduce breast cancer-related mortality in resource-limited regions.

Dataset and Methods

The dataset used for this study is publicly available on Kaggle and includes nine clinical and demographic features from 213 patients. Numerical variables like age and tumor size were used in their original form, while categorical variables were encoded. The dataset was split into training and testing sets, and tenfold cross-validation was used to ensure objective and generalizable assessment. Hyperparameter optimization was performed to enhance model performance while avoiding overfitting.

Machine Learning Algorithms

Eight machine learning algorithms were evaluated, including decision trees, discriminant analysis, logistic regression, support vector machines (SVM), Naive Bayes, K-nearest neighbors (K-NN), ensemble learning, and artificial neural networks (ANN). Decision trees and ensemble models achieved the highest performance, with decision trees being prioritized for their interpretability and clinical practicality. SHAP analysis quantitatively evaluated each variable’s contribution to the model output, ensuring transparency and interpretability.

Results and Discussion

The decision tree model achieved 91.7% accuracy, with affected lymph node and tumor size being the most influential variables. The model’s classification logic was illustrated through decision tree structures, and SHAP analysis quantitatively showed the impact of variables on the model’s decisions. The optimized model reduces overfitting and achieves better generalization, making it a practical tool for clinical decision support systems.

Limitations and Future Directions

The study acknowledges limitations such as the small dataset size and lack of molecular markers. Future studies should focus on validating the model with larger, multi-center datasets and incorporating more comprehensive clinical and biological variables. The model’s impact on physicians’ decision-making processes and patient confidence should also be evaluated in field applications. Despite these limitations, the AI tool represents a significant advancement in early breast cancer detection, offering a practical and scalable solution for resource-limited settings.