Artículo Volumen 40, nº 51

Development of a pathophysiological model for multi-class cancer classification, based on biomedical imaging of patient population using AI technologies

Autor(es)

Óscar Magna, Luciano Grandi, Leonardo Badinez, Beatriz Reggio

Secciones

Sobre los autores

Leer el artículo

ABSTRACT

This study develops and evaluates a novel deep-learning approach for multiclass cancer classification using medical imaging. The research objective was to formulate a pathophysiological model capable of classifying 12 different categories of lung, breast, and colon cancer using computed tomography (CT) images. Two prominent convolutional neural network architectures, ResNet50 and DenseNet121, were adapted and compared for this task.
A dataset comprising 81 patient examinations, including various stages of breast, colon, and lung cancer, was considered. The images were meticulously preprocessed, including anonymization, range adjustment, and data augmentation techniques, resulting in a dataset of 3703 images. The DenseNet121 model, modified for 12-class output, demonstrated superior performance with 91% accuracy, AUC-ROC of 1.00, recall of 0.87, and F1-score of 0.89 on the test set, outperforming the adapted ResNet50 model (86% accuracy, AUC-ROC of 0.99, recall of 0.83, F1-score of 0.83).
Gradient Weighted Class Activation Mapping (Grad-CAM) was employed to improve model interpretability by revealing areas of model focus for predictions. Although effective in identifying healthy colon regions, limitations were observed in accurately localizing lung cancer abnormalities, highlighting areas for future improvement.
The proposed model not only classifies the presence of cancer, but also provides information on cancer staging for breast and colon cases, and specific types of lung cancer. This advance represents a significant step towards more complete and nuanced cancer diagnoses, potentially improving early detection and treatment planning.
However, the study acknowledges limitations, particularly in the size and diversity of the dataset. Future work should focus on expanding the dataset across multiple imaging modalities, incorporating advanced radiomic techniques, and refining the model’s ability to discern subtle differences in stages and types of cancer.
This research contributes to the growing body of AI applications in oncology, demonstrating the potential of deep learning in improving cancer staging and classification. While promising, it emphasizes the need for extensive clinical trials to validate the efficacy of the model in real-world medical settings.

RESUMEN

Este estudio presenta el desarrollo y evaluación de un novedoso enfoque de aprendizaje profundo para la clasificación multiclase de cáncer utilizando imágenes médicas. La investigación tuvo como objetivo formular un modelo fisiopatológico capaz de clasificar 12 categorías distintas de cáncer de pulmón, mama y colon utilizando imágenes de tomografía computarizada (TC). Se adaptó y comparó para esta tarea. dos destacadas arquitecturas de redes neuronales convolucionales: ResNet50 y DenseNet121.
Se consideró un conjunto de datos que comprende 81 exámenes de pacientes, abarcando varias etapas de cáncer de mama, colon y pulmón. Las imágenes fueron meticulosamente preprocesadas, incluyendo anonimización, ajuste de rango y técnicas de aumento de datos, con lo cual se logra un dataset de 3703 imágenes. El modelo DenseNet121, modificado para una salida de 12 clases, demostró un rendimiento superior con una precisión del 91%, AUC-ROC de 1.00, Recall de 0.87 y F1-score de 0.89 en el conjunto de prueba, superando al modelo ResNet50 adaptado (precisión del 86%, AUC-ROC de 0.99, Recall de 0.83, F1-score de 0.83).
Se empleó el Mapeo de Activación de Clases Ponderado por Gradiente (Grad-CAM) para mejorar la interpretabilidad del modelo, revelando las áreas de enfoque del modelo para las predicciones. Aunque fue eficaz en la identificación de regiones de colon sanas, se observaron limitaciones en la localización precisa de anomalías de cáncer de pulmón, destacando áreas para futuras mejoras.
El modelo propuesto no sólo clasifica la presencia de cáncer, sino que también proporciona información sobre la estadificación del cáncer para casos de mama y colon, y tipos específicos de cáncer de pulmón. Este avance representa un paso significativo hacia diagnósticos de cáncer más completos y matizados, potencialmente mejorando la detección temprana y la planificación del tratamiento.
Sin embargo, el estudio reconoce limitaciones, particularmente en el tamaño y la diversidad del conjunto de datos. El trabajo futuro debería centrarse en expandir el conjunto de datos a través de varias modalidades de imágenes, incorporar técnicas radiómicas avanzadas y refinar la capacidad del modelo para discernir diferencias sutiles en etapas y tipos de cáncer.
Esta investigación contribuye al creciente cuerpo de aplicaciones de IA en oncología, demostrando el potencial del aprendizaje profundo en la mejora de la clasificación y estadificación del cáncer. Aunque prometedor, se enfatiza la necesidad de ensayos clínicos extensos para validar la eficacia del modelo en entornos médicos del mundo real.

 

1. INTRODUCTION

Cancer represents one of the most significant challenges in public health worldwide, constituting the second cause of mortality in Chile, only surpassed by cardiovascular diseases. Early and accurate detection of malignant neoplasms is crucial to improve survival rates and optimize therapeutic strategies. This study addresses the formulation of a pathophysiological model and the development of a functional prototype for classifying twelve different types of cancer, using computed tomography (CT) images and advanced artificial intelligence (AI) techniques.

The central innovation of this research lies in the implementation of deep neural network architectures, specifically, ResNet50 and DenseNet121, modified to improve oncology classification efficiency and provide complementary tools for healthcare professionals. This approach not only facilitates a deeper understanding of neoplastic pathophysiology but also contributes to more accurate and efficient diagnosis and treatment of the disease.

In Chile, neoplasms such as breast carcinoma, lung cancer, and colorectal adenocarcinoma present a high prevalence and constitute a considerable challenge in terms of diagnosis and therapeutic management. Breast cancer is the leading cause of oncological mortality among Chilean women, followed by lung cancer, which also occupies an alarming place among causes of death in men, along with colorectal cancer (Globocan, 2020; MINSAL, 2019).

The classification and diagnosis of cancer from medical imaging is a complex process. While traditional diagnostic techniques are mostly effective, they are enhanced with the implementation of AI. Neural networks can be trained to recognize specific patterns and features in large datasets, which makes them particularly suitable for analyzing medical images and detecting early signs of malignancy.
1.1. Contextualization of Cancer Classification Research by AI

The use of artificial intelligence (AI) technologies in biomedical image-based cancer classification has experienced significant advances in recent years. Convolutional neural networks (CNNs), such as ResNet and DenseNet, have demonstrated high accuracy in medical image classification, outperforming traditional methods in several studies (Han et al., 2018). These architectures enable the detection and classification of different types of cancer, providing an accurate and reliable diagnostic tool.

One of the main advances in this field is the ability of CNNs to handle large volumes of data and learn complex features from medical images. For example, DenseNet121 has shown superior performance in biomedical image classification, achieving accuracies above 90% in breast and colon cancer detection (Lång et al., 2023). Furthermore, the integration of deep learning techniques with traditional methods, such as support vector machines (SVMs), has led to improved generalization and robustness of models (Han et al., 2018).

The PRIMAGE project, led by Davatzikos et al. (2020), focuses on early detection of pediatric cancer using imaging biomarkers. This project illustrates the importance of silico predictive analytics and multiscale analytics to personalize the treatment of childhood cancer, highlighting the relevance of AI in improving diagnostic and therapeutic protocols.

Han et al. (2018) demonstrated the efficacy of the ResNet model for classifying clinical images of benign and malignant skin neoplasms. This study is particularly notable for its ability to guide healthcare professionals in clinical decision-making, improving treatment outcomes, and providing an accurate and reliable diagnostic tool.

Lång et al. (2023) conducted a randomized controlled clinical trial in Sweden to evaluate the efficacy of AI in breast cancer screening in women aged 40 to 80 years. Using the Transpara 1.7.0 system, the risk of malignancy on mammograms was scored. The AI intervention group detected 244 cases of breast cancer, while the control group detected 203 cases. This study stands out for reducing the workload by 44.3%, offering an accurate and efficient diagnostic tool that improves breast cancer detection and treatment.

Finally, research by Pattanaik et al. (2022) explores how the combination of DenseNet121, an advanced neural network model, with an Extreme Learning Machine (ELM), improves breast cancer diagnosis and treatment. This study highlights not only the accuracy and performance of the model but also how these technologies can be effectively integrated to improve clinical outcomes in oncology.

However, the use of AI in medicine presents several challenges. Obtaining large and diverse datasets remains a significant hurdle due to the sensitive nature of medical information (Field et al., 2021). In addition, image quality and variability in acquisition techniques can affect the accuracy of models. The lack of uniform operational standards for lesion detection and segmentation also limits the consistency of results (Zheng et al., 2023).

Another major challenge is the interpretation of the results provided by AI models. Although these models can provide accurate predictions, healthcare professionals must validate and interpret these results to avoid bias and diagnostic errors (Haug & Drazen, 2023). Furthermore, the actual clinical impact of AI on cancer diagnosis has yet to be confirmed by large randomized clinical trials.

In summary, although AI has proven to be a promising tool in biomedical imaging-based cancer staging, it is essential to address challenges related to data collection, image quality, and interpretation of results. Continued collaboration between researchers and healthcare professionals will be key to overcoming these obstacles and advancing cancer diagnosis and treatment.

1.2. Limitations and justification of the investigation

Research in the oncology field has been predominantly focused on the binary classification of neoplasms, i.e., determining whether they are benign or malignant. However, a significant challenge in this field is the paucity of models that address the classification of different types of cancer in a detailed manner.

This gap prompted the formulation of a model that not only identifies the presence of abnormalities but also classifies staging in cases of mammary or colorectal carcinoma and identifies the specific type of lung neoplasm.

The need for this broader approach is based on the use of emerging technologies such as artificial intelligence and machine learning. The development of a comprehensive prototype involves extensive research and information gathering on how these technologies are applied in the medical field. In addition, a variety of convolutional neural network architectures are explored with a specific focus on oncology classification and staging.

A key challenge facing this project is the inherent limitation of the available datasets. To address this limitation, a representative dataset has been created that includes computed tomography images. These images are critical for training and evaluating the investigated neural networks, thus allowing for improved accuracy and generalizability of the model.

1.3. Research contributions

The main contributions of this study are outlined below:

• Modified versions of the ResNet50 and DenseNet121 models were developed, adapting the final layer for an output that allows classification into twelve distinct cancer categories.
• The developed models were trained using a database of breast, colon, and lung computed tomography images, classifying cancer types from the patient’s clinical history, and following advanced image processing strategies. (Gordillo et al., 2013; Pereira et al., 2016).
• The proposed model can accurately identify distinctive features in tomographic images of breast, colon, and lung in twelve particular classes, allowing an approach to staging and classification of neoplastic pathologies. The performance of the multiclass models has been evaluated by metrics of accuracy, sensitivity, area under the ROC curve (AUC-ROC), and F1-Score.
• A comparative performance analysis was performed between the modified DenseNet121 model and modified ResNet50 for multiclass classification of mammary, colorectal, and lung carcinoma.

1.4. Structure of the article

The paper is organized into sections. Section 2 presents the construction of the formulated model, starting from the basis of medical imaging and its manipulation with artificial intelligence (AI) technologies, to outline the architecture of the prototype based on convolutional neural networks ResNet50 and DenseNet121, both architectures used for medical image classification. Section 3 outlines the results obtained regarding the proposed model, detailing the predictions made for cancer types under study, and providing insights for cancer staging in patients with mammary carcinoma, colorectal carcinoma, and specific subtypes of lung neoplasia (Tseng et al., 2018). Finally, Section 4 synthesizes relevant observations and conclusions based on the results obtained, which support the implementation of artificial intelligence models in medical image interpretation to advance oncological diagnosis and treatment.

2. METHOD

This study is based on the implementation and experimental evaluation of modified deep learning architectures, specifically ResNet50 and DenseNet121, optimized for multiclass classification in diagnostic imaging oncology. The system has been meticulously designed to detect and categorize medical images associated with lung, colorectal, and breast neoplasms, including twelve distinct diagnostic categories.

The DenseNet121 architecture, recognized for its efficiency in the accurate classification of oncological medical images, represents the core of this model. This convolutional neural network (CNN) is trained on a rigorously labeled dataset, producing categorized output that reflects both the presence and staging of the oncologic conditions under study. This approach provides an advanced diagnostic tool, facilitating clinical decision-making through detailed and accurate classification of medical images, thus contributing to the optimization of the diagnostic process in oncology.

1.1. Foundational knowledge

Imaging for this study comprised cases of breast, colorectal, and pulmonary neoplasms, facilitated through collaboration with relevant points of contact. Priority was given to the inclusion of tomographic images in DICOM format, covering a spectrum of diagnoses including different disease stages and healthy controls, to ensure a representative and balanced data set.

Image classification was performed according to standardized criteria, considering:
• Breast and Colorectal Neoplasia: Based on TNM staging (NIH (, with categories “b0”/ “c0” (T=0, N=0, M=0), “b1”/ “c1” (T > 0N0M0), “b2”/ “c2” (T>0N>0M0), and “b3”/ “c3” (T > 0N > 0M1).
• Pulmonary Neoplasia: Categorized according to tumor histology.

The study comprised 81 unique cases, distributed as detailed in Table 1, highlighting a predominance of colorectal neoplasia cases (n=40), followed by pulmonary (n=22) and mammary (n=19) neoplasia.

1.2. Neural network architectures for medical image identification

The prototype implements two state-of-the-art CNN architectures: ResNet50 and DenseNet121, adapted for multiclass classification in medical oncology imaging.

– ResNet50: A pre-trained version of ResNet50 was implemented, and modified with an output layer tailored to the 12 classes in the study. This architecture (Figure 1) is noted for its ability to differentiate benign and malignant lesions in various medical imaging modalities (Han et al., 2018).

– DenseNet121: Characterized by dense connections, this architecture ensures efficient feature propagation through the network, mitigating the problem of gradient fading and reducing overfitting (Figure 2). DenseNet121 has demonstrated exceptional performance in mammographic image classification, especially when integrated with extreme learning techniques (Pattanaik et al., 2022).

Both architectures were optimized for this work and their performance was evaluated using accuracy, sensitivity, and specificity metrics.

1.3. Prototype Workflow

The development of the prototype followed a rigorous workflow, characteristic of artificial intelligence applications in medical diagnostics (Figure 3), highlighting the use of deep learning and transfer learning techniques with trained convolutional models:

1. Image Preprocessing: Normalization, data augmentation, and segmentation to optimize the quality and representativeness of the training dataset, thereby enhancing the training and performance of the model. The following key steps are highlighted:
– Selection of Relevant Images: Collaborating with medical experts to focus on regions affected by breast, colon, and lung cancer, ensuring the clinical relevance of the data.
– Image Adjustment: Utilizing specialized software like RadiAnt DICOM Viewer to modify parameters such as window and level of visualization (see Table 2), in partnership with medical technologists.
– Image Conversion: Transforming images from DICOM to PNG format for easier manipulation.
– Image Classification: Organizing images based on their diagnosis.
– Exploratory Data Analysis (EDA): Analyzing to identify patterns within the data.
– Dataset Splitting: Dividing the dataset into training (70%), validation (20%), and testing (10%) sets.
– Image Normalization: Normalizing images with a mean of 0.1906 and a standard deviation of 0.5948.
– Data Augmentation: Applying techniques such as random cropping (0% to 25%), rotation (-7 to 7 degrees), and brightness adjustment (-30% to +30%).

3. RESULTS

This section presents the key results of the study on cancer diagnosis by deep learning. It begins with an exploratory analysis that identifies the relevant historical and environmental factors in Chile. Subsequently, the ResNet50 and DenseNet121 neural architectures are evaluated using standard performance metrics. In addition, a detailed interpretation of the classification of medical images is presented, complemented by visual analysis through Grad-CAM. In conclusion, the proposed DenseNet121-based model is presented, highlighting its effectiveness in oncological classification.

1. Exploratory data analysis

The exploratory data analysis revealed a significant correlation between the increasing cancer rates in Chile and historical events such as industrialization and the intensification of agricultural practices. Rapid urbanization and industrial growth since the mid-20th century exposed large population segments to a higher concentration of carcinogenic substances in the air and water. At the same time, the adoption of pesticides and chemical fertilizers in agriculture (Alavanja et al., 2004), as well as the increase in tobacco consumption, have been recognized as risk factors for various types of cancer, particularly lung cancer (Erratum, 2020; Raza et al., 2013; Jemal et al., 2011).

Moreover, considering the shift towards diets richer in processed foods and saturated fats (Torre et al., 2015), it reflects that historical, environmental, and social context has contributed to modifying the population risk profile. This, in turn, has influenced the underlying dynamics of cancer prevalence, highlighting the importance of identifying patterns and establishing more precise associations between risk factors and cancer.

2. Training results

The training results (Table 3) show that the DenseNet121 model outperformed the ResNet50 model on all metrics evaluated. DenseNet121 achieved an accuracy of 91%, meaning it correctly classified 91% of the images in the test set. The area under the ROC curve (AUC-ROC) of 1.00 indicates an excellent ability of the model to distinguish between images with and without cancer. In addition, the Recall of 0.87 suggests that the model correctly identified 87% of positive cases.
The superiority of DenseNet121 can be attributed to its dense architecture, which allows for greater feature extraction and more efficient gradient propagation. However, it is important to note that both models obtained promising results, especially considering the complexity of the multiclass medical image classification task.

A more detailed analysis of performance curves and confusion matrices (Figure 4) reveals that the two models performed generally satisfactorily in most classes, with high values on the main diagonal (true positives) reflecting the model’s ability to adequately identify classes.

The models exhibit a high level of predictability; however, they show some imprecision in classifying some cancer subclasses. This suggests the need to collect a larger and more diverse data set to improve the generalizability of the models.

3.3. Result’s interpretation

To analyze the DenseNet121 model’s performance, each image was labeled with true diagnosis (labeled «True»), the model’s prediction (labeled «Pred»), and the probability associated with that prediction (see Table 1, class and subclass).

As illustrated in Figure 5, the model successfully predicted a lung adenocarcinoma with a probability of 100%. It also accurately identified a locally advanced colon carcinoma (T>0, N>0, M=0, where T refers to tumor size, N indicates the presence of affected lymph nodes, and M denotes distant metastasis) with a probability of 84%. Lastly, the model correctly classified an image of healthy breast tissue with a probability of 78% (b0 and T=0, N=0, M=0.

While the results are promising, it is important to emphasize that predictions with a probability of less than 100% carry a degree of uncertainty. Furthermore, the clinical application of this model would require thorough validation on a larger and more diverse dataset.

3.4. Exploratory Results

In this section, we explore the interpretability of the DenseNet121 model using the Grad-CAM technique (Gradient-weighted Class Activation Mapping, Selvaraju et al., 2020). This technique generates heatmaps that visualize the regions of an image that most influence a prediction, providing insight into the model’s internal logic.

When applying Grad-CAM to images of healthy colon tissue (see Figure 6), a strong correlation is observed between the heatmaps and relevant anatomical regions, indicating that the model has effectively learned to identify the distinctive features of a healthy colon.

However, when analyzing images of lungs with large cell carcinoma (see Figure 7), we encountered limitations in accurately localizing the lesions. This difficulty may be attributed to the heterogeneity of the lesions, the complexity of lung anatomy, and the inherent limitations of Grad-CAM.

Clinical Implications and Future Directions

The results obtained underscore the importance of interpretability in deep learning models. While the model has proven effective in classifying images of healthy colon tissue, there is a need to enhance its ability to accurately localize lesions in more complex cases. To address these limitations, we propose exploring more advanced preprocessing techniques, investigating neural network architectures that incorporate spatial attention modules, and examining post-processing techniques to refine the heatmaps generated by Grad-CAM.

In conclusion, the application of Grad-CAM has provided valuable insights into the internal workings of our DenseNet121 model. While we have achieved promising results, further research is necessary to improve the model’s capability to detect more complex lesions. The findings of this study open new avenues for developing more precise and reliable artificial intelligence tools for computer-aided diagnosis.

3.5. Proposed Model

Based on the experimental results obtained, a customized DenseNet121 architecture was selected as the definitive model for medical image classification. This choice is based on superior performance compared to ResNet50, as detailed in Section 3.4.

Model Architecture:
The proposed model (see Figure 7) consists of two main stages:

1. Feature Extraction:
– A pre-trained ResNet50 convolutional neural network is used to extract high-level features from the input images.
– The weights of the convolutional layers in ResNet50 are frozen to leverage prior knowledge and accelerate training.
– The final fully connected layer of ResNet50 is replaced with a new custom layer featuring 12 output neurons, corresponding to several classes to be classified.

2. Classification:
– The feature vector extracted by ResNet50 is fed into a classifier based on Support Vector Machines (SVM).
– The SVM performs the final classification, assigning each image to one of the 12 predefined diagnostic categories.

Model Innovation and Implementation

The main innovation of this model consists of adapting the DenseNet121 architecture for the specific task of multiclass classification of medical images. By combining the advantages of deep convolutional neural networks with the generalization capability of SVMs, a robust and efficient model for the detection and classification of different types of cancer has been obtained.

The implementation of the model, resulting from experimentation and analysis of modified architectures based on ResNet50 and DenseNet121, was realized using the torchvision library. In the adapted version of DenseNet121, a last fully connected layer was included to adapt it to 12 output classes. This approach allowed optimizing the model for the accurate classification of medical images into different cancer categories.

The proposed model, based on DenseNet121, has proven to be an effective solution for medical image classification. Its robust architecture and combination of deep learning and traditional machine learning techniques allow obtaining accurate and reliable results, standing out as a valuable tool in cancer diagnosis and treatment.

4. CONCLUSIONS, LIMITATIONS AND RECOMMENDATIONS

4.1. Conclusions

The use of bibliometric tools has become an essential element for understanding and guiding research projects in the field of biomedical image classification. In the development of this applied study, bibliometric analysis not only established an initial foundation but also provided a detailed overview of current trends and advancements in this area. This information laid the groundwork for the project and allowed for the identification of the relevance of ResNet50 and DenseNet121 neural network architectures due to their performance.

Upon evaluating both architectures with the dataset, DenseNet121 demonstrated superior performance, achieving an accuracy of 91%, compared to the 86% obtained by ResNet50. This confirms the potential of DenseNet121 as an effective tool in the classification of biomedical images. The importance of this research lies in the ongoing quest to improve cancer diagnosis and treatment, as the creation of pathophysiological models from medical images represents a considerable advancement in this direction.

Accurate predictions are crucial, as they not only provide insights for cancer staging in patients with breast, colon, and lung cancer but also offer details about the specific type of neoplasm present in a patient. Considering the high prevalence and severity of breast, colon, and lung cancer, advancements in their diagnosis are of vital importance, especially those works that contribute to distinguishing between benign and malignant lesions in colorectal cancer and recognizing different types of lung cancer.

This project goes beyond binary classification in colorectal cancer, achieving not only differentiation between benign and malignant tumors but also staging the disease’s progression. Regarding lung cancer, it manages to classify up to four types of anomalies. Therefore, this research marks significant progress in the classification of cancer types, predicting not only the presence of cancer but also its staging, which constitutes a crucial advancement in the timely diagnosis and treatment of patients.

However, it is essential to emphasize that the use of Artificial Intelligence (AI) in medicine requires caution. Although AI can offer valuable results, they must always be interpreted and validated by medical professionals, as there is a risk that the results may present certain biases. While the model has shown promise, it is crucial to recognize that science and technology are constantly evolving, and the path to excellence in cancer diagnosis is a continuous journey. This project is a crucial step in that direction

4.2. Limitations

The project faced significant challenges, the most prominent being obtaining data sets (Field et al., 2021; Bertsimas & Wiberg, 2020). The confidential and sensitive nature of patients’ medical information restricts access to a large and diverse amount of data. This limitation posed difficulties in the generalization and robustness of the model, as working with a limited data set may not reflect the diversity and variability of real cases in the population.

In addition, current techniques for image detection, lesion segmentation, and qualitative analysis still present challenges. There is a lack of a uniform operational standard to ensure consistent results from detection to feature extraction. Current algorithms require greater specificity to address different types and qualities of images, as well as individual patient variations. The accuracy of AI-assisted diagnostics remains strongly linked to the quality of the images obtained and the improvement of existing algorithms (Zheng et al., 2023). It is essential to emphasize that the true clinical impact of AI-assisted diagnosis has yet to be confirmed by large randomized clinical trials.

4.3. Recommendations

To enhance the predictive efficacy and practical applicability of the model in future research and clinical applications, it is imperative to focus on the growth and diversification of the dataset. The expansion should encompass a greater quantity and variety of diagnostic imaging modalities, including magnetic resonance imaging (MRI), positron emission tomography (PET), computed tomography (CT), mammography, and radiography. Incorporating these different modalities enables the model to interpret the varied contrasts, resolutions, and anatomical details each imaging technique provides with greater precision.

To ensure the dataset is representative of diverse clinical presentations, it is crucial to select images that represent a variety of cancer subtypes and treatment responses. A data enrichment strategy that considers these variables will enhance the model’s accuracy and robustness.

Maintaining a high standard of quality in the images and the information they provide is key. Utilizing radio genomic techniques (Parmar et al., 2015; Zhang et al., 2023) can be highly beneficial for achieving advanced segmentation of regions of interest (ROI), facilitating precise cancer detection and classification.

Close collaboration with healthcare professionals is indispensable for the curation and validation of the dataset, ensuring that the features selected for training are clinically relevant. This not only improves the model’s performance but also optimizes the training process through a more targeted and specific approach.

For greater accuracy in staging, the number of labels should be expanded based on the TNM system, allowing for detailed classification of disease phases (Nicora et al., 2020). The complexity and heterogeneity of different cancer types justify the exploration of specific staging and classification models that correspond to the particular characteristics of each cancer type.

In summary, the expansion of the dataset should strive for a balance between improving the model’s generalization capability and maintaining precision in predicting cancer stages. A methodology that combines an extensive and varied dataset, advanced radiomic techniques, a collaborative approach with medical professionals, and specialized attention to precise staging could lead to significant advancements in early detection and personalized cancer treatment.

5. BIBLIOGRAFÍA

1. Alavanja, M. C. R., Hoppin, J. A., & Kamel, F. (2004). Health effects of chronic pesticide exposure: Cancer and neurotoxicity. Annual Review of Public Health, 25, 155–197.
2. Barragán-Montero, A. (2021). Artificial intelligence and machine learning for medical imaging: A technology review. Physica Medica – European Journal of Medical Physics. https://www.physicamedica.com/article/S1120-1797(21)00173-3/fulltext
3. Bertsimas, D., & Wiberg, H. (2020). Machine learning in Oncology: methods, applications, and challenges. JCO clinical cancer informatics, 4, 885-894. https://doi.org/10.1200/cci.20.00072}
4. Davatzikos, C. (2020). PRIMAGE project: Predictive in silico multiscale analytics to support childhood cancer personalized evaluation empowered by imaging biomarkers. ProQuest. https://www.proquest.com/docview/2385884442
5. Erratum (2020). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. (2020). CA: A Cancer Journal for Clinicians, 70(4), 313–313. https://doi.org/10.3322/caac.21609
6. Field, M., Hardcastle, N., Jameson, M., Aherne, N. J., & Holloway, L. (2021). Machine learning applications in radiation Oncology. Physics and Imaging in Radiation Oncology, 19. https://doi.org/10.1016/j.phro.2021.05.007
7. Gordillo, N., Montseny, E., & Sobrevilla, P. (2013). State-of-the-art survey on MRI brain tumor segmentation. Magnetic Resonance Imaging, 31(8), 1426–1438. https://doi.org/10.1016/j.mri.2013.05.002
8. Han, S. S., Kim, M. S., Lim, W., Park, G. H., Park, I., & Chang, S. E. (2018). Classification of the Clinical Images for Benign and Malignant Cutaneous Tumors Using a Deep Learning Algorithm. Journal of Investigative Dermatology, 138(7), 1529–1538. https://doi.org/10.1016/j.jid.2018.01.028
9. Haug, C., & Drazen, J. M. (2023). Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. The New England Journal of Medicine, 388(13), 1201-1208. https://doi.org/10.1056/nejmra2302038
10. Ijaz, M., Ashraf, I., Zahid, U., Yasin, A., Ali, S., Attique Khan, M., … & Zhang, Y. D. (2023). A Decision Support System for Lung Colon Cancer Classification using Fusion of Deep Neural Networks and Normal Distribution based Gray Wolf Optimization. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3625096
11. Jemal, A., Bray, F., Center, M. M., Ferlay, J., Ward, E., & Forman, D. (2011). Global cancer statistics. CA: A Cancer Journal for Clinicians, 61(2), 69–90. https://doi.org/10.3322/caac.20107
12. Lang, Kristina et al. (2023). Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis. The Lancet Oncology. https://doi.org/10.1016/S1470-2045(23)00298-X
13. MINSAL. (2019). Plan Nacional de Cáncer. https://www.minsal.cl/wp-content/uploads/2019/01/2019.01.23_PLAN-NACIONAL-DE-CANCER_web.pdf
14. Nicora, G., Vitali, F., Dagliati, A., Geifman, N., & Bellazzi, R. (2020). Integrated Multi-Omics Analyses in Oncology: A review of machine learning methods and tools. Frontiers in Oncology, 10. https://doi.org/10.3389/fonc.2020.01030.
15. NIH (2022). Cancer Staging. NCI – National Cancer Institute. https://www.cancer.gov/about-cancer/diagnosis-staging/staging
16. Parmar, C., Großmann, P., Bussink, J., Lambin, P., & Aerts, H. J. (2015). Machine learning methods for quantitative radiomic biomarkers. Scientific Reports, 5(1). https://doi.org/10.1038/srep13087
17. Pattanaik, R. K., Mishra, S., Siddique, M., GopiKrishna, T., & Satapathy, S. (2022). Breast cancer classification from mammogram images using Extreme Learning Machine-Based DenseNet121 model. Journal of Sensors, 2022, 1-12. https://doi.org/10.1155/2022/2731364
18. Pereira, S., Pinto, A., Alves, V., & Silva, C. A. (2016). Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images. IEEE Transactions on Medical Imaging, 35(5), 1240–1251. https://doi.org/10.1109/TMI.2016.2538465.
19. PyTorch. (2023). DenseNet121 Architecture. Dense net by Pytorch Team. Dense Convolutional Network (DenseNet), connects each layer to every other layer in a feed-forward fashion, agosto 2023. https://pytorch.org/hub/pytorch_vision_densenet/
20. Raza, H., John, A., & Nemmar, A. (2013). Short-term effects of nose-only cigarette smoke exposure on glutathione redox homeostasis, cytochrome P450 1A1/2 and respiratory enzyme activities in mice tissues. Cellular Physiology and Biochemistry: International Journal of Experimental Cellular Physiology, Biochemistry, and Pharmacology, 31(4–5), 683–692. https://doi.org/10.1159/000350087
21. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128(2), 336–359. https://doi.org/10.1007/s11263-019-01228-7
22. Torre, L. A., Bray, F., Siegel, R. L., Ferlay, J., Lortet-Tieulent, J., & Jemal, A. (2015). Global cancer statistics, 2012. CA: A Cancer Journal for Clinicians, 65(2), 87–108. https://doi.org/10.3322/caac.21262
23. Tseng, H. E., Wei, L., Cui, S., Luo, Y., Haken, R. T., & Naqa, I. E. (2018). Machine learning and imaging informatics in oncology. Oncology, 98(6), 344-362. https://doi.org/10.1159/000493575
24. Zhang, T., Tan, T., Samperna, R., Li, Z., Gao, Y., Wang, X., Han, L., Yu, Q., Beets‐Tan, R. G. H., & Mann, R. M. (2023). Radiomics and Artificial intelligence in breast imaging: a survey. Artificial Intelligence Review. https://doi.org/10.1007/s10462-023-10543-y
25. Zheng, D., He, X., & Jing, J. (2023). Overview of artificial intelligence in breast cancer medical imaging. Journal of Clinical Medicine, 12(2), 419. https://doi.org/10.3390/jcm12020419