Icon Legend

This session is not in your schedule.

This session is in your schedule. Click again to remove it.

Presentation Icons

Tony B Awardee

Food/Beverage Available

Data Science and AI Parallel Path

Lab Automation Parallel Path

Rate ePoster149 Views

Screening Applications & Diagnostics

Poster Session B

(1031-B) Diagnostic of mammary carcinoma in dogs: a machine learning spectroscopy approach

Wednesday, May 29, 2024

10:30 - 11:15 CEST

Location: Exhibit Hall

Poster Presenter(s)

Md

Maria I. da Silva

Postdoctoral researcher
Universidade Federal de Minas Gerais
Belo Horizonte, Minas Gerais, Brazil

Abstract: The development of new techniques for cancer diagnostic is of great importance. In this work we present a new approach for diagnostic of mammary carcinoma from canine mammary gland samples using machine learning spectroscopy. Mammary gland samples of female dogs were measured by variable angle spectroscopic ellipsometry (VASE). Measurements were carried out at 6 different incidence angles, in a spectral range from 245 nm to 1700 nm. Samples were obtained from 12 different dogs and measured in duplicate. In this way, a large amount of data with approximately 0.6 million measured data points was obtained for each sample. The traditional approach to ellipsometry data analysis, which considered the construction and adjustment of an optical model for the sample, was not considered in this case due to the great complexity of the samples. Instead, a machine learning approach was used for binary classification of the samples. Five different machine learning models were considered: K-nearest neighbors (KNN) [1], Logistic Regression (LR) [2], Support Vector Machine (SVM) [3], eXtreme Gradient Boosting (XGBoost) [4], and Multilayer Perceptron (MLP) [5]. Binary classification strategies were implemented, coupled with a robust parameter and hyperparameter optimization scheme utilizing the Optuna library to enhance result accuracy. Throughout the cross-validation procedure, meticulous treatments were applied to ensure result reliability and efficacy, inclusive of outlier identification using the K-Nearest Neighbors (KNN) algorithm, data rebalancing through SMOTE, and feature curation employing the Select From Model technique. Following the optimization of machine learning models, a sequence of 30x 5-fold cross-validation trials ensued, yielding metrics scores across varied validation parameters. The optimal classification model returned a good level of performance with AUC of 0.89.

We would like to acknowledge the Brazilian agencies CNPq (grant 409327/2022-0, grant302632/2022-0) and FAPEMIG (grant RED-00135-22) for the financial support of this work.

References
1. Fiz, E. and J. L. Hodges, “Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties,” California Univ., Berkeley. Electronics Research Laboratory, 1951.
2. Cox, D. R., “The regression analysis of binary sequences.” Journal of the Royal Statistical Society: Series B (Methodological), Vol. 20, No. 2, 215-232, 1958.
3. Cortes, C. and V. Vapnik, “Support-Vector Networks,” Mach. Learn., Vol. 20, No. 3, 273–297, 1995.
4. Chen, T. and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794, 2016.
5. Haykin, S., “Neural networks: a comprehensive foundation,” Prentice Hall PTR, 1998.