Full professor Universidade Federal De Minas Gerais Belo Horizonte, Minas Gerais, Brazil
Abstract: The development of new techniques for recognition of tumoral and non-tumoral tissues at the cellular level are of great importance for the diagnostic of cancer. In this work we present an important step in that direction with a new in-vitro label-free spectroscopic approach capable of differentiating samples of tumoral and non-tumoral cell lines by using machine learning spectroscopy techniques. Cultures of epidermoid carcinoma A431 cells, hypopharyngeal carcinoma FaDU cells, squamous carcinoma SCC-4 cells, and epidermal keratinocyte HaCat cells were used in this study. These cells were grown on 22 mm x 22 mm Corning #1½ glass coverslips and measured by variable angle spectroscopic ellipsometry in the spectral range from 240 nm to 1700 nm. Ellipsometry properties such as , , Re(), Im(), and Depolarization were retrieved. In this way, a large amount of data with approximately 0.6 million measured data points was obtained for each sample. Due to the large complexity of this biological samples the traditional approach for the analysis of ellipsometry data, by building and fitting an optical model for the sample, was not considered. As an alternative, a machine learning approach was used to classify the samples in the different cell lines. Interesting results were obtained by focusing on the machine learning processing of Re() data at an angle of incidence of 55 degrees. Three types of ML models were used: Support Vector Machine (SVM) [1], eXtreme Gradient Boosting (XGBoost) [2], and Multilayer Perceptron (MLP) [3]. Two-class and multi-class classification processes were employed, with a robust parameter and hyperparameter optimization method implemented to increase the accuracy of results using the Optuna library. During the cross-validation process, important treatments were taken to ensure the reliability and effectiveness of the results, including outlier detection with the K-Nearest Neighbors (KNN) algorithm, data balancing with SMOTE, and feature selection with the Select From technique Model. After optimizing the machine learning models, a series of 30 10-fold cross-validation experiments were conducted, generating scores for 6 validation metrics. For a multiclass scenario, a code was developed that classifies the four types of cells by comparing information from each one with the others. These results demonstrate the potential of machine learning spectroscopy as a label-free, facile and accurate platform for the future diagnostic of cancer at the cellular level.
We would like to acknowledge the Brazilian agencies CNPq (grant 409327/2022-0, grant302632/2022-0) and FAPEMIG (grant RED-00135-22) for the financial support of this work.
References 1. Cortes, C. and V. Vapnik, “Support-Vector Networks,” Mach. Learn., Vol. 20, No. 3, 273–297, 1995. 2. Chen, T. and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794, 2016. 3. Haykin, S., “Neural networks: a comprehensive foundation,” Prentice Hall PTR, 1998.