INTERNATIONAL JOURNAL OF SCIENTIFIC DEVELOPMENT AND RESEARCH International Peer Reviewed & Refereed Journals, Open Access Journal ISSN Approved Journal No: 2455-2631 | Impact factor: 8.15 | ESTD Year: 2016
open access , Peer-reviewed, and Refereed Journals, Impact factor 8.15
A machine learning integrated bioinformatics analysis for tissue specific breast cancer gene classification
Authors Name:
Ghazala Sultan
, Dr. Swaleha Zubair
Unique Id:
IJSDR2301017
Published In:
Volume 8 Issue 1, January-2023
Abstract:
Machine learning techniques has been extensively utilized at early stages of biomedical research to analyze large datasets. This study aimed to develop machine learning models with strong prediction power and interpretability for gene classification between normal and cancer samples based on their expression level in different origins of tis-sues. We collected various candidate features from the clinical features of samples and generated filtered relatable features from original features set. Best features were selected through feature evaluation for classification of cancer specific genes. We used 30% of the data as a test dataset and 70% cases of data as a training and validation dataset on 7110 features from epithelial and stromal tissue. To develop the cancer gene prediction model, we considered five ma-chine learning algorithms: Logistic Regression, random forest (RF), support vector machine (SVM) and k-nearest neighbor (KNN) and C5.0. We found that random forest model shows the best learning model that produces the highest validation accuracy. In the random forest model, the classification accuracy of 95%, sensitivity is 0.926, specificity is 0.915, and AUC is 0.970. The developed prediction models show high accuracy, sensitivity, specificity and AUC in classifying among cancerous and healthy samples. This model could be used to predict BRCA in other patients with epithelial or stromal origin cancer. This study suggests that combination of multiple learning models may increase the cancer prediction accuracy.
Keywords:
Breast Cancer, Machine Learning, Supervised Learning, Unsupervised Clustering, Gene Classification
Cite Article:
"A machine learning integrated bioinformatics analysis for tissue specific breast cancer gene classification", International Journal of Science & Engineering Development Research (www.ijsdr.org), ISSN:2455-2631, Vol.8, Issue 1, page no.93 - 98, January-2023, Available :http://www.ijsdr.org/papers/IJSDR2301017.pdf
Downloads:
000336257
Publication Details:
Published Paper ID: IJSDR2301017
Registration ID:203361
Published In: Volume 8 Issue 1, January-2023
DOI (Digital Object Identifier):
Page No: 93 - 98
Publisher: IJSDR | www.ijsdr.org
ISSN Number: 2455-2631
Facebook Twitter Instagram LinkedIn