Abstract
One of the most deadly illnesses in the world is lung cancer, and increasing survival rates require early detection. Lung cancer diagnostics from the imaging modalities is always subjective, and this paves the way for deep learning assisted computer aided techniques. Still, the accuracy of such a technique is the major concern. This research work attempts to enhance lung cancer diagnostics from histopathological images using a previously unaddressed combination of multiple self-supervised learning techniques, filter-based feature selection, and Vision Graph Convolutional Networks. The main contribution lies in optimized feature fusion, and it brings complementary strengths of three different self-supervised learning approaches- contrastive alignment, redundancy reduction, and semantic grouping. The first step involves extracting the key features from histopathological images using a custom Convolutional Neural Network. Three complementary self-supervised learning methods - Deep Cluster, Bootstrap Your Own Latent, and Simple Framework for Contrastive Learning of Visual Representations - are then used to refine each of these features separately. Following optimization, the improved features are combined, and the most significant features from the combined set are found and preserved using a filter-based feature selection technique called Minimum Redundancy Maximum Relevance. Vision Graph Convolutional Neural Network is used as a classifier. Initial experiments are carried out on the 691 histopathological images extracted from the publicly available LungHist700 dataset. This dataset contains histopathological images under three categories: 151 normal subjects, 280 lung adenocarcinoma subjects, and 260 lung squamous cell carcinoma subjects. The proposed approach provided a balanced accuracy of 97% while the plain Vision Graph Convolution Neural Network offers only an 86% balanced accuracy score. Further validation is performed using two more publicly available datasets, namely LC25000 and TCGA UT datasets. The experimental results demonstrate the enhanced lung cancer prediction performance of the proposed approach in all three datasets.