Classification of Lung Cancer using Vision Transformer on Histopathological Images

Muhamad Rafi Raihan Akbar; Untari Novia Wisesty

doi:10.30865/json.v7i3.9399

Authors

Muhamad Rafi Raihan Akbar Fakultas Informatika, Universitas Telkom, Bandung
Untari Novia Wisesty Fakultas Informatika, Universitas Telkom, Bandung

DOI:

https://doi.org/10.30865/json.v7i3.9399

Keywords:

Lung Cancer Classification, Histopathological Image Analysis, Vision Transformer (ViT), Deep Learning in Medical Imaging, Computer-Aided Diagnosis (CAD)

Abstract

Lung cancer is the leading cause of cancer-related deaths worldwide, with early diagnosis often hindered by morphological variations in histopathological images. The main problem is the difficulty in accurately and rapidly distinguishing cancer types such as adenocarcinoma and squamous cell carcinoma from benign tissue. This research processes histopathological images as input to produce a three-class classification: adenocarcinoma, squamous cell carcinoma, and benign tissue. Early detection of lung cancer can improve survival rates by up to 50%, but manual diagnosis by pathologists depends on subjective experience, causing errors of up to 20% in ambiguous cases. For example, in developing countries like Indonesia, the shortage of pathologists exacerbates treatment delays. This gap demands a reliable automated approach to support more timely clinical decisions. The developed solution involves implementing Vision Transformer (ViT) with two different architectures: ViT-B/16 (base model with 86 million parameters) and ViT-L/16 (large model with 304 million parameters). Histopathological images are processed through normalization and patch embedding of 16×16 pixels, then features are extracted using self-attention mechanism. Models are trained with transfer learning from ImageNet-21k, applying fine- tuning on lung cancer histopathological images dataset. The process includes data splitting into training (70%), validation (15%), and testing (15%), as well as data augmentation to improve robustness. The ViT-B/16 model achieved testing accuracy of 98.40% with F1-score of 0.984, while ViT-L/16 achieved accuracy of 98.18% with F1-score of 0.982. Both models demonstrated perfect capability in detecting benign tissues (precision 1.00). The average AUC-ROC value reached 0.999 for ViT-B/16 and 0.998 for ViT-L/16, indicating very high discriminative power. The main contribution of this research is a comprehensive comparison between two scales of Vision Transformer for automated lung cancer diagnosis, proving that the smaller model (ViT-B/16) can achieve equivalent or better performance with higher computational efficiency.

References

H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249, May 2021, doi: 10.3322/caac.21660.

J. An et al., “Transformer-Based Weakly Supervised Learning for Whole Slide Lung Cancer Image Classification,” IEEE J. Biomed. Heal. Informatics, vol. 29, no. 12, pp. 9095–9108, Dec. 2025, doi: 10.1109/JBHI.2024.3425434.

O. Singh, K. L. Kashyap, and K. K. Singh, “Lung and Colon Cancer Classification of Histopathology Images Using Convolutional Neural Network,” SN Comput. Sci., vol. 5, no. 2, p. 223, Jan. 2024, doi: 10.1007/s42979-023-02546-x.

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR 2021 - 9th Int. Conf. Learn. Represent., Jun. 2021, [Online]. Available: http://arxiv.org/abs/2010.11929

C. Nisa’, N. Suciati, and A. Yuniarti, “CLASSIFICATION OF LUNG AND COLON CANCER TISSUES USING HYBRID CONVOLUTIONAL NEURAL NETWORKS,” JUTI J. Ilm. Teknol. Inf., vol. 22, no. 1, pp. 56–64, Jan. 2024, doi: 10.12962/j24068535.v22i1.a1225.

A. Toskas, F.-M. Laskaratos, S. Coda, S. Banerjee, and O. Epstein, “Is Panenteric PillcamTM Crohn’s Capsule Endoscopy Ready for Widespread Use? A Narrative Review,” Diagnostics, vol. 13, no. 12, p. 2032, Jun. 2023, doi: 10.3390/diagnostics13122032.

A. Esteva et al., “Deep learning-enabled medical computer vision,” npj Digit. Med., vol. 4, no. 1, pp. 1–9, 2021, doi: 10.1038/s41746-020-00376-2.

N. Mahesh, A. Prakash, P. Naveen, and M. Reddy, “Osteosarcoma of Maxilla – A Rare Case Report,” Br. J. Med. Med. Res., vol. 17, no. 4, pp. 1–7, Jan. 2016, doi: 10.9734/BJMMR/2016/25396.

Ó. A. Martín and J. Sánchez, “Evaluation of Vision Transformers for Multi-Organ Tumor Classification Using MRI and CT Imaging,” Electronics, vol. 14, no. 15, p. 2976, Jul. 2025, doi: 10.3390/electronics14152976.

M. Hasan et al., “Vision Transformer-based Classification for Lung and Colon Cancer using Histopathology Images,” Proc. - 22nd IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2023, no. i, pp. 1300–1304, 2023, doi: 10.1109/ICMLA58977.2023.00196.

H. Ali, F. Mohsen, and Z. Shah, “Improving diagnosis and prognosis of lung cancer using vision transformers: a scoping review,” BMC Med. Imaging, vol. 23, no. 1, p. 129, Sep. 2023, doi: 10.1186/s12880-023-01098-z.

S. Rezaei et al., “Role of machine learning in molecular pathology for breast cancer: A review on gene expression profiling and RNA sequencing application,” Crit. Rev. Oncol. Hematol., vol. 213, no. March, p. 104780, Sep. 2025, doi: 10.1016/j.critrevonc.2025.104780.

A. Almangush, A. A. Mäkitie, and I. Leivo, “Back to basics: Hematoxylin and eosin staining is the principal tool for histopathological risk assessment of oral cancer,” Oral Oncol., vol. 115, p. 105134, Apr. 2021, doi: 10.1016/j.oraloncology.2020.105134.

S. Koivukoski, U. Khan, P. Ruusuvuori, and L. Latonen, “Unstained Tissue Imaging and Virtual Hematoxylin and Eosin Staining of Histologic Whole Slide Images,” Lab. Investig., vol. 103, no. 5, p. 100070, 2023, doi: 10.1016/j.labinv.2023.100070.

X. Matias-Guiu et al., “Implementing digital pathology: qualitative and financial insights from eight leading European laboratories,” Virchows Arch., vol. 487, no. 4, pp. 815–826, Oct. 2025, doi: 10.1007/s00428-025-04064-y.

N. Kumar, M. Sharma, V. P. Singh, C. Madan, and S. Mehandia, “An empirical study of handcrafted and dense feature extraction techniques for lung and colon cancer classification from histopathological images,” Biomed. Signal Process. Control, vol. 75, no. February, p. 103596, May 2022, doi: 10.1016/j.bspc.2022.103596.

N. Y. Ibrahim and A. S. Talaat, “An Enhancement Technique to Diagnose Colon and Lung Cancer by using Double CLAHE and Deep Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, pp. 276–282, 2022, doi: 10.14569/IJACSA.2022.0130833.

Classification of Lung Cancer using Vision Transformer on Histopathological Images

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

menu

template

citation

Keywords