Lung cancer is the second most common cancer type and is the leading cause of cancer death globally, with smoking and advancing age as the leading causal risk factors. The USPSTF guidelines for lung cancer screening recommends annual screening for select current or former smokers over 50 years of age. While annual screening via low dose CT has been demonstrated to decrease lung cancer mortality, compliance with screening guidelines remains low. Additional prognostic tools for future lung cancer risk stratification, particularly those without immutable demographic and health history, may be beneficial in increasing screening compliance and monitoring changing risk across time.
Using modified-aptamer proteomics technology, SomaScan® assay v4.0 (Fig 1), we scanned ~5000 proteins in 6085 EDTA plasma samples from “Ever Smokers” (current or former smokers, aged 50-73) with no known prevalent cancer at visit 3 of the Atherosclerosis Risk in Communities (ARIC) study, for a total of ~30 million protein measurements. A total of 348 incident lung cancer diagnoses occurred in this sample set, with 75 occurring within 5 years of visit 3 blood-draw. Time to lung cancer diagnosis events were modeled with protein measurements using machine learning methods in 70% of ARIC visit 3 ever smokers. A model was
selected based on performance in a 15% holdout sample subset and validated in the remaining 15% ARIC visit 3 samples not used for model training or selection.
A 7-feature protein-only accelerated failure time (AFT) Weibull model was successfully developed to predict the probability of a lung cancer diagnosis within 5 years of blood draw. Model performance in training, model selection, and validation datasets was AUC equal to 0.76, 0.72, and 0.83, respectively. Based on predicted probabilities from the model, individuals were stratified into 3 risk bins (low, medium, and high) with a 5-year event rate of 0.49% vs 2.74% in low vs high risk bins.
We successfully developed a blood-based protein-only model that predicts risk of developing lung cancer in ever smokers. Performance of the protein model out-performs traditional risk factors for lung cancer and given the lack of immutable factors it has
the potential to provide real-time risk which can be repeatedly assessed over time. Proteomics-driven risk stratification may have the ability to increase adherence to lung cancer screening guidelines and/or influence a positive behavior change in modifiable riskrelated behaviors.