Efficient development of prognostic tests for detecting cancer risk using proteomic technology

Background

Prognostic models for assessing future health outcomes can be developed using time-to-event (also known as “survival”) data. This methodology is ubiquitous in statistical literature and in the analysis of cancer outcomes, but its use in high-dimensional analyses tends to be limited as the methods are difficult to implement in a machine learning environment. Additionally, development of certified prognostic clinical tests using proteomic biomarkers for detecting future cancer risk can be time-consuming, prone to overfitting issues, and difficult to navigate. We demonstrate the utility of combining SomaScan® proteomic data with pipeline machine learning tools and survival analysis methodology to identify powerful and robust LDT-certifiable prognostic tests for assessing future risk of cancer.

Methods

Data pipeline and analysis tools were developed using R. In addition to standard machine learning techniques, statistical methods include elastic net AFT models, subsampling survival techniques, and metrics for assessing predictive survival models. The pipeline takes the analyst from data processing and QC through identification of optimal models for prediction of clinical endpoints, and then through validation on a hold-out test set.

Results

Analysis time for identifying the optimal proteomic model to validation was reduced by at least 80%, with decreased prediction variability by up to 90%. This tool has led to 7 LDT certified SomaLogic prognostic tests (using survival methodology) in the last 3 years.

Conclusions

Not only are powerful, proteomics-driven, diagnostic tests realizable, but they can be LDT certified in an efficient, reproducible manner and made to be robust to real-life variability. Efficient analysis tools allow us to leverage proteomic technology in new ways, leading to tests that can be used for precision medicine applications.

Authors

Yolanda Hagar
Leigh Alexander
Jessica Chadwick
Gargi Datta
Joe Gogain
Rachel Ostroff
Clare Paterson
Laura Sampson
Caleb Scheidel
Sama Shrestha
Chi Zhang
Michael Hinterberg

SomaLogic Operating Co., Inc., Boulder, CO USA

Click below for a downloadable version of this poster

Download poster

Share with colleagues

More posters

PosterOptimizing biomarker discovery with focus on low coefficient of variation in large-scale proteomics

Coefficients of variation (CV) describe innate technical variation in high throughput molecular measurement platforms and are a standard metric for characterizing and monitoring assay precision. Median CVs range from ~4.5% to 18.0% for immunoassay technology, 1 up to >30% for mass spectrometry,2 ~5% for the SomaScan® Assay, and ~10% for the Olink Explore Assay (Figure 1). Large CVs can cause technical variability to overwhelm biological signal.

Learn more

PosterA proteomic predictor of conversion from mild cognitive impairment to dementia with potential utility in enhancing productivity of emerging clinical trials

A significant proportion of individuals with mild cognitive impairment (MCI) develop dementia, with annual conversion rates exceeding 10%. Earlier dementia diagnosis and intervention can improve outcomes, and new disease-modifying drugs are being repositioned for the preclinical stages of illness.

Learn more

PosterQuantitative immunology protein panel built on the SomaScan Assay platform

The SomaScan® assay is a highly multiplexed proteomic assay that uses SOMAmer® reagents to detect proteins in various biological samples. The latest version of the SomaScan assay allows researchers to measure over 11,000 proteins in human blood. The SomaScan assay is designed to provide protein epitope abundance measurements by reporting relative SOMAmer reagent abundance quantified using DNA microarrays.

Learn more

Explore posters in our interactive viewer