Prognostic proteomic models for low event rates: A case study with myocardial infarction
Background
Large-scale clinical proteomics provide increasing opportunities for patient risk stratification, especially with multi-marker models derived using machine learning techniques. Prognostic models can be developed as binary risk classifiers, or by using time-to-event data. Survival modeling is ubiquitous in statistical literature, but support for machine learning optimization is more limited in comparison to other regression techniques. We have developed and assessed a novel prognostic model development method combining two statistical techniques – survival analysis and subsampling – using existing machine learning tools in R. These methods were applied to a clinical dataset to identify a highly predictive proteomic model for myocardial infarction (MI) despite a low observed event rate.
Methods
Cox elastic net with subsampling tools were developed in R. Simulations were used to demonstrate the utility and accuracy of subsampling in a survival data context, with comparisons made to logistic regression, Cox elastic net (Coxnet), and SVM models. Following the validation of the approach via simulations, models were developed and assessed on the HUNT3 data set (n = 756), which had 61 (8.1%) MI events within four years of blood draw. Proteomic measurements were performed using SomaScan® v4.0
technology.
Results
Simulation results and analysis of HUNT3 data set show improved performance metrics using the subsampled survival method.
Conclusions
Survival analysis with subsampling is a novel combination of techniques that can be applied to proteomic data to improve biomarker discovery and predictive modeling in the context of relatively low incidence rates.Using these methods, sensitivity and specificity metrics were more balanced on real-world hold-out test sets, and simulation results showed improved discrimination metrics using subsampled survival analysis. Additionally, simulations showed that the proteins that were most highly correlated with MI were selected for final models, indicating that this method is a promising tool for clinical discovery and prognostic/diagnostic development.
Authors
Y. Hagar
L.E. Alexander
J. Chadwick
G. Datta
M.A. Hinterberg
SomaLogic Operating Co., Inc., Boulder, CO USA
Learn more by downloading this poster now
More posters
PosterOptimizing biomarker discovery with focus on low coefficient of variation in large-scale proteomics
Coefficients of variation (CV) describe innate technical variation in high throughput molecular measurement platforms and are a standard metric for characterizing and monitoring assay precision. Median CVs range from ~4.5% to 18.0% for immunoassay technology, 1 up to >30% for mass spectrometry,2 ~5% for the SomaScan® Assay, and ~10% for the Olink Explore Assay (Figure 1). Large CVs can cause technical variability to overwhelm biological signal.
PosterA proteomic predictor of conversion from mild cognitive impairment to dementia with potential utility in enhancing productivity of emerging clinical trials
A significant proportion of individuals with mild cognitive impairment (MCI) develop dementia, with annual conversion rates exceeding 10%. Earlier dementia diagnosis and intervention can improve outcomes, and new disease-modifying drugs are being repositioned for the preclinical stages of illness.
PosterQuantitative immunology protein panel built on the SomaScan Assay platform
The SomaScan® assay is a highly multiplexed proteomic assay that uses SOMAmer® reagents to detect proteins in various biological samples. The latest version of the SomaScan assay allows researchers to measure over 11,000 proteins in human blood. The SomaScan assay is designed to provide protein epitope abundance measurements by reporting relative SOMAmer reagent abundance quantified using DNA microarrays.