Prognostic proteomic models for low event rates: A case study with myocardial infarction

Background

Large-scale clinical proteomics provide increasing opportunities for patient risk stratification, especially with multi-marker models derived using machine learning techniques. Prognostic models can be developed as binary risk classifiers, or by using time-to-event data. Survival modeling is ubiquitous in statistical literature, but support for machine learning optimization is more limited in comparison to other regression techniques. We have developed and assessed a novel prognostic model development method combining two statistical techniques – survival analysis and subsampling – using existing machine learning tools in R. These methods were applied to a clinical dataset to identify a highly predictive proteomic model for myocardial infarction (MI) despite a low observed event rate.

Methods

Cox elastic net with subsampling tools were developed in R. Simulations were used to demonstrate the utility and accuracy of subsampling in a survival data context, with comparisons made to logistic regression, Cox elastic net (Coxnet), and SVM models. Following the validation of the approach via simulations, models were developed and assessed on the HUNT3 data set (n = 756), which had 61 (8.1%) MI events within four years of blood draw. Proteomic measurements were performed using SomaScan® v4.0
technology.

Results

Simulation results and analysis of HUNT3 data set show improved performance metrics using the subsampled survival method.

Conclusions

Survival analysis with subsampling is a novel combination of techniques that can be applied to proteomic data to improve biomarker discovery and predictive modeling in the context of relatively low incidence rates.Using these methods, sensitivity and specificity metrics were more balanced on real-world hold-out test sets, and simulation results showed improved discrimination metrics using subsampled survival analysis. Additionally, simulations showed that the proteins that were most highly correlated with MI were selected for final models, indicating that this method is a promising tool for clinical discovery and prognostic/diagnostic development.

Authors

Y. Hagar
L.E. Alexander
J. Chadwick
G. Datta
M.A. Hinterberg

SomaLogic Operating Co., Inc., Boulder, CO USA


Share with colleagues

More posters

PosterComparison of Proteomic CV Risk to Established ASCVD 10-Year Risk Decision Points

The ASCVD pooled cohort equation (PCE) is well-established for CV risk assessment. Decision points for determining treatment plans are low, intermediate and high risk over 10 years, however this approach over and underestimates risk in certain subgroups. The validated CV Risk SomaSignal® Test (SST) provides 4-year risk probability of MACE allowing for timely assessment of risk, but the shorter timescale makes comparison to 10-year PCE risk less intuitive.

Learn more

PosterStatin signature: using proteomics to detect pharmacological fingerprints

Using a previously described metacohort (n=5,575) of patients with increased CV risk, we hypothesized that PCE would stratify patients differently than the CV Risk SST, and that CV Risk score scaled to 10 years would yield an improved net reclassification index (NRI).

Learn more

PosterUsing a proteomics-based cardiovascular risk test to identify systemic changes in a clinical trial of nonalcoholic fatty liver disease

Improvement in hepaKc inflammaKon, NAFLD acKvity score and fibrosis were associated with improved proteomic CV risk scores regardless of treatment provided.

Learn more

Explore posters in our interactive viewer