Proteomic models to predict pre-analytical variation


In biomarker discovery, it is critical to assess any pre-analytical variation (PAV) in order to avoid artificial bias in the intended measurements. PAV may arise from both avoidable and unavoidable factors, resulting in misleading data and incorrect conclusions. Proteins, in particular, are vulnerable to variation in collection methods, storage temperatures, and processing protocols. It is vitally important to understand this PAV when analyzing samples using protein assays.


Human EDTA plasma and serum samples, subjected to standardized sample processing methods, with distinct excursions from ideal collection, were assayed on the SomaScan® Platform measuring ~7,000 analytes. Using machine-learning methods, these quantitative protein measurements were compared to sample processing truth standards (eg, time-to-spin) to create predictive models. These models, termed SomaSignal™ tests (SSTs), were developed to enable the assessment of PAV related to processing methods.


SomaSignal tests (SSTs) have been developed to predict time-to-spin, time-to-decant and time-to-freeze, reported in the number of hours, for both plasma and serum. Models that predict the number for freeze/thaws a sample has been subjected to, have also been developed. All eight models had Lin’s CCC and R2 values greater than 0.90 in hold-out validation datasets. In addition to these sample handling predictions, effect size calculations for all ~7,000 measurements have been determined for multiple time points, or freeze-thaw cycles, for each model.


SomaLogic has developed a unique class of PAV models that are able to assess variation related to processing methods. Results from these predictions can be used during biomarker evaluations to exclude samples due to apparent excessive delay in processing, identify collection site bias for current and future analysis and identify sample groupings that may impact analysis. Further, knowing the effect size metrics for all measurements could also enable the removal of specific analytes from modeling and/or be used as covariates in model development.


David Astling
Dan Drolet
Joe Gogain
Yolanda Hagar
Laura Sampson
Kaitlin Soucie
Kinsey Trinder
Ira von Carlowitz
Matthew Westacott

SomaLogic Operating Co., Inc., Boulder, CO USA

Share with colleagues

More posters

PosterAptamer-based analysis of plasma proteome of growing tumors 

With proteins, the presence of a tumor is more often accompanied with changes in the levels of endogenous, unmutated proteins in circulation. In this context, knowing which proteins represent the earliest markers or tumor presence would be enormously useful.

Learn more

PosterPrognostic proteomic models for low event rates: A case study with myocardial infarction

We have developed and assessed a novel prognostic model development method combining two statistical techniques – survival analysis and subsampling – using existing machine learning tools in R.

Learn more

PosterEfficient development of certified diagnostic laboratory developed tests using proteomic data

We demonstrate the utility of combining pipeline tools, statistical learning techniques, and a knowledge base of in-silico proteomic datasets into a reproducible workflow that allows for efficient development of LDT-certifiable tests using SomaScan® technology.

Learn more

Explore posters in our interactive viewer