Biases introduced by choosing controls to match risk factors of cases in biomarker research.
Abstract
Selecting controls that match cases on risk factors for the outcome is a pervasive practice in biomarker research studies. Such matching, however, biases estimates of biomarker prediction performance. The magnitudes of these biases are unknown.
We examined the prediction performance of biomarkers and improvements in prediction gained by adding biomarkers to risk factor information. Data simulated from bivariate normal statistical models and data from a study to identify critically ill patients were used. We compared true performance with that estimated from case control studies that do or do not use matching. ROC curves were used to quantify performance. We propose a new statistical method to estimate prediction performance from matched studies for which data on the matching factors are available for subjects in the population.
Performance estimated with standard analyses can be grossly biased by matching, especially when biomarkers are highly correlated with matching risk factors. In our studies, the performance of the biomarker alone was underestimated whereas the improvement in performance gained by adding the marker to risk factors was overestimated by 2-10-fold. We found examples for which the relative ranking of 2 biomarkers for prediction was inappropriately reversed by use of a matched design. The new approach to estimation corrected for bias in matched studies.
To properly gauge prediction performance in the population or the improvement gained by adding a biomarker to known risk factors, matched case control studies must be supplemented with risk factor information from the population and must be analyzed with nonstandard statistical methods.
Authors
- Fan J
- Feng Z
- Huang Y
- Li C
- Pepe MS
- Seymour CW