Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation.
Abstract
The state of readiness for high-dimensional single nucleotide polymorphism (SNP) epidemiologic association studies is described, as background for a discussion of statistical aspects of case-control study design and analysis. Specifically, the important role that multistage designs can play in the elimination of false-positive associations and in the control of study costs will be noted. Also, the trade-offs associated with using pooled DNA at early design stages for additional important cost reductions will be discussed in some detail. An odds ratio approach to relating SNP alleles to disease risk using pooled DNA will be proposed, in conjunction with a simple empirical variance estimator, based on comparisons among log-odds ratio estimators from distinct pairs of case and control pools. Simulation studies will be presented to evaluate the moderate sample size properties of such multistage designs and estimation procedures. The design of an ongoing three-stage study in the Women's Health Initiative to relate 250,000 SNPs to the risk of coronary heart disease, stroke, and breast cancer will provide illustration, and will be used to motivate the choice of simulation configurations.