Sample size calculations for comparative studies of medical tests for detecting presence of disease.
Abstract
Technologic advances give rise to new tests for detecting disease in many fields, including cancer and sexually transmitted disease. Before a new disease screening test is approved for public use, its accuracy should be shown to be better than or at least not inferior to an existing test. Standards do not yet exist for designing and analysing studies to address this issue. Established principles for the design of therapeutic studies can be adapted for studies of screening tests. In particular, drawing upon methods for superiority and non-inferiority studies of therapeutic agents, we propose that confidence intervals for the relative accuracy of dichotomous tests drive the design of comparative studies of disease screening tests. We derive sample size formulae for a variety of designs, including studies where patients undergo several tests and studies where patients receive only one of the tests under evaluation. Both cohort and case-control study designs are considered. Modifications to the confidence intervals and sample size formulae are discussed to accommodate studies where, because of the invasive nature of definitive testing, true disease status can only be obtained for subjects who are positive on one or more of the screening tests. The methods proposed are applied to a study comparing a modified pap test to the conventional pap for cervical cancer screening. The impact of error in the gold standard reference test on the design and evaluation of comparative screening test studies is also discussed.