Rapid Association Test for SNPs and Haplotypes Using Importance Sampling and Linkage Disequilibrium Decay

Gadi Kimmel, School of Computer Science, TAU

Due to the rapid progress in genotyping techniques, many large-scale, genome-wide disease association studies are now under way. Typically, the disorders examined are multi-factorial, and therefore researchers seeking association between the trait and the loci must consider interactions among loci and between loci and other factors. One of the challenges of large disease association studies is obtaining accurate estimates of the significance of discovered associations. The linkage disequilibrium between SNPs makes the tests highly dependent, and dependency worsens when interactions are tested. The standard way of assigning significance (p-value) is by a permutation test. Unfortunately, in large studies it is prohibitively slow to compute low p-values by this method.
We present here a faster algorithm for calculating the accurate p-value of a case-control association permutation test. It is based on importance sampling and on accounting for the decay in linkage disequilibrium along the chromosome. The algorithm is dramatically faster than the standard permutation test. For example, when testing single marker-trait association in simulations with a thousand SNPs and a thousand cases and controls, it was over 10,000 times faster. When testing pairwise interactions among 300 SNPs, our algorithm was about 100,000 times faster. On 10,000 SNPs from Chromosome 1, a speed-up of 60,000 was achieved. Our method significantly increases the problem size range for which accurate, meaningful association results are attainable.

Joint work with Ron Shamir.