The gcatest package provides an implementation of the Genotype Conditional Association Test (GCAT) (Song, Hao, and Storey 2015). GCAT is a test for genetic association that is powered by Logistic Factor Analysis (LFA) (Hao, Song, and Storey 2016). LFA is a method of modeling population structure in a genome wide association study. GCAT performs a test for association between each SNP and a trait (either quantitative or binary). We have shown that GCAT is robust to confounding from population structure.
We include a sample dataset with the package. sim_geno is a simulated genotype matrix. sim_trait is a simulated trait. There are 10,000 SNPs and 1,000 individuals. The first five SNPs are associated with the trait. This simulations were done under the Pritchard-Stephens-Donnelly model with \(K=3\), with Dirichlet parameter \(\alpha=0.1\) and variance allotment in the trait corresponding to \(5\%\) genetic, \(5\%\) environmental, and \(90\%\) noise. This dataset is simulated under identical parameters as the PSD simulation in Figure 2 of the paper (Song, Hao, and Storey 2015), except that we have adjusted the size of the simulation to be appropriate for a small demo.
gcatThe first step of gcat is to estimate the logistic factors:
Then, we call the gcat function, which returns a vector of p-values:
We can look at the p-values for the associated SNPs:
And also plot the histogram of the unassociated SNPs:
library(ggplot2)
dat <- data.frame(p = gcat_p[6:10000])
ggplot(dat, aes(p, after_stat(density))) + geom_histogram(binwidth=1/20) + theme_bw()The genio package provides the function read_plink for parsing PLINK binary genotypes (extension: .bed) into an R object of the format needed for the gcat function. A BEDMatrix object (from the eponymous function and package) is also supported, and can result in reduced memory usage (at a small runtime penalty).
Hao, Wei, Minsun Song, and John D. Storey. 2016. “Probabilistic Models of Genetic Variation in Structured Populations Applied to Global Human Studies.” Bioinformatics 32 (5): 713–21. https://doi.org/10.1093/bioinformatics/btv641.
Song, Minsun, Wei Hao, and John D. Storey. 2015. “Testing for Genetic Associations in Arbitrarily Structured Populations.” Nat. Genet. 47 (5): 550–54. https://doi.org/10.1038/ng.3244.