Type: | Package |
Title: | Genetic Epidemiology Design and Inference |
Version: | 1.0.3 |
Depends: | R (≥ 2.0.0), stats |
Author: | Venkatraman E. Seshan |
Maintainer: | Venkatraman E. Seshan <seshanv@mskcc.org> |
Description: | Package for Genetic Epidemiologic Methods Developed at MSKCC. It contains functions to calculate haplotype specific odds ratio and the power of two stage design for GWAS studies. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyLoad: | yes |
NeedsCompilation: | yes |
Packaged: | 2023-08-31 16:07:06 UTC; seshanv |
Repository: | CRAN |
Date/Publication: | 2023-08-31 16:30:02 UTC |
Functions for some genetic epidemiology methods.
Description
These functions provide code for genetic epidemiology methods developed at MSKCC. They currently include estimating haplotype disease risk and two stage designs for GWAS.
Details
Package: | genepi |
Type: | Package |
Version: | 1.0.3 |
Date: | 2023-08-31 |
License: | GPL version 2 or later |
LazyLoad: | yes |
There are two functions haplotypeOddsRatio
and
twoStagePower
in this package.
Package will be archived and functions added to clinfun package
Author(s)
Venkatraman E. Seshan
Maintainer: Venkatraman E. Seshan <seshanv@mskcc.org>
References
Satagopan JM, Venkatraman ES, Begg CB. (2004) Two-stage designs for gene-disease association studies with sample size constraints. Biometrics
Venkatraman ES, Mitra N, Begg CB. (2004) A method for evaluating the impact of individual haplotypes on disease incidence in molecular epidemiology studies. Stat Appl Genet Mol Biol. v3:Article27.
Internal Functions for haplotypeOddsRatio
Description
Internal functions needed for haplotypeOddsRatio.
Usage
genhaplopairs(n)
hwehaplofreq(n, two2k, g1idx, g2idx, g1tbl, hpair, tol=1e-8)
probonecopy(n, hf0, g1idx, g1tbl, hpair)
Details
These functions are not to be called by the user.
Calculate haplotype disease risk.
Description
Haplotype disease risk is calculated resolving haplotype ambiguity and adjusting for covariates and population stratification.
Usage
haplotypeOddsRatio(formula, gtypevar, data, stratvar=NULL, nsim=100, tol=1e-8)
## S3 method for class 'haploOR'
print(x, ...)
Arguments
formula |
The formula for logistic regression without the haplotype variable. |
gtypevar |
The variable names in the data frame corresponding to the loci of interest. Each variables counts the number of mutant genotypes a subject has at that locus. Legal values are 0, 1, 2 & NA. |
data |
The name of the dataframe being analyzed. It should have all the variables in the formula as well as those in genotype and stratvar. |
stratvar |
Name of the stratification variable. This is used to account for population stratification. The haplotype frequencies are estimated within each stratum. |
nsim |
Variance should be inflated to account for inferred ambiguous haplotypes. The estimates are recalculated by simulating the disease haplotype copy number and variance added to average. |
tol |
Tolerance limit for the EM algorithm convergence. |
x |
Object of class haploOR. |
... |
Other print options. |
Details
This implements the method in the reference below.
Value
It is a list of class haploOR
call |
The function call that produced this output. |
coef |
Table with estimated coefficients, standard error, Z-statistic and p-value. |
var |
Covariance matrix of the estimated log odds-ratiios. |
deviance |
Average of the simulated deviances. Its theoretical properties are unknown. |
aic |
Average of the simulated aic. |
null.deviance |
Deviance of null model. |
df.null |
Degrees of freedom of null model. |
df.residual |
Degrees of freedom of full model. |
The "print" method formats the results into a user-friendly table.
Author(s)
Venkatraman E. Seshan
References
Venkatraman ES, Mitra N, Begg CB. (2004) A method for evaluating the impact of individual haplotypes on disease incidence in molecular epidemiology studies. Stat Appl Genet Mol Biol. v3:Article27.
Examples
# simulated data with 2 loci haplotypes 1=00, 2=01, 3=10, 4=11
# control haplotype probabilities p[i] i=1,2,3,4
# haplotype pairs (i<=j) i=j: probs = p[i]^2 ; i<j: p[i]*p[j]
p <- c(0.25, 0.2, 0.2, 0.35)
p0 <- rep(0, 10)
l <- 0
for(i in 1:4) {for(j in i:4) {l <- l+1; p0[l] <- 2*p[i]*p[j]/(1+1*(i==j))}}
controls <- as.numeric(cut(runif(1000), cumsum(c(0,p0)), labels=1:10))
# case probabilities disease haplotype is 11
or <- c(2, 5)
p1 <- p0*c(1,1,1,2,1,1,2,1,2,8); p1 <- p1/sum(p1)
cases <- as.numeric(cut(runif(1000), cumsum(c(0,p1)), labels=1:10))
# now pool them together and set up the data frame
dat <- data.frame(status=rep(0:1, c(1000, 1000)))
# number of copies of mutant variant for locus 1
dat$gtype1 <- c(0,0,1,1,0,1,1,2,2,2)[c(controls, cases)]
# number of copies of mutant variant for locus 2
dat$gtype2 <- c(0,1,0,1,2,1,2,0,1,2)[c(controls, cases)]
# true number of copies of disease haplotype
dat$hcn <- c(0,0,0,1,0,0,1,0,1,2)[c(controls, cases)]
# model with genotypes only
haplotypeOddsRatio(status ~ 1, c("gtype1","gtype2"), dat)
# model from the logistic fit using the number of copies of disease haplotype
glm(status ~ factor(hcn), dat, family=binomial)
Calculate the power of a two stage design for GWAS
Description
Calculate the power of a two stage design for GWAS under sample size or cost constraints. Implements methods in the refereces below.
Usage
twoStagePower(n=NULL, Cost=NULL, m=5000, mu=0.045, mu.loc=0.5, p=0.10,
f=NULL, relcost=1, true.needed=1, rho=0, rho0=0, nsim=2000)
Arguments
n |
The maximum sample size for the study. |
Cost |
Maximum available resource. One of |
m |
The number of markers in the study. Default is 5000. It will take a a long time to compute power for very large numbers e.g. 100000 |
mu |
The mean vector for the markers that are associated with endpoint. |
mu.loc |
The locations of the true markers. Since the chromosome is mapped to the unit interval (0,1) the numbers should be between 0 and 1. |
p |
The proportion of markers taken to the second stage. The default is 0.1 which is found to be optimal. |
f |
The fraction of Cost or sample size allocated to the first stage. If not specified it uses 0.75 for the Cost constraint scenario and 0.5 for the sample size contraint scenario. |
relcost |
Specifies how expensive it is to genotype in the second stage compared to the first stage. |
true.needed |
The number of markers selected in the end. Can be a maximum of length of mu.loc (or mu). |
rho , rho0 |
correlation between markers |
nsim |
Number of Monte Carlo replications to compute power. |
Details
This implements the method in the reference below.
Value
It returns the power as a single numeric value
Author(s)
Jaya M. Satagopan & Venkatraman E. Seshan
References
Satagopan JM, Venkatraman ES, Begg CB. (2004) Two-stage designs for gene-disease association studies with sample size constraints. Biometrics
Examples
twoStagePower(n=1000)
twoStagePower(Cost=1000)