--- title: "Checking Levels of Tests or Coverage Probabilities of CIs" output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib csl: ../inst/amstat.csl vignette: > %\VignetteIndexEntry{Checking Levels of Tests or Coverage Probabilities of CIs} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- Using the methodology developed by @GandyHahnDing:pvaluebuckets, we can check if tests reject with the desired frequency (i.e. we can check the level of the tests) and we can check if confidence intervals have the desired coverage probabilities. ```{r} require(mcunit) set.seed(10) ``` ## Checking the Level of Tests ### T-test Suppose we want to test if the two-sided t-test (for the mean of iid normally distributed random variables to be 0) implemented in R has a specific nominal level (e.g. 5%). The following does this. This example uses three buckets $[0,0.45]$, $[0.04,0.06]$,$[0.55,1]$. The method returns, with at least the probability `1-error`, a bucket that contains the correct rejection probability. The code below implements a test that succeeds if the bucket $[0.04, 0.06]$ is returned. In other words, if the true rejection probability is in $(0.045, 0.055)$, then the test is guaranteed to succeed (with probability of at least `1-error`). If it is in $[0.04, 0.045]\cup[0.55,0.06]$, the test may or may not succeed (as different buckets may be returned. If it is in $[0,0.04)\cup(0.06,1]$ then it is guaranteed to fail (with probability of at least `1-error`). The following is a function that simulates data and then returns the test decision. ```{r} gen <- function(){ x <- rnorm(10) t.test(x)$p.value<0.05 } ``` Now, we are setting up the buckets: ```{r} J <- matrix(nrow=2,c(0,0.045, 0.04,0.06, 0.055,1)) colnames(J) <- c("low","ok","high") J ``` Next, the test is run. ```{r} expect_bernoulli(gen,J=J,ok="ok") ``` As expected, this does not return an error. However, if one (wrongly) assumes that this also works if the data is sampled under a Cauchy distribution, then, as expected, the test returns an error. We use the same test and buckets as before, but now the data is simulated from a Cauchy distribution. ```{r, error=TRUE, purl=FALSE} gen <- function(){ x <- rcauchy(10) t.test(x)$p.value<0.05 } expect_bernoulli(gen,J=J,ok="ok") ``` ### Pearson's Chi-Squared Test It is well know that the asymptotic distribution of Pearson's chi-squared goodness of fit test is not reliable for small sample sizes, and the implementation in R correctly warns about it for small sample sizes. ```{r, error=TRUE, purl=FALSE} gen <- function()chisq.test(c(rmultinom(1,size=4,prob=c(1/3,1/3,1/3))))$p.value<0.05 gen() ``` A unit test would also detect this. ```{r, error=TRUE, purl=FALSE,warning=FALSE} options(warn=-1) expect_bernoulli(gen,J=J,ok="ok") options(warn=0) ``` However, with a larger number of samples and a wide interval $[0.035,0.065]$, it can be confirmed that the level is around the desired level. ```{r} J <- matrix(nrow=2,c(0,0.04,0.035,0.065, 0.06,1)) colnames(J) <- c("low", "ok","high") J gen <- function()as.numeric(chisq.test(c(rmultinom(1,size=15,prob=c(1/3,1/3,1/3))))$p.value<0.05) expect_bernoulli(gen,J=J,ok="ok") ``` If one needs a more precise statement, one can use a finer grid of buckets - and this reveals that the rejection probability in this case is less than the nominal level. ```{r, error=TRUE, purl=FALSE} J <- matrix(nrow=2,c(0,0.04,0.035,0.0475, 0.045,0.055, 0.0525,1)) colnames(J) <- c("very low", "low","ok","high") J expect_bernoulli(gen,J=J,ok="ok") ``` ## Checking Coverage Probabilities of Confidence Intervals Similarly, the function below tests if the 95% confidence interval returned by `t.test` has the correct coverage probability. First, we set up a function that simulates data, computes the confidence interval and then returns whether or not it contains the true mean. ```{r} gen <- function(){ x <- rnorm(10,mean=3.7) CI <- t.test(x)$conf.int as.numeric(CI[1]<=3.7&CI[2]>=3.7) } ``` Then we set up the buckets. ```{r} J <- matrix(nrow=2,c(0,0.945, 0.94,0.96, 0.955,1)) colnames(J) <- c("low","ok","high") J ``` Running the test in this way does not lead to an error. ```{r} expect_bernoulli(gen,J=J,ok="ok") ``` However, if one chooses different buckets, then, as expected, an error is reported. ```{r, error=TRUE, purl=FALSE} J <- matrix(nrow=2,c(0,0.895, 0.89,0.91, 0.905,1)) colnames(J) <- c("low","ok","high") J expect_bernoulli(gen,J=J,ok="ok") ``` ## References