---
title: "Checking Levels of Tests or Coverage Probabilities of CIs"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
csl: ../inst/amstat.csl
vignette: >
  %\VignetteIndexEntry{Checking Levels of Tests or Coverage Probabilities of CIs}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

Using the methodology developed by @GandyHahnDing:pvaluebuckets, we
can check if tests reject with the desired frequency (i.e. we can
check the level of the tests) and we can check if confidence intervals
have the desired coverage probabilities.


```{r}
require(mcunit)
set.seed(10)
```

## Checking the Level of Tests


### T-test
Suppose we want to test if the two-sided t-test (for the mean of iid normally distributed random variables to be 0) implemented in R has a specific  nominal level (e.g. 5%). 
The following does this. 

This example uses three buckets $[0,0.45]$, $[0.04,0.06]$,$[0.55,1]$. The method returns, with at least the probability `1-error`, a bucket that contains the correct rejection probability. 

The code below implements a test that succeeds if the bucket $[0.04,
0.06]$ is returned. In other words, if the true rejection probability
is in $(0.045, 0.055)$, then the test is guaranteed to succeed (with
probability of at least `1-error`). If it is in $[0.04,
0.045]\cup[0.55,0.06]$, the test may or may not succeed (as different
buckets may be returned.  If it is in $[0,0.04)\cup(0.06,1]$ then it is
guaranteed to fail (with probability of at least `1-error`).

The following is a function that simulates data and then returns the test decision.
```{r}
gen <- function(){
    x <- rnorm(10)
    t.test(x)$p.value<0.05
}
```

Now, we are setting up the buckets:
```{r}
J <- matrix(nrow=2,c(0,0.045, 0.04,0.06, 0.055,1))
colnames(J) <- c("low","ok","high")
J
```

Next, the test is run. 
```{r}
expect_bernoulli(gen,J=J,ok="ok")
```
As expected, this does not return an error. 

However, if one (wrongly) assumes that this also works if the data is
sampled under a Cauchy distribution, then, as expected, the test
returns an error. We use the same test and buckets as before, but now
the data is simulated from a Cauchy distribution.

```{r, error=TRUE, purl=FALSE}
gen <- function(){
    x <- rcauchy(10)
    t.test(x)$p.value<0.05
}
expect_bernoulli(gen,J=J,ok="ok")
```

### Pearson's Chi-Squared Test

It is well know that the asymptotic distribution of Pearson's
chi-squared goodness of fit test is not reliable for small sample
sizes, and the implementation in R correctly warns about it for small sample sizes. 
```{r, error=TRUE, purl=FALSE}
gen <- function()chisq.test(c(rmultinom(1,size=4,prob=c(1/3,1/3,1/3))))$p.value<0.05
gen()
```

A unit test would also detect this.
```{r, error=TRUE, purl=FALSE,warning=FALSE}
options(warn=-1)
expect_bernoulli(gen,J=J,ok="ok")
options(warn=0)
```

However, with a larger number of samples and a wide interval $[0.035,0.065]$, it can be confirmed that the level is around the desired level.
```{r}
J <- matrix(nrow=2,c(0,0.04,0.035,0.065, 0.06,1))
colnames(J) <- c("low", "ok","high")
J
gen <- function()as.numeric(chisq.test(c(rmultinom(1,size=15,prob=c(1/3,1/3,1/3))))$p.value<0.05)
expect_bernoulli(gen,J=J,ok="ok")
```

If one needs a more precise statement, one can use a finer grid of
buckets - and this reveals that the rejection probability in this case
is less than the nominal level.
```{r, error=TRUE, purl=FALSE}
J <- matrix(nrow=2,c(0,0.04,0.035,0.0475, 0.045,0.055, 0.0525,1))
colnames(J) <- c("very low", "low","ok","high")
J
expect_bernoulli(gen,J=J,ok="ok")
```


## Checking Coverage Probabilities of Confidence Intervals

Similarly, the function below tests if the 95% confidence interval returned by `t.test` has the correct coverage probability. 

First, we set up a function that simulates data, computes the confidence interval and then returns whether or not it contains the true mean.

```{r}
gen <- function(){
    x <- rnorm(10,mean=3.7)
    CI <- t.test(x)$conf.int
    as.numeric(CI[1]<=3.7&CI[2]>=3.7)
}
```

Then we set up the buckets.
```{r}
J <- matrix(nrow=2,c(0,0.945, 0.94,0.96, 0.955,1))
colnames(J) <- c("low","ok","high")
J
```

Running the test in this way does not lead to an error.
```{r}
expect_bernoulli(gen,J=J,ok="ok")
```

However, if one chooses different buckets, then, as expected, an error is reported. 
```{r, error=TRUE, purl=FALSE}
J <- matrix(nrow=2,c(0,0.895, 0.89,0.91, 0.905,1))
colnames(J) <- c("low","ok","high")
J
expect_bernoulli(gen,J=J,ok="ok")
```

## References