\documentclass{article}
\usepackage{amstext}
\usepackage{amsfonts}
\usepackage{hyperref}
\usepackage[round]{natbib}
\renewcommand{\refname}{REFERENCES}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{rotating}
%%\usepackage[nolists]{endfloat}
%%\usepackage{Sweave}

%%\VignetteIndexEntry{A Lego System for Conditional Inference}

\newcommand{\Rpackage}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rclass}[1]{\textit{#1}}
\newcommand{\Rcmd}[1]{\texttt{#1}}
\newcommand{\Roperator}[1]{\texttt{#1}}
\newcommand{\Rarg}[1]{\texttt{#1}}
\newcommand{\Rlevel}[1]{\texttt{#1}}

\newcommand{\RR}{\textsf{R}}
\renewcommand{\S}{\textsf{S}}

\newcommand{\R}{\mathbb{R} }
\newcommand{\Prob}{\mathbb{P} }
\newcommand{\N}{\mathbb{N} }
%%\newcommand{\C}{\mathbb{C} }
\newcommand{\V}{\mathbb{V}} %% cal{\mbox{\textnormal{Var}}} }
\newcommand{\E}{\mathbb{E}} %%mathcal{\mbox{\textnormal{E}}} }
\newcommand{\Var}{\mathbb{V}} %%mathcal{\mbox{\textnormal{Var}}} }
\newcommand{\argmin}{\operatorname{argmin}\displaylimits}
\newcommand{\argmax}{\operatorname{argmax}\displaylimits}
\newcommand{\LS}{\mathcal{L}_n}
\newcommand{\TS}{\mathcal{T}_n}
\newcommand{\LSc}{\mathcal{L}_{\text{comb},n}}
\newcommand{\LSbc}{\mathcal{L}^*_{\text{comb},n}}
\newcommand{\F}{\mathcal{F}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\yn}{y_{\text{new}}}
\newcommand{\z}{\mathbf{z}}
\newcommand{\X}{\mathbf{X}}
\newcommand{\Y}{\mathbf{Y}}
\newcommand{\sX}{\mathcal{X}}
\newcommand{\sY}{\mathcal{Y}}
\newcommand{\T}{\mathbf{T}}
\newcommand{\x}{\mathbf{x}}
\renewcommand{\a}{\mathbf{a}}
\newcommand{\xn}{\mathbf{x}_{\text{new}}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\w}{\mathbf{w}}
\newcommand{\ws}{\mathbf{w}_\cdot}
\renewcommand{\t}{\mathbf{t}}
\newcommand{\M}{\mathbf{M}}
\renewcommand{\vec}{\text{vec}}
\newcommand{\B}{\mathbf{B}}
\newcommand{\K}{\mathbf{K}}
\newcommand{\W}{\mathbf{W}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\I}{\mathbf{I}}
\newcommand{\bS}{\mathbf{S}}
\newcommand{\cellx}{\pi_n[\x]}
\newcommand{\partn}{\pi_n(\mathcal{L}_n)}
\newcommand{\err}{\text{Err}}
\newcommand{\ea}{\widehat{\text{Err}}^{(a)}}
\newcommand{\ecv}{\widehat{\text{Err}}^{(cv1)}}
\newcommand{\ecvten}{\widehat{\text{Err}}^{(cv10)}}
\newcommand{\eone}{\widehat{\text{Err}}^{(1)}}
\newcommand{\eplus}{\widehat{\text{Err}}^{(.632+)}}
\newcommand{\eoob}{\widehat{\text{Err}}^{(oob)}}

\RequirePackage[T1]{fontenc}
\RequirePackage{graphicx,ae,fancyvrb}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}
\usepackage{relsize}

\DefineVerbatimEnvironment{Sinput}{Verbatim}{baselinestretch=1}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
                                              baselinestretch=1,
                                              fontshape=it,
                                              fontsize=\relsize{-1}}
\DefineVerbatimEnvironment{Scode}{Verbatim}{}
\newenvironment{Schunk}{}{}

\renewcommand{\baselinestretch}{1}

\SweaveOpts{engine=R,eps=FALSE}

\hypersetup{%
  pdftitle = {A Lego System for Conditional Inference},
  pdfsubject = {Manuscript},
  pdfauthor = {Torsten Hothorn, Kurt Hornik,
               Mark A. van de Wiel and Achim Zeileis},
%% change colorlinks to false for pretty printing
  colorlinks = {true},
  linkcolor = {blue},
  citecolor = {blue},
  urlcolor = {red},
  hyperindex = {true},
  linktocpage = {true},
}

<<setup, echo = FALSE, results = hide>>=
options(width = 65, prompt = "R> ", continue = "   ")
require("coin")
set.seed(290875)
### get rid of the NAMESPACE
#load(file.path(.find.package("coin"), "R", "all.rda"))
anonymous <- FALSE
@


\begin{document}

\title{A Lego System for Conditional Inference
\footnote{This is a preprint of an article published in
The American Statistician, Volume 60, Number 3, Pages 257--263.
Copyright \copyright{} 2006 American Statistical Association; available online at
\url{http://www.amstat.org/publications/tas/}.}}


<<authors, echo = FALSE, results = tex>>=
if(!anonymous) {
    cat("\\author{Torsten Hothorn$^1$, Kurt Hornik$^2$, \\\\
            Mark A. van de Wiel$^3$ and Achim Zeileis$^2$}\n")
} else  {
    cat("\\author{TAS MS05-239, Revision}\n")
}
@

\setkeys{Gin}{width=\textwidth}

\date{}

\maketitle

\thispagestyle{empty}

<<affil, echo = FALSE, results = tex>>=
if(!anonymous)
    cat("\\noindent$^1$ Institut f\\\"ur Medizininformatik, Biometrie und Epidemiologie\\\\
           Friedrich-Alexander-Universit\\\"at Erlangen-N\\\"urnberg\\\\
           Waldstra{\\ss}e 6, D-91054 Erlangen, Germany \\\\
           \\texttt{Torsten.Hothorn@R-project.org}
         \\newline

         \\noindent$^2$ Department f\\\"ur Statistik und Mathematik,
            Wirtschaftsuniversit\\\"at Wien \\\\
            Augasse 2-6, A-1090 Wien, Austria \\\\
            \\texttt{Kurt.Hornik@R-project.org} \\\\
            \\texttt{Achim.Zeileis@R-project.org}
         \\newline

         \\noindent$^3$ Department of Mathematics, Vrije Universiteit \\\\
                        De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands \\\\
            \\texttt{mark.vdwiel@vumc.nl}
         \\newline\n")
@

\noindent

\begin{abstract}
Conditioning on the observed data is an important and flexible
design principle for statistical test procedures. Although generally
applicable, permutation tests currently in use are limited to
the treatment of special cases, such as contingency tables or $K$-sample
problems. A new theoretical framework for permutation
tests opens up the way to a unified and generalized view. We argue that the
transfer of such a theory to practical data analysis has important
implications in many applications and requires tools that enable the
data analyst to compute on the theoretical concepts as closely as possible.
We re-analyze four data sets
by adapting the general conceptual framework to these challenging inference
problems and
utilizing the \Rpackage{coin} add-on package in the \RR{} system for statistical computing
to show what one can gain from going beyond the `classical' test procedures.
\end{abstract}

\noindent
KEY WORDS: Permutation tests; Independence; Asymptotic distribution; Software.
\newline


%% \newpage

\section{INTRODUCTION}

The distribution of a test statistic under the circumstances of a
null hypothesis clearly depends on the unknown distribution of the data and
thus is unknown as well.
Two concepts are commonly applied to dispose of this dependency.
Unconditional tests impose assumptions on the distribution of the data such
that the null distribution of a test statistic can be derived analytically. In contrast,
conditional tests
replace the unknown null distribution by the conditional null distribution,
i.e., the distribution of the test statistic given the observed data. The
latter approach is known as \textit{permutation testing} and was developed
by R.~A.~Fisher more than 70 years ago \citep{Fisher1935}.
The pros and cons of both approaches in different fields of application
have been widely discussed \citep[e.g.~by][]{why-permut:1998,pros-and-c:2000,Shuster2005}.
Here, we focus on the practical aspects of permutation testing rather than
dealing with its methodological foundations.

For the construction of permutation tests it is common exercise to `recycle'
test statistics well known from the unconditional world, such as linear rank
statistics, ANOVA $F$ statistics or $\chi^2$ statistics for
contingency tables, and to replace the unconditional null distribution with
the conditional distribution of the test statistic under the null
hypothesis \citep{Edgington1987,Good2000,Pesarin2001,Ernst2004}.
Because the choice of the test statistic is the only `degree of freedom' for
the data analyst,
the classical view on permutation tests requires
a `cook book' classification of inference problems (categorical data
analysis, multivariate analysis, $K$-sample location problems, correlation,
etc.), each being associated with a `natural' form of the test statistic.

The theoretical advances of the last decade \citep[notably][]{StrasserWeber1999,
Pesarin2001,JanssenPauls2003}
give us a much better understanding of the strong connections between the
`classical' permutation tests defined for different inference problems.
As we will argue in this paper, the new
theoretical tools open up the way to a simple construction principle
for test procedures in new and challenging inference problems.
Especially attractive for this purpose is the theoretical framework for
permutation tests developed by \cite{StrasserWeber1999}. This unifying
theory is based on a flexible form of multivariate linear statistics for the
general independence problem.

This framework provides us with a conceptual Lego system for the construction
of permutation tests consisting of Lego bricks for linear statistics
suitable for different inference problems (contingency tables, multivariate
problems, etc.), different forms of test statistics (such as quadratic forms
for global tests or test statistics suitable for multiple comparison
procedures), and several ways to derive the conditional null
distribution (by means of exact computations or approximations).
The classical procedures, such as a permutation $t$ test, are part
of this framework and, even more interesting, new test procedures can be
embedded into the same theory whose main ideas are sketched in
Section~\ref{CI}.

Currently, the statistician's toolbox consists of rather specialized spanners,
such as the Wilcoxon-Mann-Whitney test for comparing two distributions
or the Cochran-Mantel-Haenszel $\chi^2$ test for independence in
contingency tables. With this work, we add an adjustable spanner to the
statistician's toolbox which helps to address both the common as well
as new or unusual inference problems with the appropriate
conditional test procedures. In the main part of this paper we show how one can
construct and implement permutation tests `on the fly' by plugging together Lego bricks for
the multivariate linear statistic, the test statistic and the conditional
null distribution, both conceptually and practically by means of the
\Rpackage{coin} add-on package
<<coincite, echo = FALSE, results = tex>>=
if (anonymous) {
    cat(" \\citep{PKG:coina} ")
} else {
    cat(" \\citep{PKG:coin} ")
}
@
in the \RR{} system for statistical computing
\citep{Rcore2005}.


\section{A CONCEPTUAL LEGO SYSTEM \label{CI}}

To fix notations,
we assume that we are provided with independent and identically distributed
observations
$(\Y_i, \X_i)$ for $i = 1, \dots, n$.
The variables $\Y$ and $\X$ from sample spaces $\mathcal{Y}$ and
$\mathcal{X}$ may
be measured at arbitrary scales and may be multivariate as well.
We are interested in testing the null hypothesis of independence of $\Y$ and $\X$
\begin{eqnarray*}
H_0: D(\Y | \X) = D(\Y)
\end{eqnarray*}
against arbitrary alternatives. \cite{StrasserWeber1999} suggest to derive
\textit{scalar} test statistics for testing $H_0$ from \textit{multivariate}
linear statistics of the form
\begin{eqnarray*}
\T = \vec\left(\sum_{i = 1}^n g(\X_i) h(\Y_i)^\top\right)
\in \R^{pq \times 1}.
\end{eqnarray*}
Here, $g: \mathcal{X} \rightarrow \R^{p \times 1}$ is a transformation of
the $\X$ measurements and $h: \mathcal{Y} \rightarrow
\R^{q \times 1}$ is called \emph{influence function}. The function $h(\Y_i)
= h(\Y_i, (\Y_1, \dots, \Y_n))$ may depend on the full vector of responses
$(\Y_1, \dots, \Y_n)$, however only
in a permutation symmetric way, i.e., the value of the
function must not depend on the order in which $\Y_1, \dots, \Y_n$ appear.
We will give several examples how to choose $g$ and $h$
for specific inference problems in Section~\ref{play}.

The distribution of $\T$  depends on the joint distribution of $\Y$ and $\X$,
which is unknown under almost all practical circumstances.
At least under the null hypothesis one can dispose of this
dependency by fixing $\X_1, \dots, \X_n$ and conditioning on all possible
permutations $S$ of the responses $\Y_1, \dots, \Y_n$. Tests that have been
constructed by means of this conditioning principle are called
\emph{permutation tests}.

The conditional expectation $\mu \in \R^{pq \times 1}$ and covariance
$\Sigma \in \R^{pq \times pq}$ of $\T$ under $H_0$ given
all permutations $\sigma \in S$ of the responses are derived by
\cite{StrasserWeber1999}:
\begin{eqnarray*}
\mu = \E(\T | S) & = & \vec \left( \left( \sum_{i = 1}^n g(\X_i) \right)
\E(h | S)^\top \right) \\
\Sigma = \V(\T | S) & = &
    \frac{n}{n - 1}  \V(h | S) \otimes
        \left(\sum_i g(\X_i) \otimes  g(\X_i)^\top \right)
\\
& - & \frac{1}{n - 1}  \V(h | S)  \otimes \left(
        \sum_i g(\X_i) \right) \otimes \left( \sum_i g(\X_i)\right)^\top
\nonumber
\end{eqnarray*}
where $\otimes$ denotes the Kronecker product, and the conditional
expectation of the influence function is $\E(h | S) = n^{-1} \sum_i
h(\Y_i)$ with corresponding $q \times q$ covariance matrix
\begin{eqnarray*}
\V(h | S) = n^{-1} \sum_i \left(h(\Y_i) - \E(h | S) \right) \left(h(\Y_i) - \E(h | S)\right)^\top.
\end{eqnarray*}

The key step for the construction of test statistics based on the multivariate
linear statistic $\T$ is its standardization utilizing the
conditional expectation $\mu$ and covariance matrix $\Sigma$.
Univariate test statistics~$c$ mapping a linear
statistic $\T \in \R^{pq \times 1}$
into the real line can be of arbitrary form.  Obvious choices are
the maximum of the absolute values of the standardized linear statistic
or a quadratic form:
\begin{eqnarray*}
c_\text{max}(\T, \mu, \Sigma)  & = & \max \left| \frac{\T - \mu}{\text{diag}(\Sigma)^{1/2}} \right|, \\
c_\text{quad}(\T, \mu, \Sigma)  & = & (\T - \mu)^\top \Sigma^+ (\T - \mu),
\end{eqnarray*}
involving the Moore-Penrose inverse $\Sigma^+$ of $\Sigma$.

%%The definition of one- and two-sided $p$-values used for the computations in
%%the \Rpackage{coin} package is
%%\begin{eqnarray*}
%%& & P(c(\T, \mu, \Sigma)\le c(\mathbf{t}, \mu, \Sigma)) \quad \text{(less)} \\
%%& & P(c(\T, \mu, \Sigma) \ge c(\mathbf{t}, \mu, \Sigma)) \quad \text{(greater)}\\
%%& & P(|c(\T, \mu, \Sigma)| \le |c(\mathbf{t}, \mu, \Sigma)|) \quad
%%\text{(two-sided).}
%%\end{eqnarray*}
%%Note that for quadratic forms only two-sided $p$-values are available
%%and that in the one-sided case maximum type test statistics are replaced by
%%\begin{eqnarray*}
%%\min \left( \frac{\mathbf{t} - \mu}{\text{diag}(\Sigma)^{1/2}} \right)
%%    \quad \text{(less) and }
%%\max \left( \frac{\mathbf{t} - \mu}{\text{diag}(\Sigma)^{1/2}} \right)
%%    \quad \text{(greater).}
%%\end{eqnarray*}

The conditional distribution $\Prob(c(\T, \mu, \Sigma) \le z | S)$
is the number of permutations $\sigma \in S$ of the data
with corresponding test statistic not exceeding $z$ divided by the total number
of permutations in $S$. For some special forms of the
multivariate linear statistic the exact distribution of some
test statistics is tractable for small and moderate sample sizes.
%%For two-sample problems, the shift-algorithm by \cite{axact-dist:1986}
%%and \cite{exakte-ver:1987} and the split-up algorithm by
%%\cite{vdWiel2001} are implemented as part of the package.
In principle, resampling procedures can always be used to
approximate the exact distribution up to any desired accuracy by evaluating
the test statistic for a random sample from the set of all permutations $S$.
It is important to note that in the presence of a grouping of the observations
into independent blocks,
only permutations within blocks are eligible and that the
conditional expectation and covariance matrix need to be computed
separately for each block.

Less well known is the fact that
a normal approximation of the conditional distribution can be computed
for arbitrary choices of $g$ and $h$.
\cite{StrasserWeber1999} showed in their Theorem~2.3 that the
conditional distribution of linear statistics $\T$ with conditional
expectation $\mu$ and covariance $\Sigma$ tends to a multivariate normal
distribution with parameters $\mu$ and $\Sigma$ as $n \rightarrow
\infty$. Thus, the asymptotic conditional distribution of test statistics of
the form $c_\text{max}$ is normal and
can be computed directly in the univariate case ($pq = 1$) and by numerical
algorithms in the multivariate case \citep{numerical-:1992}.
For quadratic forms
$c_\text{quad}$ which follow a $\chi^2$ distribution with degrees of freedom
given by the rank of $\Sigma$ \citep[see][Chapter 29]{johnsonkotz1970}, exact
probabilities can be computed efficiently.

\section{PLAYING LEGO \label{play}}

The Lego system sketched in the previous section consists of Lego bricks for
the multivariate linear statistic $\T$, namely the transformation $g$ and
influence function $h$, multiple forms of the test statistic $c$ and several choices
of approximations of the null distribution. In this section, we will show how
classical procedures, starting with the conditional Kruskal-Wallis test and
the Cochran-Mantel-Haenszel test, can be embedded into this general theory
and, much more interesting from our point of view, how new conditional test
procedures can be constructed conceptually \textit{and} practically.
Therefore, each inference problem comes with \RR{} code performing the
appropriate conditional test using the \Rpackage{coin} functionality
which enables the data analyst to benefit from this
simple methodology in typical data analyses. All following analyses are
reproducible from the \Rpackage{coin} package vignette; this document
can be accessed via
\begin{Schunk}
\begin{Sinput}
R> vignette("LegoCondInf", package = "coin")
\end{Sinput}
\end{Schunk}
directly in \RR{}.

\paragraph{Independent $K$-Samples: Genetic Components of Alcoholism.}

Various studies have linked alcohol dependence phenotypes to chromosome 4.
One candidate gene is \textit{NACP} (non-amyloid component of plaques),
coding for alpha synuclein.
\cite{Boenscheta2005} found longer alleles of
\textit{NACP}-REP1 in alcohol-dependent patients compared with healthy controls
and report that the allele lengths show some
association with levels of expressed alpha synuclein mRNA in
alcohol-dependent subjects (see Figure~\ref{alpha-box}). Allele length is
measured as a sum score built from additive dinucleotide repeat length and
categorized into three groups: short ($0-4$, $n = 24$), intermediate ($5-9$,
$n = 58$), and long ($10-12$, $n = 15$).

\setkeys{Gin}{width=0.6\textwidth}
\begin{figure}
\begin{center}
<<alpha-data-figure, echo = FALSE, fig = TRUE>>=
n <- table(alpha$alength)
par(cex.lab = 1.3, cex.axis = 1.3)
boxplot(elevel ~ alength, data = alpha, ylab = "Expression Level",
        xlab = "NACP-REP1 Allele Length", varwidth = TRUE)
axis(3, at = 1:3, labels = paste("n = ", n))
rankif <- function(data) trafo(data, numeric_trafo = rank_trafo)
@
\caption{\Robject{alpha} data: Distribution of levels of expressed alpha synuclein mRNA
         in three groups defined by the \textit{NACP}-REP1 allele lengths.
         \label{alpha-box}}
\end{center}
\end{figure}
Our first attempt to test for different levels of gene expression in the three
groups is the classical Kruskal-Wallis test. Here, the transformation
$g$ is a dummy coding of the allele length ($g(\X_i) = (0, 1, 0)^\top$ for
intermediate length, for example) and the value of the influence function
$h(\Y_i)$ is the rank of $\Y_i$ in $\Y_1, \dots, \Y_n$.
Thus, the linear statistic $\T$ is the vector of rank sums in each of the
three groups and the test statistic is a quadratic form
$(\T - \mu) \Sigma^+ (\T - \mu)^\top$
utilizing the conditional expectation~$\mu$ and covariance matrix~$\Sigma$.
For computing $p$-values, the limiting $\chi^2$ distribution is typically used.

In \RR{}, this specific test is readily implemented in the well established
function \Rcmd{kruskal.test} which takes a symbolic formula description of the
inference problem and a data set containing the actual observations as its main
arguments. Here, the independence of expression
levels (\Robject{elevel}) and allele lengths (\Robject{alength}) is
formulated as \verb|elevel ~ alength|, the associated observations are available
in a data frame \Rcmd{alpha}:
<<alpha-kruskal, echo = TRUE>>=
kruskal.test(elevel ~ alength, data = alpha)
@

Alternatively, the same result can be obtained by embedding the classical Kruskal-Wallis
test into the more general conditional inference framework implemented in the
\Rcmd{independence\_test} function in the \Rpackage{coin} package. This also takes
a formula and a data frame as its main arguments and additionally allows for the
specification of the transformations $g$ and $h$ via
\Rcmd{xtrafo} and \Rcmd{ytrafo}, respectively, as well as setting \Rcmd{teststat}
to \Rcmd{"maximum"} or \Rcmd{"quadratic"} (for $c_\text{max}$ or $c_\text{quad}$, respectively)
and the \Rcmd{distribution} to be used. Thus, for computing the Kruskal-Wallis test
\Rcmd{ytrafo} has to be set to the function \Rcmd{rank\_trafo} for computing ranks, in \Rcmd{xtrafo}
dummy codings have to be used (the default for categorical variables), \Rcmd{teststat}
is the \Rcmd{"quadratic"} type statistic $c_\text{quad}$ and the default asymptotic
\Rcmd{distribution} is applied:
<<alpha-kruskal, echo = TRUE>>=
independence_test(elevel ~ alength, data = alpha, ytrafo = rank_trafo, teststat = "quadratic")
@
The output gives equivalent results as reported by \Rcmd{kruskal.test} above.
So what is the advantage of using \Rcmd{independence\_test}?
Going beyond the classical functionality in \Rcmd{kruskal.test}
would require extensive programming but is easily possible in
\Rcmd{independence\_test}. For example, the resampling distribution instead of
the asymptotic distribution could be used by setting \Rcmd{distribution = approximate()}.
More interestingly, ignoring the ordinal structure of the allele length is
suboptimal, especially when we have an ordered alternative in mind.
An intuitive idea for capturing the ordinal information would be to
assign numeric scores to the allele length categories in the transformation
$g$ rather than the dummy codings used above. A natural
choice of scores would be the mid-points of the intervals originally
used to categorize the allele lengths, i.e., $g(X_i) = 2$ for short
($\in [0, 4]$), $7$ for intermediate ($\in [5, 9]$) and $11$ for
long ($\in [10, 12]$) alleles. In \RR{}, such a function $g$ is easily
implemented as
<<mpoints, echo = TRUE>>=
mpoints <- function(x) c(2, 7, 11)[unlist(x)]
@
which returns an $n$-vector and
can then be passed as \Rcmd{xtrafo} argument to \Rcmd{independence\_test}:
<<alpha-kruskal-ordered, echo = TRUE>>=
independence_test(elevel ~ alength, data = alpha, ytrafo = rank_trafo, xtrafo = mpoints)
@
This $p$-value emphasizes the impression from Figure~\ref{alpha-box}
that the expression levels increase with increasing allele lengths.
Note that due to usage of scalar transformations $g$ and
$h$, the $c_\text{max}$- and $c_\text{quad}$-type test statistics are
equivalent and hence \Rcmd{teststat} is not set (defaulting to \Rcmd{"maximum"}).
Furthermore, it should be pointed out that a test based on such a numerical
transformation for ordinal variables is equivalent to
linear-by-linear association tests \citep{Agresti2002} for which further
convenience infrastructure is available in the \Rcmd{independence\_test}
function via the \Rcmd{scores} argument.

\paragraph{Contingency Tables: Smoking and Alzheimer's Disease.}

\setkeys{Gin}{width=\textwidth}
<<alzheimer-demographics, echo = FALSE>>=
total <- nrow(alzheimer)
stopifnot(total == 538)
male <- sum(alzheimer$gender == "Male")
stopifnot(male == 200)
female <- sum(alzheimer$gender == "Female")
stopifnot(female == 338)
disease <- table(alzheimer$disease)
smoked <- sum(alzheimer$smoking != "None")
atab <- xtabs(~ smoking + + disease + gender, data = alzheimer)
### there is a discrepancy between Table 1 (32% smokers of 117 women
### suffering from other diagnoses) and Table 4 (63% non-smokers).
### We used the data as given in Table 4.
@

\cite{SalibHillier1997}
report results of a case-control study on Alzheimer's disease
and smoking behavior of $\Sexpr{disease["Alzheimer"]}$ female
and male Alzheimer patients and
$\Sexpr{sum(disease[names(disease) != "Alzheimer"])}$ controls.
The data shown in Table~\ref{alzheimertab} have been
re-constructed from Table~4 in \cite{SalibHillier1997} and are depicted in
Figure~\ref{alz-plot}.
The authors conclude that `cigarette smoking is less frequent in
men with Alzheimer's disease.'

\begin{table}[h]
\begin{center}
\caption{\Robject{alzheimer} data: Smoking and Alzheimer's disease. \label{alzheimertab}}
\begin{tabular}{lrrrr} \hline \hline
 & \multicolumn{4}{c}{No. of cigarettes daily} \\
 & None & <10 & 10--20 & >20 \\ \hline
\textit{Female} & & & & \\
<<alzheimer-tab, echo = FALSE, results = tex>>=
x <- t(atab[,,"Female"])
lines <- paste(paste(dimnames(x)$disease, " & "),
               paste(apply(x, 1, function(l) paste(l, collapse = " & ")), "\\\\"))
for (i in 1:length(lines)) cat(lines[i], "\n")
@

& & & & \\
\textit{Male} & & & & \\
<<alzheimer-tab, echo = FALSE, results = tex, width = 10>>=
x <- t(atab[,,"Male"])
lines <- paste(paste(dimnames(x)$disease, " & "),
               paste(apply(x, 1, function(l) paste(l, collapse = " & ")), "\\\\"))
for (i in 1:length(lines)) cat(lines[i], "\n")
@
\hline
\end{tabular}
\end{center}
\end{table}

\begin{figure}
\begin{center}
<<alzheimer-plot, echo = FALSE, fig = TRUE, width = 11, height = 6>>=
layout(matrix(1:2, ncol = 2))
spineplot(disease ~ smoking, data = alzheimer, subset = gender == "Male",
main = "Male", xlab = "Smoking", ylab = "Disease", tol = 1)
spineplot(disease ~ smoking, data = alzheimer, subset = gender == "Female",
main = "Female", xlab = "Smoking", ylab = "Disease", tol = 1)
@
\caption{\Robject{alzheimer} data: Association of
smoking behavior and disease status stratified by gender. \label{alz-plot}}
\end{center}
\end{figure}

We are interested to assess whether there is any association
between smoking and Alzheimer's (or other dementia) diseases and, in a
second step, how a potential association can be described. First,
the global null hypothesis of independence between smoking behavior and disease
status for both females and males, i.e., treating gender as a
block factor, can be tested with a $c_\text{quad}$-type test statistic, i.e., the
Cochran-Mantel-Haenszel test:
<<alzheimer-mantelhaen, echo = TRUE>>=
it_alz <- independence_test(disease ~ smoking | gender, data = alzheimer,
                            teststat = "quadratic")
it_alz
@
which suggests that there is a clear deviation from independence.
By default, the influence function $h$ (the \Rcmd{ytrafo} argument)
and the transformation $g$ (the \Rcmd{xtrafo} argument)
are dummy codings of the factors disease status $\Y$ and smoking behavior $\X$,
i.e., $h(\Y_i) = (1, 0, 0)^\top$
and $g(\X_i) = (1, 0, 0 ,0)^\top$ for a non-smoking Alzheimer patient.
Consequently, the linear multivariate statistic $\T$ based on $g$ and $h$
is the contingency table of both variables
<<alzheimer-statistic, echo = TRUE>>=
statistic(it_alz, type = "linear")
@
with conditional expectation \Rcmd{expectation(it\_alz)} and conditional
covariance \Rcmd{covariance(it\_alz)} which are available for standardizing
the contingency table $\T$. The conditional distribution is approximated by
its limiting $\chi^2$ distribution by default.

Given that there is significant departure from independence, we further
investigate the structure of association between smoking and Alzheimer's
disease. First we assess for which gender the violation of independence
occured, i.e., perform independence tests for female and male
subjects separately
<<alzheimer-men, echo = TRUE>>=
females <- alzheimer$gender == "Female"
males <- alzheimer$gender == "Male"
pvalue(independence_test(disease ~ smoking, data = alzheimer,
       subset = females, teststat = "quadratic"))
pvalue(independence_test(disease ~ smoking, data = alzheimer,
       subset = males, teststat = "quadratic"))
@
where it turns out that the association is due to the male patients only
(see also Figure~\ref{alz-plot}) and we therefore focus on the male patients in
the following.
A standardized contingency table is useful for gaining insight into the
association structure of contingency tables.
Thus, a test statistic based on the standardized linear statistic
$\T$ (and thus the standardized contingency table) would be more useful than a
$c_\text{quad}$-type test statistic where the contributions of all cells are
collapsed in such a quadratic form. Therefore, we choose the
maximum of the standardized contingency table as $c_\text{max}$ test statistic via
<<alzheimer-max, echo = TRUE>>=
it_alzmax <- independence_test(disease ~ smoking, data = alzheimer,
       subset = males, teststat = "maximum")
it_alzmax
@
where the underlying standardized contingency table highlights the cells with
deviations from independence
<<alzheimer-maxstat, echo = TRUE>>=
statistic(it_alzmax, type = "standardized")
@
This leads to the impression that heavy smokers suffer less frequently
from Alzheimer's disease but more frequently from other dementias
than expected under independence.
However, interpreting the standardized contingency table requires
knowledge about the distribution of the standardized statistics. An
approximation of the joint distribution of all elements of the standardized
contingency table can be obtained from the
$\Sexpr{length(statistic(it_alzmax, type = "standardized"))}$-dimensional
multivariate limiting normal distribution of the linear statistic $\T$. Most
useful is an approximation of the $95\%$ quantile of the permutation
null distribution which is available from
<<alzheimer-qperm, echo = TRUE>>=
qperm(it_alzmax, 0.95)
@
Alternatively, and more conveniently, one can
switch to $p$-values adjusted for multiple testing by a single-step max-$T$
multiple testing approach:
<<alzheimer-MTP, echo = TRUE>>=
pvalue(it_alzmax, method = "single-step")
@
These results support the conclusion that the rejection of the
null hypothesis of independence is due to a large number of patients
with other dementias and a small number with Alzheimer's disease in the heavy
smoking group. In addition, there is some evidence that, for the small
group of men smoking less than ten cigarettes per day, the reverse
association is true.

\paragraph{Multivariate Response: Photococarcinogenicity Experiments.}

The effect on tumor frequency and latency in photococarcinogenicity
experiments, where carcinogenic doses of ultraviolet radiation (UVR) are
administered, are measured by means of (at least) three response variables:
the survival time, the time to first tumor and the total number of tumors of
animals in different treatment groups.
The main interest is testing the global null hypothesis of no treatment
effect with respect to any of the three responses survival time, time to first tumor or
number of tumors \citep[][analyze the detection time
of tumors in addition, this data is not given here]{Molefeetal2005}.
In case the  global null hypothesis can be rejected, the deviations
from the partial hypotheses are of special interest.

\cite{Molefeetal2005} report data of an experiment where
$\Sexpr{nrow(photocar)}$ female mice were exposed to different levels
of UVR (group A: $n = 36$ with topical vehicle and 600 Robertson--Berger units
of UVR, group B: $n = 36$ without topical vehicle and 600 Robertson--Berger units of UVR and group C:
$n = 36$ without topical vehicle and 1200 Robertson--Berger units of UVR).
The data are taken from Tables~1--3 in \cite{Molefeetal2005}, where a
parametric test procedure is proposed.  Figure~\ref{photocarfig} depicts
the group effects for all three response variables.


\begin{figure}
\begin{center}
<<photocar-plot, echo = FALSE, fig = TRUE, width = 7, height = 3>>=
par(cex.lab = 1.3, cex.axis = 1.3)
layout(matrix(1:3, ncol = 3))
plot(survfit(Surv(time, event) ~ group, data = photocar), xmax = 50,
     xlab = "Survival Time (in weeks)", ylab = "Probability",
     lty = 1:3)
     legend("bottomleft", lty = 1:3, levels(photocar$group), bty = "n")
plot(survfit(Surv(dmin, tumor) ~ group, data = photocar), xmax = 50,
     xlab = "Time to First Tumor (in weeks)", ylab = "Probability",
     lty = 1:3)
     legend("bottomleft", lty = 1:3, levels(photocar$group), bty = "n")
boxplot(ntumor ~ group, data = photocar,
        ylab = "Number of Tumors", xlab = "Treatment Group",
        varwidth = TRUE)
@
\caption{\Robject{photocar} data:
         Kaplan-Meier estimates of time to death and time to first tumor as
         well as boxplots of the total number of tumors in three treatment
         groups. \label{photocarfig}}
\end{center}
\end{figure}
First, we construct a global test for the null hypothesis of independence
of treatment and \textit{all} three response variables. A
$c_\text{max}$-type test based on the standardized multivariate
linear statistic and an approximation of the conditional distribution
utilizing the asymptotic distribution simply reads
<<photocar-global, echo = TRUE>>=
it_ph <- independence_test(Surv(time, event) + Surv(dmin, tumor) + ntumor ~ group,
                           data = photocar)
it_ph
@
Here, the influence function $h$ consists of the logrank scores
(the default \Rcmd{ytrafo} argument for censored observations) of the survival
time and time to first tumor as well as the number of tumors, i.e., for the
first animal in the first group $h(\Y_1) =
\Sexpr{paste("(", paste(round(it_ph@statistic@ytrans[1,], 2), collapse = ","), ")")}^\top$
and $g(\X_1) = (1, 0, 0)^\top$. The multivariate linear statistic $\T$ is the sum of each of
the three components of the influence function $h$ in each of the groups,
i.e.,
<<photocar-linear, echo = TRUE>>=
statistic(it_ph, type = "linear")
@
It is important to note that this global test utilizes the complete
covariance structure $\Sigma$
when $p$-values are computed.
Alternatively, a test statistic based on the quadratic form $c_\text{quad}$
directly incorporates the covariance matrix and leads to a very similar
$p$-value.

The deviations from the partial null hypotheses, i.e., independence of
each single response and treatment groups, can be inspected by comparing the standardized
linear statistic $\T$ to its critical value \Sexpr{round(qperm(it_ph, 0.95), digits = 3)}
(which can be obtained by \verb+qperm(it_ph, 0.95)+)
<<photocar-stand, echo = TRUE>>=
statistic(it_ph, type = "standardized")
@
or again by means of the corresponding adjusted $p$-values
<<photocar-stand, echo = TRUE, eval = FALSE>>=
pvalue(it_ph, method = "single-step")
@
<<photocar-stand, echo = FALSE>>=
round(pvalue(it_ph, method = "single-step"), 5)
@
%%Of course, the goodness of the asymptotic procedure can be checked against
%%the Monte Carlo approximation which is computed by
%%<<photocar-MC, echo = TRUE>>=
%%it <- independence_test(Surv(time, event) + Surv(dmin, tumor) + ntumor ~ group,
%%                  data = photocar, distribution = approximate(50000))
%%pvalue(it, method = "single-step")
%%@
%%The more powerful step-down multiple testing adjusted $p$-values
%%\citep[Algorithm 2.8 in][]{WestfallYoung1993} are
%%<<photocar-MC2, echo = TRUE>>=
%%pvalue(it, method = "step-down")
%%@
Clearly, the rejection of the global null hypothesis is due to the
group differences in both survival time and time to first tumor whereas
no treatment effect on the total number of tumors can be observed.

\paragraph{Independent Two-Samples: Contaminated Fish Consumption.}

In the former three applications, pre-fabricated Lego bricks---i.e.,
standard transformations for $g$ and $h$ such as dummy codings, ranks
and logrank scores---have been employed.
In the fourth application, we will show how the Lego system can be used
to construct new bricks and implement a newly invented test procedure.

\cite{Rosenbaum1994a} proposed to compare groups by means of a
\textit{coherence criterion} and studied a data set of subjects
who ate contaminated fish for more than three years in
the `exposed' group ($n = 23$) and a control group ($n = 16$).
Three response variables are
available: the mercury level of the blood, the percentage of cells with
structural abnormalities and the proportion of cells with asymmetrical or
incomplete-symmetrical chromosome aberrations (see Figure~\ref{mercurybox}).
The coherence criterion defines a partial ordering:
an observation is said to be smaller than another when all three variables
are smaller. The rank score for observation $i$ is the number of
observations that are larger (following the above sketched partial ordering)
than observation $i$ minus the number of
observations that are smaller. The
distribution of the rank scores in both groups is to be compared and
the corresponding test is called `POSET-test' (partially ordered
sets test) and may be viewed as a multivariate form of the
Wilcoxon-Mann-Whitney test.

\begin{figure}
\begin{center}
<<mercuryfish-plot, echo = FALSE, fig = TRUE, width = 7, height = 3>>=
par(cex.lab = 1.3, cex.axis = 1.3)
layout(matrix(1:3, ncol = 3))
boxplot(I(log(mercury)) ~ group, data = mercuryfish,
        ylab = "Mercury Blood Level (in logs)", varwidth = TRUE)
boxplot(abnormal ~ group, data = mercuryfish,
        ylab = "Abnormal Cells (in %)", varwidth = TRUE)
boxplot(ccells ~ group, data = mercuryfish,
        ylab = "Chromosome Aberrations (in %)", varwidth = TRUE)
@
\caption{\Robject{mercuryfish} data:
         Distribution of all three response variables in the exposed group
         and control group. \label{mercurybox}}
\end{center}
\end{figure}
The coherence criterion can be formulated in a simple function by utilizing
column-wise sums of indicator functions applied to all individuals
<<mercurysfish-score, echo = TRUE>>=
coherence <- function(data) {
    x <- t(as.matrix(data))
    apply(x, 2, function(y)
        sum(colSums(x < y) == nrow(x)) - sum(colSums(x > y) == nrow(x)))
}
@
which is now defined as influence function $h$ via the \Rcmd{ytrafo} argument
<<mercuryfish-poset, echo = TRUE>>=
poset <- independence_test(mercury + abnormal + ccells ~ group,
    data = mercuryfish, ytrafo = coherence, distribution = exact())
@
Once the transformations $g$ (the default zero-one coding of the exposed and control
group) and $h$ (the coherence criterion) are defined, we enjoy the whole
functionality of the framework, including an exact two-sided $p$-value
<<mercuryfish-pvalue, echo = TRUE>>=
pvalue(poset)
@
and density (\Rcmd{dperm}), distribution (\Rcmd{pperm}) and quantile functions
(\Rcmd{qperm}) of the conditional distribution. When only a small number of
observations is available, it might be interesting to compare the exact
conditional distribution and its approximation via the limiting distribution.
For the \Robject{mercuryfish} data, the relevant parts of
both distribution functions are shown in Figure~\ref{distplot}. It
turns out that the quality of the normal approximation is excellent for this
particular problem and using the normal approximation would be sufficient
for all practical purposes in this application.

\begin{figure}
\begin{center}
<<mercuryfish-ppermplot, echo = FALSE, fig = TRUE, height = 5, width = 9>>=
par(cex.lab = 1.3, cex.axis = 1.1)
ite <- poset
ita <- independence_test(mercury + abnormal + ccells ~ group, data =
                           mercuryfish, ytrafo = coherence)
site <- support(ite)
layout(matrix(1:2, ncol = 2))
site <- site[site <= qperm(ite, 0.1) & site > -3]
pite <- sapply(site, function(x) pperm(ite, x))
pita <- sapply(site, function(x) pperm(ita, x))

plot(site, pite, type = "S", ylab = "Probability", xlab = "Standardized Statistic")
lines(site, pita, lty = 3)
legend("topleft", lty = c(1,3), legend = c("Conditional Distribution",
"Approximation"), bty = "n")

site <- support(ite)
site <- site[site >= qperm(ite, 0.9) & site < 3]
pite <- sapply(site, function(x) pperm(ite, x))
pita <- sapply(site, function(x) pperm(ita, x))

plot(site, pite, type = "S", ylab = "Probability", xlab = "Standardized Statistic")
lines(site, pita, lty = 3)
@
\caption{\Robject{mercuryfish} data:
         Conditional distribution and asymptotic normal approximation
         for the POSET test. \label{distplot}}
\end{center}
\end{figure}

\section{DISCUSSION}

Conditioning on the observed data is a simple, yet powerful, design
principle for statistical tests. Conceptually, one only needs to choose
an appropriate test statistic and evaluate it for all admissible
permutations of the data \citep[][gives some examples]{Ernst2004}.
In practical setups, an implementation of this
two-step procedure requires a certain amount of programming
and computing time. Sometimes, permutation tests are even regarded
as being `computationally impractical'
for larger sample sizes \citep{BalkinMallows2001}.

The permutation test framework by \cite{StrasserWeber1999} helps us to take
a fresh look at conditional inference procedures and makes at least
two important contributions: analytic formulae for the
conditional expectation and covariance and the limiting normal distribution
of a class of multivariate linear statistics. Thus, test statistics can be
defined for appropriately standardized linear statistics and a fast
approximation of the conditional distribution is available,
especially for large sample sizes.

It is one mission, if not \textit{the} mission, of statistical computing to
transform new theoretical developments into flexible software tools for the
data analyst. The \Rpackage{coin} package is an attempt to translate
the theoretical concepts of \cite{StrasserWeber1999} into software tools
preserving the simplicity and flexibility of
the theory as closely as possible. With this package, the specialized spanners
currently in use,
such as \Rcmd{wilcox.test} for the Wilcoxon-Mann-Whitney test or
\Rcmd{mantelhaen.test} for the Cochran-Mantel-Haenszel $\chi^2$ test in the
\S{} language and \texttt{NPAR1WAY} for linear rank statistics in \textsf{SAS}
as well as the tools implemented in \textsf{StatXact}, \textsf{LogXact},
\textsf{Stata}, and \textsf{Testimate}
\citep[see][for an overview]{Oster2002,Oster2003},
are extended by \Rcmd{independence\_test}, a much more flexible and
adjustable spanner.

But who stands to benefit from such a software infrastructure? We argue
that an improved data analysis is possible in cases when the appropriate
conditional test is not available from standard software packages.
Statisticians can modify existing test procedures or even try new ideas by
computing directly on the theory. A high-level Lego system is attractive for
both researchers and software developers, because only the transformation $g$ and influence
function $h$ need to be newly defined, but the burden of implementing a
resampling procedure, or even deriving the limiting distribution of a newly
invented test statistic, is waived.

%%Since the \Rpackage{coin} package consists
%%of only a few core functions that need to be tested, the setup of quality
%%assurance tools is rather simple in this case \citep[the need for such tests
%%is obvious,[]{different-:2000}. Many text books
%%\citep[e.g.][]{HollanderWolfe1999} or software manuals \citep[first of all
%%the excellent StatXact handbook by][]{StatXact6} include examples and results
%%of the associated test procedures which have been reproduced with
%%\Rpackage{coin}.
%%Since the \Rpackage{coin} package is part of the Comprehensive \RR{} Archive
%%Network (CRAN, \url{http://CRAN.R-project.org/}) we have been able to help
%%several people asking `Is the xyz-test available in \RR{}' on the
%%\texttt{r-help} email list with the answer `No, but its only whose
%%two lines of \RR{} code'.

With a unifying conceptual framework in mind and a software implementation, such as
\Rpackage{coin}, at hand, we are no longer limited to already published and
implemented permutation test procedures and are free to define our own transformations and
influence functions, can choose several forms of suitable test statistics and
utilize several methods for the computation or approximation of
the conditional distribution of the test statistic of interest. Thus, the
construction of an appropriate permutation test, for both classical and
new inference problems, is only a matter of putting together
adequate Lego bricks.

%%\section*{ACKNOWLEDGEMENTS}

%%Strasser, Dominikus, editor / AE + referees

\renewcommand{\baselinestretch}{1}
%%\setlength{\bibsep}{2mm}

\bibliographystyle{jss}
\bibliography{LegoCondInf}

\clearpage

\setkeys{Gin}{width=0.9\textheight}
\pagestyle{empty}

\end{document}