--- documentclass: jss author: - name: Anestis Touloumis affiliation: University of Brighton address: | | School of Architecture, Technology and Engineering | University of Brighton | Brighton, BN2 4GJ, UK email: \email{A.Touloumis@brighton.ac.uk} title: formatted: "\\proglang{R} Package \\pkg{multgee}: A Generalized Estimating Equations Solver for Multinomial Responses" # If you use tex in the formatted title, also supply version without plain: "R Package multgee: A Generalized Estimating Equations Solver for Multinomial Responses" # For running headers, if needed short: "\\pkg{multgee}: GEE for Multinomial Responses" abstract: > This vignette of the \proglang{R} package \pkg{multgee} is an updated version of @Touloumis2015, published in the Journal of Statistical Software. To cite \pkg{multgee} in publications, please use @Touloumis2015. To cite the GEE methodology implemented in \pkg{multgee}, please use @Touloumis2012. The \proglang{R} package \pkg{multgee} implements the local odds ratios generalized estimating equations (GEE) approach proposed by @Touloumis2012, a GEE approach for correlated multinomial responses that circumvents theoretical and practical limitations of the GEE method. A main strength of \pkg{multgee} is that it provides GEE routines for both ordinal (\code{ordLORgee}) and nominal (\code{nomLORgee}) responses, while relevant software in \proglang{R} and \proglang{SAS} are restricted to ordinal responses under a marginal cumulative link model specification. In addition, \pkg{multgee} offers a marginal adjacent categories logit model for ordinal responses and a marginal baseline category logit model for nominal. Further, utility functions are available to ease the local odds ratios structure selection (\code{intrinsic.pars}), to constract confidence intervals (\code{confint}), to extract the variance-covariance matrix of the regression parameters (\code{vcov}) and to perform a Wald--type goodness-of-fit test between two nested GEE models (\code{waldts}). We demonstrate the application of \pkg{multgee} through a clinical trial with clustered ordinal multinomial responses. keywords: # at least one keyword must be supplied formatted: [generalized estimating equations, nominal and ordinal multinomial responses, local odds ratios, "\\proglang{R}"] plain: [generalized estimating equations, nominal and ordinal multinomial responses, local odds ratios, R] preamble: > \usepackage{amsmath} \usepackage{amsbsy} \Volume{64} \Issue{8} \Month{March} \Year{2015} \Submitdate{2013-06-08} \Acceptdate{2014-07-24} \DOI{10.18637/jss.v064.i08} output: if (rmarkdown::pandoc_version() >= "2") rticles::jss_article else rmarkdown::html_vignette bibliography: References.bib vignette: > %\VignetteIndexEntry{GEE for Multinomial Responses} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction In several studies, the interest lies in drawing inference about the regression parameters of a marginal model for correlated, repeated or clustered multinomial variables with ordinal or nominal response categories while the association structure between the dependent responses is of secondary importance. The lack of a convenient multivariate distribution for multinomial responses and the sensitivity of ordinary maximum likelihood methods to misspecification of the association structure led researchers to modify the GEE method of @Liang1986 in order to account for multinomial responses [@Miller1993; @Lipsitz1994; @Williamson1995; @Lumley1996; @Heagerty1996; @Parsons2006]. These GEE approaches estimate the marginal regression parameter vector by solving the same set of estimating equations as in @Liang1986, but differ in the way they parametrize and/or estimate $\boldsymbol \alpha$, a parameter vector that is usually defined to describe a "working" assumption about the association structure. @Touloumis2012 showed that the joint existence of the estimated marginal regression parameter vector and $\hat{\boldsymbol \alpha}$ cannot be assured in existing approaches. This is because the parametric space of the proposed parameterizations of the association structure depends on the marginal model specification even in the simple case of bivariate multinomial responses. To address this issue, @Touloumis2012 defined $\boldsymbol \alpha$ as a "nuisance" parameter vector that contains the marginalized local odds ratios structure, that is the local odds ratios as if no covariates were recorded, and they employed the family of association models [@Goodman1985] to develop parsimonious and meaningful structures regardless of the response scale. The practical advantage of the local odds ratios GEE approach is that it is applicable to both ordinal and nominal multinomial responses without being restricted by the marginal model specification. Simulations in @Touloumis2012 imply that the local odds ratios GEE approach captures a significant portion of the underlying correlation structure, and compared to the independence "working" model (i.e., assuming no correlation structure in the GEE methodology), simple local odds ratios structures can substantially increase the efficiency gains in estimating the regression vector of the marginal model. Note that low convergence rates for the GEE approach of @Lipsitz1994 and @Heagerty1996 did not allow the authors to compare these approaches with the local odds ratios GEE approach while the GEE approach of @Parsons2006 was excluded from the simulation design because its use is restricted to a cumulative logit marginal model specification. The \proglang{R} [@RCoreTeam2013] package \pkg{multgee} implements the local odds ratios GEE approach and it is available from the Comprehensive \proglang{R} Archive Network at \url{https://CRAN.R-project.org/package=multgee}. To emphasize the importance of reflecting the nature of the response scale on the marginal model specification and on the marginalized local odds ratios structure, two core functions are available in \pkg{multgee}: \code{nomLORgee} which is appropriate for GEE analysis of nominal multinomial responses and \code{ordLORgee} which is appropriate for ordinal multinomial responses. In particular, options for the marginal model specification include a baseline category logit model for nominal response categories and a cumulative link model or an adjacent categories logit model for ordinal response categories. In addition, there are six utility functions that enable the user to: i) Perform goodness-of-fit tests between two nested GEE models (\code{waldts}), ii) construct Wald type confidence intervals (\code{confint}), iii) select the local odds ratios structure based on the rule of thumb discussed in @Touloumis2012 (\code{intrinsic.pars}), iv) obtain the estimated covariance matrix of the parameter estimates (\code{vcov}), v) construct a probability table (to be passed in the core functions) that satisfies a desired local odds ratios structure (\code{matrixLOR}), and vi) obtain commonly used criteria for variables and/or association structure selection in the GEE literature. To appreciate the features of \pkg{multgee}, we briefly review GEE software for multinomial responses in \proglang{SAS} [@SAS] and \proglang{R}. The current version of \proglang{SAS} supports only the independence "working" model under a marginal cumulative probit or logit model for ordinal multinomial responses. To the best of our knowledge, \proglang{SAS} macros [@Williamson1998; @Yu2004] implementing the approach of @Williamson1995 are not publicly available. The \proglang{R} package \pkg{repolr} [@Parsons2013] implements the approach of @Parsons2006 but it is restricted to using a cumulative logit model. Another option for ordinal responses is the function \code{ordgee} in the \proglang{R} package \pkg{geepack} [@Hojsgaard2006]. This function implements the GEE approach of @Heagerty1996 but it seems to produce unreliable results for multinomial responses. To illustrate this, we simulated independent multinomial responses under a cumulative probit model specification with a single time-stationary covariate for each subject and we employed \code{ordgee} to obtain the GEE estimates from the independence "working" model. Description of the generative process can be found in Scenario 1 of @Touloumis2012 except that we used the values $-3,-1,1$ and $3$ for the four category specific intercepts in order to make the problem more evident. Based on $1000$ simulation runs when the sample size $N=500$, we found that the bias of the GEE estimate of $\beta=1$ was $\approx 4.8 \times 10^{28}$, indicating the presence of a bug or -at least- of numerical problems for some situations. Similar problems occurred for the alternative global odds ratios structures in \code{ordgee}. In contrast to existing software, \pkg{multgee} offers greater variety of GEE models for ordinal responses, implements a GEE model for nominal responses and is not limited to the independence "working" model, which might lead to significant efficiency losses. Further, one can assess the goodness of fit for two or more nested GEE models. This paper is organized as follows. In Section [2](#GEENotation), we present the theoretical background of the local odds ratios GEE approach that is necessary for the use of \pkg{multgee}. We introduce the marginal models implemented in \pkg{multgee}, the estimation procedure for the "nuisance" parameter vector $\boldsymbol \alpha$ and the asymptotic theory on which GEE inference is based. We describe the arguments of the core GEE functions (\code{nomLORgee}, \code{ordLORgee}) in Section [3](#Description1) while the utility functions (\code{waldts}, \code{confint}, \code{intrinsic.pars}, \code{vcov}, \code{matrixLOR}, \code{gee_criteria}) are described in Section [4](#Description2). In Section [5](#Example), we illustrate the use of \pkg{multgee} in a longitudinal study with correlated ordinal multinomial responses. We summarize the features of the package and provide a few practical guidelines in Section [6](#Summary). # Local odds ratios GEE approach {#GEENotation} For notational ease, suppose the data arise from a longitudinal study with no missing observations. However, note that the local odds ratios GEE approach is not limited neither to longitudinal studies nor to balanced designs, under the strong assumption that missing observations are missing completely at random [@Rubin1976]. Let $Y_{it}$ be the multinomial response for subject $i$ $(i=1,\ldots,N)$ at time $t$ $(t=1,\ldots,T)$ that takes values in $\{1,2,\ldots,J\}$, $J>2$. Define the response vector for subject $i$ $$\mathbf {Y}_{i}=(Y_{it1},\ldots,Y_{i1(J-1)},Y_{i21},\ldots,Y_{i2(J-1)},\ldots,Y_{iT1},\ldots,Y_{iT(J-1)})^{\top},$$ where $Y_{itj}=1$ if the response for subject $i$ at time $t$ falls at category $j$ and $Y_{itj}=0$ otherwise. Denote by $\mathbf{x}_{it}$ the covariates vector associated with $Y_{it}$, and let $\mathbf x_{i}=(\mathbf x^{\top}_{i1},\ldots,\mathbf x^{\top}_{iT})^{\top}$ be the covariates matrix for subject $i$. Define $\pi_{itj}= \E(Y_{itj}|\mathbf x_i)=\Prob(Y_{itj}=1| \mathbf x_i)=\Prob(Y_{it}=j| \mathbf x_i)$ as the probability of the response category $j$ for subject $i$ time $t$, and let $\boldsymbol \pi_{i}=(\boldsymbol \pi^{\top}_{i1},\ldots,\boldsymbol \pi^{\top}_{iT})^{\top}$ be the mean vector of $\mathbf Y_i$, where $\boldsymbol{\pi}_{it} = (\pi_{it1},\ldots,\pi_{it(J-1)})^{\top}$. It follows from the above that $Y_{itJ}=1-\sum_{j=1}^{J-1} Y_{itj}$ and $\pi_{itJ}=1-\sum_{j=1}^{J-1} \pi_{itj}$. ## Marginal models for correlated multinomial responses The choice of the marginal model depends on the nature of the response scale. For ordinal multinomial responses, the family of cumulative link models \begin{equation} F^{-1}\left[\Prob(Y_{it}\leq j|\mathbf x_i)\right]=\beta_{j0}+ {\boldsymbol \beta}_{\ast}^{\top} \mathbf{x}_{it} \label{ABMCLM} \end{equation} or the adjacent categories logit model \begin{equation} \log\left(\frac{\pi_{itj}}{\pi_{it(j+1)}} \right)=\beta_{j0}+ {\boldsymbol \beta}_{\ast}^{\top} \mathbf{x}_{it} \label{ABMACLM} \end{equation} can be used, where $F$ is the cumulative distribution function of a continuous distribution and $\{\beta_{j0}:j=1,\ldots,J-1\}$ are the category specific intercepts. For nominal multinomial responses, the baseline category logit model \begin{equation} \log\left(\frac{\pi_{itj}}{\pi_{itJ}}\right)=\beta_{j0}+{\boldsymbol {\beta}}_{j}^{\top} \mathbf{x}_{it} \label{ABMBCLM} \end{equation} can be used, where $\boldsymbol {\beta}_{j}$ is the $j$-th category specific parameter vector. It is worth mentioning that the linear predictor differs in the above marginal models. First, the category specific intercepts need to satisfy a monotonicity condition $\beta_{10}\leq\beta_{20}\leq \ldots \leq \beta_{(J-1)0}$ only when the family of cumulative link models in (\ref{ABMCLM}) is employed. Second, the regression parameter coefficients of the covariates $\mathbf x_{it}$ are category specific only in the marginal baseline category logit model (\ref{ABMBCLM}) and not in the ordinal marginal models (\ref{ABMCLM}) and (\ref{ABMACLM}). ## Estimation of the marginal regression parameter vector To unify the notation, let $\boldsymbol \beta$ be the $p$--variate parameter vector that includes all the regression parameters in (\ref{ABMCLM}), (\ref{ABMACLM}) or (\ref{ABMBCLM}). To obtain $\boldsymbol {\widehat \beta_G}$, a GEE estimator of $\boldsymbol \beta$, @Touloumis2012 solved the estimating equations \begin{equation} \mathbf{U}(\boldsymbol \beta,\widehat{\boldsymbol \alpha})=\frac{1}{N}\sum_{i=1}^N \mathbf{D}_i \mathbf V^{-1}_{i} (\mathbf {Y}_i-\boldsymbol{\pi}_i)=\mathbf{0} \label{EEbeta} \end{equation} where $\mathbf{D}_i=\partial \boldsymbol{\pi}_i/\partial \boldsymbol{\beta}$ and $\mathbf V_i$ is a $T(J-1) \times T(J-1)$ "weight" matrix that depends on $\boldsymbol \beta$ and on $\widehat{\boldsymbol \alpha}$, an estimate of the "nuisance" parameter vector $\boldsymbol \alpha$ defined formally in Section [2.3](#Alpha). Succinctly, $\mathbf V_i$ is a block matrix that mimics the form of $\COV(\mathbf{Y}_i|\mathbf x_i)$, the true covariance matrix for subject $i$. The $t$-th diagonal matrix of $\mathbf V_i$ is the covariance matrix of $Y_{it}$ determined by the marginal model. The $(t,t^{\prime})$-th off-diagonal block matrix describes the marginal pseudo-association of $(Y_{it},Y_{it^{\prime}})$, which is a function of the marginal model and of the pseudo-probabilities $\{\Prob(Y_{it}=j,Y_{it^{\prime}}=j^{\prime}|\mathbf x_i):j,j^{\prime}=1,\ldots,J-1\}$ calculated based on $(\widehat{\boldsymbol \alpha},\boldsymbol \beta)$. We should emphasize that $\mathbf V_i$ is a "weight" matrix because $\boldsymbol \alpha$ is defined as a "nuisance" parameter vector and it is unlikely to describe a valid "working" assumption about the association structure for all subjects. ## Estimation of the nuisance parameter vector and of the weight matrix {#Alpha} Order the $L=T(T-1)/2$ time-pairs with the rightmost element of the pair most rapidly varying as $(1,2),(1,3),\ldots,(T-1,T)$, and let $G$ be the group variable with levels the $L$ ordered pairs. For each time-pair $(t,t^{\prime})$, ignore the covariates and cross-classify the responses across subjects to form an $J \times J$ contingency table such that the row totals correspond to the observed totals at time $t$ and the column totals to the observed totals at time $t^{\prime}$, and let $\theta_{tjt^{\prime}j^{\prime}}$ be the local odds ratio at the cutpoint $(j,j^{\prime})$ based on the expected frequencies $\{f_{tjt^{\prime}j^{\prime}}:j,j^{\prime}=1,\ldots,J\}$. For notational reasons, let $A$ and $B$ be the row and column variable respectively. Assuming a Poisson sampling scheme to the $L$ sets of $J \times J$ contingency tables, fit the RC-G(1) type model [@Becker1989a] \begin{equation} \log f_{tjt^{\prime}j^{\prime}}=\lambda+\lambda^{A}_{j}+\lambda^{B}_{j^{\prime}}+\lambda^{G}_{(t,t^{\prime})}+\lambda^{AG}_{j(t,t^{\prime})}+\lambda^{BG}_{j^{\prime}(t,t^{\prime})}+\phi^{(t,t^{\prime})}\mu^{(t,t^{\prime})}_j \mu^{(t,t^{\prime})}_{j^{\prime}}, \label{RCGmodel} \end{equation} where $\{\mu^{(t,t^{\prime})}_{j}:j=1,\ldots,J\}$ are the score parameters for the $J$ response categories at the time-pair $(t,t^{\prime})$. After imposing identifiability constraints on the regression parameters in (\ref{RCGmodel}), the log local odds ratios structure is given by \begin{equation} \log \theta_{tjt^{\prime}j^{\prime}}=\phi^{(t,t^{\prime})}\left(\mu^{(t,t^{\prime})}_{j}-\mu^{(t,t^{\prime})}_{j+1}\right)\left(\mu^{(t,t^{\prime})}_{j^{\prime}}-\mu^{(t,t^{\prime})}_{j^{\prime}+1}\right). \label{RCstructure2} \end{equation} At each time-pair, (\ref{RCstructure2}) summarizes the local odds ratios structure in terms of the $J$ score parameters and the intrinsic parameter $\phi^{(t,t^{\prime})}$ that measures the average association of the marginalized contingency table. Since the score parameters do not need to be fixed or monotonic, the local odds ratios structure is applicable to both nominal and ordinal multinomial responses. @Touloumis2012 defined $\boldsymbol \alpha$ as the parameter vector that contains the marginalized local odds ratios structure $$\alpha=\left(\theta_{1121},\ldots,\theta_{1(J-1)2(J-1)},\ldots,\theta_{(T-1)1T1},\ldots,\theta_{(T-1)(J-1)T(J-1)}\right)^{\top}$$ where $\theta_{tjt^{\prime}j^{\prime}}$ satisfy (\ref{RCstructure2}). To increase the parsimony of the local odds ratios structures for ordinal responses, they proposed to use common unit-spaced score parameters $\left(\mu^{(t,t^{\prime})}_{j}=j\right)$ and/or common intrinsic parameters $\left(\phi^{(t,t^{\prime})}=\phi\right)$ across time-pairs. For a nominal response scale, they proposed to apply a homogeneity constraint on the score parameters $\left(\mu^{(t,t^{\prime})}_{j}=\mu_{j}\right)$ and use common intrinsic parameters across time-pairs. To estimate $\boldsymbol \alpha$ maximum likelihood methods are involved by treating the $L$ marginalized contingency tables as independent. Technical details and justification about this estimation procedure can be found in @Touloumis2011a and @Touloumis2012. Conditional on the estimated marginalized local odds ratios structure $\widehat{\boldsymbol \alpha}$ and the marginal model specification at times $t$ and $t^{\prime}$, $\{\Prob(Y_{it}=j,Y_{it^{\prime}}=j^{\prime}|\mathbf x_i):t2$ observed response categories are sorted in an ascending order and then mapped onto $\{1,2,\ldots,J\}$. To account for a covariate \code{x} with a constrained parameter coefficient fixed to 1 in the linear predictor, the term \code{offset(x)} must be inserted on the right hand side of \code{formula}. ## Data representation The \code{id} argument identifies the $N$ subjects by assigning a unique label to each subject. If required, the observed \code{id} labels are sorted in an ascending order and then relabeled as $1,\ldots,N$, respectively. The \code{repeated} argument identifies the times at which the multinomial responses are recorded by treating the $T$ unique observed times in the same manner as in \code{id}. The purpose of \code{repeated} is dual: To identify the $T$ distinct time points and to construct the full marginalized contingency table for each time-pair by aggregating the relevant/available responses across subjects. The \code{repeated} argument is optional and it can be safely ignored in balanced designs or in unbalanced designs in which if the $t$-th response is missing for a particular subject then all subsequent responses at times $t^{\prime}>t$ are missing for that subject. Otherwise, it is recommended to provide the \code{repeated} argument in order to ensure proper construction of the full marginalized contingency table. To this end, note that if the measurement occasions are not recorded in a numerical mode, then the user should create \code{repeated} by mapping the $T$ distinct measurement occasions onto the set $\{1,\ldots,T\}$ in such a way that the temporal order of the measurement occasions is preserved. For example, if the measurements occasions are recorded as "before", "baseline", "after", then the levels for \code{repeated} should be coded as $1,2$ and $3$, respectively. The dataset is imported via the \code{data} argument in "long" format, meaning that each row contains all the information provided by a subject at a given measurement occasion. This implies that \code{data} must include the variables specified in the mandatory arguments \code{formula} and \code{id}, as well as the optional argument \code{repeated} when this is specified by the user. If no \code{data} is provided then the above variables are extracted from the \code{environment} that \code{nomLORgee} and \code{ordLORgee} are called. Currently missing observations, identified by \code{NA} in \code{data}, are ignored. ## Marginalized local odds ratios structure specification The marginalized local odds ratios is specified via the \code{LORstr} argument. Table \ref{tab:LOR} displays the structures proposed by @Touloumis2012. Currently the default option is the time exchangeability structure (\code{"time.exch"}) in \code{nomLORgee} and the category exchangeability (\code{"category.exch"}) structure in \code{ordLORgee}. The uniform (\code{"uniform"}) and category exchangeability structures are not allowed in \code{nomLORgee} because given unit-spaced parameter scores are not meaningful for nominal response categories. The user can also fit the independence "working" model (\code{LORstr = "independence"}) or even provide the local odds ratios structure (\code{LORstr = "fixed"}) using the \code{LORterm} argument. In this case, an $L \times J^2$ matrix must be constructed such that the $g$-th row contains the vectorized form of a probability table that satisfies the desired local odds ratios structure at the time-pair corresponding to the $g$-th level of $G$. @Touloumis2011a discussed two further versions of the \code{"time.exch"} and the RC (\code{"RC"}) structures based on using: i) Heterogeneous score parameters (\code{homogeneous = FALSE}) at each time-pair, and/or ii) monotone score parameters (\code{restricted = TRUE}), an option applicable only for ordinal response categories. However, it is sensible to employ these additional options only when the local odds ratios structures in Table \ref{tab:LOR} do not seem adequate. It is important to mention that the user must provide only the arguments required for the specified local odds ratios structure. For example, the arguments \code{homogeneous}, \code{restricted} and \code{LORterm} are ignored when \code{LORstr = "uniform"}. \begin{table} \centering \begin{tabular}{cccccl} \hline \hline $\log \theta_{tjt^{\prime}j^{\prime}}$ & \code{LORstr} & Functions & Parameters \\ \hline $\phi$ & \code{"uniform"} & \code{ordLORgee} & 1\\ $\phi^{(t,t^{\prime})}$ &\code{"category.exch"} & \code{ordLORgee} &L\\ $\phi \left(\mu_{j}-\mu_{j+1}\right)\left(\mu_{j^{\prime}}-\mu_{j^{\prime}+1}\right)$ & \code{"time.exch"} & Both & $J-1$ \\ $\phi^{(t,t^{\prime})}\left(\mu^{(t,t^{\prime})}_{j}-\mu^{(t,t^{\prime})}_{j+1}\right)\left(\mu^{(t,t^{\prime})}_{j^{\prime}}-\mu^{(t,t^{\prime})}_{j^{\prime}+1}\right)$ & \code{"RC"} & Both & $L(J-1)$ \\ \hline \end{tabular} \caption{The main options for the marginalized local odds ratios structures in \pkg{multgee}.} \label{tab:LOR} \end{table} ## Computational details The default estimation procedure for the marginalized local odds ratios structure is to fit model (\ref{RCGmodel}) to the full marginalized contingency table (\code{LORem = "3way"}) after imposing the desired restrictions on the intrinsic and the score parameters. @Touloumis2011a noticed that the estimated local odds ratios structure under model (\ref{RCGmodel}) is identical to that obtained by fitting independently a row and columns (RC) effect model [@Goodman1985] with homogeneous score parameters to each of the $L$ contingency tables. Motivated by this, an alternative estimation procedure (\code{LORem = "2way"}) for estimating the structures \code{"uniform"} and \code{"time.exch"} was proposed. In particular, one can estimate the single parameter of the \code{"uniform"} structure as the average of the $L$ intrinsic parameters $\phi^{(t,t^{\prime})}$ obtained by fitting the linear-by-linear association model [@Agresti2002] independently to each of the $L$ marginalized contingency tables. For the \code{"time.exch"} structure, one can fit $L$ RC effects models with homogeneous (\code{homogeneous = TRUE})/heterogeneous (\code{homogeneous = FALSE}) score parameters and then estimate the log local odds ratio at each cutpoint $(j,j^{\prime})$ by averaging $\log \hat{\theta}_{tjt^{\prime}j^{\prime}}$ for $t