\documentclass[a4paper]{article} \usepackage[round]{natbib} \usepackage{amssymb} %% \usepackage{RJournal} \usepackage{fancyvrb} \usepackage{Sweave, url} \usepackage{hyperref} \DefineVerbatimEnvironment{Sinput}{Verbatim}{fontsize=\small,fontshape=sl} \DefineVerbatimEnvironment{Soutput}{Verbatim}{fontsize=\small} \DefineVerbatimEnvironment{Scode}{Verbatim}{fontsize=\small,fontshape=sl} \SweaveOpts{echo = TRUE, results = verbatim, keep.source = TRUE} \bibliographystyle{abbrvnat} \begin{document} \title{Performance Attribution for Equity Portfolios} \author{Yang Lu\thanks{\href{mailto:yang.lu2014@gmail.com}{yang.lu2014@gmail.com}} and David Kane\thanks{\href{mailto:dave.kane@gmail.com}{dave.kane@gmail.com}}} %%\VignetteIndexEntry{Using the pa package} %%\VignetteDepends{pa} \maketitle \setkeys{Gin}{width=0.95\textwidth} \section{Introduction} Many portfolio managers measure performance with reference to a benchmark. The difference in return between a portfolio and its benchmark is the active return of the portfolio. Portfolio managers and their clients want to know what caused this active return. Performance attribution decomposes the active return. The two most common approaches are the Brinson-Hood-Beebower (hereafter referred to as the Brinson model) and a regression-based analysis.\footnote{See \cite{brinson:gary} and \cite{jpmreg} for more information.} Portfolio managers use different variations of the two models to assess the performance of their portfolios. Managers of fixed income portfolios include yield-curve movements in the model \cite{lord} while equity managers who focus on the effect of currency movements use variations of the Brinson model to incorporate ``local risk premium'' \cite{localrisk}. In contrast, in this paper we focus on attribution models for equity portfolios without considering any currency effect. The \textbf{pa} package provides tools for conducting both methods for equity portfolios. The Brinson model takes an ANOVA-type approach and decomposes the active return of any portfolio into asset allocation, stock selection, and interaction effects. The regression-based analysis utilizes estimated coefficients from a linear model to attribute active return to different factors. After describing the Brinson and regression approaches and demonstrating their use via the \texttt{pa} package, we show that the Brinson model is just a special case of the regression approach. \section{Data} We demonstrate the use of the \textbf{pa} package with a series of examples based on real-world data sets from MSCI Barra's Global Equity Model II(GEM2).\footnote{See www.msci.com and \cite{gem2} for more information.} MSCI Barra is a leading provider of investment decision support tools to investment institutions worldwide. According to the company: % Note that the font size for this quote small and then go % back to normal. \small \begin{quote} GEM2 is the latest Barra global multi-factor equity model. It provides a foundation for investment decision support tools via a broad range of insightful analytics for developed and emerging market portfolios. The latest model version provides: \begin{itemize} \item Improved accuracy of risk forecasts and increased explanatory power. \item An intuitive structure that accommodates different investment processes in developed vs. emerging markets. \item Greater responsiveness to market dynamics. \item Comprehensive market coverage. \end{itemize} GEM2 leverages the decades of experience that MSCI Barra has in developing and maintaining global equity multi-factor models and indices, and offers important enhancements over GEM, which is utilized by hundreds of institutional fund managers worldwide. \end{quote} \normalsize The original data set contains selected attributes such as industry, size, country, and various style factors for a universe of approximately 48,000 securities on a monthly basis. For illustrative purposes, this article uses three modified versions of the original data set, containing 3000 securities, namely \texttt{year}, \texttt{quarter}, and \texttt{jan}. The data frame, \texttt{quarter}, is a subset of \texttt{year}, containing the data of the first quarter. The data frame, \texttt{jan}, is a subset of \texttt{quarter} with the data from January, 2010. <>= options(width = 50, digits = 2, scipen = 5) library(pa) @ <<>>= data(year) names(year) @ \begin{itemize} \item barrid: security identifier by Barra. \item name: name of a security. \item return: monthly total return in trading currency. \item date: the starting date of the month to which the data belong. \item sector: consolidated sector categories based on the GICS.\footnote{Global Industry Classification Standard} \item momentum: capture sustained relative performance. \item value: capture the extent to which a stock is priced inexpensively in the market. \item size: differentiate between large and small cap companies. \item growth: capture stock's growth prospects. \item cap.usd: capitalization in model base currency USD. \item yield: dividend of a security. \item country: the country in which the company is traded. \item currency: currency of exposure. \item portfolio: top 200 securities based on \texttt{value} scores in January are selected as portfolio holdings and are held through December 2010. This is to avoid the complexity of trading in the analyses. \item benchmark: top 1000 securities based on size each month. The benchmark is cap-weighted. \end{itemize} Here is a sample of rows and columns from the data frame \texttt{year}: <>= sample.mat <- rbind(which(year$portfolio == 0 & year$benchmark == 0)[2], which(year$portfolio == 0 & year$benchmark > 0)[2], which(year$portfolio > 0 & year$benchmark == 0)[2], which(year$return > 0 & year$portfolio > 0 & year$benchmark > 0)[500], which(year$barrid == "CANAITC")[11]) year[sample.mat, c(-1, -6, -7, -9: -11, -13) ] @ The portfolio has 200 equal-weighted holdings. The row for Canadian Imperial Bank of Commerce indicates that it is one of the 200 portfolio holdings with a weight of 0.5\% in 2010. Its return was 2.61\% in August, and almost flat in November. \section{The Brinson Model} \subsection{Single-Period Brinson Model} Consider an equity portfolio manager who uses the S\&P 500 as the benchmark. In a given month, she outperformed the S\&P by 3\%. Part of that performance was due to the fact that she allocated more weight of the portfolio to certain sectors that performed well. Call this the \emph{allocation effect}. Part of her outperformance was due to the fact that some of the stocks she selected did better than their sector as a whole. Call this the \emph{selection effect}. The residual can then be attributed to an interaction between allocation and selection -- the \emph{interaction effect}. The Brinson model provides mathematical definitions for these terms and methods for calculating them. The example above uses sector as the classification scheme when calculating the allocation effect. But the same approach can work with any other variable which places each security into one, and only one, discrete category: country, industry, and so on. In fact, a similar approach can work with continuous variables that are split into discrete ranges: the highest quintile of market cap, the second highest quintile and so forth. For generality, we will use the term ``category'' to describe any classification scheme which places each security in one, and only one, category. Notations: \begin{itemize} \item $w^B_i$ is the weight of security $i$ in the benchmark. \item $w^P_i$ is the weight of security $i$ in the portfolio. \item $W^B_j$ is the weight of category $j$ in the benchmark. $W^B_j = \sum w^B_i$, $i$ $\in$ $j$. \item $W^P_j$ is the weight of a category $j$ in the portfolio. $W^P_j = \sum w^P_i$, $i$ $\in$ $j$. \item The sum of the weight $w^B_i$, $w^P_i$, $W^B_j$, and $W^P_j$ is 1, respectively. \item $r_i$ is the return of security $i$. \item $R^B_j$ is the return of a category $j$ in the benchmark. $R^B_j = \sum w^B_ir_i$, $i$ $\in$ $j$. \item $R^P_j$ is the return of a category $j$ in the portfolio. $R^P_j = \sum w^P_ir_i$, $i$ $\in$ $j$. \end{itemize} The return of a portfolio, $R_P$, can be calculated in two ways: \begin{itemize} \item On an individual security level by summing over $n$ stocks: $R_P = \sum\limits_{i = 1}^n w^P_ir_i$. \item On a category level by summing over $N$ categories: $R_P = \sum\limits_{j = 1}^N W^P_jR^P_j$. \end{itemize} Similar definitions apply to the return of the benchmark, $R_B$, \begin{itemize} \item $R_B = \sum\limits_{i = 1}^n w^B_ir_i$. \item $R_B = \sum\limits_{j = 1}^N W^B_jR^B_j$. \end{itemize} Active return of a portfolio, $R_{active}$, is a performance measure of a portfolio relative to its benchmark. The two conventional measures of active return are arithmetic and geometric. The \textbf{pa} package implements the arithmetic measure of the active return for a single-period Brinson model because an arithmetic difference is more intuitive than a ratio over a single period. The arithmetic active return of a portfolio, $R_{active}$, is the portfolio return $R_P$ less the benchmark return $R_B$: \begin{center} $R_{active} = R_P - R_B$. \end{center} Since the category allocation of the portfolio is generally different from that of the benchmark, allocation plays a role in the active return, $R_{active}$. The same applies to stock selection where assuming that the portfolio has the exact same categorical exposures as the benchmark does, equities within each category are different. This contributes to $R_{active}$ as well. Allocation effect $R_{allocation}$ and selection effect $R_{selection}$ over $N$ categories are defined as: \begin{center} $R_{allocation} = \sum\limits_{j = 1}^N W^P_jR^B_j - \sum\limits_{j = 1}^N W^B_jR^B_j$, \end{center} \begin{center} $R_{selection} = \sum\limits_{j = 1}^N W^B_jR^P_j - \sum\limits_{j = 1}^N W^B_jR^B_j$. \end{center} The intuition behind the allocation effect is that a portfolio would produce different returns with different allocation schemes ($W^P_j$ vs. $W^B_j$) while having the same stock selection and thus the same return ($R^B_j$) for each category. The difference between the two returns, caused by the allocation scheme, is called the allocation effect ($R_{allocation}$). Similarly, two different returns can be produced when two portfolios have the same allocation ($W^B_j$) yet dissimilar returns due to differences in stock selection within each category ($R^p_j$ vs. $R^B_j$). This difference is the selection effect ($R_{selection}$). Interaction effect, $R_{interaction}$, is the result of subtracting return due to allocation $R_{allocation}$ and return due to selection $R_{selection}$ from the active return $R_{active}$: \begin{center} $R_{interaction} = R_{active} - R_{allocation} - R_{selection}$. \end{center} \subsection{Weakness of the Brinson Model} The Brinson model allows portfolio managers to analyze the relative return of a portfolio using any attribute of a security, such as country or sector. One weakness of the model is to expand the analysis beyond two categories.\footnote{\citet{garytwo} proposed a framework to include two variables in the Brinson analysis.} As the number of categories increases, this procedure is subject to the \textit{curse of dimensionality}. Suppose an equity portfolio manager wants to find out the contributions of any two categories (for instance, country and sector) to her portfolio based on the Brinson model. She can decompose the active return into three broad terms -- $R_{allocation}$, $R_{selection}$, and $R_{interaction}$. The allocation effect can be further split into country allocation effect, sector allocation effect and the product of country and sector allocation effects: \begin{center} $R_{allocation} = R_{country \, allocation} + R_{sector \, allocation} + R_{country \, allocation}R_{sector \, allocation}$. \end{center} Specifically, the country allocation effect is the return caused by the difference between the actual country allocation and the benchmark country allocation while assuming the same benchmark return within each level of the category country, that is, \begin{center} $R_{country \, allocation} = \sum\limits_{j = 1}^N {}_CW^P_j{}_CR^B_j - \sum\limits_{j = 1}^N {}_CW^B_j{}_CR^B_j$, \end{center} where \begin{itemize} \item ${}_CW^P_j$ and ${}_CW^B_j$ refer to the weight of each country $j$ ($N_C$ countries in total) in the portfolio and that in the benchmark, respectively. \item ${}_CR^B_j$ refers to the benchmark return of any country $j$. \end{itemize} Similarly, the sector allocation effect is the difference in return between a portfolio's sector allocation and the benchmark's sector allocation while having the same benchmark returns: \begin{center} $R_{sector \, allocation} = \sum\limits_{j = 1}^N {}_SW^P_j{}_SR^B_j - \sum\limits_{j = 1}^N {}_SW^B_j{}_SR^B_j$, \end{center} ${}_SW^P_j$ and ${}_SW^B_j$ refer to the weight of the sector $j$ in the portfolio and the weight of the sector $j$ in the benchmark, respectively. ${}_SR^B_j$ is the benchmark return of any given sector $j$ of all $N_S$ sectors. In the same vein, the return as a result of the selection effect $R_{selection}$ is the sum of country selection effect, sector selection effect, and the product of country and sector selection effects: \begin{eqnarray*} R_{selection} & = & R_{country \, selection} + R_{sector \, selection} \\ & & + R_{country \, selection} * R_{sector \, selection}\\ & = & \sum\limits_{j = 1}^N {}_CW^B_j{}_CR^P_j - \sum\limits_{j = 1}^N {}_CW^B_j{}_CR^B_j \\ & + & \sum\limits_{j = 1}^N {}_SW^B_j{}_SR^P_j - \sum\limits_{j = 1}^N {}_SW^B_j{}_SR^B_j \\ & + & (\sum\limits_{j = 1}^N {}_CW^B_j{}_CR^P_j - \sum\limits_{j = 1}^N {}_CW^B_j{}_CR^B_j) \\ & * & (\sum\limits_{j = 1}^N {}_SW^B_j{}_SR^P_j - \sum\limits_{j = 1}^N {}_SW^B_j{}_SR^B_j). \end{eqnarray*} The interaction effect, $R_{interaction}$, includes the interaction between country allocation and sector selection and that between country selection and sector allocation. Therefore, in the case of $Q$ categories where $Q > 1$, the Brinson model becomes very complex (assume $Q \ge 3$): \begin{eqnarray*} R_{allocation} & = & \sum\limits_{j = 1}^Q R_{allocation_j} + \sum\limits_{j = 1}^Q\sum\limits_{k = 1}^Q R_{allocation_j}R_{allocation_k} \\ & + & \sum\limits_{j = 1}^Q\sum\limits_{k = 1}^Q\sum\limits_{p = 1}^Q R_{allocation_j}R_{allocation_k}R_{allocation_p} \\ & = & \dots, \end{eqnarray*} \begin{eqnarray*} R_{selection} & = & \sum\limits_{j = 1}^Q R_{selection_j} + \sum\limits_{j = 1}^Q\sum\limits_{k = 1}^Q R_{selection_j}R_{selection_k} \\ & + & \sum\limits_{j = 1}^Q \sum\limits_{k = 1}^Q\sum\limits_{p = 1}^Q R_{selection_j}R_{selection_k}R_{selection_p} \\ & = & \dots, \end{eqnarray*} where $R_{allocation_j}$ is the allocation effect of any given category $j$, $j$ $\in$ $Q$ and $R_{selection_j}$ is the selection effect of any given category $j$, $j$ $\in$ $Q$. $i$, $j$, $k$ have different values. As the number of categories grows, the numbers of terms for the allocation and the selection effects grow exponentially. $Q$ categories will result in $2^Q - 1$ terms for each of the allocation and selection effect. Due to the interaction between allocation and selection of each of the $Q$ categories (it could be interaction between 2, 3 or even all $Q$ categories), the number of terms included in the interaction effect grows exponentially to take into all the interaction effects among all categories. \begin{eqnarray*} R_{interaction} & = & \sum\limits_{j = 1}^Q \sum\limits_{k = 1}^Q R_{allocation_j}R_{selection_k} \\ & + & \sum\limits_{j = 1}^Q \sum\limits_{k = 1}^Q \sum\limits_{p = 1}^Q R_{allocation_j}R_{selection_k}R_{allocation_p} \\ & + & \dots. \end{eqnarray*} %% In the case of $Q$ categories where $Q > 1$, %% there are $3^Q - 2Q - 1$ terms for the interaction effect between any %% of the two categories. The interaction effect can be written as, %% Another way to think about a multivariate Brinson model is to %% visualize the effects in cells. Three different characteristics can be %% assigned to each of the $Q$ categories. They are allocation effect, %% selection effect and \textit{no effect}. The allocation and selection %% effects are the same as those in the univariate Brinson model. No %% effect means a category has no effect on itself or other categories %% when calculating overall allocation, selection, and interaction %% effects. Thus, $Q$ categories can break the portfolio into $3^Q$ %% cells. Each cell has a combination of allocation, selection or no %% effect for each of the $Q$ categories. In the case of 3 categories, if %% a cell consists of the allocation effect of category $i$, no effect of %% category $j$, and no effect of category $k$, it represents the %% allocation effect of category $i$. Should the cell include the %% allocation effect of category $i$, the selection effect of category %% $j$, and no effect of category $k$, the cell represents the %% interaction between the allocation effect of category $i$ and the %% selection effect of category $j$. To calculate the allocation effect, %% one can sum up all the cells which contain one of the allocation %% effects from the $Q$ categories and no effect from other %% categories. The selection effect is the sum of the returns in all the %% cells which have one of the selection effects from the $Q$ categories %% and no effect from other categories. The interaction effect is the sum %% of the returns in all cells which have more than one allocation or %% selection effect. $Q$ categories has $2^{2n} - 2^{n + 1} + 1$ terms of interaction effects. For instance, when there are 3 categories, the allocation effect and the selection effect each have $2^3 - 1 = 7$ terms. The interaction effect has $2^6 - 2^4 + 1 = 49$ terms. When there are 4 categories, $2^4 - 1 = 15$ terms have to be estimated for the allocation effect as well as the selection effect, respectively. $2^8 - 2^5 + 1 = 225$ terms have to be calculated for the interaction effect of 4 categories. This poses a significant computational challenge when a portfolio manager performs a multivariate Brinson analysis. To some extent, the regression-based model detailed later solves the problem of multivariate attribution. \subsection{Single-Period Brinson Tools} Brinson analysis is run by calling the function \texttt{brinson} to produce an object of class \texttt{brinson}. Below we show the tools provided in the \textbf{pa} package to analyze a single period portfolio based on the Brinson model. <<>>= data(jan) br.single <- brinson(x = jan, date.var = "date", cat.var = "sector", bench.weight = "benchmark", portfolio.weight = "portfolio", ret.var = "return") @ The data frame, \texttt{jan}, contains all the information necessary to conduct a single-period Brinson analysis. \texttt{date.var}, \texttt{cat.var}, and \texttt{return} identify the columns containing the date, the factor to be analyzed, and the return variable, respectively. \texttt{bench.weight} and \texttt{portfolio.weight} specify the name of the benchmark weight column and that of the portfolio weight column in the data frame. Calling \texttt{summary} on the resulting object \texttt{br.single} of class \texttt{brinson} reports essential information about the input portfolio (including the number of securities in the portfolio and the benchmark as well as sector exposures) and the results of the Brinson analysis. <<>>= summary(br.single) @ The \texttt{br.single} summary shows that the active return of the portfolio, in January, 2010 was 1.47\%. This return can be decomposed into allocation effect (-0.14\%), selection effect (1.42\%), and interaction effect (0.19\%). \begin{figure} \centering \vspace*{.1in} <>= plot(br.single, var = "sector", type = "return") @ \caption{\label{figure:return} Sector Return.} \end{figure} Figure \ref{figure:return} is a visual representation of the return of both the portfolio and the benchmark sector by sector in January, 2010. This plot shows that in absolute terms, Utilities performed the best with a gain of more than 5\% and Consumer Discretionary, the worst performing sector, lost more than 10\%. Utilities was also the sector with the highest active return in the portfolio. \subsection{Multi-Period Brinson Model} To obtain Brinson attribution on a multi-period data set, one calculates allocation, selection and interaction within each period and aggregates them across time. There are five methods for this -- arithmetic, geometric, optimized linking by \citet{multi}, linking by \citet{davies}, and linking by \citet{frongello}. We focus on the first three methods in this paper. Arithmetic measure calculates relative performance of a portfolio and its benchmark by a difference; geometric measure does so by a ratio. Arithmetic measure is more intuitive but a well-known challenge in arithmetic attribution is that active returns do not add up over multiple periods due to geometric compounding.\footnote{See \cite{practical} for a complete discussion of the complexity involved.} Geometric is able to circumvent the adding-up problem. \cite{multi} discussed various linking algorithms to connect arithmetic return with geometric return and argued that the \emph{optimized linking algorithm} is the best way to link attribution over time. \texttt{Arithmetic Attribution.} The arithmetic attribution model calculates active return and contributions due to allocation, selection, and interaction in each period and sums them over multiple periods. The arithmetic active return over $T$ periods $R_{arithmetic}$ is expressed as: \begin{center} $R_{arithmetic} = \sum\limits_{t = 1}^T R_t^{active}$, \end{center} and $R_t^{active}$ is the active return in a single period $t$. \texttt{Geometric Attribution.} The geometric attribution is to compound various returns over $T$ periods where, \begin{center} $1 + R_P = \prod\limits_{t = 1}^T (1 + R_t^P)$, \end{center} \begin{center} $1 + R_B = \prod\limits_{t = 1}^T (1 + R_t^B)$, \end{center} and $R_t^P$ and $R_t^B$ are portfolio and benchmark returns in a single period $t$, respectively. Geometric return $R_{geometric}$ is thus the difference between $R_p$ and $R_B$: \begin{center} $R_{geometric} = R_p - R_B$. \end{center} \texttt{Optimized Linking Algorithm.} The well-known challenge faced in arithmetic attribution is that the actual active return over time is not equal to the arithmetic summation of single-period active returns, \begin{center} $ R_{geometric} \not= R_{arithmetic}$, \end{center} i.e., \begin{center} $R_P - R_B \not= \sum\limits_{t = 1}^T R_t^{active}$. \end{center} \cite{multi} proposed an optimized linking coefficient $b^{opt}$ to link arithmetic returns of individual periods with geometric returns over time, \begin{center} $R_p - R_B = \sum\limits_{t = 1}^T b_t^{opt}R_t^{active}$, \end{center} where $b_t^{opt}$ is the optimized linking coefficient in a single period $t$. The optimized linking coefficient $b_t^{opt}$ is the summation of a \emph{natural scaling} $A$ and an \emph{adjustment} $a_t$ specific to a time period $t$, \begin{center} $b_t^{opt} = A + a_t$, \end{center} where $A$ is an coefficient for linking from the single-period to the multi-period return and $a_t$ is an adjustment to eliminate residuals\footnote{See \cite{linking} for more information on the optimized linking coefficients.}. Since active return over time $R_P - R_B$ is a summation of active return in each period adjusted to the optimized linking algorithm, the following is true: \begin{center} $R_P - R_B = \sum\limits_{t = 1}^T b_t^{opt}(R_t^{allocation} + R_t^{selection} + R_t^{interaction})$, \end{center} where $R_t^{allocation}$, $R_t^{selection}$, and $R_t^{interaction}$ represent allocation, selection and interaction in each period $t$, respectively. Within each period $t$, the adjusted attribution is thus expressed as \begin{center} $\hat{R}_t^{allocation} = b_t^{opt}R_t^{allocation}$, \end{center} \begin{center} $\hat{R}_t^{selection} = b_t^{opt}R_t^{selection}$, \end{center} and \begin{center} $\hat{R}_t^{interaction} = b_t^{opt}R_t^{interaction}$. \end{center} Therefore, across $T$ periods, active return $R_{active}$, the difference between portfolio return $R_P$ and benchmark return $R_B$, can be written as \begin{center} $R_{active} = \sum\limits_{t= 1}^T (\hat{R}_t^{allocation} + \hat{R}_t^{selection} + \hat{R}_t^{interaction})$, \end{center} where $R_{active} = R_P - R_B$. \subsection{Multi-Period Brinson Tools} In practice, analyzing a single-period portfolio is meaningless as portfolio managers and their clients are more interested in the performance of a portfolio over multiple periods. To apply the Brinson model over time, we can use the function \texttt{brinson} and input a multi-period data set (for instance, \texttt{quarter}) as shown below. <<>>= data(quarter) br.multi <- brinson(quarter, date.var = "date", cat.var = "sector", bench.weight = "benchmark", portfolio.weight = "portfolio", ret.var = "return") @ The object \texttt{br.multi} of class \texttt{brinsonMulti} is an example of a multi-period Brinson analysis. <<>>= exposure(br.multi, var = "size") @ The \texttt{exposure} method on the class \texttt{br.multi} object shows the exposure of the portfolio and the benchmark based on a user-defined category. Here, it shows the exposure on \texttt{size}. We can see that the portfolio overweights the benchmark in the lowest quintile in \texttt{size} and underweights in the highest quintile. <<>>= returns(br.multi, type = "linking") @ The \texttt{returns} method shows the results of the Brinson analysis applied to the data from January, 2010 through March, 2010. The optimized linking algorithm is applied here by setting the type to \emph{linking}. The first portion of the \texttt{returns} output shows the Brinson attribution in individual periods. The second portion shows the aggregate attribution results. The portfolio formed by top 200 value securities in January had an active return of 12.7\% over the first quarter of 2010. The allocation and the selection effects contributed 0.95\% and 1.73\% respectively; the interaction effect made a loss of 1.42\%. \begin{figure} \centering \vspace*{.1in} <>= plot(br.multi, type = "return") @ \caption{\label{figure:multireturn} Sector Return Across Time.} \end{figure} Figure \ref{figure:multireturn} depicts the returns of both the portfolio and the benchmark of the allocation effect from January, 2010 through March. 2010. This plot shows that for the portfolio, \texttt{Utilities} performed the best with a gain of more than 5\% in January and February, 2010 but tanked in March, 2010. \section{Regression} \subsection{Single-Period Regression Model} One advantage of a regression-based approach is that such analysis allows one to define their own attribution model by easily incorporating multiple variables in the regression formula. These variables can be either discrete or continuous. Suppose a portfolio manager wants to find out how much each of the value, growth, and momentum scores of her holdings contributes to the overall performance of the portfolio. Consider the following linear regression without the intercept term based on a single-period portfolio of $n$ securities with $k$ different variables: \begin{center} $\mathbf{r}_n = \mathbf{X}_{n,k}\mathbf{f}_k + \mathbf{u}_n$ \end{center} where \begin{itemize} \item $\mathbf{r}_n$ is a column vector of length $n$. Each element in $\mathbf{r}_n$ represents the return of a security in the portfolio. \item $\mathbf{X}_{n,k}$ is an $n$ by $k$ matrix. Each row represents $k$ attributes of a security. There are $n$ securities in the portfolio. \item $\mathbf{f}_k$ is a column vector of length $k$. The elements are the estimated coefficients from the regression. Each element represents the \emph{factor return} of an attribute. \item $\mathbf{u}_n$ is a column vector of length $n$ with residuals from the regression. \end{itemize} In the case of this portfolio manager, suppose that she only has three holdings in her portfolio. $r_3$ is thus a 3 by 1 matrix with returns of all her three holdings. The matrix $\mathbf{X}_{3,3}$ records the score for each of the three factors (value, growth, and momentum) in each row. $\mathbf{f}_3$ contains the estimated coefficients of a regression $\mathbf{r}_3$ on $\mathbf{X}_{3, 3}$. The active exposure of each of the $k$ variables, $X_{i}$, $i$ $\in$ $k$, is expressed as \begin{center} $X_i = \mathbf{w}_{active}\prime \mathbf{x}_{n,i}$, \end{center} where $X_i$ is the value representing the active exposure of the attribute $i$ in the portfolio, $\mathbf{w}_{active}$ is a column vector of length $n$ containing the active weight of every security in the portfolio, and $\mathbf{x}_{n, i}$ is a column vector of length $n$ with attribute $i$ for all securities in the portfolio. Active weight of a security is defined as the difference between the portfolio weight of the security and its benchmark weight. Using the example mentioned above, the active exposure of the attribute \texttt{value}, $X_{value}$ is the product of $\mathbf{w}_{active}\prime$ (containing active weight of each of the three holdings) and $\mathbf{x}_{3}$ (containing value scores of the three holdings). The contribution of a variable $i$, $R_i$, is thus the product of the factor returns for the variable $i$, $f_i$ and the active exposure of the variable $i$, $X_i$. That is, \begin{center} $R_i = f_iX_i$. \end{center} Continuing the example, the contribution of value is the product of $f_{value}$ (the estimated coefficient for value from the linear regression) and $X_{value}$ (the active exposure of value as shown above). Therefore, the active return of the portfolio $R_{active}$ is the sum of contributions of all $k$ variables and the residual $u$ (a.k.a. the interaction effect), \begin{center} $R_{active} = \sum\limits_{i = 1}^kR_i + u$. \end{center} For instance, a hypothetical portfolio has three holdings (A, B, and C), each of which has two attributes -- size and value. <>= test.df <- data.frame(Return = c(0.3, 0.4, 0.5), Name = c('A', 'B', 'C'), Size = c(1.2, 2, 0.8), Value = c(3, 2, 1.5), Active_Weight= c(0.5, 0.1, -0.6)) test.df ## model <- lm(Ret ~ Size + Value, data = test.df) ## test.df[,1] %*% test.df[,5] ## model$coefficients[2] * t(test.df[,3]) %*% test.df[,5] ## active exposure of size ## model$coefficients[3] * t(test.df[,4]) %*% test.df[,5] ## active exposure of value @ Following the procedure as mentioned, the factor returns for size and value are -0.0313 and -0.1250. The active exposure of size is 0.32 and that of value is 0.80. The active return of the portfolio is -11\% which can be decomposed into the contribution of size and that of value based on the regression model. Size contributes 1\% of the negative active return of the portfolio and value causes the portfolio to lose the other 10.0\%. \subsection{Single-Period Regression Tools} Another conventional attribution methodolody is the regression-based analysis. As mentioned, the \textbf{pa} package provides tools to analyze both single-period and multi-period data frames. <<>>= rb.single <- regress(jan, date.var = "date", ret.var = "return", reg.var = c("sector", "growth", "size"), benchmark.weight = "benchmark", portfolio.weight = "portfolio") exposure(rb.single, var = "growth") @ \texttt{reg.var} specifies the columns containing variables whose contributions are to be analyzed. Calling \texttt{exposure} with a specified \texttt{var} yields information on the exposure of both the portfolio and the benchmark by that variable. If \texttt{var} is a continuous variable, for instance, \texttt{growth}, the exposure will be shown in 5 quantiles. Majority of the high \texttt{value} securities in the portfolio in January have relatively low \texttt{growth} scores. <<>>= summary(rb.single) @ The \texttt{summary} method shows the number of securities in the portfolio and the benchmark, and the contribution of each input variable according to the regression-based analysis. In this case, the portfolio made a loss of 2.91\% and the benchmark lost 4.38\%. Therefore, the portfolio outperformed the benchmark by 1.47\%. \texttt{Sector}, \texttt{growth}, and \texttt{size} contributed 0.32\%, 0.05\%, and 0.29\%, respectively. \subsection{Multi-Period Regression Model} The same challenge of linking arithmetic and geometric returns is present in multi-period regression model. We apply the optimized linking algorithm proposed by \cite{linking} in the regression attribution as well. Within each period $t$, \begin{center} $R_t^{active} = \sum\limits_{i = 1}^kR_{i,t} + u_t$, \end{center} where $R_{i,t}$ represents the contribution of a variable $i$ of the time period $t$ and $u_t$ is the residual in that period. Across $T$ periods, the active return can be expressed by a product of the optimized linking coefficient $b_t^{opt}$ and the individual contribution of each of the $k$ attributes. The adjusted contribution of each of the $k$ variables $i$, $\hat{R}_{i,t}$, is expressed by \begin{center} $\hat{R}_{i,t} = b_t^{opt}R_{i,t}$. \end{center} Thus, the overall active return $R_{active}$ can be decomposed into \begin{center} $R_{active} = \sum\limits_{t = 1}^T\sum\limits_{i = 1}^k\hat{R}_{i,t} + U$, \end{center} where $U$ is the residual across $T$ periods. \subsection{Multi-Period Regression Tools} <<>>= rb.multi <- regress(quarter, date.var = "date", ret.var = "return", reg.var = c("sector", "growth", "size"), benchmark.weight = "benchmark", portfolio.weight = "portfolio") rb.multi @ Regression-based analysis can be applied to a multi-period data frame by calling the same method \texttt{regress}. By typing the name of the class object \texttt{rb.multi} directly, a short summary of the analysis is provided, showing the starting and ending period of the analysis, the methodology, and the average number of securities in both the portfolio and the benchmark. <<>>= summary(rb.multi) @ The regression-based summary shows that the contribution of each input variable in addition to the basic information on the portfolio. The summary suggests that the active return of the portfolio in year 2010 is 1.27\%. The \texttt{Residual} number indicates the contribution of the interaction among various variables including \texttt{sector}, \texttt{growth}, and \texttt{growth}. Visual representation of relative performance of a portfolio against its benchmark is best viewed across a longer time span. Here, we use the data frame \texttt{year} for illustrative purposes. <<>>= rb.multi2 <- regress(year, date.var = "date", ret.var = "return", reg.var = c("sector", "growth", "size"), benchmark.weight = "benchmark", portfolio.weight = "portfolio") returns(rb.multi2, type = "linking") @ We obtained an object \emph{rb.multi2} of class \emph{regressMulti} based on the data set from January, 2010 through December, 2010. The portfolio beat the benchmark by 10.1\% over this period. Based on the regression model, \texttt{size} contributed to the lion share of the active return. \begin{figure} \centering \vspace*{.1in} <>= plot(rb.multi2, var = "sector", type = "return") @ \caption{\label{figure:regmultiattrib} Performance Attribution.} \end{figure} Figure \ref{figure:regmultiattrib} displays both the cumulative portfolio and benchmark returns from January, 2010 through December, 2010. It suggests that the portfolio, consisted of high \texttt{value} securities in January, consistently outperformed the benchmark in 2010. Outperformance in May and June helped the overall positive active return in 2010 to a large extent. \section{Brinson as Regression} Another way to think about the analysis as \cite{brinson:gary} have done is to consider it in the context of a regression model. Conducting a Brinson attribution is similar to running a linear regression without the intercept term. Estimated coefficients will then be the mean return of each category of the attributed specified in the universe, a.k.a. the factor return of each category. The mean return of each category also appears in the Brinson analysis. The equivalent to the allocation effect for the universe in the Brinson model is the sum of the product of the estimated coefficient and the active weight of each category. Using the same regression model as before, \begin{eqnarray*} R_{allocation} & = & \sum\limits_{j = 1}^N W^P_jR^B_j - \sum\limits_{j = 1}^N W^B_jR^B_j \\ & = & (\mathbf{W}^P - \mathbf{W}^B)\prime \mathbf{f}, \end{eqnarray*} where $\mathbf{W}^P$ is a column vector indicating the portfolio weight of each category within the attributed specified by the manager; $\mathbf{W}^B$, a column vector indicating the benchmark weight of each category, and $\mathbf{f}$ is the column vector which has benchmark return of all the categories. Assuming that in this case, the benchmark is the universe and the portfolio holdings are all from the benchmark, $\mathbf{R^B}$ can be estimated by regressing returns on the attribute specified by the portfolio manager: \begin{center} $\mathbf{r}_n = \mathbf{X}_{n,p}\mathbf{f} + \mathbf{U}$, \end{center} where \begin{itemize} \item $\mathbf{r}_n$ is a column vector of length $n$. Each element in $\mathbf{r}_n$ represents the return of a security in the portfolio. \item $\mathbf{X}_{n, p}$ is an $n$ by $p$ matrix where $n$ refers to the number of securities in the portfolio and $p$ refers to the number of levels within the attribute specified. \item $\mathbf{f}$ is the estimated coefficients on the regression without the intercept term. The estimated coefficient of each attribute is the mean return for each of the attribute. \item $\mathbf{U}$ is the column vector with all the residual terms. \end{itemize} Since $\mathbf{R}^B$ is the same as $\mathbf{f}$, the allocation effect in the Brinson model is a special case of the regression approach. In order to estimate the selection effect in the Brinson model, one can calculate the mean return of each category within the attribute in both the portfolio and the benchmark under a regression framework and use the benchmark weights to calculate the selection effect. \begin{eqnarray*} R_{selection} & = & \sum\limits_{j = 1}^N W^B_jR^P_j - \sum\limits_{j = 1}^N W^B_jR^B_j \\ & = & \mathbf{W^B}\prime (\mathbf{f^P} - \mathbf{f^B}), \end{eqnarray*} where $\mathbf{W^B}$ is the column vector with the benchmark weight of each category within the attribute specified; $\mathbf{f^P}$ and $\mathbf{f^B}$ are the column vectors indicating the mean return of the portfolio and that of the benchmark, respectively. As mentioned above, $\mathbf{f^P}$ and $\mathbf{f^B}$ can be estimated by running a linear regression without the intercept term with respect to stocks in the portfolio and benchmark separately. Hence, the selection effect in the Brinson model can be calculated by using linear regression. Interaction effect is the difference between a portfolio's actual return and the sum of the allocation and selection effects. An numerical example of showing that the Brinson model is a special case of the regression approach is as follows. Suppose that an equity portfolio manager has a portfolio named \texttt{test} with the universe as the benchmark. %% allocation effect <<>>= data(test) test.br <- brinson(x = test, date.var = "date", cat.var = "sector", bench.weight = "benchmark", portfolio.weight = "portfolio", ret.var = "return") returns(test.br) @ When we apply the standard single-period Brinson anaysis, we obtain an active return of -35.9 bps which can be further decomposed into allocation (-3.4 bps), selection (-42.5 bps), and interaction (10.1 bps). We can also show the allocation effect by running a regression model based on sector only. <<>>= test.reg <- regress(x =test, date.var = "date", ret.var = "return", reg.var = "sector", benchmark.weight = "benchmark", portfolio.weight = "portfolio") returns(test.reg) @ The contribution from sector based on the regression approach (-3.4 bps) matches the allocation effect from the Brinson model as shown above. %% <<>>= %% lm.1 <- lm(return ~ sector - 1, data = test) %% exposure(br.single, var = "sector") %% (exposure(br.single, var = "sector")[ , 1] - %% exposure(br.single, var = "sector")[ , 2]) %*% %% lm.1$coefficients %% @ However, in order to calculate the selection effect from the regression approach, we need to apply another regression model to a universe limited to the securities held in the portfolio. Using the factor returns from the regress class object, \texttt{test.reg}, and those from the linear regression, we can obtain the selection effect (-42.5 bps) via the regression approach. %% selection effect <<>>= lm.test <- lm(return ~ sector - 1, data = test[test$portfolio != 0, ]) lm.test$coefficients exposure(br.single, var = "sector")[ ,2] %*% (lm.test$coefficients - test.reg@coefficients) @ \section{Conclusion} In this paper, we describe two widely-used methods for performance attribution -- the Brinson model and the regression-based approach, and provide a simple collection of tools to implement these two methods in \texttt{R} with the \texttt{pa} package. We also show that the Brinson model is a special case of the regression method. A comprehensive package, \texttt{portfolio} \cite{kane:david}, provides facilities to calculate exposures and returns for equity portfolios. It is possible to use the \texttt{pa} package based on the output from the \texttt{portfolio} package. Further, the flexibility of \texttt{R} itself allows users to extend and modify these packages to suit their own needs and/or execute their preferred attribution methodology. Before reaching that level of complexity, however, \texttt{pa} provides a good starting point for basic performance attribution. \bibliography{pa} \end{document}