%% LyX 2.3.7 created this file. For more info, see http://www.lyx.org/. %% Do not edit unless you really know what you are doing. \documentclass[english,american,noae]{scrartcl} \usepackage{lmodern} \renewcommand{\sfdefault}{lmss} \renewcommand{\ttdefault}{lmtt} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{geometry} \geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in} \usepackage{float} \usepackage{setspace} \usepackage[authoryear]{natbib} \makeatletter %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands. %% Because html converters don't know tabularnewline \providecommand{\tabularnewline}{\\} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands. <>= if(exists(".orig.enc")) options(encoding = .orig.enc) @ \providecommand*{\code}[1]{\texttt{#1}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands. %\VignetteIndexEntry{variablekey} \usepackage{booktabs} \usepackage{Sweavel} \usepackage{graphicx} \usepackage{color} \usepackage[samesize]{cancel} \usepackage{ifthen} \makeatletter \renewenvironment{figure}[1][]{% \ifthenelse{\equal{#1}{}}{% \@float{figure} }{% \@float{figure}[#1]% }% \centering }{% \end@float } \renewenvironment{table}[1][]{% \ifthenelse{\equal{#1}{}}{% \@float{table} }{% \@float{table}[#1]% }% \centering % \setlength{\@tempdima}{\abovecaptionskip}% % \setlength{\abovecaptionskip}{\belowcaptionskip}% % \setlength{\belowcaptionskip}{\@tempdima}% }{% \end@float } %\usepackage{listings} % Make ordinary listings look as if they come from Sweave \lstset{tabsize=2, breaklines=true, %,style=Rstyle} fancyvrb=false,escapechar=`,language=R,% %%basicstyle={\Rcolor\Sweavesize},% backgroundcolor=\Rbackground,% showstringspaces=false,% keywordstyle=\Rcolor,% commentstyle={\Rcommentcolor\ttfamily\itshape},% literate={<<-}{{$\twoheadleftarrow$}}2{~}{{$\sim$}}1{<=}{{$\leq$}}2{>=}{{$\geq$}}2{^}{{$^{\scriptstyle\wedge}$}}1{==}{{=\,=}}1,% alsoother={$},% alsoletter={.<-},% otherkeywords={!,!=,~,$,*,\&,\%/\%,\%*\%,\%\%,<-,<<-,/},% escapeinside={(*}{*)}}% % In document Latex options: \fvset{listparameters={\setlength{\topsep}{0em}}} \def\Sweavesize{\scriptsize} \def\Rcolor{\color{black}} \def\Rbackground{\color[gray]{0.95}} % for sideways table \usepackage{rotating} \makeatother \usepackage{babel} \begin{document} \title{The Variable Key Data Management Framework} \author{\selectlanguage{english}% Paul E. Johnson\\ Benjamin A. Kite \\ Center for Research Methods and Data Analysis\\ University of Kansas} \date{\selectlanguage{english}% \singlespacing{}\noindent \textbf{\today}} \maketitle \selectlanguage{american}% \begin{abstract} This essay describes the ``variable key'' approach to importing and recoding data. This method has been developed in the Center for Research Methods and Data Analysis at the University of Kansas to deal with the importation of large, complicated data sets. This approach improves teamwork, keeps better records, and reduces slippage between the intentions of principal investigators the implementation by code writers. The framework is implemented in the R \citep{RCore} package \code{kutils}. \end{abstract} <>= if(!dir.exists("plots")) dir.create("plots") @ \selectlanguage{english}% % In document Latex options: \fvset{listparameters={\setlength{\topsep}{0em}}} \SweaveOpts{prefix.string=plots/t,split=F,ae=F,height=4,width=5.5} <>= options(device = pdf) options(width=100, prompt=" ", continue=" ") options(useFancyQuotes = FALSE) options(SweaveHooks=list(fig=function() par(ps=10))) pdf.options(onefile=F,family="Times",pointsize=10) @ \selectlanguage{american}% \section{Introduction} The staff of the Center for Research Methods and Data Analysis has been asked to help with data importation and recoding from time to time. In one very large project, we were asked to combine, recode, and integrate variables from 21 different files. The various files used different variable names and had different, unique coding schemes. A skeptic might have thought that the firm which created the data sets intentionally obfuscated the records to prevent the comparison of variables across a series of surveys. In projects like that, the challenge of importing and fixing the data seems overwhelming. The graduate research assistants are asked to cobble together thousands of lines of ``recodes'' as they rename, regroup, and otherwise harmonize the information. From a managerial point of view, that is not the main problem. We expect to spend the time of research assistants. While it may be tedious to read a codebook and write recodes, social scientists have been doing that since the 1960s. It is not all that difficult. The truly difficult part is mustering up the confidence in the resulting recoded data. How can a supervisor check thousands of recode statements for accuracy? The very extensibility of R itself–its openness to new functions and language elements–makes proof-reading more difficult. We might shift some of the proof reading duty to the principle investigators, but they sometimes are not interested in details. In the end, the responsibility for verifying the recodes falls on the project supervisors. While most supervisors with whom we are personally acquainted have nearly super-human reading skills and almost-perfect comprehension, we have documented a case in which one of them was unable to catch an error on line 827 within an R file with 2119 lines. To reduce the risk of misunderstanding and error, we propose the \emph{variable key procedure}. It is a systematic way to separate code writing from the process of renaming variables and re-designating their values. The characteristics of the data are summarized in a table, a simple-looking structure that might be edited in a text editor or a spread sheet program. This simple structure, which we call the variable key, can be used by principal investigators and supervisors to designate the desired results. Once the key is created, then the data set can be imported and recoded by the application of the key's information. This does not eliminate the need to proof-read the renaming and recoding of the variables, it simply shifts that chore into a simpler, more workable setting. This essay proceeds in 3 parts. First, the general concepts behind the variable key system are explored. Second, the four stages in the variable key procedure are outlined and illustrated with examples. Third, we offer some examples of ways to double-check the results. \section{Enter the Variable Key} The variable key process was first developed for a very large project for which we were hired by a commercial consulting company. As it happened, the project manager who hired us was an Excel user who did not know about R. He was given several SPSS datasets. After going through the usual R process of importing and recoding data from 6 files, the aggregate of which included more than 40,000 observations on 150 variables, we arrived at a renamed set of columns. Unfortunately, the research assistant who had done most of the work resigned in order to pursue a career as a magician.\footnote{Or graduated, we are not sure which.} With the unavailability of our key asset, it was difficult to know for sure what was in which column. There was nobody to quickly answer questions like ``which column is the respondent's sexual identity?'' and ``if sex is V23418, did we change 1 to male or female''. The only way to find out is by hunting and pecking through a giant R file. In order to better communicate about that project, we developed a table that looked like Table \ref{tab:A-Small-Variable-key}. \begin{table}[H] \caption{A Small Variable Key\label{tab:A-Small-Variable-key}} \begin{tabular}{|c|c|c|c|} \hline name\_old & name\_new & values\_old & values\_new\tabularnewline \hline \hline V23419 & sex & 1|2|3 & ``male''|''female''|''neither''\tabularnewline \hline V32422 & education & 1|2|3|4|5 & ``elem''<''hs''<''somecoll''<''ba''<''post''\tabularnewline \hline V54532 & income & . & numeric\tabularnewline \hline \end{tabular} \end{table} It was tedious to assemble that table, but it helped quite a bit in our discussions. The vertical bars were used to indicate that the original data had discrete values. When a variable has a natural ordering, the new values were placed in order with the symbol (``<''). That table grew quite large, since it had one row per variable, but it was otherwise workable. It was popular with the client. In the middle of preparing that summary table of recoded values, we realized that it was possible to write an R program to import the key table and use its information to recode and rename the variables. The recodes would \emph{just happen}. If we prepared the functions properly, we had not just a table masquerading as a codebook, we had a \emph{programmable codebook}. We wrote some functions that could import variables (as named in column 1), apply the new values (from columns 3 and 4), then apply the new name from column 2. The functions to do that are somewhat difficult to prepare, but they are very appealing from a supervisor's point of view. There will be less proof-reading to do, at least in the R code itself. Once we can validate the functions, then we never have to proof-read them again. These functions can be applied, row by row, to create a new data frame. Instead, we need to concentrate our attention on the substance of the problem, the specification of the new names and values in the table. In the projects where we have employed this system, we adjusted the key and the R functions to suit the particular demands of the project and the client. That was unfortunate, because we had very little accumulation of code from one project to another. However, we did accumulate experience; there were concepts and vocabulary which allowed us to understand the various challenges that might be faced. The effort to develop a standardized framework for the variable key began in 2016 with the creation of the \code{kutils} package for R. The variable key process allows project supervisors to create a table that instructs the research assistants in the importation, renaming, and recoding of data. There is still a daunting problem, however, because the supervisors must create that variable key table. In a large data set, it might be arduous to simply type the old names of the variables and their observed values. In 2015 one of the graduate assistants in our lab was asked to type up a variable key and he couldn't quite believe that was a good use of his time. After some discussion, we realized that it was not necessary to type the variable key at all. We would write a function to do so. If R can import the candidate data set, then R can certainly output its column names and a roster of observed values. This lightened the workload considerably. By tabulating all of the observed variables and their values, the most tedious part of the process was done mechanically. In the remainder of this essay, we discuss the process of creating a variable key template, revising it, and putting it to use. \section{Four Simple Steps} The variable key process has four steps. First, inspect an R data.frame object and create a key template file. The key template summarizes the existing state of the variables and creates ``placeholders'' where we might like to specify revisions. Second, edit the key template file in a spreadsheet or other program that can work with comma separate variables. Change the names, values, and designate other recodes (which we will describe next). Third, import the revised key into R. Fourth, apply the key to the data to generate a new, improved data frame. Then run some diagnostic routines. If all goes well, we should end up with a new data frame in which \begin{enumerate} \item The columns are renamed in accordance with the instructions of the principal investigator (or supervisor). \item The values of all variables have been recoded according to the instructions of the principal investigator (or supervisor). \end{enumerate} Diagnostic tables are reported to clearly demonstrate the effect of each coding change, mapping out the difference between the input and the output variables. For purposes of illustration, we have create an example data frame with various types of variables. This data frame, \code{mydf}, has most of the challenges that we see in actual projects. It has integer variables that need to be reorganized and turned into character or factor variables. It has character variables that might become integers or factors. <>= set.seed(234234) N <- 200 mydf <- data.frame( x5 = rnorm(N), x4 = rpois(N, lambda = 3), x3 = ordered(sample(c("lo", "med", "hi"), size = N, replace=TRUE), levels = c("med", "lo", "hi")), x2 = letters[sample(c(1:4,6), 200, replace = TRUE)], x1 = factor(sample(c("cindy", "jan", "marcia"), 200, replace = TRUE)), x7 = ordered(letters[sample(c(1:4,6), 200, replace = TRUE)]), x6 = sample(c(1:5), 200, replace = TRUE), stringsAsFactors = FALSE) mydf$x4[sample(1:N, 10)] <- 999 mydf$x5[sample(1:N, 10)] <- -999 @ \subsection{Step 1. Create a Key Template} The function \code{keyTemplate} scans a data frame and generates a new key template. The key has 8 pieces of information about each variable. The rows of the key are named by the variables of the data frame. The 8 columns in the key are \code{name\_old}, \code{name\_new}, \code{class\_old}, \code{class\_new}, \code{value\_old}, \code{value\_new}, \code{missings}, and \code{recodes}. \code{keyTemplate} will fill \code{name\_old}, \code{class\_old}, and \code{value\_old}, in the with values based on the data input, while the \code{new} columns will be copies of those old values. The last 2, missings and recodes, will be empty. There are two formats for the key template, \code{long} and \code{wide} (determined by the parameter \code{long}). These names are drawn from terminology in R's reshape function. The long format has one row per value of each variable, while the wide format has all of the information in one row. The two key formats are intended to be interchangeable in functionality; they differ solely for convenience. Some users may prefer to edit variable information in one style. The re-importation of the key should deal gracefully with either type of variable key. A wide format key can be produced with a call to the \code{keyTemplate} function like so: <>= key_wide <- keyTemplate(mydf, file = "key_wide.csv", max.levels = 5) @ \noindent If the long argument is not specified, a wide key is the default. One can ask for a long format: <>= key_long <- keyTemplate(mydf, long = TRUE, file = "key_long.csv", max.levels = 5) @ \noindent The key object is a data.frame. Apart from the long argument, the \code{keyTemplate} function has two especially noteworthy arguments, \code{file} and \code{max.levels}. If the file argument is supplied, \code{keyTemplate} uses the suffix to determine storage format. Legal suffixes are \code{.csv}, \code{.xlsx}, and \code{.rds} (for creating comma separated variables, Excel spreadsheets, and R serialization data structures). The \code{max.levels} argument is also important. This is used in the same sense that functions like \code{read.spss} in the foreign package use that term. There is guessing involved in deciding if we should enumerate a character or integer variable. We do want to enumerate the ``Strongly Disagree'' to ``Strongly Agree'' values of a 7 point scale, but we do not want to enumerate the first names of all study participants. If the number of discrete values exceeds \code{max.levels}, for which the default is 15, then the key will not enumerate them. Table \ref{tab:The-Wide-Key} demonstrates a wide key template as it is produced by \code{keyTemplate}. We see now why it is called a wide key; the recoding information is tightly packed into \code{value\_old} and \code{value\_new}. The key includes more or less obvious columns for the old and new variable names, their classes, and values of the variables. Note that the values of x5 and x4 are not enumerated because we set max.levels at 5. The \code{max.levels} parameter defaults to 15, so that an integer variable with less than 15 values will have each value displayed. For this example, the display of that variable key was too wide for the page, so we reduced the number of values. When the observed number of scores is above \code{max.levels}, the key does not try to list the individual values (compare the treatment of variables \code{x4} and \code{x6}). <>= library(kutils) library(xtable) @ <>= key_wide <- keyTemplate(mydf, max.levels = 5) @ A long key template is displayed in Table \ref{tab:Long-Key}. The benefit of the long key is that the cells \code{value\_old} and \code{value\_new} are easier to navigate. <>= key_long <- keyTemplate(mydf, long = TRUE, max.levels = 5) @ \begin{table}[h] \caption{The Wide Key Template\label{tab:The-Wide-Key}} \def\Sweavesize{\tiny} <>= print(xtable(key_wide), include.rownames = FALSE, size = "small", floating = FALSE ) @ \end{table} \begin{table} \caption{The Long Key Template\label{tab:Long-Key}} \def\Sweavesize{\tiny} <>= print(xtable(key_long), include.rownames = FALSE, size = "small", floating = FALSE) @ \end{table} The value of \code{class\_old} in the key is the first element in the return from the function class for a variable. There is one exception, where we have tried to differentiate integer variables from numeric variables. This is a confusing isssue in the history of R, as discussed in the R help page for the function \code{as.double}. ,In the \emph{Note on names} section, that page explains an ``anomaly'' in the usage of term \code{numeric}. The R function \code{as.numeric} creates a double precision floating point value, not an integer. However, the is.numeric function responds TRUE for both integers and floating point values. For purposes of editing the key, it is useful to differentiate integers from floating point numbers. \code{kutils} includes a function named \code{safeInteger}. It checks the observed values of a variable to find out if any floating point values are present. If the aggregate deviations from integer values are miniscule, then a variable is classified as an integer. As a result, the keyTemplate function's column \code{class\_old} should be ``integer'' or ``numeric'', and by the latter we mean a floating point number. In some of our early projects, the variable key was in the wide format. Difficulty in editing that caused us to shift some projects to the long key. The idea that we would glide back and forth between keys created in the wide and long formats dawned on us only recently. To ease the conversion back-and-forth between the formats, we developed the functions named \code{wide2long} and \code{long2wide}. We believe these functions work effectively, but we have experienced some troubles related to the way spreadsheets store character strings. If the key in long format has a column of values ``Yes'',''No'', and ``'', the wide representation should be ``Yes|No|'', but there is some inclination to say we should have nested quotation marks, as in ``''Yes''|''No''|''{}''{}''. That kind of string will not generally survive importation to and export from a spread sheet at the current time. \subsection{Step 2. Edit the variable key} If the file argument was specified in \code{keyTemplate}, the work is laid out for us. One can edit a csv file in any text editor or in a spreadsheet. An xlsx file can be edited by Libre Office or Microsoft Office. It is not necessary to change all of the values in name\_new, class\_new, and value\_new. In fact, one might change just a few elements and the un-altered variables will remain as they were when the data is re-imported. We suggest users start small by making a few edits in the key. A principal investigator might change just a few variable names or values. In a large project, there may be quite a bit of work involved. The \code{name\_old} column must never be edited. Generally, \code{class\_old} and \code{value\_old} will not be edited either (the only exception might arise if \code{class\_new} is either ``\code{factor}'' or ``\code{ordered}''). The \code{name\_new} column should include legal R variable names (do not begin name\_new with a numeral or include mathematical symbols like ``+'' or ``-''). We use R's \code{make.names} function to clean up errant entries, so incorrect names are not fatal. The difficult user decisions will concern the contents of class\_new and value\_new. The desired variable type, of course, influences the values that are meaningful. To convert a character variable to integer, for example, it should go without saying that the value\_new element should include integer values, not text strings. The conversion of information from one type of variable into another may be more complicated than it seems. It is a bit more tricky to convert a factor into a numeric variable that uses the factor's levels as numeric values. After experimenting with a number of cases, we believe that if \code{class\_old} and \code{class\_new} are elements of this the \emph{safe class set}: \code{character}, \code{logical}, \code{integer}, \code{numeric} (same as \code{double}), \code{factor}, or \code{ordered}, then the re-importation and recoding of data will be more-or-less automatic. If \code{class\_new} differs from \code{class\_old}, and \code{class\_new} is not an element in that 6 element set, then the user must supply a recode function that creates a variable of the requested class. Most commonly, we expect that will be used to incorporate \code{Date} variables. The enumerated values in the \code{value\_new} column should be specified in the more or less obvious way. If \code{class\_new} is equal to \code{character}, \code{factor}, or \code{ordered}, then the new values can be arbitrary strings. The \code{missings} and \code{recodes} columns are empty in the key template. The user will need to fill in those values if they are to be used. When the key is later put to use, the order of processing will be as follows. First, the values declared as missings will be applied (convert observed value to R's NA symbol). Second, if there is a recode function in the key, it is applied to a variable. Third, if there was no recode function supplied, then the conversion of discrete values by recalculation from \code{value\_old} into \code{value\_new} will be applied. Note that the discrete values are applied only if the recode cell is empty in the key. The decision of whether to approach a given variable via value enumeration or a recode function is, to an extent, simply a matter of taste. Some chores that might be handled in either way. If a variable includes floating point numbers (temperatures, dollar values, etc), then we would not rely on new assignments in \code{value\_old} and \code{value\_new}. Truly numeric variables of that sort almost certainly call for assignment of missings and recodes by the last two cells in the variable key. However, if a column includes integers or characters (1 = male, 2 = female), one might use the enumerated values (\code{value\_old} and \code{value\_new}) or one could design a recode function to produce the same result. It is important to remember that if a recode function is applied, the enumerated value recoding is not. If one decides to use a recode statement, then the elements in value\_old and value\_new are ignored entirely, they could be manually deleted to simplify the key. (That is to say, the \code{max.levels} parameter is just a way of guessing how many unique levels is ``too many'' for an enumeration. Users are free to delete values if recodes are used.) Despite the possibility that a factor (or ordered) variable may have many values, we believe that all of the levels of those variables should be usually be included in the key. If a variable is declared as a factor, it means the researcher has assigned meaning to the various observed values and we are reluctant to ignore them. There is a more important reason to enumerate all of the legal values for factor variables. If a value is omitted from the key, that value will be omitted from the dataset. Among our users, we find opinion is roughly balanced between the long and the wide key formats. One might simply try both. If the number of observed values is more than 5 or 10, editing the key in a program like Microsoft Excel is less error prone in the long key. This is simply a matter of taste, however. The disadvantage of the long format is that it is somewhat verbose, with repeated values in the name and class values. If an editor makes an error in the assignment of a block, then hard to find errors may result. Because editing the key can be a rather involved process, we will wait to discuss the details until section \ref{sec:Editing-the-key}. \subsection{Step 3. keyImport} Once any desired changes are entered in the variable key, the file needs to be imported back into R. For that purpose, we supply the \code{keyImport} function. As in \code{keyTemplate}, the file argument's suffix is used to discern whether the input should be read as .csv, .xlsx, or .rds. It is not necessary to specify that the key being imported is in the long or wide format. \code{keyImport} includes heuristics that have classified user-edited keys very accurately. The returned value is an R data frame that should be very similar to the template, except that the new values of \code{name\_new}, \code{class\_new}, and \code{value\_new} will be visible. In order to test this function with the \code{kutils} package, we include some variable keys. The usage of those keys is demonstrated in the help page for \code{keyImport}. In addition to the mydf toy data frame created above, we also include a subset of the US National Longitudinal Survey in a data frame named \code{natlongsurv}. \subsection{Step 4. Apply the imported key to the data} The final step is to apply the key to the data frame (or some other data frame that may have arrived in the interim). The syntax is simple <>= mydf.cleaned <- keyApply(mydf, mydf.keylist) @ Because the default value of the argument \code{diagnostic} is TRUE, the output from \code{keyApply} is somewhat verbose. After we have more feedback from test users, we will be able to quiet some of that output. The diagnostic output will include information about mismatch between the key and the data itself. If variables that are included in the key that are not included in the new data set, there will not be an error, but a gentle warning will appear. Similarly, if the observed values of an enumerated variable are not included in the variable key, there will be a warning. The diagnostic will also create a cross tabulation of each new variable against its older counterpart. This works very well with discrete variables with 10 or so values, but for variables with more values it is rather unmanageable. \section{Editing the variable key} \label{sec:Editing-the-key} The work of revising the variable key can be driven by the separation of variables into two type. The variables with enumerated values–the ones for which we intend to rename or re-assign values one-by-one–are treated in a very different way than the other ones. The enumerated value strategy works well with variables for which we simply need to rename categories (e.g, ``cons'' to ``Conservative''). Variables for which we do not do so (e.g., convert Fahrenheit to Celsius) are treated differently. As we will see in this section, the revision of variables of the enumerated value type emphasizes the revision of the value\_old and value\_new columns in the key. On the other hand, the other types will depend on writing correctly formatted statements in the recode column of the variable key. \subsection{Enumerated variables} All of the values observed for \code{logical}, \code{factor}, and \code{ordered} variables will appear in the key template. Do not delete them unless the exclusion of those values from the analysis intended. For character and integer variables with fewer than \code{max.levels} discrete values, the observed scores will be included in \code{value\_old}. If one wishes to convert a variable from being treated as an enumerated to a numeric type, then one can delete all values from \code{value\_old} and \code{value\_new}. The recoding of discrete variables is a fairly obvious chore. For each old value, a new value must be specified. We first consider the case of a variable that enters as a character variable but we might like to recode it and also create factor and integer variants of it. In the \code{mydf} variable key (Table \ref{tab:The-Wide-Key}), we have variable \code{x2} which is coded \code{a} through \code{f}. We demonstrate ways to spawn new character, factor, or integer variable in Table \ref{tab:Change-Type-Example1}. As long as name\_old is preserved, as many lines as desired can be used to create variables of different types. Here we show the middle section of the revised key in which we have spawned 3 new variants of x2, each with its own name. \begin{table}[H] \caption{Change Class, Example 1\label{tab:Change-Type-Example1}} \begin{tabular}{cccccc} \hline name\_old & name\_new & class\_old & class\_new & value\_old & value\_new\tabularnewline \hline x2 & x2.char & character & character & a|b|c|d|f & Excellent|Proficient|Good|Fair|Poor\tabularnewline x2 & x2.fac & character & factor & a|b|c|d|f & Excellent|Proficient|Good|Fair|Poor\tabularnewline x2 & x2.gpa & character & integer & a|b|c|d|f & 4|3|2|1|0\tabularnewline \hline \end{tabular} \end{table} In line one of Table \ref{tab:Change-Type-Example1}, the class \code{character} remains the same. That line will produce a new character variable with embellished values. Line two demonstrates how to create an R factor variable, \code{x2.fac}, and line three converts the character to an integer variable. Remember that it is important to match the value of \code{class\_new} with the content proposed for \code{value\_new}. Do not include character values in a variable for which the new class will be numeric or integer. Similarly, it is obvious to see how an integer input can be converted into either an integer, character, or factor variable by employing any of the rows seen in Table \ref{tab:Change-Type-Example2}. \begin{table}[H] \caption{Change Class Example 2\label{tab:Change-Type-Example2}} \begin{tabular}{cccccc} \hline name\_old & name\_new & class\_old & class\_new & value\_old & value\_new\tabularnewline \hline x6 & x6.i100 & integer & integer & 1|2|3|4|5 & 100|200|300|400|500\tabularnewline x6 & x6.c & integer & character & 1|2|3|4|5 & Austin|Denver|Nashville|Provo|Miami\tabularnewline x6 & x6.f & integer & factor & 1|2|3|4|5 & F|D|C|B|A\tabularnewline \hline \end{tabular} \end{table} If a variable's \code{class\_old} is ordered, and we simply want to relabel the existing levels, the work is also easy (see Table \ref{tab:Change-Type-Example3}). The second row in Table \ref{tab:Change-Type-Example3} shows that factor levels can be ``combined'' by assigning the same character string to several ``<'' separated values. \begin{table}[H] \caption{Change Class, Example 3\label{tab:Change-Type-Example3}} \begin{tabular}{cccccc} \hline name\_old & name\_new & class\_old & class\_new & value\_old & value\_new\tabularnewline \hline x7 & x7.grades & ordered & ordered & f'', or ``c''. These are illustrated in Table \ref{tab:Recode-Examples}. The symbols ``<='' and ``>='' are accepted in the obvious way. \begin{table}[H] \caption{Missings Examples\label{tab:Recode-Examples}} \begin{tabular}{|c|c|c|} \hline missings & interpretation: NA will be assigned to & example\tabularnewline \hline > t & values greater than t & > 99\tabularnewline >= t & values greater than or equal to t & >=99\tabularnewline