--- title: "Recreating a Royston and Altman paper" author: "Terry Therneau" date: '`r format(Sys.time(),"%d %B, %Y")`' bibliography: refer.bib output: bookdown::html_document2: base_format: rmarkdown::html_vignette number_sections: true toc: true vignette: > %\VignetteIndexEntry{Recreating a Royston and Altman paper} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} options(continue=" ", width=70, prompt="> ") options(contrasts=c("contr.treatment", "contr.poly")) #ensure default options(show.significant.stars = FALSE) #statistical intelligence knitr::opts_chunk$set( collapse = TRUE, warning=FALSE, error=FALSE, tidy=FALSE, fig.asp=.75, fig.align="center", comment = "#>", highlight=TRUE, echo=TRUE, prompt=FALSE, fig.width=7, fig.height=5.5 ) library("survival") ``` # Introduction The paper by Royston and Altman @Royston13b on Cox model validation is a delightfully clear exposition of the important principles. As well, they made use of, and documented, datasets that others can use. The `rotterdam` and `gbsg` datasets from their work have been incorporated into the survival package. This short note describes that process, and more importantly, attempts to recreate some of the results of their paper as a way of validating that the data was created correctly. \section{Rotterdam data} The main paper refers to a web site www.stata-press.com/data/fpsaus.html. That in turn hosts all of the datasets and Stata code for a book by Royston and Lampert @Royston11. Following the URL leads to a dataset `rott2`, imported from a Stata file `rott2.dta`. Variable labels make it fairly easy to set up the new dataset. There were a few variables that we did not copy. * `mf, mfi`: time to metastasis * `enodes`: exp(-0.12*nodes) * `pr\_1` : log(pr +1) * `enod_1`: enodes*enodes * `recent`: year of surgery dichotomized as $\le 1987$ (0) vs $\ge 1988$ (1) * `_st`: 1 for all rows * `_d` : equal to `recur` if rtime < 10 years, 0 otherwise * `_t` : recurrence time, truncated at 10 years * `_t0`: 0 for all rows The Rotterdam data contains 2982 subjects. The article makes the statement that the endpoint for fits to the Rotterdam data is relapse-free survival (RFS), the earlier of death or relapse, censored at 84 months, and that they will omit those with nodes=0. It also has the statement that "The Rotterdam data (with RFS truncated to 120 months) can be downloaded from www.stat-press.com.", I expect this latter statement has caused many to assume that the `(_t, _d)` pair in that dataset contains RFS values, but examination shows that they encode 10 year relapse. Table 1 shows the distribution for 1546 subjects, the number with nodes > 0. The distribution below, from the R dataset, matches the table with the exception of the mean estrogen receptor (ER) level. If I winsorize that at 2000 to exclude large outliers, however, I match the table. ```{r table1, echo= TRUE} rotterdam2 <- subset(rotterdam, nodes>0) table(rotterdam2$size) table(rotterdam2$meno) table(rotterdam2$hormon) round(c(mean(rotterdam2$age), sd(rotterdam2$age)), 1) round(c(mean(rotterdam2$nodes),sd(rotterdam2$nodes)), 1) round(c(mean(rotterdam2$pgr), sd(rotterdam2$pgr)), 1) round(c(mean(rotterdam2$er), sd(rotterdam2$er)),1) ``` The Cox model in the paper was fit using fractional polynomials. If we use the RPS at 84 months = 7 years, the model fit looks like the following. As noted in the help file for the rotterdam data, there are 43 subjects who have died without recurrence, but whose death time is greater than the censoring time for recurrence; and exactly how best to handle these is debatable. Consider subject 191 who is censored for recurrence at 13 months, with death at 52 months. We could code this subject as censored at 13 or an RFS event at 52. The latter implicitly assumes that there was no recurrence in the unobserved period from 13 to 52 months, over 3 years. For a tertiary care center in the US, where a subject is likely to return to their home physician after the primary cancer treatment, this is not a tenable assumption. If they had recurred, we would not have found out about it due to disjoint medical records systems, and the safer course is to censor at 13. In a national health care system such as the Netherlands, where this study was conducted, such an assumption is more reasonable. In this case, ignoring the gaps and using the later death leads to 965 events, which matches the description on page 3 of the paper. ```{r table2, echo=TRUE} y7 <- round(7*365.25) # 7 years or 84 months r7 <- rotterdam2 r7$recur <- ifelse(r7$rtime > y7, 0, r7$recur) r7$rtime <- pmin(r7$rtime, y7) r7$death <- ifelse(r7$dtime > y7, 0, r7$death) r7$dtime <- pmin(r7$dtime, y7) r7$rfstime <- with(r7, ifelse(recur==1, rtime, dtime)) # time to recur or death r7$rfs <- with(r7, pmax(death, recur)) table(r7$rfs) agefun <- function(x) cbind((x/100)^3, (x/100)^3 * log(x/100)) cfit <- coxph(Surv(rfstime, rfs) ~ agefun(age) + meno + size + I(1/sqrt(nodes)) + I(er/1000) + hormon, data= r7, ties="breslow") cbind("cfit"= round(coef(cfit),3), "paper"= c(1.07, 9.13, 0.46, 0.23, 0.31, -1.74, -0.34, -0.35)) ``` The dataset is also available from Sauerbrei et al @Sauerbrei07. That version does not have the overall survival and death variables, but includes two categorical variables `sized1` = 1 if size $>20$ and `sized2` = 1 if size $> 50$, which match the variable names found in table 2, and resolves the coefficient difference we see above for the size > 50 group: our coefficient is the sum of the `sized1` and `sized2` coefficients in their table. They had used contrasts of group 2 vs. 1 and 3 vs. 2, while we have 2 vs. 1 and 3 vs. 1. A termplot reveals the non-linear age effect. Both young and old subjects are at higher risk. ```{r termplot} termplot(cfit, term=1, ylab="Estimated age effect", col.term=1, col.se=1, se=TRUE) ``` # GBSG data The GBSG dataset found in the reference is somewhat simpler, in that it only contains the RFS outcome. The GBSG dataset has no node-negative subjects, and tumor size is continuous rather than being categorical. The dataset has again, exact agreement with table 1. ```{r gbsg1, echo=TRUE} table(cut(gbsg$size, c(0, 20, 50, 150), c("<=20", "20-50", ">50"))) table(gbsg$meno) table(gbsg$hormon) round(c(mean(gbsg$age), sd(gbsg$age)), 1) round(c(mean(gbsg$nodes),sd(gbsg$nodes)), 1) round(c(mean(gbsg$pgr), sd(gbsg$pgr)), 1) round(c(mean(gbsg$er), sd(gbsg$er)), 1) ``` Validation is made easier if we create a version of the GBSG dataset whose variable names exactly match the Rotterdam derivation dataset. Then create a PI variable using the rotterdam fit. ```{r gbsg2} gbsg2 <- gbsg gbsg2$size <- cut(gbsg$size, c(0, 20, 50, 150), c("<=20", "20-50", ">50")) gbsg2$rfs <- gbsg2$status gbsg2$PI <- predict(cfit, newdata=gbsg2) ``` The histogram of the risk scores appears to match that in Figure 1. ```{r hist1, echo=FALSE} PI1 <- predict(cfit) - mean(predict(cfit)) oldpar <- par(mfrow=c(2,1), mar=c(6,6,1,1)) hist(PI1, breaks=seq(-1.6, 1.6, by=.1), xlab="PI in derivation data", main=NULL) abline(v = quantile(PI1, c(.16, .5 , .84)), col=2) PI2 <- gbsg2$PI - mean(predict(cfit)) hist(PI2, breaks=seq(-1.6, 1.6, by=.1), xlab="PI in validation data", main=NULL) abline(v = quantile(PI2, c(.16, .5 , .84)), col=2) par(oldpar) ``` And here is figure 2, comparing the KM curves, by PI group. ```{r fig2} grp1 <- cut(PI1, quantile(PI1, c(0, .16, .5 , .84, 1)), include.lowest=TRUE) km1 <- survfit(Surv(rfstime, rfs) ~ grp1, r7) plot(km1, xscale=365.25, xlab="Years since enrollment", ylab="Relapse free survival") grp2 <- cut(PI2, quantile(PI2, c(0, .16, .5 , .84, 1)), include.lowest=TRUE) km2 <- survfit(Surv(rfstime, rfs) ~ grp2, gbsg2) lines(km2, col=2) ``` The curves for the validation group have similar spread to those for the development dataset. The GBSG dataset shows fewer early deaths than predicted; however, this is not surprising. The Rotterdam study enrolled all breast cancer subjects while the GBSG data is from a clinical trial, and cancer clinical trial subjects very often have an initial "death honeymoon", for a variety of reasons: eligibility criteria often explicitly ask for subjects with a minimal expected survival, or implicitly so by excluding those with low Karnofsky scores, while subjects who are in extremis are themselves less likely to volunteer. # References