Exercise 3. Localised melanoma: Comparing estimates of cause-specific survival between periods; first graphically and then using the log rank test
You may have to install the required packages the first time you use
them. You can install a package by
install.packages("package_of_interest") for each package
you require.
library(biostat3)
library(survival)
library(dplyr)    # for data manipulation
library(survminer) # for ggsurvplot
library(bshazard) # smooth hazardsWe will now analyse the full data set of patients diagnosed with localised skin melanoma. We start by reading the data selecting those with a localised stage and then define a 1/0 varible for the events that we are interested in.
melanoma <- biostat3::melanoma |>
    subset(stage=="Localised") |>
    transform(death_cancer = ifelse(status == "Dead: cancer", 1, 0),
              death_all = ifelse(status == "Dead: cancer" |
                                 status == "Dead: other", 1, 0))(a)
survfit(Surv(surv_mm/12, death_cancer==1) ~ year8594, data = melanoma) |>
    plot(col = 1:2,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates")
legend("bottomleft", levels(melanoma$year8594), col=1:2, lty = 1)There seems to be a clear difference in survival between the two periods. Patients diagnosed during 1985–94 have superior survival to those diagnosed 1975–84.
(b)
The plot shows the instantaneous cancer-specific mortality rate (the hazard) as a function of time. It appears that mortality is highest approximately 3.5 years following diagnosis. Remember that all patients were classified as having localised cancer at the time of diagnosis so we would not expect mortality to be high directly following diagnosis.
The plot of the hazard clearly illustrates the pattern of cancer-specific mortality as a function of time whereas this pattern is not obvious in the plot of the survivor function.
par(mfrow=1:2)
for(level in levels(melanoma$year8594))
    plot(bshazard(Surv(surv_mm/12,death_cancer)~1, subset(melanoma,year8594==level),
                  verbose=FALSE), main=level, xlim=c(0,20), ylim=c(0,0.1),
         xlab="Time since diagnosis\n(years)")(c)
There is strong evidence that survival differs between the two periods. The log-rank and the Wilcoxon tests give very similar results. The Wilcoxon test gives more weight to differences in survival in the early period of follow-up (where there are more individuals at risk) whereas the log rank test gives equal weight to all points in the follow-up. Both tests assume that, if there is a difference, a proportional hazards assumption is appropriate.
## Log-rank test for equality of survivor functions
survdiff(Surv(surv_mm, death_cancer) ~ year8594, data=melanoma)## Call:
## survdiff(formula = Surv(surv_mm, death_cancer) ~ year8594, data = melanoma)
## 
##                             N Observed Expected (O-E)^2/E (O-E)^2/V
## year8594=Diagnosed 75-84 2145      572      512      7.03      15.5
## year8594=Diagnosed 85-94 3173      441      501      7.18      15.5
## 
##  Chisq= 15.5  on 1 degrees of freedom, p= 8e-05## Equivalent to the Peto & Peto modfication of the Gehan-Wilcoxon test
survdiff(Surv(surv_mm, death_cancer) ~ year8594, data=melanoma, rho=1)## Call:
## survdiff(formula = Surv(surv_mm, death_cancer) ~ year8594, data = melanoma, 
##     rho = 1)
## 
##                             N Observed Expected (O-E)^2/E (O-E)^2/V
## year8594=Diagnosed 75-84 2145      502      447      6.82      16.2
## year8594=Diagnosed 85-94 3173      400      455      6.70      16.2
## 
##  Chisq= 16.2  on 1 degrees of freedom, p= 6e-05(d)
We see that mortality increases with age at diagnosis (and survival decreases).
##              agegrp    tstop event     rate    lower    upper
## agegrp=0-44    0-44 157.1215   217 1.381097 1.203441 1.577593
## agegrp=45-59  45-59 148.8215   282 1.894887 1.680159 2.129453
## agegrp=60-74  60-74 121.3380   333 2.744400 2.457518 3.055574
## agegrp=75+      75+  36.2380   181 4.994757 4.293585 5.777786## melanoma |>
##     select(death_cancer, surv_mm, year8594) |>
##     group_by(year8594) |>
##     summarise(D = sum(death_cancer), Y = sum(surv_mm)/1000, Rate = D/Y,
##               lower.ci = stats::poisson.test(D,Y)$conf.int[1],
##               upper.ci = stats::poisson.test(D,Y)$conf.int[2]) 
survfit(Surv(surv_mm, death_cancer) ~ agegrp, data = melanoma) |>
    plot(col = 1:4,
         xlab = "Months since diagnosis",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates")
legend("bottomleft", levels(melanoma$agegrp), col=1:4, lty = 1)The rates are (cause-specific) deaths per 1000 person-years. When we
use Surv we defined time as time in months divided by 12
and then asked for rates per 1000 units of time.
What are the units of the estimated hazard rates? HINT: look at how you defined time.
(e)
survfit(Surv(surv_mm/12, death_cancer) ~ agegrp, data = melanoma) |>
    plot(col = 1:4,
         xlab = "Years since diagnosis",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates")
legend("bottomleft", levels(melanoma$agegrp), col=1:4, lty = 1)##                                 year8594    tstop event     rate    lower    upper
## year8594=Diagnosed 75-84 Diagnosed 75-84 22.66279   572 25.23961 23.21332 27.39540
## year8594=Diagnosed 85-94 Diagnosed 85-94 15.96379   441 27.62502 25.10656 30.32768(f)
aplot = function() {
    survfit(Surv(surv_mm/12, death_cancer) ~ sex, data = melanoma) |>
    plot(col = 1:2,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates")
}
if (requireNamespace("muhaz", quietly=TRUE)) {
    par(mfrow=c(1, 2))
    aplot()
    plot(muhaz2(Surv(surv_mm/12,death_cancer)~sex, data=melanoma), lty=1,
         xlab="Time since diagnosis (years)")
} else {
    par(mfrow=c(1, 1))
    aplot()
    legend("bottomleft", legend=levels(melanoma$sex), lty=1, col=1:2)
}## Log-rank test for equality of survivor functions
survdiff(Surv(surv_mm, death_cancer==1) ~ sex, data=melanoma)## Call:
## survdiff(formula = Surv(surv_mm, death_cancer == 1) ~ sex, data = melanoma)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Male   2405      542      433      27.7      48.6
## sex=Female 2913      471      580      20.6      48.6
## 
##  Chisq= 48.6  on 1 degrees of freedom, p= 3e-12