Data

We will use the same data that was used in Scrucca et al (2010). The data is available on the main author’s website; it is also available as part of this package.

set.seed(12345)
library(casebase)
## See example usage at http://sahirbhatnagar.com/casebase/
data(bmtcrr)
head(bmtcrr)
##   Sex   D   Phase Age Status Source  ftime
## 1   M ALL Relapse  48      2  BM+PB   0.67
## 2   F AML     CR2  23      1  BM+PB   9.50
## 3   M ALL     CR3   7      0  BM+PB 131.77
## 4   F ALL     CR2  26      2  BM+PB  24.03
## 5   F ALL     CR2  36      2  BM+PB   1.47
## 6   M ALL Relapse  17      2  BM+PB   2.23

We will perform a competing risk analysis on data from 177 patients who received a stem cell transplant for acute leukemia. The event of interest in relapse, but other competing causes (e.g. transplant-related death) need to be taken into account. We also want to take into account the effect of several covariates such as Sex, Disease (lymphoblastic or myeloblastic leukemia, abbreviated as ALL and AML, respectively), Phase at transplant (Relapse, CR1, CR2, CR3), Source of stem cells (bone marrow and peripheral blood, coded as BM+PB, or peripheral blood, coded as PB), and Age. Below, we reproduce their Table 1:

Variable Description Statistical summary
Sex Sex M=Male (100)
F=Female (77)
D Disease ALL (73)
AML (104)
Phase Phase CR1 (47)
CR2 (45)
CR3 (12)
Relapse (73)
Source Type of transplant BM+PB (21)
PB (156)
Age Age of patient (years) 4–62
30.47 (13.04)
Ftime Failure time (months) 0.13–131.77
20.28 (30.78)
Status Status indicator 0=censored (46)
1=relapse (56)
2=competing event (75)

The statistical summary is generated differently for continuous and categorical variables:

  • For continuous variables, we are given the range, followed by the mean and standard deviation.

  • For categorical variables, we are given the counts for each category.

Note that failure time can also correspond to censoring.

Population-time plots

In order to try and visualize the incidence density of relapse, we can look at a population-time plot: on the X-axis we have time, and on the Y-axis we have the size of the risk set at a particular time point. Failure times associated to the event of interest can then be highlighted on the plot using red dots.

nobs <- nrow(bmtcrr)
ftime <- bmtcrr$ftime
ord <- order(ftime, decreasing = FALSE)

# We split the person-moments in four categories:
# 1) at-risk
# 2) main event
# 3) competing event
# 4) censored
yCoords <- cbind(cumsum(bmtcrr[ord, "Status"] == 2), 
                 cumsum(bmtcrr[ord, "Status"] == 1),
                 cumsum(bmtcrr[ord, "Status"] == 0))
yCoords <- cbind(yCoords, nobs - rowSums(yCoords))

# Plot only at-risk
plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs), 
     xlab = 'Follow-up time', ylab = 'Population')
polygon(c(0, 0, ftime[ord], max(ftime), 0),
        c(0, nobs, yCoords[,4], 0, 0), col = "grey90")
cases <- bmtcrr[, "Status"] == 1

# randomly move the cases vertically
moved_cases <- yCoords[cases[ord], 4] * runif(sum(cases))
points((ftime[ord])[cases[ord]], moved_cases, pch = 20, 
       col = "red", cex = 1)