We will use the same data that was used in Scrucca et al (2010). The data is available on the main author’s website; it is also available as part of this package.
set.seed(12345)
library(casebase)
## See example usage at http://sahirbhatnagar.com/casebase/
data(bmtcrr)
head(bmtcrr)
## Sex D Phase Age Status Source ftime
## 1 M ALL Relapse 48 2 BM+PB 0.67
## 2 F AML CR2 23 1 BM+PB 9.50
## 3 M ALL CR3 7 0 BM+PB 131.77
## 4 F ALL CR2 26 2 BM+PB 24.03
## 5 F ALL CR2 36 2 BM+PB 1.47
## 6 M ALL Relapse 17 2 BM+PB 2.23
We will perform a competing risk analysis on data from 177 patients who received a stem cell transplant for acute leukemia. The event of interest in relapse, but other competing causes (e.g. transplant-related death) need to be taken into account. We also want to take into account the effect of several covariates such as Sex, Disease (lymphoblastic or myeloblastic leukemia, abbreviated as ALL and AML, respectively), Phase at transplant (Relapse, CR1, CR2, CR3), Source of stem cells (bone marrow and peripheral blood, coded as BM+PB, or peripheral blood, coded as PB), and Age. Below, we reproduce their Table 1:
Variable | Description | Statistical summary |
---|---|---|
Sex | Sex | M=Male (100) F=Female (77) |
D | Disease | ALL (73) AML (104) |
Phase | Phase | CR1 (47) CR2 (45) CR3 (12) Relapse (73) |
Source | Type of transplant | BM+PB (21) PB (156) |
Age | Age of patient (years) | 4–62 30.47 (13.04) |
Ftime | Failure time (months) | 0.13–131.77 20.28 (30.78) |
Status | Status indicator | 0=censored (46) 1=relapse (56) 2=competing event (75) |
The statistical summary is generated differently for continuous and categorical variables:
For continuous variables, we are given the range, followed by the mean and standard deviation.
For categorical variables, we are given the counts for each category.
Note that failure time can also correspond to censoring.
In order to try and visualize the incidence density of relapse, we can look at a population-time plot: on the X-axis we have time, and on the Y-axis we have the size of the risk set at a particular time point. Failure times associated to the event of interest can then be highlighted on the plot using red dots.
nobs <- nrow(bmtcrr)
ftime <- bmtcrr$ftime
ord <- order(ftime, decreasing = FALSE)
# We split the person-moments in four categories:
# 1) at-risk
# 2) main event
# 3) competing event
# 4) censored
yCoords <- cbind(cumsum(bmtcrr[ord, "Status"] == 2),
cumsum(bmtcrr[ord, "Status"] == 1),
cumsum(bmtcrr[ord, "Status"] == 0))
yCoords <- cbind(yCoords, nobs - rowSums(yCoords))
# Plot only at-risk
plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
xlab = 'Follow-up time', ylab = 'Population')
polygon(c(0, 0, ftime[ord], max(ftime), 0),
c(0, nobs, yCoords[,4], 0, 0), col = "grey90")
cases <- bmtcrr[, "Status"] == 1
# randomly move the cases vertically
moved_cases <- yCoords[cases[ord], 4] * runif(sum(cases))
points((ftime[ord])[cases[ord]], moved_cases, pch = 20,
col = "red", cex = 1)