Case-base sampling was proposed by Hanley and Miettinen, 2009 as a way to fit smooth-in-time parametric hazard functions via logistic regression. The main idea, which was first proposed by Mantel, 1973 and then later developped by Efron, 1977, is to sample person-moments, i.e. discrete time points along an subject’s follow-up time, in order to construct a base series against which the case series can be compared.
This approach allows the explicit inclusion of the time variable into the model, which enables the user to fit a wide class of parametric hazard functions. For example, including time linearly recovers the Gompertz hazard, whereas including time logarithmically recovers the Weibull hazard; not including time at all corresponds to the exponential hazard.
The theoretical properties of this approach have been studied in Saarela and Arjas, 2015 and Saarela, 2015.
The first example we discuss uses the well-known veteran
dataset, which is part of the survival
package. As we can see below, there is almost no censoring, and therefore we can get a good visual representation of the survival function:
set.seed(12345)
library(survival)
data(veteran)
table(veteran$status)
##
## 0 1
## 9 128
evtimes <- veteran$time[veteran$status == 1]
hist(evtimes, nclass = 30, main = '', xlab = 'Survival time (days)',
col = 'gray90', probability = TRUE)
tgrid <- seq(0, 1000, by = 10)
lines(tgrid, dexp(tgrid, rate = 1.0/mean(evtimes)),
lwd = 2, lty = 2, col = 'red')