Simulating a simulation study

Alessandro Gasparini

2018-06-20

Introduction

In this vignette, we show how the simulated data included as an example dataset in simsum has been generated.

Motivation

Say we want to run a simulation study in which we want to compare the sensitivity of parametric and semiparametric survival models on relative risk estimates.

Data generating mechanisms

We simulate an hypotetical trial with a binary treatment. We fix the log-treatment effect to \(-0.50\), and we generate a treatment indicator variable for each simulated individual via a \(Binom(1, 0.5)\) random variable. We simulate two different sample sizes (50 and 250 individuals) and we assume two different baseline hazard functions: exponential with scale parameter \(\lambda = 0.5\), and Weibull with scale parameter \(\lambda = 0.5\) and shape parameter \(\gamma = 1.5\). Finally, we apply administrative censoring at time \(t = 5\).

exp_basehaz <- function(t, lambda = 0.5) lambda * 1 * t^0
exp_weibull <- function(t, lambda = 0.5, gamma = 1.5) lambda * gamma * t^(gamma - 1)
curve(exp_basehaz, from = 0, to = 5, lty = 1, ylim = c(0, 2), ylab = expression(h[0](t)), xlab = "Follow-up time t")
curve(exp_weibull, from = 0, to = 5, lty = 2, add = TRUE)
legend(x = "topleft", lty = 1:2, legend = c("Exponential baseline hazard", "Weibull baseline hazard"), bty = "n")

The survival times are estimated using the approach of Bender et al. (2005), based on drawing from a \(U(0, 1)\) random variable and applying the following transformations:

  1. for an exponential baseline hazard, the survival time \(t\) is simulated as: \[t = -\frac{log(U)}{\lambda \exp(\beta ^ T X)}\]

  2. for a Weibull baseline hazard, the survival time \(t\) is simulated as: \[t = \left(-\frac{log(U)}{\lambda \exp(\beta ^ T X)}\right) ^ {1 / \gamma}\]

The R function to simulate a dataset for our simulation study is defined as follows:

simulate_data <- function(dataset, n, baseline, params = list(), coveff = -0.50) {
  # Simulate treatment indicator variable
  x <- rbinom(n = n, size = 1, prob = 0.5)
  # Draw from a U(0,1) random variable
  u <- runif(n)
  # Simulate survival times depending on the baseline hazard
  if (baseline == "Exponential") {
    t <- -log(u) / (params$lambda * exp(x * coveff))
  } else {
    t <- (-log(u) / (params$lambda * exp(x * coveff)))^(1 / params$gamma)
  }
  # Winsorising tiny values for t (smaller than one day on a yearly-scale, e.g. 1 / 365.242), and adding a tiny amount of white noise not to have too many concurrent values
  t <- ifelse(t < 1 / 365.242, 1 / 365.242, t)
  t[t == 1 / 365.242] <- t[t == 1 / 365.242] + rnorm(length(t[t == 1 / 365.242]), mean = 0, sd = 1e-4)
  # ...and make sure that the resulting value is positive
  t <- abs(t)

  # Make event indicator variable applying administrative censoring at t = 5
  d <- as.numeric(t < 5)
  t <- pmin(t, 5)
  # Return a data.frame
  data.frame(dataset = dataset, x = x, t = t, d = d, n = n, baseline = baseline, stringsAsFactors = FALSE)
}

Methods

We compare the Cox model (Cox, 1972) with a fully parametric survival model assuming an exponential baseline hazard and a flexible parametric model with 2 degrees of freedom for modelling the baseline hazard (Royston and Parmar, 2002). The Cox model can be fit via the coxph function from the survival package, the exponential model can be fit via the phreg function from the eha package, and the Royston-Parmar model can be fixed via the stpm2 function from the rstpm2 package.

Performance measures

Say we are interested in the following performance measures:

Sample size

We are primarily interested in bias, and assume that the variance of the estimated log-treatment effect is \(0.1\). The Monte Carlo standard error for the bias is:

\[\text{MCSE} = \sqrt{\frac{\text{variance}}{\# \text{simulations}}}\]

Aiming for a Monte Carlo standard error of 0.01 on the estimated bias, we would require \(1,000\) replications.

The Monte Carlo standard error for converage is:

\[\text{MCSE} = \sqrt{\frac{\text{coverage} \times (1 - \text{coverage})}{\# \text{simulations}}}\]

This Monte Carlo standard error is maximised for a coverage = \(0.5\). In that setting, the Monte Carlo standard error with \(1,000\) replications would be \(0.01581139\), which is deemed to be acceptable.

Therefore, we will run \(1,000\) replications of this simulation study.

Running the simulation study

Generate data

We generate \(1,000\) datasets for each data-generating mechanism.

First, we set a random seed for reproducibility:

Then, we simulate the data:

Run models

We define a function to fit the models of interest:

We now run the models for each simulated dataset:

results <- list()
results[["n = 50, baseline = Exp, model = Cox"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Exp"]],
    FUN = fit_models,
    model = "Cox"
  )
)
results[["n = 250, baseline = Exp, model = Cox"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Exp"]],
    FUN = fit_models,
    model = "Cox"
  )
)
results[["n = 50, baseline = Wei, model = Cox"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Wei"]],
    FUN = fit_models,
    model = "Cox"
  )
)
results[["n = 250, baseline = Wei, model = Cox"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Wei"]],
    FUN = fit_models,
    model = "Cox"
  )
)

results[["n = 50, baseline = Exp, model = Exp"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Exp"]],
    FUN = fit_models,
    model = "Exp"
  )
)
results[["n = 250, baseline = Exp, model = Exp"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Exp"]],
    FUN = fit_models,
    model = "Exp"
  )
)
results[["n = 50, baseline = Wei, model = Exp"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Wei"]],
    FUN = fit_models,
    model = "Exp"
  )
)
results[["n = 250, baseline = Wei, model = Exp"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Wei"]],
    FUN = fit_models,
    model = "Exp"
  )
)

results[["n = 50, baseline = Exp, model = RP(2)"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Exp"]],
    FUN = fit_models,
    model = "RP(2)"
  )
)
results[["n = 250, baseline = Exp, model = RP(2)"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Exp"]],
    FUN = fit_models,
    model = "RP(2)"
  )
)
results[["n = 50, baseline = Wei, model = RP(2)"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 50, baseline = Wei"]],
    FUN = fit_models,
    model = "RP(2)"
  )
)
results[["n = 250, baseline = Wei, model = RP(2)"]] <- do.call(
  rbind.data.frame,
  lapply(
    X = data[["n = 250, baseline = Wei"]],
    FUN = fit_models,
    model = "RP(2)"
  )
)

Aggregating results

We save the final results, that will be included as an example in the R package rsimsum.

Summarising results

Finally, we obtain summary statistics by calling the simsum function:

summary(s)
#> 
#> Call:
#>  rsimsum::simsum(data = relhaz, estvarname = "theta", true = -0.5, 
#>     se = "se", methodvar = "model", ref = "Cox", by = c("n", 
#>         "baseline"), mcse = TRUE)
#> 
#> Method variable: model 
#>  Unique methods: Cox, Exp, RP(2) 
#>  Reference method: Cox 
#> By factors: n, baseline 
#> 
#> Summary statistics:
#> 
#>  Method = Cox, n = 250, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5004     NA         NA          NA
#>                       Median point estimate   -0.4969     NA         NA          NA
#>                      Average standard error    0.0194     NA         NA          NA
#>                       Median standard error    0.0194     NA         NA          NA
#>                      Bias in point estimate   -0.0004 0.0043    -0.0088      0.0081
#>                    Empirical standard error    0.1363 0.0030     0.1303      0.1422
#>                          Mean squared error    0.0185 0.0009     0.0168      0.0203
#>  % gain in precision relative to method Cox    1.0000 0.0000     1.0000      1.0000
#>                  Model-based standard error    0.1394 0.0001     0.1393      0.1396
#>          Relative % error in standard error    2.3293 2.2894    -2.1578      6.8164
#>                  Coverage of nominal 95% CI    0.9510 0.0068     0.9376      0.9644
#>   Bias corrected coverage of nominal 95% CI    0.9520 0.0068     0.9388      0.9652
#>                      Power of 5% level test    0.9570 0.0064     0.9444      0.9696
#> 
#>  Method = Exp, n = 250, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.4996     NA         NA          NA
#>                       Median point estimate   -0.5004     NA         NA          NA
#>                      Average standard error    0.0191     NA         NA          NA
#>                       Median standard error    0.0190     NA         NA          NA
#>                      Bias in point estimate    0.0004 0.0043    -0.0080      0.0088
#>                    Empirical standard error    0.1350 0.0030     0.1291      0.1409
#>                          Mean squared error    0.0182 0.0009     0.0165      0.0199
#>  % gain in precision relative to method Cox    1.0192 0.0093     1.0009      1.0375
#>                  Model-based standard error    0.1381 0.0001     0.1379      0.1382
#>          Relative % error in standard error    2.2813 2.2882    -2.2035      6.7661
#>                  Coverage of nominal 95% CI    0.9540 0.0066     0.9410      0.9670
#>   Bias corrected coverage of nominal 95% CI    0.9540 0.0066     0.9410      0.9670
#>                      Power of 5% level test    0.9570 0.0064     0.9444      0.9696
#> 
#>  Method = RP(2), n = 250, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5015     NA         NA          NA
#>                       Median point estimate   -0.5000     NA         NA          NA
#>                      Average standard error    0.0194     NA         NA          NA
#>                       Median standard error    0.0193     NA         NA          NA
#>                      Bias in point estimate   -0.0015 0.0043    -0.0100      0.0069
#>                    Empirical standard error    0.1363 0.0030     0.1303      0.1423
#>                          Mean squared error    0.0186 0.0009     0.0168      0.0203
#>  % gain in precision relative to method Cox    0.9996 0.0034     0.9929      1.0063
#>                  Model-based standard error    0.1392 0.0001     0.1391      0.1394
#>          Relative % error in standard error    2.1570 2.2855    -2.3225      6.6365
#>                  Coverage of nominal 95% CI    0.9520 0.0068     0.9388      0.9652
#>   Bias corrected coverage of nominal 95% CI    0.9520 0.0068     0.9388      0.9652
#>                      Power of 5% level test    0.9590 0.0063     0.9467      0.9713
#> 
#>  Method = Cox, n = 50, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5255     NA         NA          NA
#>                       Median point estimate   -0.5226     NA         NA          NA
#>                      Average standard error    0.1016     NA         NA          NA
#>                       Median standard error    0.1003     NA         NA          NA
#>                      Bias in point estimate   -0.0255 0.0104    -0.0458     -0.0052
#>                    Empirical standard error    0.3277 0.0073     0.3133      0.3421
#>                          Mean squared error    0.1079 0.0047     0.0988      0.1170
#>  % gain in precision relative to method Cox    1.0000 0.0000     1.0000      1.0000
#>                  Model-based standard error    0.3188 0.0004     0.3180      0.3197
#>          Relative % error in standard error   -2.7072 2.1801    -6.9802      1.5658
#>                  Coverage of nominal 95% CI    0.9410 0.0075     0.9264      0.9556
#>   Bias corrected coverage of nominal 95% CI    0.9400 0.0075     0.9253      0.9547
#>                      Power of 5% level test    0.3740 0.0153     0.3440      0.4040
#> 
#>  Method = Exp, n = 50, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5223     NA         NA          NA
#>                       Median point estimate   -0.5181     NA         NA          NA
#>                      Average standard error    0.0977     NA         NA          NA
#>                       Median standard error    0.0970     NA         NA          NA
#>                      Bias in point estimate   -0.0223 0.0101    -0.0422     -0.0024
#>                    Empirical standard error    0.3208 0.0072     0.3068      0.3349
#>                          Mean squared error    0.1033 0.0045     0.0945      0.1122
#>  % gain in precision relative to method Cox    1.0431 0.0126     1.0183      1.0679
#>                  Model-based standard error    0.3126 0.0004     0.3118      0.3133
#>          Relative % error in standard error   -2.5836 2.1821    -6.8603      1.6932
#>                  Coverage of nominal 95% CI    0.9370 0.0077     0.9219      0.9521
#>   Bias corrected coverage of nominal 95% CI    0.9390 0.0076     0.9242      0.9538
#>                      Power of 5% level test    0.3850 0.0154     0.3548      0.4152
#> 
#>  Method = RP(2), n = 50, baseline = Exponential 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5337     NA         NA          NA
#>                       Median point estimate   -0.5269     NA         NA          NA
#>                      Average standard error    0.1005     NA         NA          NA
#>                       Median standard error    0.0993     NA         NA          NA
#>                      Bias in point estimate   -0.0337 0.0105    -0.0543     -0.0131
#>                    Empirical standard error    0.3323 0.0074     0.3177      0.3468
#>                          Mean squared error    0.1114 0.0049     0.1018      0.1210
#>  % gain in precision relative to method Cox    0.9726 0.0066     0.9597      0.9855
#>                  Model-based standard error    0.3171 0.0004     0.3162      0.3179
#>          Relative % error in standard error   -4.5711 2.1383    -8.7620     -0.3802
#>                  Coverage of nominal 95% CI    0.9370 0.0077     0.9219      0.9521
#>   Bias corrected coverage of nominal 95% CI    0.9390 0.0076     0.9242      0.9538
#>                      Power of 5% level test    0.3920 0.0154     0.3617      0.4223
#> 
#>  Method = Cox, n = 250, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5057     NA         NA          NA
#>                       Median point estimate   -0.5031     NA         NA          NA
#>                      Average standard error    0.0173     NA         NA          NA
#>                       Median standard error    0.0172     NA         NA          NA
#>                      Bias in point estimate   -0.0057 0.0040    -0.0136      0.0022
#>                    Empirical standard error    0.1274 0.0029     0.1219      0.1330
#>                          Mean squared error    0.0163 0.0008     0.0147      0.0178
#>  % gain in precision relative to method Cox    1.0000 0.0000     1.0000      1.0000
#>                  Model-based standard error    0.1316 0.0001     0.1315      0.1317
#>          Relative % error in standard error    3.2717 2.3104    -1.2566      7.8001
#>                  Coverage of nominal 95% CI    0.9440 0.0073     0.9297      0.9583
#>   Bias corrected coverage of nominal 95% CI    0.9500 0.0069     0.9365      0.9635
#>                      Power of 5% level test    0.9740 0.0050     0.9641      0.9839
#> 
#>  Method = Exp, n = 250, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.3522     NA         NA          NA
#>                       Median point estimate   -0.3500     NA         NA          NA
#>                      Average standard error    0.0164     NA         NA          NA
#>                       Median standard error    0.0163     NA         NA          NA
#>                      Bias in point estimate    0.1478 0.0028     0.1423      0.1533
#>                    Empirical standard error    0.0886 0.0020     0.0847      0.0925
#>                          Mean squared error    0.0297 0.0009     0.0279      0.0314
#>  % gain in precision relative to method Cox    2.0701 0.0381     1.9954      2.1448
#>                  Model-based standard error    0.1280 0.0000     0.1279      0.1280
#>          Relative % error in standard error   44.4489 3.2309    38.1165     50.7813
#>                  Coverage of nominal 95% CI    0.8900 0.0099     0.8706      0.9094
#>   Bias corrected coverage of nominal 95% CI    0.9910 0.0030     0.9851      0.9969
#>                      Power of 5% level test    0.8870 0.0100     0.8674      0.9066
#> 
#>  Method = RP(2), n = 250, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5072     NA         NA          NA
#>                       Median point estimate   -0.5055     NA         NA          NA
#>                      Average standard error    0.0171     NA         NA          NA
#>                       Median standard error    0.0171     NA         NA          NA
#>                      Bias in point estimate   -0.0072 0.0040    -0.0150      0.0007
#>                    Empirical standard error    0.1267 0.0028     0.1212      0.1323
#>                          Mean squared error    0.0161 0.0008     0.0146      0.0176
#>  % gain in precision relative to method Cox    1.0116 0.0067     0.9984      1.0248
#>                  Model-based standard error    0.1309 0.0001     0.1308      0.1310
#>          Relative % error in standard error    3.3058 2.3110    -1.2238      7.8353
#>                  Coverage of nominal 95% CI    0.9470 0.0071     0.9331      0.9609
#>   Bias corrected coverage of nominal 95% CI    0.9500 0.0069     0.9365      0.9635
#>                      Power of 5% level test    0.9740 0.0050     0.9641      0.9839
#> 
#>  Method = Cox, n = 50, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5123     NA         NA          NA
#>                       Median point estimate   -0.5133     NA         NA          NA
#>                      Average standard error    0.0926     NA         NA          NA
#>                       Median standard error    0.0900     NA         NA          NA
#>                      Bias in point estimate   -0.0123 0.0100    -0.0318      0.0073
#>                    Empirical standard error    0.3158 0.0071     0.3020      0.3297
#>                          Mean squared error    0.0998 0.0049     0.0903      0.1094
#>  % gain in precision relative to method Cox    1.0000 0.0000     1.0000      1.0000
#>                  Model-based standard error    0.3043 0.0005     0.3035      0.3052
#>          Relative % error in standard error   -3.6377 2.1601    -7.8714      0.5960
#>                  Coverage of nominal 95% CI    0.9450 0.0072     0.9309      0.9591
#>   Bias corrected coverage of nominal 95% CI    0.9440 0.0073     0.9297      0.9583
#>                      Power of 5% level test    0.3820 0.0154     0.3519      0.4121
#> 
#>  Method = Exp, n = 50, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.3492     NA         NA          NA
#>                       Median point estimate   -0.3506     NA         NA          NA
#>                      Average standard error    0.0834     NA         NA          NA
#>                       Median standard error    0.0825     NA         NA          NA
#>                      Bias in point estimate    0.1508 0.0066     0.1379      0.1636
#>                    Empirical standard error    0.2073 0.0046     0.1982      0.2163
#>                          Mean squared error    0.0656 0.0028     0.0602      0.0711
#>  % gain in precision relative to method Cox    2.3224 0.0503     2.2238      2.4209
#>                  Model-based standard error    0.2888 0.0002     0.2884      0.2891
#>          Relative % error in standard error   39.3334 3.1175    33.2231     45.4436
#>                  Coverage of nominal 95% CI    0.9740 0.0050     0.9641      0.9839
#>   Bias corrected coverage of nominal 95% CI    0.9950 0.0022     0.9906      0.9994
#>                      Power of 5% level test    0.1450 0.0111     0.1232      0.1668
#> 
#>  Method = RP(2), n = 50, baseline = Weibull 
#>                                              Estimate   MCSE Lower 2.5% Upper 97.5%
#>  Simulations with non-missing estimates/SEs 1000.0000     NA         NA          NA
#>                      Average point estimate   -0.5187     NA         NA          NA
#>                       Median point estimate   -0.5120     NA         NA          NA
#>                      Average standard error    0.0894     NA         NA          NA
#>                       Median standard error    0.0873     NA         NA          NA
#>                      Bias in point estimate   -0.0187 0.0100    -0.0382      0.0009
#>                    Empirical standard error    0.3155 0.0071     0.3017      0.3294
#>                          Mean squared error    0.0998 0.0047     0.0905      0.1091
#>  % gain in precision relative to method Cox    1.0020 0.0110     0.9804      1.0237
#>                  Model-based standard error    0.2989 0.0004     0.2982      0.2997
#>          Relative % error in standard error   -5.2531 2.1230    -9.4140     -1.0922
#>                  Coverage of nominal 95% CI    0.9380 0.0076     0.9231      0.9529
#>   Bias corrected coverage of nominal 95% CI    0.9430 0.0073     0.9286      0.9574
#>                      Power of 5% level test    0.4140 0.0156     0.3835      0.4445

Conclusions

With this vignette we showed how to simulate survival data and run a small, simple simulation study.

References