rsimsum
is an R package that can compute summary statistics from simulation studies. It is a port to R of the user-written command simsum
in Stata (White I.R., 2010).
The aim of rsimsum
is helping reporting of simulation studies, including understanding the role of chance in results of simulation studies. Specifically, rsimsum
can compute Monte Carlo standard errors of summary statistics, defined as the standard deviation of the estimated summary statistic.
Formula for summary statistics and Monte Carlo standard errors are presented in the next section.
We will use th following notation throughout this vignette:
The first summary statistic of interest is bias, which quantifies whether the estimator targets the true value \(\theta\) on average. Bias is calculated as:
\[\text{Bias} = \frac{1}{n_{\text{sim}}} \sum_{i = 1} ^ {n_{\text{sim}}} \hat{\theta}_i - \theta\]
The Monte Carlo standard error of bias is calculated as:
\[\text{MCSE(Bias)} = \sqrt{\frac{\frac{1}{n_{\text{sim}} - 1} \sum_{i = 1} ^ {n_{\text{sim}}} (\hat{\theta}_i - \bar{\theta}) ^ 2}{n_{\text{sim}}}}\]
The empirical standard error of \(\theta\) depends only on \(\hat{\theta}\) and does not require any knowledge of \(\theta\). It estimates the standard deviation of \(\hat{\theta}\) over the \(n_{\text{sim}}\) replications:
\[\text{Empirical SE} = \sqrt{\frac{1}{n_{\text{sim}} - 1} \sum_{i = 1} ^ {n_{\text{sim}}} (\hat{\theta}_i - \bar{\theta}) ^ 2}\]
The Monte Carlo standard error is calculated as:
\[\text{MCSE(Emp. SE)} = \frac{\text{Emp. SE}}{\sqrt{2 (n_{\text{sim}} - 1)}}\]
When comparing different methods, the relative precision of a given method B against a reference method A is computed as:
\[\text{Relative % increase in precision} = \left( \frac{\text{Emp. SE}_A}{\text{Emp. SE}_B} \right) ^ 2\]
Its (approximated) Monte Carlo standard error is:
\[\text{MCSE(Relative % increase in precision)} \simeq 2 \frac{\text{Var}(\hat{\theta}_A)}{\text{Var}(\hat{\theta}_B)} \sqrt{\frac{1 - \rho^2_{AB}}{n_{\text{sim}} - 1}}\]
\(\rho^2_{AB}\) is the correlation of \(\hat{\theta}_A\) and \(\hat{\theta}_B\).
A measure that takes into account both precision and accuracy of a method is the mean squared error, which is the sum of the squared bias and variance of \(\hat{\theta}\):
\[\text{MSE} = \frac{1}{n_{\text{sim}}} = \sum_{i = 1} ^ {n_{\text{sim}}} (\hat{\theta}_i - \theta) ^ 2\]
The Monte Carlo standard error is:
\[\text{MCSE(MSE)} = \sqrt{\frac{\sum_{i = 1} ^ {n_{\text{sim}}} \left[ (\hat{\theta}_i - \theta) ^2 - \text{MSE} \right] ^ 2}{n_{\text{sim}} (n_{\text{sim}} - 1)}}\]
The model based standard error is computed by averaging the estimated standard errors for each replication:
\[\text{Model SE} = \sqrt{\frac{1}{n_{\text{sim}}} \sum_{i = 1} ^ {n_{\text{sim}}} \widehat{\text{Var}}(\hat{\theta}_i))}\]
Its (approximated) Monte Carlo standard error is computed as:
\[\text{MCSE(Model SE)} \simeq \sqrt{\frac{\text{Var}[\widehat{\text{Var}}(\hat{\theta}_i)]}{4 n_{\text{sim}} \widehat{\text{Model SE}}}}\]
The model standard error targets the empirical standard error. Hence, the relative error in the model standard error is an informative performance measure:
\[\text{Relative % error in model SE} = 100 \left( \frac{\text{Model SE}}{\text{Empirical SE}} - 1\right)\]
Its Monte Carlo standard error is computed as:
\[\text{MCSE(Relative % error in model SE)} = 100 \left( \frac{\text{Model SE}}{\text{Empirical SE}} \right) \sqrt{\frac{\text{Var}[\widehat{\text{Var}}(\hat{\theta}_i)]}{4 n_{\text{sim}} \widehat{\text{Model SE}} ^ 4} + \frac{1}{2(n_{\text{sim}} - 1)}}\]
Coverage is another key property of an estimator. It is defined as the probability that a confidence interval contains the true value \(\theta\), and computed as:
\[\text{Coverage} = \frac{1}{n_{\text{sim}}} \sum_{i = 1} ^ {n_{\text{sim}}} I(\hat{\theta}_{i, \text{low}} \le \theta \le \hat{\theta}_{i, \text{upp}})\]
where \(I(\cdot)\) is the indicator function. The Monte Carlo standard error is computed as:
\[\text{MCSE(Coverage)} = \sqrt{\frac{\text{Coverage} \times (1 - \text{Coverage})}{n_{\text{sim}}}}\]
Under coverage is to be expected if:
Over coverage occurs as a result of \(\text{Models SE} > \text{Empirical SE}\).
As under coverage may be a result of bias, another useful summary statistic is bias corrected coverage:
\[\text{Bias corrected coverage} = \frac{1}{n_{\text{sim}}} \sum_{i = 1} ^ {n_{\text{sim}}} I(\hat{\theta}_{i, \text{low}} \le \bar{\theta} \le \hat{\theta}_{i, \text{upp}}) \]
The Monte Carlo standard error is analogously as coverage:
\[\text{MCSE(Bias corrected coverage)} = \sqrt{\frac{\text{Bias corrected coverage} \times (1 - \text{Bias corrected coverage})}{n_{\text{sim}}}}\]
Finally, power of a significance test at the \(\alpha\) level is defined as:
\[\text{Power} = \frac{1}{n_{\text{sim}}} \sum_{i = 1} ^ {n_{\text{sim}}} I \left[ |\hat{\theta}_i| \ge z_{\alpha/2} * \sqrt{\widehat{\text{Var}}(\hat{\theta_i})} \right]\]
The Monte Carlo standard error is analogously as coverage:
\[\text{MCSE(Power)} = \sqrt{\frac{\text{Power} \times (1 - \text{Power})}{n_{\text{sim}}}}\]
Further information on summary statistics for simulation studies can be found in White (2010) and Morris, White, and Crowther (2017).
With this example dataset included in rsimsum
we aim to summarise a simulation study comparing different ways to handle missing covariates when fitting a Cox model (White and Royston, 2009). One thousand datasets were simulated, each containing normally distributed covariates \(x\) and \(z\) and time-to-event outcome. Both covariates has \(20\%\) of their values deleted independently of all other variables so the data became missing completely at random (Little and Rubin, 2002). Each simulated dataset was analysed in three ways. A Cox model was fit to the complete cases (CC
). Then two methods of multiple imputation using chained equations (van Buuren, Boshuizen, and Knook, 1999) were used. The MI_LOGT
method multiply imputes the missing values of \(x\) and \(z\) with the outcome included as \(\log(t)\) and \(d\), where \(t\) is the survival time and \(d\) is the event indicator. The MI_T
method is the same except that \(\log(t)\) is replaced by \(t\) in the imputation model.
We load the data in the usual way:
library(rsimsum)
#>
#> Attaching package: 'rsimsum'
#> The following object is masked from 'package:utils':
#>
#> zip
data("MIsim", package = "rsimsum")
Let’s have a look at the first 10 rows of the dataset:
head(MIsim, n = 10)
#> dataset method b se
#> 1 1 CC 0.7067682 0.1465100
#> 2 1 MI_T 0.6841882 0.1255043
#> 3 1 MI_LOGT 0.7124795 0.1410814
#> 4 2 CC 0.3485008 0.1599879
#> 5 2 MI_T 0.4060082 0.1409831
#> 6 2 MI_LOGT 0.4287003 0.1358589
#> 7 3 CC 0.6495075 0.1521568
#> 8 3 MI_T 0.5028701 0.1300780
#> 9 3 MI_LOGT 0.5604051 0.1168512
#> 10 4 CC 0.4320534 0.1262853
The included variables are:
str(MIsim)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 3000 obs. of 4 variables:
#> $ dataset: num 1 1 1 2 2 2 3 3 3 4 ...
#> $ method : chr "CC" "MI_T" "MI_LOGT" "CC" ...
#> $ b : num 0.707 0.684 0.712 0.349 0.406 ...
#> $ se : num 0.147 0.126 0.141 0.16 0.141 ...
#> - attr(*, "label")= chr "simsum example: data from a simulation study comparing 3 ways to handle missing"
dataset
, the number of the simulated dataset;
method
, the method used (CC
, MI_LOGT
or MI_T
);
b
, the point estimate;
se
, the standard error of the point estimate.
We summarise the results of the simulation study by method using the simsum
function:
s1 <- simsum(data = MIsim, estvarname = "b", true = 0.50, se = "se", methodvar = "method", ref = "CC")
Using the default settings, Monte Carlo standard errors are computed and returned.
ss1 <- summary(s1)
ss1
#>
#> Call:
#> simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se",
#> methodvar = "method", ref = "CC")
#>
#> Method variable: method
#> Unique methods: CC, MI_LOGT, MI_T
#> Reference method: CC
#> By factors: none
#>
#> Summary statistics:
#>
#> Method = CC
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate 0.5168 NA NA NA
#> Median point estimate 0.5070 NA NA NA
#> Average standard error 0.0216 NA NA NA
#> Median standard error 0.0211 NA NA NA
#> Bias in point estimate 0.0168 0.0048 0.0074 0.0261
#> Empirical standard error 0.1511 0.0034 0.1445 0.1577
#> Mean squared error 0.0231 0.0011 0.0209 0.0253
#> % gain in precision relative to method CC 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.1471 0.0005 0.1461 0.1481
#> Relative % error in standard error -2.6594 2.2049 -6.9810 1.6622
#> Coverage of nominal 95% CI 0.9430 0.0073 0.9286 0.9574
#> Bias corrected coverage of nominal 95% CI 0.9400 0.0075 0.9253 0.9547
#> Power of 5% level test 0.9460 0.0071 0.9320 0.9600
#>
#> Method = MI_LOGT
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate 0.5009 NA NA NA
#> Median point estimate 0.4969 NA NA NA
#> Average standard error 0.0182 NA NA NA
#> Median standard error 0.0172 NA NA NA
#> Bias in point estimate 0.0009 0.0042 -0.0073 0.0091
#> Empirical standard error 0.1320 0.0030 0.1262 0.1378
#> Mean squared error 0.0174 0.0009 0.0157 0.0191
#> % gain in precision relative to method CC 1.3105 0.0394 1.2333 1.3876
#> Model-based standard error 0.1349 0.0006 0.1338 0.1361
#> Relative % error in standard error 2.2233 2.3318 -2.3469 6.7935
#> Coverage of nominal 95% CI 0.9490 0.0070 0.9354 0.9626
#> Bias corrected coverage of nominal 95% CI 0.9490 0.0070 0.9354 0.9626
#> Power of 5% level test 0.9690 0.0055 0.9583 0.9797
#>
#> Method = MI_T
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate 0.4988 NA NA NA
#> Median point estimate 0.4939 NA NA NA
#> Average standard error 0.0179 NA NA NA
#> Median standard error 0.0169 NA NA NA
#> Bias in point estimate -0.0012 0.0043 -0.0095 0.0071
#> Empirical standard error 0.1344 0.0030 0.1285 0.1403
#> Mean squared error 0.0181 0.0009 0.0163 0.0198
#> % gain in precision relative to method CC 1.2637 0.0384 1.1884 1.3390
#> Model-based standard error 0.1338 0.0006 0.1327 0.1350
#> Relative % error in standard error -0.4412 2.2690 -4.8883 4.0059
#> Coverage of nominal 95% CI 0.9430 0.0073 0.9286 0.9574
#> Bias corrected coverage of nominal 95% CI 0.9430 0.0073 0.9286 0.9574
#> Power of 5% level test 0.9630 0.0060 0.9513 0.9747
It is straightforward to produce a table of summary statistics for use in an R Markdown document:
library(knitr)
kable(get_data(ss1))
stat | est | mcse | method | lower | upper |
---|---|---|---|---|---|
nsim | 1000.0000000 | NA | CC | NA | NA |
thetamean | 0.5167662 | NA | CC | NA | NA |
thetamedian | 0.5069935 | NA | CC | NA | NA |
se2mean | 0.0216373 | NA | CC | NA | NA |
se2median | 0.0211425 | NA | CC | NA | NA |
bias | 0.0167662 | 0.0047787 | CC | 0.0074001 | 0.0261322 |
empse | 0.1511150 | 0.0033807 | CC | 0.1444889 | 0.1577411 |
mse | 0.0230940 | 0.0011338 | CC | 0.0208717 | 0.0253163 |
relprec | 1.0000000 | 0.0000000 | CC | 1.0000000 | 1.0000000 |
modelse | 0.1470963 | 0.0005274 | CC | 0.1460626 | 0.1481300 |
relerror | -2.6593842 | 2.2049438 | CC | -6.9809947 | 1.6622263 |
cover | 0.9430000 | 0.0073315 | CC | 0.9286305 | 0.9573695 |
bccover | 0.9400000 | 0.0075100 | CC | 0.9252807 | 0.9547193 |
power | 0.9460000 | 0.0071473 | CC | 0.9319915 | 0.9600085 |
nsim | 1000.0000000 | NA | MI_LOGT | NA | NA |
thetamean | 0.5009231 | NA | MI_LOGT | NA | NA |
thetamedian | 0.4969223 | NA | MI_LOGT | NA | NA |
se2mean | 0.0182091 | NA | MI_LOGT | NA | NA |
se2median | 0.0172157 | NA | MI_LOGT | NA | NA |
bias | 0.0009231 | 0.0041744 | MI_LOGT | -0.0072586 | 0.0091048 |
empse | 0.1320064 | 0.0029532 | MI_LOGT | 0.1262182 | 0.1377947 |
mse | 0.0174091 | 0.0008813 | MI_LOGT | 0.0156818 | 0.0191364 |
relprec | 1.3104634 | 0.0393747 | MI_LOGT | 1.2332904 | 1.3876365 |
modelse | 0.1349413 | 0.0006046 | MI_LOGT | 0.1337563 | 0.1361263 |
relerror | 2.2232593 | 2.3317773 | MI_LOGT | -2.3469401 | 6.7934588 |
cover | 0.9490000 | 0.0069569 | MI_LOGT | 0.9353647 | 0.9626353 |
bccover | 0.9490000 | 0.0069569 | MI_LOGT | 0.9353647 | 0.9626353 |
power | 0.9690000 | 0.0054808 | MI_LOGT | 0.9582579 | 0.9797421 |
nsim | 1000.0000000 | NA | MI_T | NA | NA |
thetamean | 0.4988092 | NA | MI_T | NA | NA |
thetamedian | 0.4939111 | NA | MI_T | NA | NA |
se2mean | 0.0179117 | NA | MI_T | NA | NA |
se2median | 0.0169319 | NA | MI_T | NA | NA |
bias | -0.0011908 | 0.0042510 | MI_T | -0.0095226 | 0.0071409 |
empse | 0.1344277 | 0.0030074 | MI_T | 0.1285333 | 0.1403221 |
mse | 0.0180542 | 0.0009112 | MI_T | 0.0162682 | 0.0198401 |
relprec | 1.2636816 | 0.0384238 | MI_T | 1.1883724 | 1.3389909 |
modelse | 0.1338346 | 0.0005856 | MI_T | 0.1326867 | 0.1349824 |
relerror | -0.4412233 | 2.2689748 | MI_T | -4.8883321 | 4.0058856 |
cover | 0.9430000 | 0.0073315 | MI_T | 0.9286305 | 0.9573695 |
bccover | 0.9430000 | 0.0073315 | MI_T | 0.9286305 | 0.9573695 |
power | 0.9630000 | 0.0059692 | MI_T | 0.9513006 | 0.9746994 |
Using get_data()
in combination with R packages such as xtable, kableExtra, tables can yield a variety of tables that should suit most purposes.
More information on producing tables directly from R can be found in the CRAN Task View on Reproducible Research.
In this section, we show how to plot and compare summary statistics using the popular R package ggplot.
Plotting bias by method with \(95\%\) confidence intervals based on Monte Carlo standard errors:
library(ggplot2)
ggplot(get_data(ss1, sstat = "bias"), aes(x = method, y = est, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, color = "red", lty = "dashed") +
geom_point() +
geom_errorbar(width = 1 / 3) +
theme_bw() +
labs(x = "Method", y = "Bias")
Conversely, say we want to visually compare coverage for the three methods compared with this simulation study:
ggplot(get_data(ss1, sstat = "cover"), aes(x = method, y = est, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0.95, color = "red", lty = "dashed") +
geom_point() +
geom_errorbar(width = 1 / 3) +
coord_cartesian(ylim = c(0, 1)) +
theme_bw() +
labs(x = "Method", y = "Coverage")
rsimsum
allows to automatically drop estimates and standard errors that are larger than a predefined value. Specifically, the argument of simsum
that control this behaviour are dropbig
, max
and semax
.
Set dropbig
to TRUE
, and then standardised estimates larger than max
in absolute value will be dropped; standard errors larger than semax
times the average standard error will be dropped too.
For instance, say we want to drop standardised estimates larger than \(4\) in absolute value and standard errors larger than \(1.5\) times the average standard error:
s1.2 <- simsum(data = MIsim, estvarname = "b", true = 0.50, se = "se", methodvar = "method", ref = "CC", dropbig = TRUE, max = 4, semax = 1.5)
Were any estimates dropped?
dropbig(s1.2)
#> Dropped point estimates:
#> # A tibble: 2 x 4
#> dataset method b se
#> <dbl> <chr> <dbl> <dbl>
#> 1 761. MI_T -0.148 0.143
#> 2 761. MI_LOGT -0.108 0.159
#>
#> Dropped standard errors:
#> # A tibble: 8 x 4
#> dataset method b se
#> <dbl> <chr> <dbl> <dbl>
#> 1 19. MI_T 0.460 0.216
#> 2 159. MI_T 0.860 0.228
#> 3 211. MI_T 0.464 0.215
#> 4 256. CC 0.278 0.219
#> 5 417. CC 0.601 0.210
#> 6 417. MI_LOGT 0.653 0.223
#> 7 551. MI_LOGT 0.543 0.227
#> 8 585. MI_LOGT 0.737 0.207
Yes they were!
Everything else works analogously as before; for instance, to summarise the results:
summary(s1.2)
#>
#> Call:
#> simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se",
#> methodvar = "method", ref = "CC", dropbig = TRUE, max = 4,
#> semax = 1.5)
#>
#> Method variable: method
#> Unique methods: CC, MI_LOGT, MI_T
#> Reference method: CC
#> By factors: none
#>
#> Summary statistics:
#>
#> Method = CC
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 998.0000 NA NA NA
#> Average point estimate 0.5169 NA NA NA
#> Median point estimate 0.5070 NA NA NA
#> Average standard error 0.0216 NA NA NA
#> Median standard error 0.0211 NA NA NA
#> Bias in point estimate 0.0169 0.0048 0.0076 0.0263
#> Empirical standard error 0.1511 0.0034 0.1444 0.1577
#> Mean squared error 0.0231 0.0011 0.0209 0.0253
#> % gain in precision relative to method CC 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.1469 0.0005 0.1459 0.1479
#> Relative % error in standard error -2.7304 2.2043 -7.0508 1.5900
#> Coverage of nominal 95% CI 0.9429 0.0073 0.9285 0.9573
#> Bias corrected coverage of nominal 95% CI 0.9399 0.0075 0.9251 0.9546
#> Power of 5% level test 0.9469 0.0071 0.9330 0.9608
#>
#> Method = MI_LOGT
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 996.0000 NA NA NA
#> Average point estimate 0.5011 NA NA NA
#> Median point estimate 0.4960 NA NA NA
#> Average standard error 0.0181 NA NA NA
#> Median standard error 0.0172 NA NA NA
#> Bias in point estimate 0.0011 0.0041 -0.0070 0.0092
#> Empirical standard error 0.1305 0.0029 0.1248 0.1363
#> Mean squared error 0.0170 0.0008 0.0154 0.0186
#> % gain in precision relative to method CC 1.3389 0.0404 1.2596 1.4181
#> Model-based standard error 0.1346 0.0006 0.1335 0.1357
#> Relative % error in standard error 3.0930 2.3522 -1.5173 7.7033
#> Coverage of nominal 95% CI 0.9498 0.0069 0.9362 0.9634
#> Bias corrected coverage of nominal 95% CI 0.9498 0.0069 0.9362 0.9634
#> Power of 5% level test 0.9699 0.0054 0.9593 0.9805
#>
#> Method = MI_T
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 996.0000 NA NA NA
#> Average point estimate 0.4992 NA NA NA
#> Median point estimate 0.4941 NA NA NA
#> Average standard error 0.0178 NA NA NA
#> Median standard error 0.0169 NA NA NA
#> Bias in point estimate -0.0008 0.0042 -0.0091 0.0074
#> Empirical standard error 0.1326 0.0030 0.1268 0.1384
#> Mean squared error 0.0176 0.0008 0.0160 0.0192
#> % gain in precision relative to method CC 1.2973 0.0398 1.2193 1.3752
#> Model-based standard error 0.1335 0.0006 0.1324 0.1346
#> Relative % error in standard error 0.6505 2.2942 -3.8460 5.1471
#> Coverage of nominal 95% CI 0.9438 0.0073 0.9295 0.9581
#> Bias corrected coverage of nominal 95% CI 0.9438 0.0073 0.9295 0.9581
#> Power of 5% level test 0.9639 0.0059 0.9523 0.9754
data("relhaz", package = "rsimsum")
Let’s have a look at the first 10 rows of the dataset:
head(relhaz, n = 10)
#> dataset n baseline theta se model
#> 1 1 50 Exponential -0.88006151 0.3330172 Cox
#> 2 2 50 Exponential -0.81460242 0.3253010 Cox
#> 3 3 50 Exponential -0.14262887 0.3050516 Cox
#> 4 4 50 Exponential -0.33251820 0.3144033 Cox
#> 5 5 50 Exponential -0.48269940 0.3064726 Cox
#> 6 6 50 Exponential -0.03160756 0.3097203 Cox
#> 7 7 50 Exponential -0.23578090 0.3121350 Cox
#> 8 8 50 Exponential -0.05046332 0.3136058 Cox
#> 9 9 50 Exponential -0.22378715 0.3066037 Cox
#> 10 10 50 Exponential -0.45326446 0.3330173 Cox
The included variables are:
str(relhaz)
#> 'data.frame': 12000 obs. of 6 variables:
#> $ dataset : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ n : num 50 50 50 50 50 50 50 50 50 50 ...
#> $ baseline: chr "Exponential" "Exponential" "Exponential" "Exponential" ...
#> $ theta : num -0.88 -0.815 -0.143 -0.333 -0.483 ...
#> $ se : num 0.333 0.325 0.305 0.314 0.306 ...
#> $ model : chr "Cox" "Cox" "Cox" "Cox" ...
dataset
, simulated dataset number;
n
, sample size of the simulate dataset;
baseline
, baseline hazard function of the simulated dataset;
model
, method used (Cox model or Royston-Parmar model with 2 degrees of freedom);
theta
, point estimate for the log-hazard ratio;
se
, standard error of the point estimate.
rsimsum
can summarise results from simulation studies with plenty of data-generating mechanisms. For instance, with this example we show how to compute summary statistics by baseline hazard function and sample size.
In order to summarise results by data-generating factors, it is sufficient to define the “by” factors in the call to simsum
:
s2 <- simsum(data = relhaz, estvarname = "theta", true = -0.50, se = "se", methodvar = "model", by = c("baseline", "n"))
#> `ref` was not specified, Cox set as the reference
s2
#>
#> Call:
#> simsum(data = relhaz, estvarname = "theta", true = -0.5, se = "se",
#> methodvar = "model", by = c("baseline", "n"))
#>
#> Method variable: model
#> Unique methods: Cox, Exp, RP(2)
#> Reference method: Cox
#>
#> By factors: baseline, n
#>
#> Monte Carlo standard errors were computed.
Summarising the results will be printed out for each method and combination of data-generating factors:
ss2 <- summary(s2)
ss2
#>
#> Call:
#> simsum(data = relhaz, estvarname = "theta", true = -0.5, se = "se",
#> methodvar = "model", by = c("baseline", "n"))
#>
#> Method variable: model
#> Unique methods: Cox, Exp, RP(2)
#> Reference method: Cox
#> By factors: baseline, n
#>
#> Summary statistics:
#>
#> Method = Cox, baseline = Exponential, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5004 NA NA NA
#> Median point estimate -0.4969 NA NA NA
#> Average standard error 0.0194 NA NA NA
#> Median standard error 0.0194 NA NA NA
#> Bias in point estimate -0.0004 0.0043 -0.0088 0.0081
#> Empirical standard error 0.1363 0.0030 0.1303 0.1422
#> Mean squared error 0.0185 0.0009 0.0168 0.0203
#> % gain in precision relative to method Cox 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.1394 0.0001 0.1393 0.1396
#> Relative % error in standard error 2.3293 2.2894 -2.1578 6.8164
#> Coverage of nominal 95% CI 0.9510 0.0068 0.9376 0.9644
#> Bias corrected coverage of nominal 95% CI 0.9520 0.0068 0.9388 0.9652
#> Power of 5% level test 0.9570 0.0064 0.9444 0.9696
#>
#> Method = Exp, baseline = Exponential, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.4996 NA NA NA
#> Median point estimate -0.5004 NA NA NA
#> Average standard error 0.0191 NA NA NA
#> Median standard error 0.0190 NA NA NA
#> Bias in point estimate 0.0004 0.0043 -0.0080 0.0088
#> Empirical standard error 0.1350 0.0030 0.1291 0.1409
#> Mean squared error 0.0182 0.0009 0.0165 0.0199
#> % gain in precision relative to method Cox 1.0192 0.0093 1.0009 1.0375
#> Model-based standard error 0.1381 0.0001 0.1379 0.1382
#> Relative % error in standard error 2.2813 2.2882 -2.2035 6.7661
#> Coverage of nominal 95% CI 0.9540 0.0066 0.9410 0.9670
#> Bias corrected coverage of nominal 95% CI 0.9540 0.0066 0.9410 0.9670
#> Power of 5% level test 0.9570 0.0064 0.9444 0.9696
#>
#> Method = RP(2), baseline = Exponential, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5015 NA NA NA
#> Median point estimate -0.5000 NA NA NA
#> Average standard error 0.0194 NA NA NA
#> Median standard error 0.0193 NA NA NA
#> Bias in point estimate -0.0015 0.0043 -0.0100 0.0069
#> Empirical standard error 0.1363 0.0030 0.1303 0.1423
#> Mean squared error 0.0186 0.0009 0.0168 0.0203
#> % gain in precision relative to method Cox 0.9996 0.0034 0.9929 1.0063
#> Model-based standard error 0.1392 0.0001 0.1391 0.1394
#> Relative % error in standard error 2.1570 2.2855 -2.3225 6.6365
#> Coverage of nominal 95% CI 0.9520 0.0068 0.9388 0.9652
#> Bias corrected coverage of nominal 95% CI 0.9520 0.0068 0.9388 0.9652
#> Power of 5% level test 0.9590 0.0063 0.9467 0.9713
#>
#> Method = Cox, baseline = Weibull, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5057 NA NA NA
#> Median point estimate -0.5031 NA NA NA
#> Average standard error 0.0173 NA NA NA
#> Median standard error 0.0172 NA NA NA
#> Bias in point estimate -0.0057 0.0040 -0.0136 0.0022
#> Empirical standard error 0.1274 0.0029 0.1219 0.1330
#> Mean squared error 0.0163 0.0008 0.0147 0.0178
#> % gain in precision relative to method Cox 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.1316 0.0001 0.1315 0.1317
#> Relative % error in standard error 3.2717 2.3104 -1.2566 7.8001
#> Coverage of nominal 95% CI 0.9440 0.0073 0.9297 0.9583
#> Bias corrected coverage of nominal 95% CI 0.9500 0.0069 0.9365 0.9635
#> Power of 5% level test 0.9740 0.0050 0.9641 0.9839
#>
#> Method = Exp, baseline = Weibull, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.3522 NA NA NA
#> Median point estimate -0.3500 NA NA NA
#> Average standard error 0.0164 NA NA NA
#> Median standard error 0.0163 NA NA NA
#> Bias in point estimate 0.1478 0.0028 0.1423 0.1533
#> Empirical standard error 0.0886 0.0020 0.0847 0.0925
#> Mean squared error 0.0297 0.0009 0.0279 0.0314
#> % gain in precision relative to method Cox 2.0701 0.0381 1.9954 2.1448
#> Model-based standard error 0.1280 0.0000 0.1279 0.1280
#> Relative % error in standard error 44.4489 3.2309 38.1165 50.7813
#> Coverage of nominal 95% CI 0.8900 0.0099 0.8706 0.9094
#> Bias corrected coverage of nominal 95% CI 0.9910 0.0030 0.9851 0.9969
#> Power of 5% level test 0.8870 0.0100 0.8674 0.9066
#>
#> Method = RP(2), baseline = Weibull, n = 250
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5072 NA NA NA
#> Median point estimate -0.5055 NA NA NA
#> Average standard error 0.0171 NA NA NA
#> Median standard error 0.0171 NA NA NA
#> Bias in point estimate -0.0072 0.0040 -0.0150 0.0007
#> Empirical standard error 0.1267 0.0028 0.1212 0.1323
#> Mean squared error 0.0161 0.0008 0.0146 0.0176
#> % gain in precision relative to method Cox 1.0116 0.0067 0.9984 1.0248
#> Model-based standard error 0.1309 0.0001 0.1308 0.1310
#> Relative % error in standard error 3.3058 2.3110 -1.2238 7.8353
#> Coverage of nominal 95% CI 0.9470 0.0071 0.9331 0.9609
#> Bias corrected coverage of nominal 95% CI 0.9500 0.0069 0.9365 0.9635
#> Power of 5% level test 0.9740 0.0050 0.9641 0.9839
#>
#> Method = Cox, baseline = Exponential, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5255 NA NA NA
#> Median point estimate -0.5226 NA NA NA
#> Average standard error 0.1016 NA NA NA
#> Median standard error 0.1003 NA NA NA
#> Bias in point estimate -0.0255 0.0104 -0.0458 -0.0052
#> Empirical standard error 0.3277 0.0073 0.3133 0.3421
#> Mean squared error 0.1079 0.0047 0.0988 0.1170
#> % gain in precision relative to method Cox 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.3188 0.0004 0.3180 0.3197
#> Relative % error in standard error -2.7072 2.1801 -6.9802 1.5658
#> Coverage of nominal 95% CI 0.9410 0.0075 0.9264 0.9556
#> Bias corrected coverage of nominal 95% CI 0.9400 0.0075 0.9253 0.9547
#> Power of 5% level test 0.3740 0.0153 0.3440 0.4040
#>
#> Method = Exp, baseline = Exponential, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5223 NA NA NA
#> Median point estimate -0.5181 NA NA NA
#> Average standard error 0.0977 NA NA NA
#> Median standard error 0.0970 NA NA NA
#> Bias in point estimate -0.0223 0.0101 -0.0422 -0.0024
#> Empirical standard error 0.3208 0.0072 0.3068 0.3349
#> Mean squared error 0.1033 0.0045 0.0945 0.1122
#> % gain in precision relative to method Cox 1.0431 0.0126 1.0183 1.0679
#> Model-based standard error 0.3126 0.0004 0.3118 0.3133
#> Relative % error in standard error -2.5836 2.1821 -6.8603 1.6932
#> Coverage of nominal 95% CI 0.9370 0.0077 0.9219 0.9521
#> Bias corrected coverage of nominal 95% CI 0.9390 0.0076 0.9242 0.9538
#> Power of 5% level test 0.3850 0.0154 0.3548 0.4152
#>
#> Method = RP(2), baseline = Exponential, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5337 NA NA NA
#> Median point estimate -0.5269 NA NA NA
#> Average standard error 0.1005 NA NA NA
#> Median standard error 0.0993 NA NA NA
#> Bias in point estimate -0.0337 0.0105 -0.0543 -0.0131
#> Empirical standard error 0.3323 0.0074 0.3177 0.3468
#> Mean squared error 0.1114 0.0049 0.1018 0.1210
#> % gain in precision relative to method Cox 0.9726 0.0066 0.9597 0.9855
#> Model-based standard error 0.3171 0.0004 0.3162 0.3179
#> Relative % error in standard error -4.5711 2.1383 -8.7620 -0.3802
#> Coverage of nominal 95% CI 0.9370 0.0077 0.9219 0.9521
#> Bias corrected coverage of nominal 95% CI 0.9390 0.0076 0.9242 0.9538
#> Power of 5% level test 0.3920 0.0154 0.3617 0.4223
#>
#> Method = Cox, baseline = Weibull, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5123 NA NA NA
#> Median point estimate -0.5133 NA NA NA
#> Average standard error 0.0926 NA NA NA
#> Median standard error 0.0900 NA NA NA
#> Bias in point estimate -0.0123 0.0100 -0.0318 0.0073
#> Empirical standard error 0.3158 0.0071 0.3020 0.3297
#> Mean squared error 0.0998 0.0049 0.0903 0.1094
#> % gain in precision relative to method Cox 1.0000 0.0000 1.0000 1.0000
#> Model-based standard error 0.3043 0.0005 0.3035 0.3052
#> Relative % error in standard error -3.6377 2.1601 -7.8714 0.5960
#> Coverage of nominal 95% CI 0.9450 0.0072 0.9309 0.9591
#> Bias corrected coverage of nominal 95% CI 0.9440 0.0073 0.9297 0.9583
#> Power of 5% level test 0.3820 0.0154 0.3519 0.4121
#>
#> Method = Exp, baseline = Weibull, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.3492 NA NA NA
#> Median point estimate -0.3506 NA NA NA
#> Average standard error 0.0834 NA NA NA
#> Median standard error 0.0825 NA NA NA
#> Bias in point estimate 0.1508 0.0066 0.1379 0.1636
#> Empirical standard error 0.2073 0.0046 0.1982 0.2163
#> Mean squared error 0.0656 0.0028 0.0602 0.0711
#> % gain in precision relative to method Cox 2.3224 0.0503 2.2238 2.4209
#> Model-based standard error 0.2888 0.0002 0.2884 0.2891
#> Relative % error in standard error 39.3334 3.1175 33.2231 45.4436
#> Coverage of nominal 95% CI 0.9740 0.0050 0.9641 0.9839
#> Bias corrected coverage of nominal 95% CI 0.9950 0.0022 0.9906 0.9994
#> Power of 5% level test 0.1450 0.0111 0.1232 0.1668
#>
#> Method = RP(2), baseline = Weibull, n = 50
#> Estimate MCSE Lower 95% Upper 95%
#> Simulations with non-missing estimates/SEs 1000.0000 NA NA NA
#> Average point estimate -0.5187 NA NA NA
#> Median point estimate -0.5120 NA NA NA
#> Average standard error 0.0894 NA NA NA
#> Median standard error 0.0873 NA NA NA
#> Bias in point estimate -0.0187 0.0100 -0.0382 0.0009
#> Empirical standard error 0.3155 0.0071 0.3017 0.3294
#> Mean squared error 0.0998 0.0047 0.0905 0.1091
#> % gain in precision relative to method Cox 1.0020 0.0110 0.9804 1.0237
#> Model-based standard error 0.2989 0.0004 0.2982 0.2997
#> Relative % error in standard error -5.2531 2.1230 -9.4140 -1.0922
#> Coverage of nominal 95% CI 0.9380 0.0076 0.9231 0.9529
#> Bias corrected coverage of nominal 95% CI 0.9430 0.0073 0.9286 0.9574
#> Power of 5% level test 0.4140 0.0156 0.3835 0.4445
Tables could get cumbersome when there are many different data-generating mechanisms. Plots are generally easier to interpret, and can be generated as easily as before.
Say we want to compare bias for each method by baseline hazard function and sample size using faceting:
ggplot(get_data(ss2, sstat = "bias"), aes(x = model, y = est, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, color = "red", lty = "dashed") +
geom_point() +
geom_errorbar(width = 1 / 3) +
facet_grid(baseline ~ n) +
theme_bw() +
labs(x = "Method", y = "Bias")