EpiNow2: Estimate real-time case counts and time-varying epidemiological parameters

R-CMD-check Codecov test coverage DOI

This package estimates the time-varying reproduction number, rate of spread, and doubling time using a range of open-source tools (Abbott et al.), and current best practices (Gostic et al.). It aims to help users avoid some of the limitations of naive implementations in a framework that is informed by community feedback and is under active development.

It estimates the time-varying reproduction number on cases by date of infection (using a similar approach to that implemented in the {EpiEstim}). Imputed infections are then mapped to observed data (for example cases by date of report) via a series of uncertain delay distributions (in the examples in the package documentation these are an incubation period and a reporting delay) and a reporting model that can include weekly periodicity.

Uncertainty is propagated from all inputs into the final parameter estimates, helping to mitigate spurious findings. This is handled internally. The time-varying reproduction estimates and the uncertain generation time also give time-varying estimates of the rate of growth.

The default model uses a non-stationary Gaussian process to estimate the time-varying reproduction number. Other options include:

The documentation for estimate_infections provides examples of the different options available.

Forecasting is also supported for the time-varying reproduction number, infections and reported cases. The time-varying reproduction number can be forecast forwards in time using an integration with the {EpiSoon} package, and converted to a case forecast using the renewal equation. Alternatively, the time-varying reproduction number and cases can be forecast using a Gaussian process.

As a standalone tool, non-parametric back-calculation is also supported. This uses a novel formulation based on a smoothed mean delay shift of reported cases combined with a Gaussian process to determine the most likely outbreak trajectory. Again see the documentation for estimate_infections for an example.

Installation

Install the stable version of the package:

install.packages("EpiNow2")

Install the stable development version of the package using {drat}:

install.packages("drat")
drat:::add("epiforecasts")
install.packages("EpiNow2")

Install the unstable development version of the package with:

remotes::install_github("epiforecasts/EpiNow2")

Windows users will need a working installation of Rtools in order to build the package from source. See here for a guide to installing Rtools for use with Stan (which is the statistical modeling platform used for the underlying model). For simple deployment/development a prebuilt docker image is also available (see documentation here).

Quick start

{EpiNow2} is designed to be used with a single function call or to be used in an ad-hoc fashion via individual function calls. In the following section we give an overview of the simple use case. For more on using each function see the function documentation. The core functions are: epinow, regional_epinow, estimate_infections, and forecast_infections. estimate_infections can be use on its own to infer the underlying infection case curve from reported cases with Rt optionally returned (on by default). Estimating the underlying infection case curve alone is substantially less computationally demanding than also estimating Rt.

Reporting delays, incubation period and generation time

Distributions can either be fitted using package functionality or determined elsewhere and then defined with uncertainty for use in {EpiNow2}. When data is supplied a subsampled bootstrapped lognormal will be fit (to account for uncertainty in the observed data without being biased by changes in incidence). An arbitrary number of delay distributions are supported with the most common use case likely to be a incubation period followed by a reporting delay.

reporting_delay <- EpiNow2::bootstrapped_dist_fit(rlnorm(100, log(6), 1))
## Set max allowed delay to 30 days to truncate computation
reporting_delay$max <- 30

reporting_delay
#> $mean
#> [1] 1.680835
#> 
#> $mean_sd
#> [1] 0.1177547
#> 
#> $sd
#> [1] 1.036805
#> 
#> $sd_sd
#> [1] 0.113312
#> 
#> $max
#> [1] 30

Here we define the incubation period and generation time based on literature estimates for Covid-19 (see here for the code that generates these estimates).

generation_time <- list(mean = EpiNow2::covid_generation_times[1, ]$mean,
                        mean_sd = EpiNow2::covid_generation_times[1, ]$mean_sd,
                         sd = EpiNow2::covid_generation_times[1, ]$sd,
                         sd_sd = EpiNow2::covid_generation_times[1, ]$sd_sd,
                         max = 30)

incubation_period <- list(mean = EpiNow2::covid_incubation_period[1, ]$mean,
                          mean_sd = EpiNow2::covid_incubation_period[1, ]$mean_sd,
                          sd = EpiNow2::covid_incubation_period[1, ]$sd,
                          sd_sd = EpiNow2::covid_incubation_period[1, ]$sd_sd,
                          max = 30)

epinow

This function represents the core functionality of the package and includes results reporting, plotting and optional saving. It requires a data frame of cases by date of report and the distributions defined above. An additional forecasting module is supported via EpiSoon and companion packages (see documentation for an example).

Load example case data from {EpiNow2}.

reported_cases <- EpiNow2::example_confirmed[1:50]

head(reported_cases)
#>          date confirm
#> 1: 2020-02-22      14
#> 2: 2020-02-23      62
#> 3: 2020-02-24      53
#> 4: 2020-02-25      97
#> 5: 2020-02-26      93
#> 6: 2020-02-27      78

Estimate cases by date of infection, the time-varying reproduction number, the rate of growth and forecast these estimates into the future by 7 days. Summarise the posterior and return a summary table and plots for reporting purposes. If a target_folder is supplied results can be internally saved (with the option to also turn off explicit returning of results). Note that for real use cases more samples and a longer warm up may be needed.

estimates <- EpiNow2::epinow(reported_cases = reported_cases, 
                             generation_time = generation_time,
                             delays = list(incubation_period, reporting_delay),
                             horizon = 7, samples = 1000, warmup = 200, 
                             cores = 4, chains = 4, verbose = TRUE, 
                             adapt_delta = 0.95)
#> DEBUG [2020-08-26 15:22:46] Running for 1000 samples (across 4 chains each with a warm up of 200 iterations each) and 71 time steps of which 7 are a forecast

names(estimates)
#> [1] "estimates"                "estimated_reported_cases"
#> [3] "summary"                  "plots"

Both summary measures and posterior samples are returned for all parameters in an easily explored format.

estimates$estimates
#> $samples
#>                 variable        parameter time       date sample       value
#>      1:       infections       infections    1 2020-02-08      1    2.206644
#>      2:       infections       infections    2 2020-02-09      1    9.431478
#>      3:       infections       infections    3 2020-02-10      1   21.395270
#>      4:       infections       infections    4 2020-02-11      1   30.846361
#>      5:       infections       infections    5 2020-02-12      1   46.237732
#>     ---                                                                     
#> 255067: prior_infections prior_infections   67 2020-04-14      1 2647.130393
#> 255068: prior_infections prior_infections   68 2020-04-15      1 2581.665553
#> 255069: prior_infections prior_infections   69 2020-04-16      1 2517.819692
#> 255070: prior_infections prior_infections   70 2020-04-17      1 2455.552770
#> 255071: prior_infections prior_infections   71 2020-04-18      1 2394.825740
#>         strat     type
#>      1:  <NA> estimate
#>      2:  <NA> estimate
#>      3:  <NA> estimate
#>      4:  <NA> estimate
#>      5:  <NA> estimate
#>     ---               
#> 255067:  <NA> forecast
#> 255068:  <NA> forecast
#> 255069:  <NA> forecast
#> 255070:  <NA> forecast
#> 255071:  <NA> forecast
#> 
#> $summarised
#>            date       variable strat     type     bottom         top
#>   1: 2020-02-22              R  <NA> estimate   0.906830    1.568452
#>   2: 2020-02-23              R  <NA> estimate   1.017340    1.560870
#>   3: 2020-02-24              R  <NA> estimate   1.101175    1.552501
#>   4: 2020-02-25              R  <NA> estimate   1.196505    1.554018
#>   5: 2020-02-26              R  <NA> estimate   1.316793    1.607314
#>  ---                                                                
#> 322: 2020-04-14 reported_cases  <NA> forecast 695.000000 6529.000000
#> 323: 2020-04-15 reported_cases  <NA> forecast 419.000000 5621.000000
#> 324: 2020-04-16 reported_cases  <NA> forecast 364.000000 6466.000000
#> 325: 2020-04-17 reported_cases  <NA> forecast 397.000000 8587.000000
#> 326: 2020-04-18 reported_cases  <NA> forecast 289.000000 7005.000000
#>            lower       upper      median        mean           sd
#>   1:    1.044737    1.309596    1.223438    1.222365 2.022448e-01
#>   2:    1.139059    1.368749    1.266639    1.270122 1.696516e-01
#>   3:    1.230104    1.417863    1.323299    1.325053 1.385959e-01
#>   4:    1.312648    1.467412    1.381972    1.385947 1.109667e-01
#>   5:    1.402101    1.527912    1.448480    1.451127 9.032870e-02
#>  ---                                                             
#> 322: 1254.000000 3183.000000 2772.000000 3386.243000 2.378292e+03
#> 323:  903.000000 2562.000000 2342.000000 3033.583000 2.574506e+03
#> 324:  924.000000 2942.000000 2685.500000 3657.875000 4.357484e+03
#> 325: 1191.000000 3496.000000 3096.000000 5197.417000 1.469759e+04
#> 326:  854.000000 2691.000000 2355.000000 5417.498000 3.578082e+04

Reported cases are returned separately in order to ease reporting of forecasts and model evaluation.

estimates$estimated_reported_cases
#> $samples
#>              date sample cases  type
#>     1: 2020-02-22      1    93 gp_rt
#>     2: 2020-02-23      1   170 gp_rt
#>     3: 2020-02-24      1   119 gp_rt
#>     4: 2020-02-25      1   200 gp_rt
#>     5: 2020-02-26      1    78 gp_rt
#>    ---                              
#> 56996: 2020-04-14   1000  1464 gp_rt
#> 56997: 2020-04-15   1000  1581 gp_rt
#> 56998: 2020-04-16   1000  1439 gp_rt
#> 56999: 2020-04-17   1000  1499 gp_rt
#> 57000: 2020-04-18   1000  2037 gp_rt
#> 
#> $summarised
#>           date  type bottom   top lower upper median     mean          sd
#>  1: 2020-02-22 gp_rt     31   175    48   108   96.5  104.307    50.51481
#>  2: 2020-02-23 gp_rt     43   243    83   163  139.5  148.957    69.30538
#>  3: 2020-02-24 gp_rt     46   269    96   187  159.0  168.320    74.53631
#>  4: 2020-02-25 gp_rt     56   303    91   186  161.0  175.871    84.52076
#>  5: 2020-02-26 gp_rt     60   297    85   175  156.5  169.978    76.10236
#>  6: 2020-02-27 gp_rt     58   397   123   247  215.0  237.670   119.01744
#>  7: 2020-02-28 gp_rt     74   524   161   332  287.5  320.237   158.99453
#>  8: 2020-02-29 gp_rt     91   504   155   313  272.0  298.097   139.30512
#>  9: 2020-03-01 gp_rt    112   652   186   399  350.0  381.556   183.33621
#> 10: 2020-03-02 gp_rt    102   697   235   473  398.0  426.200   210.03135
#> 11: 2020-03-03 gp_rt    119   745   244   476  393.0  431.998   219.71267
#> 12: 2020-03-04 gp_rt     98   708   214   447  389.0  428.876   212.76694
#> 13: 2020-03-05 gp_rt    190  1019   302   633  567.0  615.846   294.66908
#> 14: 2020-03-06 gp_rt    259  1528   403   885  784.5  879.237   444.90818
#> 15: 2020-03-07 gp_rt    261  1474   404   846  765.0  846.184   425.04547
#> 16: 2020-03-08 gp_rt    311  1963   633  1268 1070.5 1167.124   609.49882
#> 17: 2020-03-09 gp_rt    352  2301   753  1511 1252.5 1354.985   656.34476
#> 18: 2020-03-10 gp_rt    406  2451   796  1609 1322.5 1427.365   694.09473
#> 19: 2020-03-11 gp_rt    300  2380   726  1502 1323.0 1463.005   759.85016
#> 20: 2020-03-12 gp_rt    565  3445  1198  2362 1911.0 2086.378  1040.60247
#> 21: 2020-03-13 gp_rt    924  5119  1399  2979 2636.0 2919.580  1477.38699
#> 22: 2020-03-14 gp_rt    806  4664  1450  2936 2463.0 2705.797  1303.82767
#> 23: 2020-03-15 gp_rt    831  5917  2178  4188 3215.5 3537.972  1715.04663
#> 24: 2020-03-16 gp_rt    897  6518  2137  4256 3554.0 3883.479  1899.95173
#> 25: 2020-03-17 gp_rt   1176  6454  1931  3889 3351.5 3703.498  1747.03212
#> 26: 2020-03-18 gp_rt   1209  6201  1929  3789 3153.0 3491.560  1735.67271
#> 27: 2020-03-19 gp_rt   1262  7499  2709  5265 4191.0 4512.308  2115.62561
#> 28: 2020-03-20 gp_rt   1370  9582  3310  6452 5533.0 5950.403  2785.43400
#> 29: 2020-03-21 gp_rt   1392  8500  2949  5727 4694.5 5008.814  2342.54236
#> 30: 2020-03-22 gp_rt   1689 10023  3075  6316 5376.5 5891.644  2845.84631
#> 31: 2020-03-23 gp_rt   1820 10445  3198  6456 5507.0 6033.768  2923.72504
#> 32: 2020-03-24 gp_rt   1773  8987  3287  6124 5014.5 5363.948  2520.06870
#> 33: 2020-03-25 gp_rt   1332  7680  2535  4977 4247.0 4606.321  2182.78039
#> 34: 2020-03-26 gp_rt   1830  9924  3106  6214 5407.0 5803.149  2741.00916
#> 35: 2020-03-27 gp_rt   2308 11236  3566  7045 6161.0 6619.664  3299.92875
#> 36: 2020-03-28 gp_rt   1623  9285  3015  5923 5047.0 5427.988  2553.89665
#> 37: 2020-03-29 gp_rt   1677 10351  3857  7317 5761.5 6183.392  3004.75043
#> 38: 2020-03-30 gp_rt   1426  9718  3435  6368 5267.0 5739.116  2743.96398
#> 39: 2020-03-31 gp_rt   1328  8490  2644  5417 4719.5 5104.631  2417.34093
#> 40: 2020-04-01 gp_rt   1069  6956  2727  5083 4007.0 4277.459  1925.56661
#> 41: 2020-04-02 gp_rt   1260  8153  2611  5088 4376.0 4823.408  2350.00527
#> 42: 2020-04-03 gp_rt   1493 10095  2620  5668 5183.0 5747.842  3014.38984
#> 43: 2020-04-04 gp_rt   1205  7391  2393  4770 4106.5 4460.568  2107.54760
#> 44: 2020-04-05 gp_rt   1366  8411  2658  5458 4610.0 4964.977  2407.32976
#> 45: 2020-04-06 gp_rt   1229  7727  2552  4974 4203.0 4636.208  2238.66048
#> 46: 2020-04-07 gp_rt   1153  6839  2220  4397 3573.5 3992.999  2067.80000
#> 47: 2020-04-08 gp_rt   1208  5882  1661  3414 3025.0 3343.165  1639.06683
#> 48: 2020-04-09 gp_rt    789  6664  2286  4486 3515.5 3924.625  2061.85895
#> 49: 2020-04-10 gp_rt   1279  7899  2743  5200 4164.5 4597.328  2373.24618
#> 50: 2020-04-11 gp_rt    761  5915  2123  4139 3261.5 3613.558  1814.91540
#> 51: 2020-04-12 gp_rt    896  7087  2086  4609 3600.5 4110.504  2342.33555
#> 52: 2020-04-13 gp_rt    825  7385  1686  3904 3265.5 3911.608  2376.82439
#> 53: 2020-04-14 gp_rt    695  6529  1254  3183 2772.0 3386.243  2378.29173
#> 54: 2020-04-15 gp_rt    419  5621   903  2562 2342.0 3033.583  2574.50603
#> 55: 2020-04-16 gp_rt    364  6466   924  2942 2685.5 3657.875  4357.48423
#> 56: 2020-04-17 gp_rt    397  8587  1191  3496 3096.0 5197.417 14697.58863
#> 57: 2020-04-18 gp_rt    289  7005   854  2691 2355.0 5417.498 35780.82391
#>           date  type bottom   top lower upper median     mean          sd

A summary table is returned for rapidly understanding the results and for reporting purposes.

estimates$summary
#>                                  measure              estimate
#> 1: New confirmed cases by infection date     2180 (85 -- 7609)
#> 2:        Expected change in daily cases                Unsure
#> 3:            Effective reproduction no.    0.8 (0.31 -- 1.49)
#> 4:                        Rate of growth -0.06 (-0.22 -- 0.12)
#> 5:          Doubling/halving time (days)   -12.1 (5.6 -- -3.2)
#>     numeric_estimate
#> 1: <data.table[1x5]>
#> 2:              0.69
#> 3: <data.table[1x5]>
#> 4: <data.table[1x5]>
#> 5: <data.table[1x3]>

A range of plots are returned (with the single summary plot shown below).

estimates$plots$summary

Regional epinow

This function runs the the epinow function across multiple regions in an efficient manner.

Define cases in multiple regions delineated by the region variable.

reported_cases <- data.table::rbindlist(list(
   data.table::copy(reported_cases)[, region := "testland"],
   reported_cases[, region := "realland"]))

head(reported_cases)
#>          date confirm   region
#> 1: 2020-02-22      14 testland
#> 2: 2020-02-23      62 testland
#> 3: 2020-02-24      53 testland
#> 4: 2020-02-25      97 testland
#> 5: 2020-02-26      93 testland
#> 6: 2020-02-27      78 testland

Run the pipeline on each region in turn. The commented code (requires the {future} package) can be used to run regions in parallel (when in most scenarios cores should be set to 1).

## future::plan("multisession")
estimates <- EpiNow2::regional_epinow(reported_cases = reported_cases, 
                                      generation_time = generation_time,
                                      delays = list(incubation_period, reporting_delay),
                                      horizon = 7, samples = 1000, warmup = 200,
                                      cores = 4, chains = 4, adapt_delta = 0.95,
                                      verbose = TRUE)
#> INFO [2020-08-26 15:25:06] Reporting estimates using data up to: 2020-04-11
#> INFO [2020-08-26 15:25:06] Producing estimates for: testland, realland
#> INFO [2020-08-26 15:25:06] Initialising estimates for: testland
#> DEBUG [2020-08-26 15:25:06] Running for 1000 samples (across 4 chains each with a warm up of 200 iterations each) and 71 time steps of which 7 are a forecast
#> INFO [2020-08-26 15:27:15] Completed estimates for: testland
#> INFO [2020-08-26 15:27:15] Initialising estimates for: realland
#> DEBUG [2020-08-26 15:27:15] Running for 1000 samples (across 4 chains each with a warm up of 200 iterations each) and 71 time steps of which 7 are a forecast
#> INFO [2020-08-26 15:29:58] Completed estimates for: realland

Results from each region are stored in a regional list with across region summary measures and plots stored in a summary list. All results can be set to be internally saved by setting the target_folder and summary_dir arguments.

Summary measures that are returned include a table formatted for reporting (along with raw results for further processing).

estimates$summary$summarised_results$table
#>      Region New confirmed cases by infection date
#> 1: realland                    2236 (125 -- 7642)
#> 2: testland                     2066 (37 -- 7156)
#>    Expected change in daily cases Effective reproduction no.
#> 1:                         Unsure         0.81 (0.23 -- 1.4)
#> 2:                         Unsure        0.78 (0.18 -- 1.39)
#>           Rate of growth Doubling/halving time (days)
#> 1:  -0.05 (-0.2 -- 0.14)            -13.2 (5 -- -3.4)
#> 2: -0.06 (-0.25 -- 0.12)          -11.2 (5.8 -- -2.8)

A range of plots are again returned (with the single summary plot shown below).

estimates$summary$summary_plot

Reporting templates

Rmarkdown templates are provided in the package (templates) for semi-automated reporting of estimates. These are currently undocumented but an example integration can be seen here. If using these templates to report your results please highlight our limitations as these are key to understanding the results from {EpiNow2} .

Contributing

File an issue here if you have identified an issue with the package. Please note that due to operational constraints priority will be given to users informing government policy or offering methodological insights. We welcome all contributions, in particular those that improve the approach or the robustness of the code base.