Estimating incidence from cross-sectional data

29 April 2017


This vignette covers the use of the functions incprops, inccounts, and prevcounts. It is strongly recommended that the vignette “Introduction” be read before any use of this package.

The two primary functions for HIV incidence estimation are incprops and inccounts. These take as arguments a summary of arbitrarily complex survey data sets capturing HIV prevalence and prevalence of recent HIV infection among HIV positive subjects. They return estimates of incidence, and, if specification of multiple cross-sectional surveys is provided, incidence differences (point estimates, confidence intervals, p values, and subsidiary output). The principal reference for the methodology underlying this implementation is Kassanjee et al. Epidemiology, 2012.1 Further guidance is provided in Kassanjee, McWalter, Welte. AIDS Research and Human Retroviruses, 2014.2, and some hitherto unpublished technical details are in the appendix of vignette “Introduction”.

Analytical Paradigm

A fundamental element in the conception of the inctools is that the primary entry point into the critical methodology which inctools implements is the function incprops, which takes, as summary of the population state, estimates of HIV prevalence and the prevalence of recent infection amongst HIV positive subjects (including variance and covariance). These estimates, in turn, would usually be best derived by (potentially complex) preliminary analysis of the raw survey data documenting individual subjects’ status ascertainment, cluster and strata membership, weighting, etc.

The derivation of these prevalence estimates from raw data is in principle facilitated by various algorithms which are implemented in other packages, and are essentially independent of any of the innovation captured in this package. Use of inctools does not imply any specific approach to the preliminary analytical methodology, but the widely used package survey (totally independently maintained, with no link to inctools) may be suitable for many typical data sets. Additionally, to facilitate ‘naive’, self contained within the package, analysis, the ancilliary function prevcounts is provided. This takes survey counts and produces prevalence estimates (for both HIV and recent infection, including variance) under the simplifying assumption of individual level random selection of subjects from a single population group.

Using functions incprops and inccounts

The functions incprops and inccounts provide a near-identical interface, as further detailed in the help pages. Both functions take considerably pre-processed data specifying a recent infection test and a survey in which it is used: * estimates of false recency rate (FRR–\(\beta\)) and mean duration of recent infection (MDRI–\(\Omega_T\)) and their respective relative standard errors and recency time cutoff (T) * and survey data: proportions (counts, if using function inccounts) of HIV positives (PrevH) and positives for recency (PrevR) and their relative standard errors.

A critical distinction is that with the use of incprops, variance of prevalences, including covariance, is explicitly supplied, and with the use of inccounts, variance emerges from counts and design effects, and there is no covariance.

The output for a single survey is an estimate of incidence along with confidence intervals and RSE, estimated annual rate of infection and associated confidence intervals, and confidence intervals for parameters MDRI and FRR, which are deduced from input parameters.

The output for multiple surveys is the same output as for a single survey, along with pairwise comparisons of incidence rates, confidence intervals of differences, and tests of equality with p-values and RSE of differences.


Consider a single cross sectional survey summarised by:

and proposed to be processed by 10,000 bootstap iterations. Function incprops will calculate:

## $Incidence.Statistics
##   Incidence  CI.low   CI.up     RSE  Cov.PrevH.I Cor.PrevH.I
## 1   0.04265 0.03311 0.05314 0.11943 8.696208e-06   0.3074634
## $Annual.Risk.of.Infection
##      ARI ARI.CI.low ARI.CI.up
## 1 0.0418     0.0326    0.0518
##   CI.low CI.up
## 1  180.4 219.6
## $FRR.CI
##   CI.low  CI.up
## 1 0.0061 0.0139

Multiple surveys can be processed in a single call to incprops by supplying vectors of the parameters. Note that:

## $Incidence.Statistics
##   survey Incidence  CI.low   CI.up     RSE RSE.Inf.SS
## 1      1   0.04265  0.0324  0.0529 0.12264    0.05392
## 2      2   0.06774 0.05033 0.08515 0.13111    0.07302
## 3      3   0.04847 0.03961 0.05734 0.09332    0.06625
## $Incidence.Difference.Statistics
##   compare     Diff CI.Diff.low CI.Diff.up RSE.Diff RSE.Diff.Inf.SS p.value
## 1  1 vs 2 -0.02509    -0.04529   -0.00489  0.41077         0.21738 0.01492
## 2  1 vs 3 -0.00583    -0.01938    0.00773  1.18669         0.67779 0.39941
## 3  2 vs 1  0.02509     0.00489    0.04529  0.41077         0.21738 0.01492
## 4  2 vs 3  0.01927    -0.00027    0.03880  0.51737         0.30610 0.05326
## 5  3 vs 1  0.00583    -0.00773    0.01938  1.18669         0.67779 0.39941
## 6  3 vs 2 -0.01927    -0.03880    0.00027  0.51737         0.30610 0.05326
##   p.value.Inf.SS
## 1        <0.0001
## 2        0.14011
## 3        <0.0001
## 4        0.00109
## 5        0.14011
## 6        0.00109
##    CI.low   CI.up
## 1 180.400 219.600
## 2 155.304 204.696
## 3 158.832 201.168
## $FRR.CI
##   CI.low  CI.up
## 1 0.0061 0.0139
## 2 0.0055 0.0125
## 3 0.0161 0.0239

Function prevcounts

Function prevcounts, while not strictly necessary (and indeed not recommended for final inference on incidence based on real survey data, presumably obtained at great cost and with considerable complex sampling structure) turns counts of:

into (point) estimates (and variance) of prevalence of HIV and prevalence of recent infection among HIV the positives. At heart, this is a relatively simple multinomial distribution analysis (trinomial, in the case of complete coverage of recency testing amongst HIV positives) and could be accomplished without any significant innovation directly arising out of the core methods of this package, but function prevcounts at least provides a consistent entry point into this analysis, using arguments consistently named to align to the other functions, including appropriate design effects. The most likely use of prevcounts is probably indirectly through inccounts, but it is provided in user-exposed form for its intuitive supportive value and for recycling into user customisations beyond routine primary incidence estimation. Note that the use of prevcounts implies an interpretation of these counts which precludes non-null covariance of the prevalence of HIV and the prevalence of recency.

For a single survey:

prevcounts(N = 5000, N_H = 1000, N_testR = 1000, N_R = 70, DE_H = 1.1,
           DE_R = 1.5)
##   PrevH PrevR  RSE_PrevH RSE_PrevR
## 1   0.2  0.07 0.02966479 0.1411686

Note that:

Input can be provided for two or more surveys in vector form, using the concatenation expression c():

prevcounts (N = c(5000,6000), N_H = c(1000,1100), N_testR = c(950,1060),
            N_R = c(100,70), DE_H = c(1.1,1.2), DE_R = c(1.2,1.3))
##       PrevH      PrevR  RSE_PrevH RSE_PrevR
## 1 0.2000000 0.10526316 0.02966479 0.1036187
## 2 0.1833333 0.06603774 0.02984810 0.1317005

  1. Kassanjee, R., McWalter, T.A., Baernighausen, T. and Welte, A. “A new general biomarker-based incidence estimator.” Epidemiology; 2012, 23(5): 721-728.

  2. Kassanjee, R., McWalter, T.A. and Welte, A. “Short Communication: Defining Optimality of a Test for Recent Infection for HIV Incidence Surveillance.” AIDS Research and Human Retroviruses; 2014, 30(1): 45-49.