`WeightIt`

contains several functions for estimating and assessing balancing weights for observational studies. These weights can be used to estimate the causal parameters of marginal structural models. I will not go into the basics of causal inference methods here. For good introductory articles, see Austin (2011), Austin & Stuart (2015), Robins, Hernan & Brumback (2000), or Thoemmes & Ong (2016).

Typically, the analysis of an observation study might proceed as follows: identify the covariates for which balance is required; assess the quality of the data available, including missingness and measurement error; estimate weights that balance the covariates adequately; and estimate a treatment effect and corresponding standard error or confidence interval. This guide will go through all these steps for two observational studies: estimatign the causal effect of a point treatment on an outcome, and estimating the causal parameters of a marginal structural model with multiple treatment periods. This is not meant to be a difinitive guide, but rather an introduction to the relevant issues.

First we will use the Lalonde dataset to estimate the effect of a point treatment. We’ll use the version of the data set that resides withn the `cobalt`

package, which we will use later on as well. Here, we are interested in the average treatment effect on the treated (ATT).

treat | age | educ | race | married | nodegree | re74 | re75 | re78 |
---|---|---|---|---|---|---|---|---|

1 | 37 | 11 | black | 1 | 1 | 0 | 0 | 9930.0460 |

1 | 22 | 9 | hispan | 0 | 1 | 0 | 0 | 3595.8940 |

1 | 30 | 12 | black | 0 | 0 | 0 | 0 | 24909.4500 |

1 | 27 | 11 | black | 0 | 1 | 0 | 0 | 7506.1460 |

1 | 33 | 8 | black | 0 | 1 | 0 | 0 | 289.7899 |

1 | 22 | 9 | black | 0 | 1 | 0 | 0 | 4056.4940 |

We have our outcome (`re78`

), our treatment (`treat`

), and the covariates for which balance is desired (`age`

, `educ`

, `race`

, `married`

, `nodegree`

, `re74`

, and `re75`

). Using `cobalt`

, we can examine the initial imbalance on the covariates:

```
library("cobalt")
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, estimand = "ATT", m.threshold = .1)
```

```
## Balance Measures
## Type Diff.Un M.Threshold.Un
## age Contin. -0.3094 Not Balanced, >0.1
## educ Contin. 0.0550 Balanced, <0.1
## race_black Binary 0.6404 Not Balanced, >0.1
## race_hispan Binary -0.0827 Balanced, <0.1
## race_white Binary -0.5577 Not Balanced, >0.1
## married Binary -0.3236 Not Balanced, >0.1
## nodegree Binary 0.1114 Not Balanced, >0.1
## re74 Contin. -0.7211 Not Balanced, >0.1
## re75 Contin. -0.2903 Not Balanced, >0.1
##
## Balance tally for mean differences
## count
## Balanced, <0.1 2
## Not Balanced, >0.1 7
##
## Variable with the greatest mean difference
## Variable Diff.Un M.Threshold.Un
## re74 -0.7211 Not Balanced, >0.1
##
## Sample sizes
## Control Treated
## All 429 185
```

Based on this output, we can see that most variables are imbalanced in the sense that the standardized mean differences (for continuous variables) and differences in proportion (for binary variables) are greater than .1 for most variables. In particular, `re74`

and `re75`

are quite imabalanced, which is troubling given that they are likely strong predictors of the outcome. We will estimate weights using `weightit()`

to try to attain balance on these covariates.

First, we’ll start simple, and use inverse probability weights from propensity scores generated through logistic regression. We need to supply `weightit()`

with the formula for the model, the data set, the estimand (ATT), and the method of estimation (`"ps"`

) for propensity score weights).

```
library("WeightIt")
W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, estimand = "ATT", method = "ps")
W.out #print the ooutput
```

```
## A weightit object
## - method: "ps" (propensity score weighting)
## - number of obs.: 614
## - sampling weights: none
## - treatment: 2-category
## - estimand: ATT (focal: 1)
## - covariates: age, educ, race, married, nodegree, re74, re75
```

Printing the output of `weightit()`

displays a summary of how the weights were estimated. Let’s examine the quality of the weights using `summary()`

. Weights with low variability are desirable because they improve the precision of the estimator. This variability is presented in several ways: by the ratio of the largest weight to the smallest in each group, the coefficient of variation (standard deviation divided by the mean) of the weights in each group, and the effective sample size computed from the weights. We want a small ratio, a smaller coefficient of variation, and a large effective sample size (ESS). What consitutes these values is mostly relative, though, and must be balanced with other constraints, including covariate balance. These metrics are best used when comparing weighting methods, but the ESS can give a sense of how much information remains in the weighted sample on a familiar scale.

```
## Summary of weights:
##
## - Weight ranges:
## Min Max
## treated 1.0000 || 1.0000
## control 0.0092 |---------------------------| 3.7432
##
## - Units with 5 greatest weights by group:
##
## 1 3 4 6 7
## treated 1 1 1 1 1
## 597 573 411 381 303
## control 3.0301 3.0592 3.2397 3.5231 3.7432
##
## Ratio Coef of Var
## treated 1.0000 0.0000
## control 408.4971 1.8181
## overall 408.4971 1.1737
##
## - Effective Sample Sizes:
## Control Treated
## Unweighted 429.000 185
## Weighted 99.815 185
```

These weights have quite high variability, and yield an ESS of close to 100 in the control group. Let’s see if these weights managed to yield balance on our covariates.

```
## Call
## weightit(formula = treat ~ age + educ + race + married + nodegree +
## re74 + re75, data = lalonde, method = "ps", estimand = "ATT")
##
## Balance Measures
## Type Diff.Adj M.Threshold V.Ratio.Adj
## prop.score Distance -0.0205 1.0324
## age Contin. 0.1188 Not Balanced, >0.1 2.1846
## educ Contin. -0.0284 Balanced, <0.1 1.5068
## race_black Binary -0.0022 Balanced, <0.1
## race_hispan Binary 0.0002 Balanced, <0.1
## race_white Binary 0.0021 Balanced, <0.1
## married Binary 0.0186 Balanced, <0.1
## nodegree Binary 0.0184 Balanced, <0.1
## re74 Contin. -0.0021 Balanced, <0.1 1.3206
## re75 Contin. 0.0110 Balanced, <0.1 1.3938
##
## Balance tally for mean differences
## count
## Balanced, <0.1 8
## Not Balanced, >0.1 1
##
## Variable with the greatest mean difference
## Variable Diff.Adj M.Threshold
## age 0.1188 Not Balanced, >0.1
##
## Effective sample sizes
## Control Treated
## Unadjusted 429.000 185
## Adjusted 99.815 185
```

For nearly all the covariates, these weights yielded very good balance. Only `age`

remained imbalanced, with a standardized mean difference greater than .1 and a variance ratio greater than 2. Let’s see if we can do better. We’ll choose a different method: entropy balancing, which guarantees perfect balance on specified moments of the covariates while minimizing the entropy (a measure of dispersion) of the weights.

```
W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, estimand = "ATT", method = "ebal")
summary(W.out)
```

```
## Summary of weights:
##
## - Weight ranges:
## Min Max
## treated 1.0000 || 1.0000
## control 0.0081 |---------------------------| 4.0624
##
## - Units with 5 greatest weights by group:
##
## 1 2 3 4 5
## treated 1 1 1 1 1
## 608 597 411 381 303
## control 3.0735 3.2348 3.4498 3.8965 4.0624
##
## Ratio Coef of Var
## treated 1.0000 0.0000
## control 502.3782 1.8344
## overall 502.3782 1.1794
##
## - Effective Sample Sizes:
## Control Treated
## Unweighted 429.000 185
## Weighted 98.459 185
```

The variability of the weights has not changed much, but let’s see if there are any gains in terms of balance:

```
## Call
## weightit(formula = treat ~ age + educ + race + married + nodegree +
## re74 + re75, data = lalonde, method = "ebal", estimand = "ATT")
##
## Balance Measures
## Type Diff.Adj M.Threshold V.Ratio.Adj
## age Contin. -0 Balanced, <0.1 2.4412
## educ Contin. -0 Balanced, <0.1 1.5070
## race_black Binary 0 Balanced, <0.1
## race_hispan Binary 0 Balanced, <0.1
## race_white Binary -0 Balanced, <0.1
## married Binary -0 Balanced, <0.1
## nodegree Binary 0 Balanced, <0.1
## re74 Contin. -0 Balanced, <0.1 1.3265
## re75 Contin. -0 Balanced, <0.1 1.3351
##
## Balance tally for mean differences
## count
## Balanced, <0.1 9
## Not Balanced, >0.1 0
##
## Variable with the greatest mean difference
## Variable Diff.Adj M.Threshold
## race_white -0 Balanced, <0.1
##
## Effective sample sizes
## Control Treated
## Unadjusted 429.000 185
## Adjusted 98.459 185
```

Indeed, we have achieved perfect balance on the means of the covariates. However, the variance ratio of `age`

is still quite high. We could continue to try to adjust for this imbalance, but if there is reason to believe it is unlikely to affect the outcome, it may be best to leave it as is. (You can try adding `I(age^2)`

to the formula and see what changes this causes.)

Now that we have our weights stored in `W.out`

, let’s extract them and estimate our treatment effect. We’ll use the `survey`

package to incorprate the weights into our analysis.

`## Warning: package 'survey' was built under R version 3.4.4`

`## Warning: package 'Matrix' was built under R version 3.4.4`

`## Warning: package 'survival' was built under R version 3.4.4`

```
d.w <- svydesign(ids = ~1, weights = get.w(W.out),
data = lalonde)
fit <- svyglm(re78 ~ treat, design = d.w)
coef(fit)
```

```
## (Intercept) treat
## 5075.888 1273.255
```

Now let’s do some inference. Although some authors recommend using “robust” sandwich standard errors to adjust for the weights (Robins et al., 2000; Hainmueller, 2012), others believe these can misleading and recommend bootstrapping instead (e.g., Chan, Yam, & Zhang, 2016). We’ll examine both approaches.

`svyglm()`

in the survey package produces robust standard errors. We’ll use the `jtools`

package to provide clean summaries of our results. Similar output can be generated using `summary()`

without installing `jtools`

.

`## Warning: package 'jtools' was built under R version 3.4.4`

```
## Standard errors: Robust
## Est. 2.5% 97.5% t val. p
## (Intercept) 5075.89 3918.36 6233.41 8.61 0.00 ***
## treat 1273.26 -347.03 2893.54 1.54 0.12
##
## Estimated dispersion parameter = 49765840
```

Our confidence interval for `treat`

contains 0, so there isn’t evidence that `treat`

has an effect on `re78`

.

Next let’s use bootstrapping to estimate confidence intervals. We don’t need to use `svyglm()`

and can simply use `glm()`

to compute the effect estimates in each bootstrapped sample because we are not computing standard errors, and the estimated treatment effect estimates will be the same.

```
#Bootstrapping
library("boot")
est.fun <- function(data, index) {
W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = data[index,], estimand = "ATT", method = "ebal")
fit <- glm(re78 ~ treat, data = data[index,], weights = get.w(W.out))
return(coef(fit)["treat"])
}
boot.out <- boot(est.fun, data = lalonde, R = 999)
boot.ci(boot.out, type = "bca") #type shouldn't matter so much
```

```
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.out, type = "bca")
##
## Intervals :
## Level BCa
## 95% (-347, 2855 )
## Calculations and Intervals on Original Scale
```

In this case, our confidence intervals were similar. Bootstrapping can take some time, especially with weight estimation methods that take longer, such as generalized boosted modeling (`method = "gbm"`

), covariate balancing propensity score estimation (`method = "cbps"`

), or entropy balancing with `stabilize = TRUE`

.

If we wanted to produce a “doubly-robust” treatment effect estimate, we could add baseline covariates to the `glm()`

(or `svyglm()`

) model (in both the original effect estimation and the confidence interval estimation).

`WeightIt`

can estimate weights for longitudinal treatment marginal structural models as well. This time, we’ll use the sample data set from `twang`

to estimate our weights. Data must be in “wide” format; to go from long to wide, see the example at `?weightitMSM`

.

outcome | gender | age | use0 | use1 | use2 | tx1 | tx2 | tx3 |
---|---|---|---|---|---|---|---|---|

-0.2782802 | 0 | 43 | 1.1349651 | 0.4674825 | 0.3174825 | 1 | 1 | 1 |

0.5319329 | 0 | 50 | 1.1119318 | 0.4559659 | 0.4059659 | 1 | 0 | 1 |

-0.8173614 | 1 | 36 | -0.8707776 | -0.5353888 | -0.5853888 | 1 | 0 | 0 |

-0.1530853 | 1 | 63 | 0.2107316 | 0.0053658 | -0.1446342 | 1 | 1 | 1 |

-0.7344267 | 0 | 24 | 0.0693956 | -0.0653022 | -0.1153022 | 1 | 0 | 1 |

-0.8519376 | 1 | 20 | -1.6626489 | -0.9313244 | -1.0813244 | 1 | 1 | 1 |

We have our outcome variable (`outcome`

), our time-stable baseline variables (`gender`

and `age`

), our pre-treatment time-varying variables (`use0`

, measured before the first treatment, `use1`

, and `use2`

), and our three time-varying treatment variables (`tx1`

, `tx2`

, and `tx3`

). We are interested in the joint, unique, causal effects of each treatment period on the outcome. At each treatment time point, we need to achieve balance on all variables measured prior to that treatment, including previous treatments.

Using `cobalt`

, we can examine the initial imbalance at each time point and overall:

```
library("cobalt") #if not already attached
bal.tab(list(tx1 ~ age + gender + use0,
tx2 ~ tx1 + use1 + age + gender + use0,
tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
data = iptwExWide, m.threshold = .1)
```

```
## Balance by Time Point
##
## - - - Time: 1 - - -
## Balance Measures
## Type Diff.Un M.Threshold.Un
## age Contin. 0.3799 Not Balanced, >0.1
## gender Binary 0.2945 Not Balanced, >0.1
## use0 Contin. 0.2668 Not Balanced, >0.1
##
## Balance tally for mean differences
## count
## Balanced, <0.1 0
## Not Balanced, >0.1 3
##
## Variable with the greatest mean difference
## Variable Diff.Un M.Threshold.Un
## age 0.3799 Not Balanced, >0.1
##
## Sample sizes
## Control Treated
## All 294 706
##
## - - - Time: 2 - - -
## Balance Measures
## Type Diff.Un M.Threshold.Un
## age Contin. 0.2240 Not Balanced, >0.1
## gender Binary 0.1927 Not Balanced, >0.1
## use0 Contin. 0.1169 Not Balanced, >0.1
## tx1 Binary 0.1695 Not Balanced, >0.1
## use1 Contin. 0.0848 Balanced, <0.1
##
## Balance tally for mean differences
## count
## Balanced, <0.1 1
## Not Balanced, >0.1 4
##
## Variable with the greatest mean difference
## Variable Diff.Un M.Threshold.Un
## age 0.224 Not Balanced, >0.1
##
## Sample sizes
## Control Treated
## All 492 508
##
## - - - Time: 3 - - -
## Balance Measures
## Type Diff.Un M.Threshold.Un
## age Contin. 0.3431 Not Balanced, >0.1
## gender Binary 0.1532 Not Balanced, >0.1
## use0 Contin. 0.1859 Not Balanced, >0.1
## tx1 Binary 0.1071 Not Balanced, >0.1
## use1 Contin. 0.1662 Not Balanced, >0.1
## tx2 Binary 0.2423 Not Balanced, >0.1
## use2 Contin. 0.1087 Not Balanced, >0.1
##
## Balance tally for mean differences
## count
## Balanced, <0.1 0
## Not Balanced, >0.1 7
##
## Variable with the greatest mean difference
## Variable Diff.Un M.Threshold.Un
## age 0.3431 Not Balanced, >0.1
##
## Sample sizes
## Control Treated
## All 415 585
## - - - - - - - - - - -
##
## Balance summary across all time points
## Times Type Max.Diff.Un M.Threshold.Un
## age 1, 2, 3 Contin. 0.3799 Not Balanced, >0.1
## gender 1, 2, 3 Binary 0.2945 Not Balanced, >0.1
## use0 1, 2, 3 Contin. 0.2668 Not Balanced, >0.1
## tx1 2, 3 Binary 0.1695 Not Balanced, >0.1
## use1 2, 3 Contin. 0.1662 Not Balanced, >0.1
## tx2 3 Binary 0.2423 Not Balanced, >0.1
## use2 3 Contin. 0.1087 Not Balanced, >0.1
##
## Sample sizes
## - Time 1
## Control Treated
## All 294 706
## - Time 2
## Control Treated
## All 492 508
## - Time 3
## Control Treated
## All 415 585
```

`bal.tab()`

indicates significant imbalance on most covariates at most time points, so we need to do some work to eliminate that imbalance in our weighted data set. We’ll use the `weightitMSM()`

function to specifiy our weight models. The syntax is similar both to that of `weightit()`

for point treatments and to that of `bal.tab()`

for longitduinal treatments. We’ll use `method = "ps"`

for propensity score weights estimated using logistic regression.

```
Wmsm.out <- weightitMSM(list(tx1 ~ age + gender + use0,
tx2 ~ tx1 + use1 + age + gender + use0,
tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
data = iptwExWide, method = "ps")
Wmsm.out
```

```
## A weightitMSM object
## - method: "ps" (propensity score weighting)
## - number of obs.: 1000
## - sampling weights: none
## - number of time points: 3 (tx1, tx2, tx3)
## - treatment:
## + time 1: 2-category
## + time 2: 2-category
## + time 3: 2-category
## - covariates:
## + baseline: age, gender, use0
## + after time 1: use1
## + after time 2: use2
```

No matter which method is selected, `weightitMSM()`

estimates seperate weights for each time period and then take the product of the weights for each individual to arrive at the final estimated weights. Printing the output of `weightitMSM()`

provides some details about the function call and the output. We can take a look at the quality of the weights with `summary()`

, just as we could for point treatments.

```
## Summary of weights:
##
## - - - - - - - - - - Time 1 - - - - - - - - - -
## - Weight ranges:
## Min Max
## treated 1.6667 |---| 15.5292
## control 2.6123 |---------------------------| 82.2349
##
## - Units with 5 greatest weights by group:
##
## 642 350 324 299 228
## treated 13.4464 13.6233 13.7059 14.436 15.5292
## 288 176 147 72 52
## control 44.8027 46.1355 58.4164 79.8919 82.2349
##
## Ratio Coef of Var
## treated 9.3171 0.4808
## control 31.4795 0.7679
## overall 49.3385 0.8569
##
## - Effective Sample Sizes:
## Control Treated
## Unweighted 294.00 706.000
## Weighted 185.18 573.598
##
## - - - - - - - - - - Time 2 - - - - - - - - - -
## - Weight ranges:
## Min Max
## treated 1.6667 |---------------------------| 82.2349
## control 2.6123 |--------------------------| 79.8919
##
## - Units with 5 greatest weights by group:
##
## 497 338 290 286 22
## treated 33.371 34.7405 42.9442 58.4164 82.2349
## 265 139 103 47 39
## control 39.7862 42.2256 44.8027 46.1355 79.8919
##
## Ratio Coef of Var
## treated 49.3385 0.9605
## control 30.5826 0.7375
## overall 49.3385 0.8569
##
## - Effective Sample Sizes:
## Control Treated
## Unweighted 492.000 508.000
## Weighted 318.903 264.487
##
## - - - - - - - - - - Time 3 - - - - - - - - - -
## - Weight ranges:
## Min Max
## treated 1.6667 |-------------| 44.8027
## control 2.6123 |---------------------------| 82.2349
##
## - Units with 5 greatest weights by group:
##
## 548 450 382 124 67
## treated 26.003 26.471 28.0582 39.7862 44.8027
## 403 264 226 224 110
## control 42.9442 46.1355 58.4164 79.8919 82.2349
##
## Ratio Coef of Var
## treated 26.8803 0.7731
## control 31.4795 0.8734
## overall 49.3385 0.8569
##
## - Effective Sample Sizes:
## Control Treated
## Unweighted 415.000 585.000
## Weighted 235.672 366.399
```

Displayed are summaries of how the weights perform at each time point with respect to variability. Next, we’ll examine how well they perform with respect to covariate balance.

```
## Balance summary across all time points
## Times Type Max.Diff.Adj M.Threshold Max.V.Ratio.Adj
## prop.score 1, 2, 3 Distance 0.0251
## age 1, 2, 3 Contin. 0.0703 Balanced, <0.1 1.2506
## gender 1, 2, 3 Binary 0.0263 Balanced, <0.1
## use0 1, 2, 3 Contin. 0.0558 Balanced, <0.1 1.3596
## tx1 2, 3 Binary 0.0171 Balanced, <0.1
## use1 2, 3 Contin. 0.0316 Balanced, <0.1 1.2105
## tx2 3 Binary 0.0085 Balanced, <0.1
## use2 3 Contin. 0.0315 Balanced, <0.1 1.0397
##
## Effective sample sizes
## - Time 1
## Control Treated
## Unadjusted 294.00 706.000
## Adjusted 185.18 573.598
## - Time 2
## Control Treated
## Unadjusted 492.000 508.000
## Adjusted 318.903 264.487
## - Time 3
## Control Treated
## Unadjusted 415.000 585.000
## Adjusted 235.672 366.399
```

By setting `which.time = NA`

in `bal.tab()`

, we can focus on the overall balance assessment, which displays the greatest imbalance for each covariate across time points. We can see that our estimated weights balance all covariates all time points with respect to means and variances. Now we can estimate our treatment effects. We’ll sequentially simplify our model by checking whether interaction terms are needed (implying that specific patterns of treatment yield different outcomes), then by checking whether different coefficients are needed for the treatments (implying that outcomes depend on which treatments are received).

```
library("survey")
d.w.msm <- svydesign(ids = ~1, weights = get.w(Wmsm.out),
data = iptwExWide)
full.fit <- svyglm(outcome ~ tx1*tx2*tx3, design = d.w.msm)
main.effects.fit <- svyglm(outcome ~ tx1 + tx2 + tx3, design = d.w.msm)
anova(full.fit, main.effects.fit)
```

```
## Working (Rao-Scott+F) LRT for tx1:tx2 tx1:tx3 tx2:tx3 tx1:tx2:tx3
## in svyglm(formula = outcome ~ tx1 * tx2 * tx3, design = d.w.msm)
## Working 2logLR = 7.798048 p= 0.10163
## (scale factors: 1.2 1.1 0.91 0.72 ); denominator df= 992
```

Based on the non-significant p-value, we don’t have to assume specific treatment patterns yield different outcomes, but rather only that which treatments received or the number of treatments received are sufficient to explain variation in the outcome. Next we’ll narrow down these options by comparing the main effects fit to one that constrains the coefficients to be equal (implying that the cumulative number of treatments received is what matters), as Robins et al. (2000) describe.

```
## Working (Rao-Scott+F) LRT for tx1 tx2 tx3 - I(tx1 + tx2 + tx3)
## in svyglm(formula = outcome ~ tx1 + tx2 + tx3, design = d.w.msm)
## Working 2logLR = 3.756596 p= 0.15446
## (scale factors: 1.1 0.94 ); denominator df= 996
```

Based on the non-significant p-value, we can assume the effects of each treatment are close enough to be treated as the same, indicating that the number of treatments received is the relevant predictor of the outcome. Now we can examine what that treatment effect is with `summ()`

in `jtools`

(or `summary()`

).

```
## Standard errors: Robust
## Est. 2.5% 97.5% t val. p
## (Intercept) 0.11 0.00 0.22 1.96 0.05 *
## I(tx1 + tx2 + tx3) -0.15 -0.20 -0.09 -5.18 0.00 ***
##
## Estimated dispersion parameter = 0.5
```

For each additional treatment received, the outcome is expected to decrease by 0.15 points. The confidence interval excludes 0, so there is evidence of a treatment effect in the population.

There is more we can do, as well. We could have fit different types of models to estimate the weights, and we could have stabilized the weights with `stabilize = TRUE`

or by including stabilization factors in our weights using `num.formula`

(see Cole & Hernán, 2008, for more details on doing so). There are other ways of computing confidence intervals for our effect estimates (although model comparison is the most straightforward with the method we used).

Austin, P. C. (2011). An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behavioral Research, 46(3), 399–424.

Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661–3679.

Chan, K. C. G., Yam, S. C. P., & Zhang, Z. (2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(3), 673–700.

Cole, S. R., & Hernán, M. A. (2008). Constructing Inverse Probability Weights for Marginal Structural Models. American Journal of Epidemiology, 168(6), 656–664.

Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis, 20(1), 25–46.

Robins, J. M., Hernán, M. Á., & Brumback, B. (2000). Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology, 11(5), 550–560.

Thoemmes, F., & Ong, A. D. (2016). A Primer on Inverse Probability of Treatment Weighting and Marginal Structural Models. Emerging Adulthood, 4(1), 40–59.