This is a guide for the use of `cobalt`

with more complicated data than is typical in studies using propensity scores and similar methods. In particular, this guide will explain `cobalt`

’s features for handling multilevel or grouped data and data arising from multiple imputation. The features described here set `cobalt`

apart from other packages that assess balance because they exist only in `cobalt`

. It will be assumed that the basic functions of `cobalt`

are understood; this guide will only address issues that are unique to these data scenarios.

`cobalt`

and Complicated DataFirst, let’s understand multilevel or grouped data and data arising from multiple imputation.

Multilevel data occurs when units are nested within groups, and each unit belongs to one group. Examples of multilevel data include students within schools, patients within hospitals, and transactions within companies. Estimating treatment effects with multilevel data can involve multilevel modeling, generalized estimating equations, and regression with cluster-robust standard errors (see McNeish, Stapleton, & Silverman, 2017, for a comparison of these methods). There has also been a burgeoning literature on the use of propensity score methods for multilevel data (e.g., Arpino & Mealli, 2011; Thoemmes & West, 2011). The reason multilevel data demands its own attention is that cluster characteristics may cause confounding, and handling cluster characteristics in the propensity score process is a bit more challenging than handling unit characteristics only.

Most research into propensity score methods with multilevel data has focused on how to model the propensity score (e.g., with multilevel modeling or with standard regression) and how to condition on the propensity score (e.g., by matching or weighting) to arrive at the best balance (e.g., conditioning within clusters or across clusters). It is critical that balance must be assured on all unit-level and cluster-level characteristics to ensure the removal of confounding. The advantage of conditioning within clusters is that all cluster-level characteristics are automatically balanced (but unit-level characteristics may remain imbalanced). When conditioning across clusters, the analyst must assess and ensure balance on both unit-level and cluster-level characteristics. See Leite et al. (2015) for an excellent review of the literature on propensity scores with multilevel data.

Grouped data is similar to clustered data but has been treated differently in the methodological literature. Groups can be thought of as strata for which units are only comparable within each stratum. For example, in a study with multiple cohorts, each cohort can be considered a group. Green and Stuart (2014) discuss the application of propensity scores in this context to examine the potential for effect moderation by the grouping variable. As with multilevel data, units fall within one group only. To ensure balance in grouped data, balance must be achieved within each group. Thus, as Green and Stuart (2014) recommend, separate propensity score models should be estimated within each group, and conditioning should occur separately within each group. In matching scenarios, this involves exact matching on group. Multilevel and grouped data are treated the same in `cobalt`

, and so will be unified as “clustered” data.

Multiply imputed data occurs when some values of the covariates are missing, and multiple imputation has been performed to generate several data sets each with their own plausible values for the missing values. Missing data is a ubiquitous problem with large longitudinal data sets, which are popular for propensity score analysis. There are several choices researchers must make when addressing missing data; Cham and West (2016) discuss options researchers have to address the missingness. They recommend multiple imputation in most circumstances. Mitra and Reiter (2016) describe two ways to use propensity scores with multiply imputed data. The first, called the “within” method, is to perform a traditional analysis within each data set, and combine the effect estimates using the recommended rules for doing so. The second, called the “across” method, is to generate propensity scores within each imputed data set, but then to average the propensity score across data sets for each individual and use the averaged propensity scores to estimate a single effect. Guidelines for assessing balance in either scenario are absent; however, we recommend that balance assessment should involve all imputed data sets, by, for example, ensuring the average balance measure for each covariate across data sets is within some range of tolerance. `cobalt`

allows for this kind of assessment, as well as balance assessment within each imputed data set.

Each of `cobalt`

’s primary functions (`bal.tab()`

, `bal.plot()`

, and `love.plot()`

) have features to handle clustered and multiply imputed data sets, including data sets that are both clustered and multiply imputed (as occurred in Hughes, Chen, Thoemmes, & Kwok, 2010). The following sections describe for each data scenario the relevant features of each function. It should be noted that all examples and descriptions will be for data with binary treatments, but all the methods for continuous treatments also work in exactly the same way. Additionally, all examples use matching, either by specifying a `matchit`

object or by specifying weights extracted from a `matchit`

object; these methods are also applicable to weighting and matching strata as well, but not to subclassification. Subclassification and multinomial and longitudinal treatments are not yet supported for clustered or multiply imputed data.

In clustered data, the data set must contain a variable denoting the group each individual belongs to. Again, this may be a group considered a nuisance that must be accounted for to eliminate confounding (e.g., hospitals in a medical treatment study), or a group of concern for effect moderation (e.g., race or gender). In the examples below, we will imagine that we are interested in the effect of `treat`

on `re78`

stratified by `race`

. Thus, we will condition on the propensity score within each cluster.

First, let’s estimate propensity scores and perform matching within each race group. We can do this by performing separate analyses within each cluster, but we can also use exact matching in MatchIt to ensure matches occur within clusters. It is important to note that this analysis does not necessarily represent a sound statistical analysis and is being used for illustrative purposes only.

```
library("MatchIt"); library("cobalt")
data("lalonde", package = "cobalt")
m.out <- matchit(treat ~ race*(age + educ + married + nodegree + re74 + re75),
data = lalonde, method = "nearest", exact = "race",
replace = TRUE, ratio = 2)
```

`bal.tab()`

The output produced by `bal.tab()`

with clustered data contains balance tables for each cluster and a summary of balance across clusters. To use `bal.tab()`

with groups, there are four arguments that should be considered. These are `cluster`

, `which.cluster`

, `cluster.summary`

, and `cluster.fun`

.

`cluster`

is a vector of group membership for each unit or the name of a variable in a provided data set containing group membership.`which.cluster`

determines for which clusters balance tables are to be displayed, if any. (Default: display all clusters)`cluster.summary`

determines whether the cluster summary is to be displayed or not. The arguments are in addition to the other arguments that are used with`bal.tab()`

to display balance. (Default: display the cluster summary)`cluster.fun`

determines which function(s) are used to combine balance statistics across clusters for the cluster summary. (Default: when`abs = FALSE`

, minimum, mean, and maximum; when`abs = TRUE`

, mean and maximum)

Lets examine balance on our data within each race group.

```
## Call
## matchit(formula = treat ~ race * (age + educ + married + nodegree +
## re74 + re75), data = lalonde, method = "nearest", exact = "race",
## replace = TRUE, ratio = 2)
##
## Balance by cluster
##
## - - - Cluster: black - - -
## Balance measures
## Type Diff.Adj
## distance Distance 0.0150
## age Contin. -0.1001
## educ Contin. 0.0794
## married Binary 0.0288
## nodegree Binary -0.0032
## re74 Contin. -0.1501
## re75 Contin. -0.1406
##
## Sample sizes
## Control Treated
## All 87.0000 156
## Matched 105.4054 156
## Matched (Unweighted) 76.0000 156
## Unmatched 11.0000 0
##
## - - - Cluster: hispan - - -
## Balance measures
## Type Diff.Adj
## distance Distance 0.0947
## age Contin. 0.1914
## educ Contin. -0.4159
## married Binary 0.1364
## nodegree Binary 0.2273
## re74 Contin. 0.1161
## re75 Contin. 0.0683
##
## Sample sizes
## Control Treated
## All 61.0000 11
## Matched 7.4324 11
## Matched (Unweighted) 18.0000 11
## Unmatched 43.0000 0
##
## - - - Cluster: white - - -
## Balance measures
## Type Diff.Adj
## distance Distance 0.0216
## age Contin. -0.4201
## educ Contin. -0.1403
## married Binary -0.0556
## nodegree Binary 0.1111
## re74 Contin. -0.0417
## re75 Contin. 0.0298
##
## Sample sizes
## Control Treated
## All 281.0000 18
## Matched 12.1622 18
## Matched (Unweighted) 31.0000 18
## Unmatched 250.0000 0
## - - - - - - - - - - - - - -
##
## Balance summary across all clusters
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## distance Distance 0.0150 0.0438 0.0947
## age Contin. -0.4201 -0.1096 0.1914
## educ Contin. -0.4159 -0.1590 0.0794
## married Binary -0.0556 0.0366 0.1364
## nodegree Binary -0.0032 0.1117 0.2273
## re74 Contin. -0.1501 -0.0252 0.1161
## re75 Contin. -0.1406 -0.0142 0.0683
##
## Total sample sizes across clusters
## Control Treated
## All 429 185
## Matched 125 185
## Unmatched 304 0
```

First we see balance tables for each cluster. These are the same output we would see if we use `bal.tab()`

for each cluster separately. All the commands that work for `bal.tab()`

also work here with the same results, except that balance tallies and the variable with the greatest imbalance will not be displayed as they usually are when a threshold is specified.

Second, we see a balance summary across all the clusters. This table presents the minimum, mean, and maximum balance statistics for each variable across clusters. Setting `un = TRUE`

will also display the same values for the adjusted data set. With binary treatments, setting `disp.v.ratio = TRUE`

or `v.threshold`

to some number will display the same values for variance ratios. Setting `abs = TRUE`

requests summaries of absolute balance statistics which displays the extremeness of balance statistics for each variable; thus, if, for example, in some groups there are large negative mean differences and in other groups there are large positive mean differences, this table will display large mean differences, even though the average mean difference is close to 0. While it’s important to know the average balance statistic overall, assessing the absolute balance statistics provides more information about balance within each cluster rather than in aggregate.

To examine balance for just a few clusters at a time, users can enter values for `which.cluster`

. This can be a vector of clusters indices (i.e., 1, 2, 3, etc.) or names (e.g., “black”, “hispan”, “white”). Users can also specify `which.cluster = NA`

to omit cluster balance for all clusters and just see the summary across clusters, which might be helpful if there are many clusters. Users can also specify whether they want to see the summary across clusters by specifying `TRUE`

or `FALSE`

for `cluster.summary`

. When `which.cluster = NA`

, `cluster.summary`

will automatically be set to `TRUE`

(or else there wouldn’t be any output!). When examining balance within a few groups, it can be more helpful to examine balance within each group and ignore the summary. Below are examples of the use of `which.cluster`

and `cluster.summary`

to change `bal.tab()`

output.

```
#Just for black and hispan
bal.tab(m.out, cluster = "race", which.cluster = c("black", "hispan"),
cluster.summary = FALSE)
```

```
## Call
## matchit(formula = treat ~ race * (age + educ + married + nodegree +
## re74 + re75), data = lalonde, method = "nearest", exact = "race",
## replace = TRUE, ratio = 2)
##
## Balance by cluster
##
## - - - Cluster: black - - -
## Balance measures
## Type Diff.Adj
## distance Distance 0.0150
## age Contin. -0.1001
## educ Contin. 0.0794
## married Binary 0.0288
## nodegree Binary -0.0032
## re74 Contin. -0.1501
## re75 Contin. -0.1406
##
## Sample sizes
## Control Treated
## All 87.0000 156
## Matched 105.4054 156
## Matched (Unweighted) 76.0000 156
## Unmatched 11.0000 0
##
## - - - Cluster: hispan - - -
## Balance measures
## Type Diff.Adj
## distance Distance 0.0947
## age Contin. 0.1914
## educ Contin. -0.4159
## married Binary 0.1364
## nodegree Binary 0.2273
## re74 Contin. 0.1161
## re75 Contin. 0.0683
##
## Sample sizes
## Control Treated
## All 61.0000 11
## Matched 7.4324 11
## Matched (Unweighted) 18.0000 11
## Unmatched 43.0000 0
## - - - - - - - - - - - - - - -
```

```
#Just the balance summary across clusters with only the mean
bal.tab(m.out, cluster = "race", which.cluster = NA, cluster.fun = "mean")
```

```
## Call
## matchit(formula = treat ~ race * (age + educ + married + nodegree +
## re74 + re75), data = lalonde, method = "nearest", exact = "race",
## replace = TRUE, ratio = 2)
##
## Balance summary across all clusters
## Type Mean.Diff.Adj
## distance Distance 0.0438
## age Contin. -0.1096
## educ Contin. -0.1590
## married Binary 0.0366
## nodegree Binary 0.1117
## re74 Contin. -0.0252
## re75 Contin. -0.0142
##
## Total sample sizes across clusters
## Control Treated
## All 429 185
## Matched 125 185
## Unmatched 304 0
```

It should be noted that `which.cluster`

, `cluster.summary`

, and `cluster.fun`

affect display only, and so all output is generated even when some is not requested to be seen^{1} ^{2}. These can also be set as global options by using, for example, `set.cobalt.options(cluster.fun = "mean")`

, which allows user not to type a non-default option every time they call `bal.tab`

.

`bal.plot()`

`bal.plot()`

functions as it does with non-clustered data, except that multiple plots can be produced at the same time displaying balance for each cluster. The arguments to `bal.plot()`

are the same as those for `bal.tab()`

, except that `cluster.summary`

is absent. Below is an example of the use of `bal.plot()`

with clustered data:

Balance plots for each cluster are displayed next to each other. You can specify `which.cluster`

as with `bal.tab()`

to restrict plotting to a subset of clusters.

`love.plot()`

`love.plot()`

shines with clustered data because there are several options that are unique to `cobalt`

and help with the visual display of balance. One way to display cluster balance with `love.plot()`

is to produce different plots for each cluster, as `bal.plot()`

does. This method should not be used with many clusters, or the plots will be unreadable. In our present example, this is not an issue. To do so, the `which.cluster`

argument in `bal.tab()`

or `love.plot()`

must be set to the names or indices of the clusters for which balance is to be plotted. If `which.cluster`

is set to `NULL`

(the default), all clusters will be plotted. Below is an example:

These plots function like those from using `love.plot()`

with non-clustered data, except that they cannot be sorted based on the values of the balance statistics (they can still be sorted alphabetically, though). This is to ensure that the covariates line up across the plots. The same axis limits will apply to all plots.

Second, balance can be displayed summarizing across clusters by plotting an aggregate function (i.e., the mean or maximum) of the balance statistic for each covariate across clusters. To do this, `which.cluster`

in the `love.plot`

command must be set to `NA`

. To change which aggregate function is displayed, use the argument to `agg.fun`

, which may be “mean” or “max” (“mean” is the default). Below is an example:

A third option is to set `agg.fun = "range"`

, which produces a similar plot as above except that the minimum and maximum values of the balance statistics for each covariate are displayed as well. See below for an example:

Each point represents the mean balance statistic, and the bars represent intervals bounded by the minimum and maximum of each balance statistic. This display can be especially helpful with many clusters given that the mean alone may not tell the whole story. In some cases, it might be useful to set limits on the x-axis by using the `limits`

argument in `love.plot()`

; doing so may cut off some of the ranges, but whatever is left will be displayed. All `love.plot()`

arguments work with these methods as they do in the case of non-clustered data. When `var.order`

is specified and `unadjusted`

or `adjusted`

are given, the ordering will occur on the mean balance statistic when using `agg.fun = "range"`

.

Multiply imputed data works in a very similar way to clustered data, except the “grouping” variable refers to imputations rather than clusters. Thus, each row belongs to one imputation (i.e., the data set should be in “long” format). The data set used should only include the imputed data sets and not the original data set with missing values (unlike Stata’s `mi`

commands, which do require the original data set as well). The imputed data sets can be of different sizes (i.e., because matching reduced the size of each differently), but it is preferred that they are the same size and weights are used to indicate which units belong to the sample and which do not.

In the example below, we will use a version of the Lalonde data set with some values missing. We will use the `mice`

package (van Buuren & Groothuis-Oudshoorn, 2011) to implement multiple imputation with chained equations. We will demonstrate the “within”" approach first, where analyses take place within each data set, and the “across” approach second, where the estimated propensity scores are averaged across imputations. In both cases, we want to ensure that our estimated propensity scores yield balance on all imputations.

```
library("MatchIt"); library("cobalt"); library("mice")
data("lalonde_mis", package = "cobalt")
#Generate imputed data sets
m <- 10 #number of imputed data sets
imp <- mice(lalonde_mis, m = m, print = FALSE)
imp.data <- complete(imp, "long", include = FALSE)
imp.data <- imp.data[with(imp.data, order(.imp, .id)),]
#Estimate propensity scores and perform matching within each one
imp.data$ps <- imp.data$match.weight <- rep(0, nrow(imp.data))
for (i in unique(imp.data$.imp)) {
in.imp <- imp.data$.imp == i
imp.data$ps[in.imp] <- glm(treat ~ age + educ + race +
married + nodegree +
re74 + re75,
data = imp.data[in.imp,],
family = "binomial")$fitted.values
m.out <- matchit(treat ~ ps, data = imp.data[in.imp,],
distance = imp.data$ps[in.imp])
imp.data$match.weight[in.imp] <- m.out$weights
}
```

The object `imp.data`

contains all the imputed data sets, stacked vertically, with an imputation number for each, as well as propensity scores and matching weights for each unit.

`bal.tab()`

To use `bal.tab()`

with our imputed data sets, we need to use the `data.frame`

or `formula`

methods. We provide the treatment variable, the covariates, the weights, and the imputation number. In many respects, this looks very similar to the use of `bal.tab()`

with clustered data. There are four arguments that are relevant to imputed data:

`imp`

is a vector of imputation numbers for each unit or the name of a variable in an available data set containing the imputation numbers.`which.imp`

determines for which imputation balance assessment is desired. Often it can be useful to examine balance in just a few imputations for a detailed examination of what is going on. (Default: no imputations are displayed.)`imp.summary`

determines whether to display a summary of balance across imputations. (Default: the summary of balance across imputations is displayed.)`imp.fun`

determines which function(s) are used to combine balance statistics across imputations for the summary of balance across imputations. (Default: when`abs = FALSE`

, minimum, mean, and maximum; when`abs = TRUE`

, mean and maximum)

`which.imp`

, `imp.summary`

, and `imp.fun`

affect display only and can also be set as global options by using `set.cobalt.options()`

like the corresponding cluster options.

In many cases, not all variables are imputed, and often the treatment variable is not imputed. If each imputation as the same number of units, you can specify other arguments (e.g., treatment, distance) by specifying a vector of the length of one imputation, and this vector will be applied to all imputations. This will come in handy when each individual receives one average propensity score in the “across” method. To do this, the imputed data set must be sorted by imputation and unit ID.

```
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp")
```

```
## Balance summary across all imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. -0.0091 0.0444 0.0687
## educ Contin. -0.1344 -0.1180 -0.0833
## race_black Binary 0.3730 0.3730 0.3730
## race_hispan Binary -0.1730 -0.1659 -0.1568
## race_white Binary -0.2162 -0.2070 -0.2000
## married Binary -0.0216 -0.0086 0.0054
## nodegree Binary 0.0541 0.0681 0.0811
## re74 Contin. -0.1136 -0.0703 -0.0352
## re75 Contin. -0.0876 -0.0371 -0.0121
##
## Average sample sizes across imputations
## Control Treated
## All 429 185
## Matched 185 185
## Unmatched 244 0
```

First, we see a balance summary across all the imputations This table presents the minimum, mean, and maximum balance statistics for each variable across imputations. Setting `un = TRUE`

will also display the same values for the adjusted data set. With binary treatments, setting `disp.v.ratio = TRUE`

or `v.threshold`

to some number will display the same values for variance ratios. Setting `abs = TRUE`

will make `bal.tab`

report summaries of the absolute values of the balance statistics. This table functions in the same way as the table for balance across clusters. Below is the average sample size across clusters; in some matching and weighting schemes, the sample size (or effective sample size) may differ across imputations.

To view balance on individual imputations, you can specify an imputation number to `which.imp`

. To hide the balance summary across imputations, you can specify `imp.summary = FALSE`

.

```
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp", which.imp = 1, imp.summary = FALSE)
```

```
## Balance by imputation
##
## - - - Imputation 1 - - -
## Balance Measures
## Type Diff.Adj
## age Contin. 0.0476
## educ Contin. -0.1317
## race_black Binary 0.3730
## race_hispan Binary -0.1622
## race_white Binary -0.2108
## married Binary 0.0000
## nodegree Binary 0.0757
## re74 Contin. -0.0493
## re75 Contin. -0.0121
##
## Sample sizes
## Control Treated
## All 429 185
## Matched 185 185
## Unmatched 244 0
## - - - - - - - - - - - - - -
```

As with clustered data, all `bal.tab()`

options work as with non-imputed data. Indeed, the functions for clustered and imputed data are nearly identical except that for imputed data, `bal.tab()`

computes the average sample size across imputations, whereas for clustered data, `bal.tab()`

computes the total sample size across groups.

Now, lets see how we might assess balance on the imputed data sets after generating an average propensity score for each unit across imputations:

```
#Compute the average propensity for each ID
imp.agg <- aggregate(ps ~ treat + .id, data = imp.data, FUN = mean)
names(imp.agg)[names(imp.agg) == "ps"] <- "ps.ave"
#Perform matching on the aggregated data
m.out.ave <- matchit(treat ~ ps.ave, data = imp.agg,
distance = imp.agg$ps.ave)
imp.agg$match.weight.ave <- m.out.ave$weights
```

To use `bal.tab()`

, we can merge the original long multiply imputed data set with the aggregated data set so that the averaged propensity scores and estimated matching weights are supplied to each unit. Then, we use `bal.tab()`

as usual, specifying an input to `imp`

.

```
#Merge the data sets; ps.ave and match.weight.ave will remain
imp.data <- merge(imp.data, imp.agg, all.x = TRUE)
bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight.ave",
method = "matching", imp = ".imp")
```

```
## Balance summary across all imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. 0.0559 0.0559 0.0559
## educ Contin. -0.0887 -0.0887 -0.0887
## race_black Binary 0.3730 0.3730 0.3730
## race_hispan Binary -0.1676 -0.1676 -0.1676
## race_white Binary -0.2054 -0.2054 -0.2054
## married Binary -0.0162 -0.0130 0.0000
## nodegree Binary 0.0541 0.0541 0.0541
## re74 Contin. -0.1084 -0.0758 -0.0386
## re75 Contin. -0.0633 -0.0426 -0.0216
##
## Average sample sizes across imputations
## Control Treated
## All 429 185
## Matched 185 185
## Unmatched 244 0
```

For some variables, the minimum, mean, and maximum balance statistic will be the same. This will be the case for any variables that are not imputed, i.e., are the same across imputed data sets.

`WeightIt`

is well suited for estimating weights within multiple imputed data sets. Below is an example of estimating propensity score weights after multiple imputation using the same scenario as those above.

```
#Estimating the weights; by = ".imp" separates by imputation
library("WeightIt")
w.out <- weightit(treat ~ age + educ + race + married +
nodegree + re74 + re75, data = imp.data,
by = ".imp", estimand = "ATT")
#Checking balance on the output object
bal.tab(w.out, imp = ".imp")
```

```
## Call
## weightit(formula = treat ~ age + educ + race + married + nodegree +
## re74 + re75, data = imp.data, estimand = "ATT", by = ".imp")
##
## Balance summary across all imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## prop.score Distance -0.0361 -0.0273 -0.0181
## age Contin. 0.0857 0.1008 0.1182
## educ Contin. -0.0375 -0.0304 -0.0234
## race_black Binary -0.0035 -0.0028 -0.0017
## race_hispan Binary -0.0002 0.0003 0.0007
## race_white Binary 0.0016 0.0024 0.0030
## married Binary 0.0141 0.0183 0.0219
## nodegree Binary 0.0125 0.0171 0.0202
## re74 Contin. -0.0045 0.0079 0.0227
## re75 Contin. -0.0254 0.0017 0.0132
##
## Average effective sample sizes across imputations
## Control Treated
## Unadjusted 429.000 185
## Adjusted 100.477 185
```

Setting `by = ".imp"`

requests that weights be estimated within each imputation separately. Because this variable exists within the `w.out`

object, we can simply refer to it by its name in the call to `bal.tab`

.

`bal.plot()`

`bal.plot()`

works with imputed data as it does with non-imputed data, except that multiple plots can be produced displaying balance for multiple imputations at a time. The arguments to `bal.plot()`

are the same as those for `bal.tab()`

, except that `imp.summary`

is absent. Below is an example of the use of `bal.plot()`

with imputed data, examining balance in the first imputation:

```
bal.plot(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp", which.imp = 1, var.name = "age")
```

When many imputations are generated, it is recommended not to plot all at the same time by specifying an argument to `which.imp`

, as done above.

`love.plot()`

`love.plot()`

functions with imputed data as it does with clustered data. It is not recommended to display balance for multiple imputations at a time, and rather to display balance for just one representative imputation or to summarize balance across imputations. Below is an example of displaying balance for one imputation:

```
love.plot(bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp"), which.imp = 1,
var.order = "unadjusted", threshold = .2)
```

As with clustered data, it is also possible to examine the range of balance statistics for each covariate across imputations by setting `agg.fun = "range"`

:

```
love.plot(bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp"),
agg.fun = "range", threshold = .2)
```

Often these ranges will be small if the imputed data sets are very similar to each other, but the more imputations are generated, the wider the ranges tend to be.

There are cases when data is both clustered and has been multiply imputed. See Enders, Mistler, & Keller (2016) for a discussion of multiply imputed, multilevel data. You can assess balance on this type of data with `cobalt`

as well, though there may a lot to sift through (e.g., balance tables for each cluster for each imputation). Essentially, you can combine the syntax for clustered data and for imputed data to assess balance within and across clusters and imputations. For example, you may want to assess balance for each cluster but aggregating across imputations, or you may want to assess balance for each cluster within one imputation. Because there are many possible combinations, we will not describe them all here, but you should experiment with them as you see fit. In general, we recommend assessing balance aggregating across imputations, either within each cluster (e.g., for grouped data) or also aggregating across clusters (e.g., for multilevel data with many clusters), when possible.

To demonstrate, we will combine the approaches above by using the data sets with missingness that we imputed above but examining balance within each race group.

```
library("MatchIt"); library("cobalt"); library("mice")
data("lalonde_mis", package = "cobalt")
#Generate imputed data sets
m <- 5 #number of imputed data sets
imp <- mice(lalonde_mis, m = m, print = FALSE)
imp.data <- complete(imp, "long", include = FALSE)
#Estimate propensity scores and perform matching within each one
imp.data$match.weight <- rep(0, nrow(imp.data))
for (i in unique(imp.data$.imp)) {
in.imp <- imp.data$.imp == i
m.out <- matchit(treat ~ race*(age + educ + married + nodegree + re74 + re75),
data = imp.data[in.imp,], method = "nearest", exact = "race",
replace = TRUE, ratio = 2)
imp.data$match.weight[in.imp] <- m.out$weights
}
```

The object `imp.data`

contains our data set, which is grouped by both imputation and race. Note above that within each imputation, we performed the same analysis as we did with non-imputed clustered data. There are other ways to do this, such as by creating groupings for each imputation and each cluster.

`bal.tab()`

As mentioned above, there are many ways to examine balance in combinations of clusters and imputations. You will need to specify arguments to both `imp`

and `cluster`

, as well as to `which.imp`

, `which.cluster`

, `imp.summary`

, and `cluster.summary`

, if desired. The default is to display balance within each cluster aggregating over imputations and aggregating over imputations and clusters.

```
bal.tab(treat ~ age + educ + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp", cluster = "race")
```

```
## Cluster balance summary across all imputations
##
## - - - Cluster: black - - -
## Balance summary across imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. -0.0496 0.0681 0.2845
## educ Contin. -0.1914 0.0140 0.1758
## married Binary 0.0032 0.0256 0.0705
## nodegree Binary -0.0545 0.0071 0.0833
## re74 Contin. -0.2705 -0.0565 0.0259
## re75 Contin. -0.0805 -0.0371 0.0114
##
## Average sample sizes across imputations
## Control Treated
## All 87.0000 156
## Matched 105.5741 156
## Matched (Unweighted) 78.6000 156
## Unmatched 8.4000 0
##
## - - - Cluster: hispan - - -
## Balance summary across imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. -0.0504 0.1894 0.4533
## educ Contin. -0.4799 -0.1728 0.0960
## married Binary -0.0455 0.0455 0.1364
## nodegree Binary 0.0000 0.0818 0.1818
## re74 Contin. -0.0689 0.1004 0.2327
## re75 Contin. -0.0115 0.1702 0.2612
##
## Average sample sizes across imputations
## Control Treated
## All 61.0000 11
## Matched 7.4443 11
## Matched (Unweighted) 17.4000 11
## Unmatched 43.6000 0
##
## - - - Cluster: white - - -
## Balance summary across imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. -0.4986 -0.2984 -0.0667
## educ Contin. -0.4832 -0.2618 0.0000
## married Binary -0.0833 -0.0444 0.0000
## nodegree Binary -0.0278 0.0722 0.1667
## re74 Contin. -0.0516 -0.0251 0.0148
## re75 Contin. -0.1214 -0.0725 -0.0073
##
## Average sample sizes across imputations
## Control Treated
## All 281.0000 18
## Matched 12.1816 18
## Matched (Unweighted) 29.2000 18
## Unmatched 251.8000 0
## - - - - - - - - - - - - - -
##
## Balance summary across all imputations and clusters
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. -0.4986 -0.0136 0.4533
## educ Contin. -0.4832 -0.1402 0.1758
## married Binary -0.0833 0.0089 0.1364
## nodegree Binary -0.0545 0.0537 0.1818
## re74 Contin. -0.2705 0.0063 0.2327
## re75 Contin. -0.1214 0.0202 0.2612
##
## Total average sample sizes across imputations across clusters
## Control Treated
## All 429.0 185
## Matched 125.2 185
## Unmatched 303.8 0
```

There are essentially 9 combinations of `which.imp`

and `which.cluster`

to use to display balance. `i`

represents a vector (which may be of length 1) of imputation numbers, and `c`

represents a vector (which may be of length 1) of cluster names or indices. In general, `NA`

means to suppress display of individual clusters or imputations, and `NULL`

means to display all clusters or imputations.

`which.imp = NULL, which.cluster = NULL`

: For each imputation, balance within each cluster and across clusters**. For each cluster, balance across all imputations*. Balance across all imputations and clusters*.`which.imp = NULL, which.cluster = NA`

: For each imputation, balance across clusters. Balance across all imputations and clusters*.`which.imp = NULL, which.cluster = c`

: For each imputation, balance within the specified cluster(s) and across all clusters**. For each specified cluster, balance across all imputations*. Balance across all imputations and clusters*.`which.imp = NA, which.cluster = NULL`

: For each cluster, balance across all imputations. Balance across all imputations and clusters. (Default)`which.imp = NA, which.cluster = NA`

: Balance across all imputations and clusters.`which.imp = NA, which.cluster = c`

: For the specified cluster(s), balance across all imputations. Balance across all imputations and clusters.`which.imp = i, which.cluster = NULL`

: For the specified imputation(s), balance within each cluster and across clusters**. For each cluster, balance across all imputations*. Balance across all imputations and clusters*.`which.imp = i, which.cluster = NA`

: For the specified imputation(s), balance across clusters. Balance across all imputations and clusters*.`which.imp = i, which.cluster = c`

: For the specified imputation(s), balance within the specified cluster(s) and across all clusters**. For the specified cluster(s), balance across all imputations*. Balance across all imputations and clusters*.

* can be suppressed with `imp.summary = FALSE`

. ** can be suppressed with `cluster.summary = FALSE`

.

As mentioned above, in general, you may be most interested in (4) if individual imputations are not of interest or (7) if one imputation is to be used as an example. The default is to produce (4).

`bal.plot()`

As in other cases, `bal.plot()`

can display distributional balance for one variable at a time in either multiple clusters or multiple imputations (bot not both). The syntax for `bal.plot()`

is similar to that of `bal.tab()`

, except that `imp.summary`

and `cluster.summary`

are ignored. Below is an example of examining balance for all clusters in one imputation:

```
bal.plot(treat ~ age + educ + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight", method = "matching",
imp = ".imp", cluster = "race", which.imp = 1,
which.cluster = NULL, var.name = "age")
```

In `bal.plot()`

, `NULL`

and `NA`

mean the same thing when used as arguments to `which.imp`

and `which.cluster`

.

`love.plot()`

As in other cases, `love.plot()`

can display balance for individual groupings or aggregated across groupings, but with clustered and multiply imputed data, it is possible to display balance for each cluster aggregated across imputations or for each imputation aggregated across clusters. There are 9 ways of producing a plot based on the arguments to `bal.tab()`

or `love.plot()`

:

`which.imp = NULL, which.cluster = NULL`

: Not allowed.`which.imp = NULL, which.cluster = NA`

: Balance aggregated across clusters for each imputation.`which.imp = NULL, which.cluster = c`

: Balance in cluster`c`

for each imputation.`c`

must be of length 1.`which.imp = NA, which.cluster = NULL`

: Balance aggregated across imputations for each cluster.`which.imp = NA, which.cluster = NA`

: Balance aggregated across imputations and clusters.`which.imp = NA, which.cluster = c`

: Balance aggregated across imputations for cluster`c`

.`c`

must be of length 1.`which.imp = i, which.cluster = NULL`

: Balance for imputation`i`

for each cluster.`i`

must be of length 1.`which.imp = i, which.cluster = NA`

: Balance for imputation`i`

aggregated across clusters.`which.imp = i, which.cluster = c`

: Balance for imputation`i`

for cluster`c`

. Either`c`

or`i`

must be of length 1.

Below are examples of (4) and (7):

```
#4)
love.plot(bal.tab(treat ~ age + educ + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight",
method = "matching", imp = ".imp", cluster = "race"),
which.imp = NA, which.cluster = NULL,
agg.fun = "range")
```

```
#7)
love.plot(bal.tab(treat ~ age + educ + married + nodegree + re74 + re75,
data = imp.data, weights = "match.weight",
method = "matching", imp = ".imp", cluster = "race"),
which.imp = 1, which.cluster = NULL)
```

All configurations other than (3), (6), (7), and (9) involve aggregating, and so an argument to `agg.fun`

should be specified (unless its default,`"mean"`

, is desired); we recommend `"range"`

in most cases.

We have demonstrated the use of `cobalt`

with clustered data, multiply imputed, and multiply imputed clustered data. Though there are no published recommendations for the display of balance in these cases, we believe these tools may encourage development in this area. In general, we believe in displaying the most relevant information as compactly as possible, and thus recommend using `love.plot()`

with some degree of aggregation for inclusion in published work.

Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55(4), 1770–1780. https://doi.org/10.1016/j.csda.2010.11.008

Cham, H., & West, S. G. (2016). Propensity Score Analysis With Missing Data. Psychological Methods. https://doi.org/10.1037/met0000076

Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21(2), 222–240. https://doi.org/10.1037/met0000063

Green, K. M., & Stuart, E. A. (2014). Examining moderation analyses in propensity score methods: Application to depression and substance use. Journal of Consulting and Clinical Psychology, 82(5), 773–783. https://doi.org/10.1037/a0036515

Hughes, J. N., Chen, Q., Thoemmes, F., & Kwok, O. (2010). An Investigation of the Relationship Between Retention in First Grade and Performance on High Stakes Tests in Third Grade. Educational Evaluation and Policy Analysis, 32(2), 166–182. https://doi.org/10.3102/0162373710367682

Leite, W. L., Jimenez, F., Kaya, Y., Stapleton, L. M., MacInnes, J. W., & Sandbach, R. (2015). An Evaluation of Weighting Methods Based on Propensity Scores to Reduce Selection Bias in Multilevel Observational Studies. Multivariate Behavioral Research, 50(3), 265–284. https://doi.org/10.1080/00273171.2014.991018

McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114–140. https://doi.org/10.1037/met0000078

Mitra, R., & Reiter, J. P. (2016). A comparison of two methods of estimating propensity scores after multiple imputation. Statistical Methods in Medical Research, 25(1), 188–204. https://doi.org/10.1177/0962280212445945

Thoemmes, F. J., & West, S. G. (2011). The Use of Propensity Scores for Nonrandomized Designs With Clustered Data. Multivariate Behavioral Research, 46(3), 514–543. https://doi.org/10.1080/00273171.2011.569395

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3). Retrieved from http://doc.utwente.nl/78938/

Except when

`quick = TRUE`

. In this case, if`cluster.summary = FALSE`

, it will not be computed, though balance for all individual groups will be computed even if not requested.↩If some outputs are suppressed, they can still be displayed later with the

`print()`

command, which allows you to respecify display options for printing saved`bal.tab`

objects. Thus, if balance is to be checked on the same data set multiple times in different ways, it is advisable to save the output of`bal.tab()`

to a variable and then use`print()`

on that variable with the desired display options. Note that if`quick = TRUE`

, not all display options will be available with this method.↩