Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro `%paired`

was written to create summary tables with a single call. With the increasing interest in R, we have developed the function `paired()`

to create similar tables within the R environment.

This vignette is light on purpose; `paired()`

piggybacks off of tableby, so most documentation there applies here, too.

The first step when using the `paired()`

function is to load the `arsenal`

package. We can’t use `mockstudy`

here because we need a dataset with paired observations, so we’ll create our own dataset.

```
library(arsenal)
dat <- data.frame(
tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
stringsAsFactors = FALSE
)
```

To create a simple table stratified by time point, use a `formula=`

statement to specify the variables that you want summarized and the `id=`

argument to specify the paired observations.

```
p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)
summary(p)
```

Time Point 1 (N=4) | Time Point 2 (N=4) | Difference (N=4) | p value | |
---|---|---|---|---|

Cat |
1.000 | |||

A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |

B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |

Fac |
0.261 | |||

A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |

B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |

C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |

Num |
0.391 | |||

Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |

Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |

Ord |
0.174 | |||

I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | |

II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |

III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | |

Lgl |
1.000 | |||

FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |

TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |

Dat |
0.182 | |||

median | 2018-05-03 | 2018-05-04 | 0.500 | |

Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |

The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2.

Note that by default, observations which do not have both timepoints are removed. This is easily changed using the `na.action = na.paired("<arg>")`

argument. For example:

```
p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id,
signed.rank.exact = FALSE, na.action = na.paired("fill"))
summary(p)
```

Time Point 1 (N=6) | Time Point 2 (N=6) | Difference (N=6) | p value | |
---|---|---|---|---|

Cat |
1.000 | |||

N-Miss | 2 | 1 | 2 | |

A | 2 (50.0%) | 2 (40.0%) | 1 (50.0%) | |

B | 2 (50.0%) | 3 (60.0%) | 1 (50.0%) | |

Fac |
0.261 | |||

N-Miss | 1 | 1 | 2 | |

A | 2 (40.0%) | 2 (40.0%) | 2 (100.0%) | |

B | 1 (20.0%) | 2 (40.0%) | 1 (100.0%) | |

C | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |

Num |
0.391 | |||

N-Miss | 1 | 2 | 2 | |

Mean (SD) | 2.200 (1.643) | 3.250 (0.957) | 0.500 (1.000) | |

Range | 0.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |

Ord |
0.174 | |||

N-Miss | 1 | 1 | 2 | |

I | 2 (40.0%) | 1 (20.0%) | 2 (100.0%) | |

II | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |

III | 1 (20.0%) | 3 (60.0%) | 0 (0.0%) | |

Lgl |
1.000 | |||

N-Miss | 1 | 1 | 2 | |

FALSE | 3 (60.0%) | 2 (40.0%) | 2 (100.0%) | |

TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) | |

Dat |
0.182 | |||

N-Miss | 1 | 1 | 2 | |

median | 2018-05-04 | 2018-05-05 | 0.500 | |

Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |

For more details, see the help page for `na.paired()`

.

The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

The following tests are accepted:

`paired.t`

: A paired t-test.`mcnemar`

: McNemar’s test.`signed.rank`

: the signed-rank test.`sign.test`

: the sign test.`notest`

: Don’t perform a test.

`paired.control`

settingsA quick way to see what arguments are possible to utilize in a function is to use the `args()`

command. Settings involving the number of digits can be set in `paired.control`

or in `summary.tableby`

.

`args(paired.control)`

```
## function (test = TRUE, diff = TRUE, test.pname = NULL, numeric.test = "paired.t",
## cat.test = "mcnemar", ordered.test = "signed.rank", date.test = "paired.t",
## numeric.stats = c("Nmiss", "meansd", "range"), cat.stats = c("Nmiss",
## "countpct"), ordered.stats = c("Nmiss", "countpct"),
## date.stats = c("Nmiss", "median", "range"), stats.labels = list(Nmiss = "N-Miss",
## Nmiss2 = "N-Miss", meansd = "Mean (SD)", medianq1q3 = "Median (Q1, Q3)",
## q1q3 = "Q1, Q3", range = "Range", countpct = "Count (Pct)"),
## digits = 3L, digits.count = 0L, digits.p = 3L, format.p = TRUE,
## conf.level = 0.95, mcnemar.correct = TRUE, signed.rank.exact = NULL,
## signed.rank.correct = TRUE, ...)
## NULL
```

`summary.tableby`

settingsSince the “paired” object inherits “tableby”, the `summary.tableby`

function is what’s actually used to format and print the table.

`args(arsenal:::summary.tableby)`

```
## function (object, ..., labelTranslations = NULL, text = FALSE,
## title = NULL, pfootnote = FALSE, term.name = "")
## NULL
```