How the strata and lodes at each axis are ordered, and how to control their order, is a complicated but essential part of **ggalluvial**’s functionality. This vignette explains the motivations behind the implementation and explores the functionality in greater detail than the examples.

All of the functionality discussed in this vignette is exported by **ggalluvial**. We’ll also need a toy data set to play with. I conjured the data frame `toy`

to be nearly as small as possible while complex enough to illustrate the positional controls:

```
# toy data set
set.seed(0)
toy <- data.frame(
subject = rep(LETTERS[1:5], times = 4),
collection = rep(1:4, each = 5),
category = rep(
sample(c("X", "Y"), 16, replace = TRUE),
rep(c(1, 2, 1, 1), times = 4)
),
class = c("one", "one", "one", "two", "two")
)
print(toy)
```

```
## subject collection category class
## 1 A 1 Y one
## 2 B 1 X one
## 3 C 1 X one
## 4 D 1 X two
## 5 E 1 Y two
## 6 A 2 Y one
## 7 B 2 X one
## 8 C 2 X one
## 9 D 2 Y two
## 10 E 2 Y two
## 11 A 3 Y one
## 12 B 3 Y one
## 13 C 3 Y one
## 14 D 3 X two
## 15 E 3 X two
## 16 A 4 X one
## 17 B 4 Y one
## 18 C 4 Y one
## 19 D 4 X two
## 20 E 4 Y two
```

The subjects are classified into categories at each collection point but are also members of fixed classes. Here’s how **ggalluvial** visualizes these data under default settings:

The amount of control the stat layers `stat_alluvial()`

and `stat_flow()`

exert over the positional aesthetics of graphical objects (grobs) is unusual, by the standards of **ggplot2** and many of its extensions. In the layered grammar of graphics framework, the role of a statistical transformation is usually to summarize the original data, for example by binning (`stat_bin()`

) or by calculating quantiles (`stat_qq()`

). These transformed data are *then* sent to geom layers for positioning. The positions of grobs may be adjusted after the statistical transformation, for example when points are jittered (`geom_jitter()`

), but the numerical data communicated by the plot are still the product of the stat.

**ggalluvial** works differently. The stat layers convert repeated measures data into the coordinates for a sequence of stacked bar plots; the geom layers then render rectangles and splines, using these coordinates as guides. Thus, the results of the statistical transformation are not so much intrinsically meaningful as underpinning of an interpretable plot annotation. In this way, the layers of **ggalluvial** behave like `stat_ellipse()`

and its default `geom_path()`

: `StatEllipse()`

transforms point cloud data into a set of coordinates on a confidence (or other) ellipse in sequential order, which are then connected by line segments to mimic a smooth ellipse using `GeomPath()`

.

There are two key reasons for this behavior:

- The coordinates returned by some stat layers can be coupled with multiple geom layers. For example, all four geoms can couple with the
`alluvium`

stat. Moreover, as showcased in the examples, the stats can also meaningfully couple with exogenous geoms like`text`

,`pointrange`

, and`errorbar`

. (In principle, the geoms could also couple with exogenous stats, but i haven’t done this or seen it in the wild.) - Different parameters control the calculations of the coordinates (e.g.
`aes.bind`

and`aggregate.y`

) and the rendering of the graphical elements (`width`

,`knot.pos`

, and`aes.flow`

), and it makes intuitive sense to handle these separately. For example, the heights of the strata and lodes convey information about the underlying data, whereas their widths are arbitrary.

(If the data are provided in alluvia format, then `Stat*$setup_data()`

converts them to lodes format in preparation for the main transformation. This can be done manually using the exported conversion functions, and this vignette will assume the data are already in lodes format.)

Each stat layer demarcates one stack for each data collection point and one rectangle within each stack for each (non-empty) category. In **ggalluvial** terms, the collection points are axes and the rectangles are strata or lodes.

To generate a sequence of stacked bar plots with no connecting flows, only the aesthetics `x`

and `stratum`

are required:

```
# collection point and category variables only
data <- setNames(toy[, 2:3], c("x", "stratum"))
# required fields for stat transformations
data$y <- 1
data$PANEL <- 1
# stratum transformation
StatStratum$compute_panel(data)
```

```
## x stratum y PANEL ymin ymax
## 2 1 Y 1.0 1 0 2
## 1 1 X 3.5 1 2 5
## 4 2 Y 1.5 1 0 3
## 3 2 X 4.0 1 3 5
## 6 3 Y 1.5 1 0 3
## 5 3 X 4.0 1 3 5
## 8 4 Y 1.5 1 0 3
## 7 4 X 4.0 1 3 5
```

Comparing this output to `toy`

, notice first that the data have been aggregated: Each distinct combination of `x`

and `stratum`

occupies only one row. `x`

encodes the axes and is subject to layers specific to this positional aesthetic, e.g. `scale_x_*()`

transformations. `ymin`

and `ymax`

are the lower and upper bounds of the rectangles, and `y`

is their vertical centers. Each stacked rectangle begins where the one below it ends, and their heights are the numbers of subjects (or the totals of their `y`

values, if `y`

is passed a numerical variable) that take the corresponding category value at the corresponding collection point.

Here’s the plot this strata-only transformation yields:

```
ggplot(toy, aes(x = collection, stratum = category)) +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

In this vignette, i’ll use the `stat_*()`

functions to add layers, so that the parameters that control their behavior are accessible via tab-completion.

Within each axis, `stratum`

defaults to reverse order so that the bars proceed in the original order from top to bottom. This can be overridden by setting `reverse = FALSE`

in `stat_stratum()`

:

```
# stratum transformation with strata in original order
StatStratum$compute_panel(data, reverse = FALSE)
```

```
## x stratum y PANEL ymin ymax
## 1 1 X 1.5 1 0 3
## 2 1 Y 4.0 1 3 5
## 3 2 X 1.0 1 0 2
## 4 2 Y 3.5 1 2 5
## 5 3 X 1.0 1 0 2
## 6 3 Y 3.5 1 2 5
## 7 4 X 1.0 1 0 2
## 8 4 Y 3.5 1 2 5
```

```
ggplot(toy, aes(x = collection, stratum = category)) +
stat_stratum(reverse = FALSE) +
stat_stratum(geom = "text", aes(label = category), reverse = FALSE)
```

The caveat to this is that, *if reverse is declared in any layer, then it must be declared in every layer*, so that the layers will not be misaligned. This includes any

`alluvium`

, `flow`

, and `lode`

layers, since their graphical elements are organized within the bounds of the strata.When the strata are defined by a factor variable, they default to the order of the factor. This can be overridden by the `decreasing`

parameter, which defaults to `NA`

but can be set to `TRUE`

or `FALSE`

to arrange the strata in decreasing or increasing order in the `y`

direction:

```
# stratum transformation with strata in original order
StatStratum$compute_panel(data, reverse = FALSE)
```

```
## x stratum y PANEL ymin ymax
## 1 1 X 1.5 1 0 3
## 2 1 Y 4.0 1 3 5
## 3 2 X 1.0 1 0 2
## 4 2 Y 3.5 1 2 5
## 5 3 X 1.0 1 0 2
## 6 3 Y 3.5 1 2 5
## 7 4 X 1.0 1 0 2
## 8 4 Y 3.5 1 2 5
```

```
ggplot(toy, aes(x = collection, stratum = category)) +
stat_stratum(decreasing = TRUE) +
stat_stratum(geom = "text", aes(label = category), decreasing = TRUE)
```

The same caveat applies to `decreasing`

as to `reverse`

: Make sure that all layers using alluvial stats are passed the same values! Henceforth, we’ll use the default (reverse and categorical) ordering of the strata themselves.

In the strata-only plot, each subject is represented once at each axis. *Alluvia* are x-splines that connect these multiple representations of the same subjects across the axes. In order to avoid having these splines overlap at the axes, the `alluvium`

stat must stack the alluvial cohorts—subsets of subjects who have a common profile across all axes—within each stratum. These smaller cohort-specific rectangles are the *lodes*. This calculation requires the additional `alluvium`

aesthetic, which identifies common subjects across the axes:

```
# collection point, category, and subject variables
data <- setNames(toy[, 1:3], c("alluvium", "x", "stratum"))
# required fields for stat transformations
data$y <- 1
data$PANEL <- 1
# alluvium transformation
StatAlluvium$compute_panel(data)
```

```
## alluvium x stratum y PANEL ymax ymin group
## 1 1 1 Y 0.5 1 1 0 1
## 2 2 1 X 3.5 1 4 3 2
## 3 3 1 X 4.5 1 5 4 3
## 4 4 1 X 2.5 1 3 2 4
## 5 5 1 Y 1.5 1 2 1 5
## 6 1 2 Y 0.5 1 1 0 1
## 7 2 2 X 3.5 1 4 3 2
## 8 3 2 X 4.5 1 5 4 3
## 9 4 2 Y 2.5 1 3 2 4
## 10 5 2 Y 1.5 1 2 1 5
## 11 1 3 Y 2.5 1 3 2 1
## 12 2 3 Y 0.5 1 1 0 2
## 13 3 3 Y 1.5 1 2 1 3
## 14 4 3 X 4.5 1 5 4 4
## 15 5 3 X 3.5 1 4 3 5
## 16 1 4 X 3.5 1 4 3 1
## 17 2 4 Y 0.5 1 1 0 2
## 18 3 4 Y 1.5 1 2 1 3
## 19 4 4 X 4.5 1 5 4 4
## 20 5 4 Y 2.5 1 3 2 5
```

The transformed data now contain *one row per cohort*—instead of per category—*per collection point*. The vertical positional aesthetics describe the lodes rather than the strata, and the `group`

variable encodes the `alluvia`

(a convenience for the geom layer).

Here’s how this transformation translates into the alluvial diagram that began the vignette:

```
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_alluvium(aes(fill = class)) +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

The `flow`

stat differs from the `alluvium`

stat by allowing the orders of the lodes within strata to differ from one side of an axis to the other. Put differently, the `flow`

stat allows *mixing* at the axes, rather than requiring that each case or cohort is follows a continuous trajectory from one end of the diagram to the other. As a result, flow diagrams are often much clearer, with the trade-off that cases and cohorts cannot be tracked through them.

```
## alluvium x stratum PANEL side y group ymin ymax
## 15 8 1 Y 1 start 1.0 8 0 2
## 17 9 1 X 1 start 2.5 9 2 3
## 19 10 1 X 1 start 4.0 10 3 5
## 16 8 2 Y 1 end 1.0 8 0 2
## 18 9 2 Y 1 end 2.5 9 2 3
## 20 10 2 X 1 end 4.0 10 3 5
## 9 5 2 Y 1 start 0.5 5 0 1
## 13 7 2 Y 1 start 2.0 7 1 3
## 11 6 2 X 1 start 4.0 6 3 5
## 10 5 3 Y 1 end 0.5 5 0 1
## 12 6 3 Y 1 end 2.0 6 1 3
## 14 7 3 X 1 end 4.0 7 3 5
## 1 1 3 Y 1 start 1.0 1 0 2
## 5 3 3 Y 1 start 2.5 3 2 3
## 3 2 3 X 1 start 3.5 2 3 4
## 7 4 3 X 1 start 4.5 4 4 5
## 2 1 4 Y 1 end 1.0 1 0 2
## 4 2 4 Y 1 end 2.5 2 2 3
## 6 3 4 X 1 end 3.5 3 3 4
## 8 4 4 X 1 end 4.5 4 4 5
```

The `flow`

stat transformation yields *one row per cohort per side per flow*. Each interior axis appears twice in the data, once for the incoming flow and once for the outgoing flow. (The starting and ending axes only have rows for outgoing and incoming flows, respectively.) Here is the flow version of the preceding alluvial diagram:

```
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_stratum() +
stat_flow(aes(fill = class)) +
stat_stratum(geom = "text", aes(label = category))
```

Note: The `aes.flow`

parameter tells `geom_flow()`

how flows should inherit differentiation aesthetics from adjacent axes—`"forward"`

or `"backward"`

. It does *not* influence their positions.

As the number of strata at each axis grows, heterogeneous cases or cohorts can produce highly complex alluvia and very messy diagrams. **ggalluvial** mitigates this by strategically arranging the lodes—the intersections of the alluvia with the strata—so as to minimize their crossings between adjacent axes. This strategy is executed locally: Within each axis \(i\), the order of the lodes is guided by the orders of the strata at *all* axes, starting with \(i\) (so that the lodes are actually positioned within the correct strata). The order in which the remaining axes are factored into this calculation is calculated by the *lode guidance function*. (Because flows do not extend beyond two adjacent axes, the `flow`

stat cannot make use of lode guidance functions.)

A lode guidance function can be passed to the `lode.guidance`

parameter, which defaults to `"zigzag"`

. This function puts the nearest (adjacent) axes first, then zigzags outward from there:

```
## [1] 1 2 3 4
## [1] 2 1 3 4
## [1] 3 4 2 1
## [1] 4 3 2 1
```

Four alternative `lode_*()`

functions are available: `"frontback"`

and `"backfront"`

, which behave like `"zigzag"`

but extend completely in one outward direction from axis \(i\) before the other; and `"forward"`

and `"backward"`

, which put the remaining axes in increasing and decreasing order. Two are illustrated below:

```
## [1] 1 2 3 4
## [1] 2 1 3 4
## [1] 3 2 1 4
## [1] 4 3 2 1
```

```
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_alluvium(aes(fill = class), lode.guidance = "backfront") +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

The difference between `"backfront"`

guidance and `"zigzag"`

guidance can be seen in the order of the lodes of the `"Y"`

stratum at axis `3`

: Whereas `"zigzag"`

minimized the crossings between axes `3`

and `4`

, locating the distinctive class-`"one"`

case above the others, `"backfront"`

minimized the crossings between axes `2`

and `3`

(axis `2`

being immediately before axis `3`

), locating this case below the others.

```
## [1] 1 4 3 2
## [1] 2 4 3 1
## [1] 3 4 2 1
## [1] 4 3 2 1
```

```
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_alluvium(aes(fill = class), lode.guidance = "backward") +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

The effect of `"backward"`

guidance is to keep the right part of the diagram as tidy as possible while allowing the left part to become as messy as necessary. (`"forward"`

has the opposite effect.)

It often makes sense to bundle together the cases and cohorts that fall into common groups used to assign differentiation aesthetics: most commonly `fill`

, but also `alpha`

, which controls the opacity of the `fill`

colors, and `colour`

, `linetype`

, and `size`

, which control the borders of the alluvia, flows, and lodes.

The `aes.bind`

parameter defaults to `FALSE`

; setting it to `TRUE`

prioritizes any such aesthetics *after* the strata of the current axis and *before* those of the remaining axes. In the toy example, this results in the lodes within each stratum being sorted first by class:

```
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_alluvium(aes(fill = class), aes.bind = TRUE) +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

Rather than ordering lodes *within*, the `flow`

stat separately orders the flows *into* and *out from*, each stratum. By default, the flows are ordered with respect first to the orders of the strata at the present axis and second to those at the adjacent axis. In this case, `aes.bind = TRUE`

tells `stat_flow()`

to prioritize flow aesthetics after the present axis and before the adjacent:

Finally, one may wish to put the lodes at each axis in a predefined order (subject to their being located in the correct strata). This can be done by passing an integer matrix or a list of integer vectors to `lode.ordering`

, which takes the columns or elements to prescribe the order of the cases at the axes. For the toy example, we can use a shortcut—a single vector—to put the cases in the order of their IDs in the data at every axis:

```
lode_ord <- matrix(1:5, nrow = 5, ncol = 4)
ggplot(toy, aes(x = collection, stratum = category, alluvium = subject)) +
stat_alluvium(aes(fill = class), lode.ordering = lode_ord) +
stat_stratum() +
stat_stratum(geom = "text", aes(label = category))
```

Within each stratum at each axis, the cases are now in order from bottom to top. ## More examples

More examples of all of the functionality showcased here can be found in the documentation for the `stat_*()`

functions, browsable on the package website.

```
## ─ Session info ──────────────────────────────────────────────────────────
## setting value
## version R version 3.3.3 (2017-03-06)
## os macOS 10.13.6
## system x86_64, darwin13.4.0
## ui X11
## language (EN)
## collate C
## ctype en_US.UTF-8
## tz <NA>
## date 2019-09-02
##
## ─ Packages ──────────────────────────────────────────────────────────────
## package * version date lib
## assertthat 0.2.1 2019-03-21 [3]
## backports 1.1.4 2019-04-10 [3]
## cli 1.1.0 2019-03-19 [3]
## colorspace 1.4-1 2019-03-18 [3]
## crayon 1.3.4 2017-09-16 [3]
## digest 0.6.20 2019-07-04 [3]
## dplyr 0.8.3 2019-07-04 [3]
## ellipsis 0.2.0.1 2019-07-02 [3]
## evaluate 0.13 2019-02-12 [3]
## ggalluvial * 0.10.0 2019-09-02 [1]
## ggfittext 0.6.0 2018-07-06 [3]
## ggplot2 * 3.2.1 2019-08-10 [3]
## ggrepel 0.8.1 2019-05-07 [3]
## glue 1.3.1 2019-03-12 [3]
## gtable 0.3.0 2019-03-25 [3]
## htmltools 0.3.6 2017-04-28 [3]
## knitr 1.22 2019-03-08 [3]
## labeling 0.3 2014-08-23 [3]
## lazyeval 0.2.2 2019-03-15 [3]
## lifecycle 0.1.0 2019-08-01 [3]
## magrittr 1.5 2014-11-22 [3]
## munsell 0.5.0 2018-06-12 [3]
## pillar 1.4.2 2019-06-29 [3]
## pkgconfig 2.0.2 2018-08-16 [3]
## plyr 1.8.4 2016-06-08 [3]
## purrr 0.3.2 2019-03-15 [3]
## R6 2.4.0 2019-02-14 [3]
## RColorBrewer 1.1-2 2014-12-07 [3]
## Rcpp 1.0.2 2019-07-25 [3]
## rlang 0.4.0 2019-06-25 [3]
## rmarkdown 1.12 2019-03-14 [3]
## scales 1.0.0 2018-08-09 [3]
## sessioninfo 1.1.1 2018-11-05 [3]
## stringi 1.4.3 2019-03-12 [3]
## stringr 1.4.0 2019-02-10 [3]
## tibble 2.1.3 2019-06-06 [3]
## tidyr 0.8.99.9000 2019-08-31 [3]
## tidyselect 0.2.5 2018-10-11 [3]
## vctrs 0.2.0 2019-07-05 [3]
## withr 2.1.2 2018-06-23 [3]
## xfun 0.5 2019-02-20 [3]
## yaml 2.2.0 2018-07-25 [3]
## zeallot 0.1.0 2018-01-28 [3]
## source
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## CRAN (R 3.3.3)
## CRAN (R 3.3.3)
## CRAN (R 3.3.2)
## local
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## CRAN (R 3.3.2)
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.0)
## CRAN (R 3.3.3)
## CRAN (R 3.3.3)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## Github (tidyverse/tidyr@8b89cef)
## CRAN (R 3.3.2)
## CRAN (R 3.3.3)
## Github (jimhester/withr@dbcd7cd)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
##
## [1] /private/var/folders/pg/fjg8r4fj5v33zqmwptf9mfg80000gn/T/RtmpN5bqP2/Rinst1827052183864
## [2] /private/var/folders/pg/fjg8r4fj5v33zqmwptf9mfg80000gn/T/RtmpDEbrPm/temp_libpath181891c8d96f0
## [3] /Library/Frameworks/R.framework/Versions/3.3/Resources/library
```