This package is meant to implement the concept of a grammar of tables. It allows for a simple formula expression and a data frame to create a rich summary table in a variety of formats. It is designed for extensibility at each step of the process, so that one is not limited by the authors choice of table statistics, output format. The grammar however is an integral part of the package, and as such is not modifiable.
Here’s an example similar to summaryM from Hmisc to get us started:
tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc)
=====================================================================================================================
N D-penicillamine placebo not randomized Test Statistic
154 158 106
---------------------------------------------------------------------------------------------------------------------
Serum Bilirubin (mg/dl) 418 0.70 *1.30* 3.60 0.80 *1.40* 3.22 0.70 *1.40* 3.12 F_{2,415}=0.03, P=0.972
Albumin (gm/dl) 418 3.34 *3.54* 3.78 3.21 *3.56* 3.83 3.12 *3.47* 3.73 F_{2,415}=2.13, P=0.120
Histologic Stage, Ludwig Criteria 412 X^2_6=5.33, P=0.502
1 0.026 4/154 0.076 12/158 0.050 5/100
2 0.208 32/154 0.222 35/158 0.250 25/100
3 0.416 64/154 0.354 56/158 0.350 35/100
4 0.351 54/154 0.348 55/158 0.350 35/100
Prothrombin Time (sec.) 416 10.0 *10.6* 11.4 10.0 *10.6* 11.0 10.1 *10.6* 11.0 F_{2,413}=0.23, P=0.795
sex : female 418 0.903 139/154 0.867 137/158 0.925 98/106 X^2_2=2.38, P=0.304
Age 418 41.4 *48.1* 55.8 42.9 *51.9* 59.0 46.0 *53.0* 61.1 F_{2,415}=6.10, P=0.002
spiders : present 312 0.292 45/154 0.285 45/158 X^2_1=0.02, P=0.885
=====================================================================================================================
Or the same directly into an Rmarkdown pipe_table:
#rmd(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc))
Notice that stage in the formula wasn’t stored as a factor, i.e. Categorical variable, so by adding a type specifier in the formula given, it is treated as a Categorical. There is no preconversion applied to the data frame, nor is there a guess based on the number of unique values. Full direct control of typing is provided in the formula specification.
It also supports HTML5, with styling fragments
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc, msd=TRUE, quant=seq(0, 1, 0.25)),
fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2")
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubinmg/dl | 418 | 0.300.701.303.6028.00 3.65±0.43 | 0.300.801.403.2220.00 2.87±0.29 | 0.400.701.403.1218.00 3.12±0.39 | F2,415 = 0.03,P = 0.9721 |
Albumingm/dl | 418 | 1.963.343.543.784.38 3.52±0.03 | 2.103.213.563.834.64 3.52±0.04 | 2.313.123.473.734.52 3.43±0.04 | F2,415 = 2.13,P = 0.1201 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.502 | |||
1 | 0 . 0262.597 4154 | 0 . 0767.595 12158 | 0 . 0505.000 5100 | ||
2 | 0 . 20820.779 32154 | 0 . 22222.152 35158 | 0 . 25025.000 25100 | ||
3 | 0 . 41641.558 64154 | 0 . 35435.443 56158 | 0 . 35035.000 35100 | ||
4 | 0 . 35135.065 54154 | 0 . 34834.810 55158 | 0 . 35035.000 35100 | ||
Prothrombin Timesec. | 416 | 9.210.010.611.417.1 10.8±0.1 | 9.010.010.611.014.1 10.7±0.1 | 9.010.110.611.018.0 10.8±0.1 | F2,413 = 0.23,P = 0.7951 |
sex : female | 418 | 0 . 90390.260139154 | 0 . 86786.709137158 | 0 . 92592.453 98106 | χ2 2 = 2.38,P = 0.304 |
Age | 418 | 30.641.448.155.874.5 48.6±0.8 | 26.342.951.959.078.4 51.4±0.9 | 33.046.053.061.175.0 52.9±1.0 | F2,415 = 6.10,P = 0.0021 |
spiders : present | 312 | 0 . 29229.221 45154 | 0 . 28528.481 45158 | χ2 1 = 0.02,P = 0.885 |
Fragments can have localized style sheets specified by given id.
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3")
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubinmg/dl | 418 | 0.701.303.60 | 0.801.403.22 | 0.701.403.12 | F2,415 = 0.03,P = 0.9721 |
Albumingm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.73 | F2,415 = 2.13,P = 0.1201 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.502 | |||
1 | 0 . 0262.597 4154 | 0 . 0767.595 12158 | 0 . 0505.000 5100 | ||
2 | 0 . 20820.779 32154 | 0 . 22222.152 35158 | 0 . 25025.000 25100 | ||
3 | 0 . 41641.558 64154 | 0 . 35435.443 56158 | 0 . 35035.000 35100 | ||
4 | 0 . 35135.065 54154 | 0 . 34834.810 55158 | 0 . 35035.000 35100 | ||
Prothrombin Timesec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.7951 |
sex : female | 418 | 0 . 90390.260139154 | 0 . 86786.709137158 | 0 . 92592.453 98106 | χ2 2 = 2.38,P = 0.304 |
Age | 418 | 41.448.155.8 | 42.951.959.0 | 46.053.061.1 | F2,415 = 6.10,P = 0.0021 |
spiders : present | 312 | 0 . 29229.221 45154 | 0 . 28528.481 45158 | χ2 1 = 0.02,P = 0.885 |
Fragments can have localized style sheets specified by given id.
tbl <- tangram("drug ~ bili[2] + albumin + stage::Categorical[1] + protime + sex[1] + age + spiders[1]",
data=pbc,
pformat = 5)
html5(tbl,
fragment=TRUE,
inline="lancet.css",
caption = "HTML5 Table Lancet Style", id="tbl4"
)
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubinmg/dl | 418 | 0.701.303.60 | 0.801.403.22 | 0.701.403.12 | F2,415 = 0.03,P = 0.972481 |
Albumingm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.73 | F2,415 = 2.13,P = 0.119961 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.50235 | |||
1 | 0 . 02.6 4154 | 0 . 17.6 12158 | 0 . 15.0 5100 | ||
2 | 0 . 220.8 32154 | 0 . 222.2 35158 | 0 . 225.0 25100 | ||
3 | 0 . 441.6 64154 | 0 . 435.4 56158 | 0 . 335.0 35100 | ||
4 | 0 . 435.1 54154 | 0 . 334.8 55158 | 0 . 335.0 35100 | ||
Prothrombin Timesec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.794721 |
sex : female | 418 | 0 . 990.3139154 | 0 . 986.7137158 | 0 . 992.5 98106 | χ2 2 = 2.38,P = 0.30387 |
Age | 418 | 41.448.155.8 | 42.951.959.0 | 46.053.061.1 | F2,415 = 6.10,P = 0.002451 |
spiders : present | 312 | 0 . 329.2 45154 | 0 . 328.5 45158 | χ2 1 = 0.02,P = 0.88534 |
It is also capable of producing an index of contents inside a table for traceability.
index(tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]
key src value
[1,] "NTM3" "tangram:bili:drug[D-penicillamine]:N" "154"
[2,] "OTRl" "tangram:bili:drug[placebo]:N" "158"
[3,] "ZjNi" "tangram:bili:drug[not randomized]:N" "106"
[4,] "MGNk" "tangram:bili:drug:cell_n1" "418"
[5,] "MzAx" "tangram:bili:drug[D-penicillamine]:cell_iqr1" "0.70"
[6,] "NzM5" "tangram:bili:drug[D-penicillamine]:cell_iqr2" "1.30"
[7,] "YWE4" "tangram:bili:drug[D-penicillamine]:cell_iqr3" "3.60"
[8,] "M2Yw" "tangram:bili:drug[placebo]:cell_iqr1" "0.80"
[9,] "OGQ4" "tangram:bili:drug[placebo]:cell_iqr2" "1.40"
[10,] "Mjg1" "tangram:bili:drug[placebo]:cell_iqr3" "3.22"
[11,] "MTAw" "tangram:bili:drug[not randomized]:cell_iqr1" "0.70"
[12,] "NTdl" "tangram:bili:drug[not randomized]:cell_iqr2" "1.40"
[13,] "OGZi" "tangram:bili:drug[not randomized]:cell_iqr3" "3.12"
[14,] "OTU5" "tangram:bili:drug:F" "0.03"
[15,] "NzFm" "tangram:bili:drug:df1" "2"
[16,] "ZjRl" "tangram:bili:drug:df2" "415"
[17,] "MjIz" "tangram:bili:drug:P" "0.972"
[18,] "MTY2" "tangram:albumin:drug:cell_n1" "418"
[19,] "Yzlm" "tangram:albumin:drug[D-penicillamine]:cell_iqr1" "3.34"
[20,] "OGFj" "tangram:albumin:drug[D-penicillamine]:cell_iqr2" "3.54"
x <- round(rnorm(375, 79, 10))
y <- round(rnorm(375, 80, 9))
y[rbinom(375, 1, prob=0.05)] <- NA
attr(x, "label") <- "Global score, 3m"
attr(y, "label") <- "Global score, 12m"
html5(tangram(1 ~ x+y,
data.frame(x=x, y=y),
after=hmisc_intercept_cleanup),
fragment=TRUE, inline="lancet.css", caption="", id="tbl5")
N | All | |
Global score, 3m | 375 | 728087 |
Global score, 12m | 374 | 758086 |
The Hmisc default style recognizes 3 types: Categorical, Bionimial, and Numerical. Then for each product of these two, a function is provided to generate the corresponding rows and columns. As mentioned before, the user can declare any type in a formula, and one is not limited to the Hmisc defaults. This is completely customizable, which will be covered later.
Let’s cover the phases of table generations.
drug ~ stage::Categorical
, is a Categorical\(\times\)Categorical which references the summarize_chisq
for compiling. One can easily specify different compilers for a formula and get very different results inside a formula. Note: the application of multiplication *
cannot be done in the previous phase, because this involves semantic meaning of what multiplication means. In one context it might be an interaction, in another simple multiplication. Handling multiplicative terms can be tricky. Once compiling is finished a table object composed of cells (list of lists) which are one of a variety of S3 types is the result.A simple example of using an intercept in a formula, with some post processing to remove undesired columns.
d1 <- iris
d1$A <- d1$Sepal.Length > 5.1
attr(d1$A,"label") <- "Sepal Length > 5.1"
tbl1 <- tangram(
Species + 1 ~ A + Sepal.Width,
data = d1,
after = list(drop_statistics, function(tbl) del_col(tbl, 6))
)
html5(tbl1,
fragment=TRUE, inline="nejm.css", caption = "Example All Summary", id="tbl1")
N | setosa | versicolor | virginica | All | |
50 | 50 | 50 | 150 | ||
Sepal Length > 5.1 : TRUE | 150 | 0 . 28028.0001450 | 0 . 92092.0004650 | 0 . 98098.0004950 | 0 . 72772.667109150 |
Sepal.Width | 150 | 3.193.403.70 | 2.502.803.00 | 2.803.003.20 | 2.803.003.31 |
The library is designed to be extensible, in the hopes that more useful summary functions can generate results into a wide variety of formats. This is done by the translator functions, which given a row and column from a formula will process the data into a table.
This example shows how to create a function that given a row and column, to construct summary entries for a table.
### Make up some data, which has events nested within an id
n <- 1000
df <- data.frame(id = sample(1:250, n*3, replace=TRUE), event = as.factor(rep(c("A", "B","C"), n)))
attr(df$id, "label") <- "ID"
### Now create custom function for counting events with a category
summarize_count <- function(table, row, column)
{
### Getting Data for row column ast nodes, assuming no factors
datar <- row$data
datac <- column$data
### Grabbing categories
col_categories <- levels(datac)
n_labels <- lapply(col_categories, FUN=function(cat_name){
x <- datar[datac == cat_name]
cell_n(length(unique(x)), subcol=cat_name)
})
# Test a poisson model
test <- aov(glm(x ~ treatment,
aggregate(datar, by=list(id=datar, treatment=datac), FUN=length),
family=poisson))
# Build the table
table %>%
# Create Headers
row_header(derive_label(row)) %>%
col_header("N", col_categories, "Test Statistic") %>%
col_header("", n_labels, "" ) %>%
# Add the First column of summary data as an N value
add_col(cell_n(length(unique(datar)))) %>%
# Now add quantiles for the counts
table_builder_apply(col_categories, FUN=
function(tbl, cat_name) {
# Compute each data set
x <- datar[datac == cat_name]
xx <- aggregate(x, by=list(x), FUN=length)$x
# Add a column that is a quantile
add_col(tbl, cell_iqr(xx, row$format, na.rm=TRUE))
}) %>%
# Now add a statistical test for the final column
add_col(test)
}
tangram(event ~ id["%1.0f"], df, summarize_count)
=============================================================
N A B C Test Statistic
247 247 245
-------------------------------------------------------------
ID N=250 3 *4* 5 3 *4* 5 3 *4* 5 F_{2,736}=0.02, P=0.976
=============================================================