Usage of the sdcHierarchies-Package

Bernhard Meindl

2019-03-07

Introduction

The sdcHierarchies packages allows to create, modify and export nested hierarchies that are used for example to define tables in statistical disclosure control software such as in sdcTable

Usage

Before using, the package needs to be loaded:

library(sdcHierarchies)

Create and modify a hierarchy from scratch

hier_create() allows to create a hierarchy. Argument root specifies the name of the root node. Optionally, it is possible to add some nodes to the top-level by listing their names in argument node_labs. Also, hier_display() shows the hierarchical structure of the current tree as shown below:

h <- hier_create(root = "Total", nodes = LETTERS[1:5])
hier_display(h)
## Total
## ├─A
## ├─B
## ├─C
## ├─D
## └─E

Once such an object is created, it can be modified by the following functions:

These functions can be applied as shown below:

## adding nodes below the node specified in argument `node`
h <- hier_add(h, root = "A", nodes = c("a1", "a2"))
h <- hier_add(h, root = "B", nodes = c("b1", "b2"))
h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b"))

# deleting one or more nodes from the hierarchy
h <- hier_delete(h, nodes = c("a1", "b2"))
h <- hier_delete(h, nodes = c("a2"))

# rename nodes
h <- hier_rename(h, nodes = c("C" = "X", "D" = "Y"))
hier_display(h)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─E
## ├─X
## └─Y

We note that the underlying data.tree package allows to modify the objects on reference so no explicit assignment of the form is required.

Information about nodes

Function hier_info() returns information about the nodes that are specified in argument leaves.

# about a specific node
info <- hier_info(h, nodes = c("b1", "E"))

info is a named list where each list element refers to a queried node. The results for level b1 could be extracted as shown below:

info$b1
## $name
## [1] "b1"
## 
## $is_rootnode
## [1] FALSE
## 
## $level
## [1] 3
## 
## $is_leaf
## [1] FALSE
## 
## $siblings
## character(0)
## 
## $contributing_codes
## [1] "b1_a" "b1_b"
## 
## $children
## [1] "b1_a" "b1_b"
## 
## $parent
## [1] "B"
## 
## $is_bogus
## [1] FALSE

Information about all nodes can be extracted by not specifying argument leaves.

Convert to other formats

Function hier_convert() takes a hierarchy and allows to convert the network based structure to different formats while hier_export() does the conversion and writes the results to a file on the disk. The following formats are currently supported:

# conversion to a "@;label"-based format
res_df <- hier_convert(h, as = "df")
print(res_df)
##   level  name
## 1     @ Total
## 2    @@     A
## 3    @@     B
## 4   @@@    b1
## 5  @@@@  b1_a
## 6  @@@@  b1_b
## 7    @@     E
## 8    @@     X
## 9    @@     Y

The required code to create this hierarchy could be computed using:

code <- hier_convert(h, as = "code"); cat(code, sep = "\n")
## library(sdcHierarchies)
## tree <- hier_create(root = 'Total', nodes = c('A', 'B', 'E', 'X', 'Y'))
## tree <- hier_add(tree = tree, root = 'B', nodes = 'b1')
## tree <- hier_add(tree = tree, root = 'b1', nodes = c('b1_a', 'b1_b'))
## print(tree)

Using hier_export() one can write the results to a file. This is for example useful if one wants to create hrc-files that could be used as input for \(\tau\)-argus which can be achieved as follows:

hier_export(h, as = "argus", path = file.path(tempfile(), "hierarchy.hrc"))

Create a hierarchy from data.frames, code or json

hier_import() returns a network-based hierarchy given either a data.frame (in @;labs-format), json format, code or from a tau-argus compatible hrc-file. For example if we want to create a hierarchy based of res_df which was previously created using hier_convert(), the code is as simple as:

n_df <- hier_import(inp = res_df, from = "df")
hier_display(n_df)
## Total
## ├─A
## ├─B
## │ └─b1
## │   ├─b1_a
## │   └─b1_b
## ├─E
## ├─X
## └─Y

Using hier_import(inp = "hierarchy.hrc", from = "argus") one could create a sdc hierarchy object directly from a hrc-file.

Create/Compute hierarchies from a string

Often it is the case, the the nested hierarchy information in encoded in a string. Function hier_compute() allows to transform such strings into hierarchy objects. One can distinguish two cases: The first case is where all input codes have the same length while in the latter case the length of the codes differs. Let’s assume we have a geographic code given in geo_m where digits 1-2 refer to the first level, digit 3 to the second and digits 4-5 to the third level of the hierarchy.

geo_m <- c(
  "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062",
  "02000",
  "03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255",
  "03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360",
  "03361", "03451", "03452", "03453", "03454", "03455", "03456",
  "10155")

Function hier_compute() takes a character vector and creates a hierarchy from it. In argument method, two ways of specifying the encoded levels can be chosen.

In case the overal total is not encoded in the input, specifying argument root allows to give a name to the overall total. Additionally, it is possible to set the desired output format in parameter as. In the example below setting as = "df" returns the result as a data.frame in @; key-format. The two methods on how to define the positions of the levels are interchangable and lead to the same hierarchy as shown below:

v1 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 3, 5), 
  root = "Tot", 
  method = "endpos", 
  as = "df"
)

v2 <- hier_compute(
  inp = geo_m, 
  dim_spec = c(2, 1, 2), 
  root = "Tot", 
  method = "len",
  as = "df"
)

identical(v1, v2)
## [1] TRUE
hier_display(v1)
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

If the total is contained in the string, let’s say in the first 3 positions of the input values, the hierarchy can be computed as follows:

geo_m_with_tot <- paste0("Tot", geo_m)
head(geo_m_with_tot)
## [1] "Tot01051" "Tot01053" "Tot01054" "Tot01055" "Tot01056" "Tot01057"
v3 <- hier_compute(
  inp = geo_m_with_tot, 
  dim_spec = c(3, 2, 1, 2), 
  method = "len"
); hier_display(v3)
## Tot
## ├─01
## │ └─010
## │   ├─01051
## │   ├─01053
## │   ├─01054
## │   ├─01055
## │   ├─01056
## │   ├─01057
## │   ├─01058
## │   ├─01059
## │   ├─01060
## │   ├─01061
## │   └─01062
## ├─02
## │ └─020
## │   └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │   ├─03451
## │   ├─03452
## │   ├─03453
## │   ├─03454
## │   ├─03455
## │   └─03456
## └─10
##   └─101
##     └─10155

The result is the same as v1 and v2 previously generated.

hier_compute() can also deal with inputs that are of different length as shown in the next example.

## second example, unequal strings; overall total not included in input
yae_h <- c(
  "1.1.1.", "1.1.2.",
  "1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.",
  "1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.",
  "1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.",
  "1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.")
v1 <- hier_compute(
  inp = yae_h, 
  dim_spec = c(2,2,2), 
  root = "Tot", 
  method = "len"
); hier_display(v1)
## Tot
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ └─1.4.5.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

We also note that there is another way to specify the inputs in hier_compute(). Setting argument method = "list" allows to create a hierarchy from a given named list. In such a list, the name of a list element is interpreted as the name of the parent node of all codes of the specific list element. An example is shown below:

yae_ll <- list()
yae_ll[["Total"]] <- c("1.", "2.", "3.")
yae_ll[["1."]] <- paste0("1.", 1:9, ".")
yae_ll[["1.1."]] <- paste0("1.1.", 1:2, ".")
yae_ll[["1.2."]] <- paste0("1.2.", 1:5, ".")
yae_ll[["1.3."]] <- paste0("1.3.", 1:5, ".")
yae_ll[["1.4."]] <- paste0("1.4.", 1:6, ".")
d <- hier_compute(inp = yae_ll, root = "Total", method = "list") 
## Argument 'dim_spec' is ignored when constructing a hierarchy from a nested list.
hier_display(d)
## Total
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ ├─1.4.5.
## │ │ └─1.4.6.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.

Interactively create or modify hierarchies

The package also contains a shiny-based interactive app that can be started using hier_app(). The app allows to pass as input either a character vector (that should be converted into a hierarchy) or an existing hierarchy and can be started as follows given the hierarchy previously generated using hier_compute():

d <- sdcHier(d)

If a character vector is passed to sdcHier(), the interface allows to specify the arguments for hier_compute(). Once a hierarchy is created, the interface changes and the tree can be dynamically changed by dragging nodes around. Futhermore, it is possible to add, remove or rename nodes. The required code to construct the current hierarchy is displayed and can be saved to disk. Furthermore, there is functionality to undo the last step as well as to export results to either the R-session or write results to a file. This is especially helpful if one wants to create for example an hrc-file as input for \(\tau\)-argus. Please note that sdcHier() is able to return the modified hierarchy and not only save results to disk. In order to continue working, one may assign the result to a new object as shown in the code above.

Summary

In case you have any suggestions or improvements, please feel free to file an issue at our issue tracker or contribute to the package by filing a pull request against the master branch.