In this vignette, you can see what a codebook generated from a dataset with rich metadata looks like. This dataset includes mock data for a short German Big Five personality inventory and an age variable. The dataset follows the format created when importing data from However, data imported using the haven package uses similar metadata. You can also add such metadata yourself, or use the codebook package for unannotated datasets.

As you can see below, the codebook package automatically computes reliabilities for multi-item inventories, generates nicely labelled plots and outputs summary statistics. The same information is also stored in a table, which you can export to various formats. Additionally, codebook can show you different kinds of (labelled) missing values, and show you common missingness patterns. As you cannot see, but search engines will, the codebook package also generates JSON-LD metadata for the dataset. If you share your codebook as an HTML file online, this metadata should make it easier for others to find your data. See what Google sees here.

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(warning = TRUE, message = TRUE, error = FALSE)
pander::panderOptions("table.split.table", Inf)

data("bfi", package = 'codebook')
if (!knit_by_pkgdown) {
    bfi <- bfi %>% select(-starts_with("BFIK_extra"),
bfi$age <- rpois(nrow(bfi), 30)
var_label(bfi$age) <- "Alter"

By default, we only set the required metadata attributes name and description to sensible values. However, there is a number of attributes you can set to describe the data better. Find out more.

metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)"
metadata(bfi)$description <- "a small mock Big Five Inventory dataset"
metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520"
metadata(bfi)$datePublished <- "2016-06-01"
metadata(bfi)$creator <- list(
      "@type" = "Person",
      givenName = "Ruben", familyName = "Arslan",
      email = "", 
      affiliation = list("@type" = "Organization",
        name = "MPI Human Development, Berlin"))
metadata(bfi)$citation <- "Arslan (2016). Mock BFI data."
metadata(bfi)$url <- ""
metadata(bfi)$temporalCoverage <- "2016" 
metadata(bfi)$spatialCoverage <- "Goettingen, Germany" 
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)



Dataset name: MOCK Big Five Inventory dataset (German metadata demo)

a small mock Big Five Inventory dataset

Metadata for search engines

  • keywords: session, created, modified, ended, expired, BFIK_agree_4R, BFIK_agree_1R, BFIK_neuro_2R, BFIK_agree_3R, BFIK_neuro_3, BFIK_neuro_4, BFIK_agree_2, BFIK_agree, BFIK_neuro and age

Survey overview

28 completed rows, 28 who entered any information, 0 only viewed the first page. There are 0 expired rows (people who did not finish filling out in the requested time frame). In total, there are 28 rows including unfinished and expired rows.

There were 28 unique participants, of which 28 finished filling out at least one survey.

This survey was not repeated.

The first session started on 2016-07-08 09:54:16, the last session on 2016-11-02 21:19:50.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

People took on average 127.36 minutes (median 1.48) to answer the survey.

## Warning: Durations below 0 detected.
## Warning: Removed 4 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).


Scale: BFIK_agree


Reliability: ωordinal [95% CI] = 0.61 [0.37;0.84].

Missing: 0.