lifecycle Travis-CI Build Status AppVeyor Build Status Coverage Status License: MIT CRAN_Status_Badge

checkr

checkr is an R package to check the dimensions, classes, values, and names of scalars, vectors, lists and data frames. The various functions provide informative errors (or warnings) that allow users to quickly identify and fix any problems.

The following code demonstrates its use

library(checkr)

# the starwars data frame in the dplyr package fails many of these checks
check_data(dplyr::starwars, values = list(
  height = c(66L, 264L),
  name = "",
  mass = c(20,1358, NA),
  hair_color = c("blond", "brown", "black", NA),
  gender = c("male", "female", "hermaphrodite", "none", NA)), 
    order = TRUE, nrow = c(81, 84), key = "hair_color", error = FALSE)
#> Warning: dplyr::starwars column names must include 'height', 'name',
#> 'mass', 'hair_color' and 'gender' in that order
#> Warning: column height of dplyr::starwars must not include missing values
#> Warning: the values in column mass of dplyr::starwars must lie between 20
#> and 1358
#> Warning: column hair_color of dplyr::starwars can only include values
#> 'black', 'blond' or 'brown'
#> Warning: dplyr::starwars must not have more than 84 rows
#> Warning: column 'hair_color' in dplyr::starwars must be a unique key

The two other main functions are check_vector() and check_list().

y <- c(2,1,0,1,NA)
check_vector(y, values = 1:10, length = 2, unique = TRUE, sorted = TRUE, named = TRUE, error = FALSE)
#> Warning: y must be class integer
#> Warning: y must not include missing values
#> Warning: y has unpermitted values 0
#> Warning: y must have 2 elements
#> Warning: y must be unique
#> Warning: y must be sorted
#> Warning: y must be named

Values

The values argument can be used to check the values of a vector, element of a list or column of a data frame.

Class

To check the class simply pass an object of the desired class.

check_vector(y, values = numeric(0))
check_vector(y, values = integer(0))
#> Error: y must be class integer

Missing Values

To check that a vector does not include missing values pass a single non-missing value (of the correct class).

check_vector(y, 1)
#> Error: y must not include missing values

To allow it to include missing values include a missing value.

check_vector(y, c(1, NA))

And to check that it only includes missing values only pass a missing value (of the correct class)

check_vector(y, NA_real_)
#> Error: y must only include missing values

Range

To check the range of a vector pass two non-missing values (as well as the missing value if required).

check_vector(y, c(0, 2, NA))
check_vector(y, c(-1, -10, NA))
#> Error: the values in y must lie between -10 and -1

Specific Values

To check the vector only includes specific values pass three or more non-missing values.

check_vector(y, c(0, 1, 2, NA))
check_vector(y, c(1, 1, 2, NA))
#> Error: y can only include values 1 or 2

Naming Objects

By default, the name of an object is determined from the function call.

check_vector(list(x = 1))
#> Error: list(x = 1) must be an atomic vector

This simplifies things but results in less informative error messages when used in a pipe.

library(magrittr)
y %>% check_list()
#> Error: . must be a list

The argument x_name can be used to override the name.

y %>% check_list(x_name = "y")
#> Error: y must be a list

Inspiration

datacheckr

Contribution

Please report any issues.

Pull requests are always welcome.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.