brinton library to facilitate exploratory data analysis following the visual information-seeking mantra: “Overview first, zoom and filter, then details on demand.” The main idea is to assist the user during these three phases through three functions:
plotup(). While each of these functions has its own arguments and purpose, all three serve to facilitate exploratory data analysis and the selection of a suitable graphic.
The library can be installed easily from the Comprehensive R Archive Network (CRAN) using the R console. When the library is loaded into memory, it provides a startup message that pays homage to Henry D. Hubbard’s enthusiastic introduction to the book Graphic Presentation by Willard Cope Brinton in 1939:
## Loading required package: ggplot2
## Loading required package: gridExtra
## Loading required package: rmarkdown
## M a G i C i N G R a P H S
wideplot() function allows the user to explore a dataset as a whole using a grid of graphics in which each variable is represented through multiple graphics. Once we have explored the dataset as a whole, the
longplot() allows us to explore other graphics for a given variable. This function also presents a grid of graphics, but instead of showing a selection of graphics for each variable, it presents the full array of graphics available in the library to represent a single variable. Once we have narrowed in on a certain graphic, we can use the
plotup() function, which presents the values of a variable on a single graphic. We can access the code of the resulting graph and adapt it as needed. These three functions expand the graphic types that are presented automatically by the autoGEDA libraries in the R environment.
wideplot() function returns a graphical summary of the variables included in the dataset to which it has been applied. First it groups the variables according to the following sequence:
numeric. Next, it creates a multipanel graphic in html format, in which each variable of the dataset is represented in a row of the grid, while each column displays the different graphics possible for each variable. We called the resulting graphic type wideplot because it shows an array of graphics for all of the columns of the dataset. The structure of the function, the arguments it permits and its default values are as follows:
wideplot(data, dataclass = NULL, logical = NULL, ordered = NULL, factor = NULL, character = NULL, datetime = NULL, numeric = NULL, group = NULL, ncol = 7, label = 'FALSE')
The only argument necessary to obtain a result is
data that expects a
data-frame class object;
ncol filters the first n columns of the grid, between 3 and 7, which will be shown. The fewer columns displayed, the larger the size of the resulting graphics, a feature that is especially useful if the scale labels dwarf the graphics area;
label adds to the grid a vector below each group of rows according to the variable type, with the names and order of the graphics;
numeric make it possible to choose which graphics appear in the grid and in what order, for each variable type. Finally,
group changes the selection of graphics that are shown by default according to the criteria of the table 1.
wideplot() function takes inspiration from this function, but instead of describing the dataset in textual or tabular form, it does it graphically. We can easily compare the results of these two functions, for example, with the dataset esoph from a case-control study of esophageal cancer in Ille-et-Vilaine, France. The dataset has three ordered factor-type variables and two numerical variables:
## 'data.frame': 88 obs. of 5 variables: ## $ agegp : Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ... ## $ alcgp : Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ... ## $ tobgp : Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ... ## $ ncases : num 0 0 0 0 0 0 0 0 0 0 ... ## $ ncontrols: num 40 10 6 5 27 7 4 7 2 1 ...