Main
Importing the data
Spreadsheet cells are imported with the xlsx_cells()
function, which returns a data frame of all the cells in all the requested sheets. By default, every sheet is imported, but we don’t have to worry about that in this case because there is only one sheet in the file.
Cell formatting isn’t required for this vignette, but if it were, it would be imported via xlsx_formats(path)
.
Importing one of the multiples
The small multiples each have exactly one ‘Fixed Price’ header, so begin by selecting one of those.
From that single cell, selct the three rows of the column headers of the first small multiple. The split()
function below separates each row from one another, wrapping them together in a list. They are separated so that they can individually be joined to the data cells later. Another way to do this is to select each row one by one, assigning them to different variables.
col_headers <-
fixed_price %>%
offset_N(cells, n = 1) %>% # Offset up one row to "IF NWPL Rocky Mountains"
extend_E(cells, 3) %>% # Extend to the right edge of the table
extend_S(cells, 2) %>% # Extend down to the third row of the headers
filter(!is_blank) %>% # Remove blanks
select(row, col, value = character) %>% # Prepare for joining to data cells
split(.$row) # Separate the row of headers into list elements
col_headers
## $`14`
## # A tibble: 1 x 3
## row col value
## <int> <int> <chr>
## 1 14 17 IF NWPL Rocky Mountains
##
## $`15`
## # A tibble: 2 x 3
## row col value
## <int> <int> <chr>
## 1 15 17 Fixed Price
## 2 15 19 Basis
##
## $`16`
## # A tibble: 4 x 3
## row col value
## <int> <int> <chr>
## 1 16 17 BID
## 2 16 18 OFFER
## 3 16 19 BID
## 4 16 20 OFFER
Now select the data cells, starting from the ‘Fixed Price’ header again.
datacells <-
fixed_price %>%
offset_S(cells, n = 2) %>%
extend_E(cells, 4) %>%
extend_S(cells, # Extend down to a blank row
boundary = ~ is_blank, # The formula detects blank cells
edge = TRUE) %>% # Require the whole row to be blank
filter(!is_blank) %>% # Remove remaining blanks
mutate(value = as.double(numeric)) %>%# Convert the values to double
select(row, col, value) # Prepare for joining to headers
print(datacells, n = Inf)
## # A tibble: 24 x 3
## row col value
## <int> <int> <dbl>
## 1 18 17 2.06
## 2 18 18 2.08
## 3 19 17 2.40
## 4 19 18 2.42
## 5 19 19 -0.565
## 6 19 20 -0.545
## 7 20 17 2.59
## 8 20 18 2.61
## 9 20 19 -0.494
## 10 20 20 -0.474
## 11 21 17 2.58
## 12 21 18 2.60
## 13 21 19 -0.585
## 14 21 20 -0.565
## 15 22 17 3.36
## 16 22 18 3.38
## 17 22 19 -0.295
## 18 22 20 -0.275
## 19 23 17 2.63
## 20 23 18 2.65
## 21 23 19 -0.530
## 22 23 20 -0.510
## 23 17 18 1.91
## 24 17 17 1.89
Finally, bind the data cells to the column headers (this is the real magic). For more examples of how the compass directions work (the NNW()
and N()
functions below), see the vignette called Compass Directions
.
datacells %>%
NNW(col_headers[[1]]) %>% # This header isn't in every column
NNW(col_headers[[2]]) %>% # Nor is this header
N(col_headers[[3]]) # But this one is
## # A tibble: 24 x 6
## row col value.data i.value i.value.1 value.header
## <int> <int> <chr> <chr> <dbl> <chr>
## 1 18 17 Fixed Price IF NWPL Rocky Mountains 2.06 BID
## 2 18 18 Fixed Price IF NWPL Rocky Mountains 2.08 OFFER
## 3 19 17 Fixed Price IF NWPL Rocky Mountains 2.40 BID
## 4 19 18 Fixed Price IF NWPL Rocky Mountains 2.42 OFFER
## 5 19 19 Basis IF NWPL Rocky Mountains -0.565 BID
## 6 19 20 Basis IF NWPL Rocky Mountains -0.545 OFFER
## 7 20 17 Fixed Price IF NWPL Rocky Mountains 2.59 BID
## 8 20 18 Fixed Price IF NWPL Rocky Mountains 2.61 OFFER
## 9 20 19 Basis IF NWPL Rocky Mountains -0.494 BID
## 10 20 20 Basis IF NWPL Rocky Mountains -0.474 OFFER
## # ... with 14 more rows
Importing every small multiple at once
The code above, for a single multiple, can easily be adapted to import every one of the small multiples. Here this is done using the purrr()
package to apply the code to each element of a list of ‘Fixed Price’ header cells.
Get all ten ‘Fixed Price’ headers, and separate each into its own list element. Since each cell has a unique combination of row
and col
, that combination can be used to separate the cells into list elements.
Adapt the code for a single multiple a ‘tidy’ function to tidy a general small multiple, starting from the ‘Fixed Price’ header. Here this is done by substituting x
for fixed_price
, and wrapping the three sections of code in a function. Everything else is the same.
tidy <- function(x) {
col_headers <-
x %>%
offset_N(cells, n = 1) %>%
extend_E(cells, 3) %>%
extend_S(cells, 2) %>%
filter(!is_blank) %>%
select(row, col, value = character) %>%
split(.$row)
datacells <-
x %>%
offset_S(cells, n = 2) %>%
extend_E(cells, 4) %>%
extend_S(cells,
boundary = ~ is.na(numeric),
edge = TRUE) %>%
filter(!is.na(numeric)) %>%
select(row, col, value = numeric)
datacells %>%
NNW(col_headers[[1]]) %>%
NNW(col_headers[[2]]) %>%
N(col_headers[[3]])
}
Finally, map the ‘tidy’ function to each ‘Fixed Price’ header, and bind the results. into one data frame. The purrr()
package is used here, but this could also be done with the apply()
family of functions.
map_df(fixed_price, tidy) %>%
arrange(col, row) # See that, from row 39, the region changes, as it ought.
## # A tibble: 240 x 6
## row col value.data i.value i.value.1 value.header
## <int> <int> <chr> <chr> <dbl> <chr>
## 1 28 7 Fixed Price IF CIG Rocky Mountains 1.94 BID
## 2 29 7 Fixed Price IF CIG Rocky Mountains 1.96 BID
## 3 30 7 Fixed Price IF CIG Rocky Mountains 2.35 BID
## 4 31 7 Fixed Price IF CIG Rocky Mountains 2.55 BID
## 5 32 7 Fixed Price IF CIG Rocky Mountains 2.47 BID
## 6 33 7 Fixed Price IF CIG Rocky Mountains 3.31 BID
## 7 34 7 Fixed Price IF CIG Rocky Mountains 2.55 BID
## 8 39 7 Fixed Price AECO / NIT 2.38 BID
## 9 40 7 Fixed Price AECO / NIT 2.40 BID
## 10 41 7 Fixed Price AECO / NIT 2.55 BID
## # ... with 230 more rows
Joining the row headers
So far, only the column headers have been joined, but there are also row headers on the left-hand side of the spreadsheet. The following code incorporates these into the final dataset.
row_headers <-
cells %>%
filter(between(row, 17, 56), between(col, 2, 4), !is_blank) %>%
# Concatenate rows like "Dec-01", "to", "Mar-02"
mutate(character = ifelse(!is.na(character),
character,
format(date, origin="1899-12-30", "%b-%y"))) %>%
select(row, col, character) %>%
nest(-row) %>%
mutate(data = map(data, ~ paste(.x$character, collapse = " "))) %>%
unnest() %>%
mutate(col = 2L) %>%
select(row, col, value = data)
Since the single column of row headers applies to every row of every small multiple (unlike the column headers), a global row_headers
variable can be joined to each small multiple by using a simple W() join. This is incorporated into the definition of tidy()
below (see the bottom line).
tidy <- function(x) {
col_headers <-
x %>%
offset_N(cells, n = 1) %>%
extend_E(cells, 3) %>%
extend_S(cells, 2) %>%
filter(!is_blank) %>%
select(row, col, value = character) %>%
split(.$row)
datacells <-
x %>%
offset_S(cells, n = 2) %>%
extend_E(cells, 4) %>%
extend_S(cells,
boundary = ~ is.na(numeric),
edge = TRUE) %>%
filter(!is_blank) %>%
mutate(value = numeric) %>%
select(row, col, value)
datacells %>%
NNW(col_headers[[1]]) %>%
NNW(col_headers[[2]]) %>%
N(col_headers[[3]]) %>%
W(row_headers) # This is the only new line
}
map_df(fixed_price, tidy) %>%
arrange(col, row) # See that, from row 39, the context loops, as it ought.
## # A tibble: 240 x 7
## row col value.data i.value i.value.1 value.header value
## <int> <int> <chr> <chr> <dbl> <chr> <chr>
## 1 28 7 Fixed Price IF CIG Rocky … 1.94 BID Cash
## 2 29 7 Fixed Price IF CIG Rocky … 1.96 BID ROM
## 3 30 7 Fixed Price IF CIG Rocky … 2.35 BID Dec-01
## 4 31 7 Fixed Price IF CIG Rocky … 2.55 BID Dec-01 t…
## 5 32 7 Fixed Price IF CIG Rocky … 2.47 BID Apr-02 t…
## 6 33 7 Fixed Price IF CIG Rocky … 3.31 BID Nov-02 t…
## 7 34 7 Fixed Price IF CIG Rocky … 2.55 BID One Year…
## 8 39 7 Fixed Price AECO / NIT 2.38 BID Cash
## 9 40 7 Fixed Price AECO / NIT 2.40 BID ROM
## 10 41 7 Fixed Price AECO / NIT 2.55 BID Dec-01
## # ... with 230 more rows