Spbabel provides simple tools to flip between Spatial and tidy forms of data. This package aims to assist in the ongoing development of tools for spatial data in R.
NOTE: this is largely deprecated by new code in https://github.com/hypertidy/silicate.
Spatial data in the sp
package have a formal definition (extending class Spatial
) that is modelled on shapefiles, and close at least in spirit to the Simple Features definition. See What is Spatial in R? for more details.
Turning these data into long-form tables has been likened to making fish soup, which has a nice nod to the universal translator babelfish.
The spbabel
package tries to help by providing a more systematic encoding into the long-form with consistent naming and lossless ways to re-compose the original (or somewhat modified) objects.
There are two main forms.
The two-table version is similar to that implemented in:
as()
coercion for SpatialLinesDataFrame
to SpatialPointsDataFrame
geom()
fortify()
The four-table form is in development across a number of projects. It is straightforward to work with but there aren’t any high-level tools in package form yet. (Though rbgm is probably the most advanced, it uses an early form that was absolutely necessary to read everything in those files before the more general approach here).
The long-form single table of all coordinates, with object, branch, island-status, and order provides a reasonable middle-ground for transferring between different representations of Spatial data. Tables are always based on the “tibble” since it’s a much better data frame.
The sptable
function creates the table of coordinates with identifiers for object and branch, which is understood by sptable<-
to “fortify” and sp
for the reverse.
The long-form table may seem like soup, but it’s not meant to be seen for normal use. It’s very easy to dump this to databases, or to ask spatial databases for this form. There are other more normalized multi-table approaches as well - this is just a powerful lowest common denominator.
We can tidy this up by encoding the geometry data into a geometry-column, into nested data frames, or by normalizing to tables that store only one kind of data, or with recursive data structures such as lists of matrices. Each of these has strengths and weaknesses. Ultimately I want this to evolve into a fully-fledged set of tools for representing spatial/topological data in R, but still by leveraging existing code whereever possible.
The sptable
function decomposes a Spatial object to a single table structured as a row for every coordinate in all the sub-geometries, including duplicated coordinates that close polygonal rings, close lines and shared vertices between objects.
The sp
function re-composes a Spatial object from a table, it auto-detects the topology by the matching column names:
The sp
function could include overrides to avoid these tests but it’s so easy to modify a table to have the matches for the required topology it hardly seems worth it.
Use the sptable<-
replacement method to modify the underlying geometric attributes (here x
and y
is assumed no matter what coordinate system).
## class : SpatialPolygonsDataFrame
## features : 1
## extent : 112.9511, 159.1019, -54.74973, -10.05167 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## variables : 11
## # A tibble: 1 x 11
## FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT
## * <fct> <fct> <fct> <int> <fct> <int> <dbl> <int> <int> <dbl> <dbl>
## 1 AS AU AUS 36 Aust… 768230 2.03e7 9 53 136. -25.0
## long-form encoding of objects
oztab <- sptable(oz)
## make a copy to modify
woz <- oz
library(dplyr)
## modify the geometry on this object without separating the vertices from the objects
sptable(woz) <- sptable(woz) %>% mutate(x_ = x_ - 5)
# plot to compare
plot(oz, col = "grey")
plot(woz, add = TRUE)
We can also restructure objects, by mutating the value of object to be the same as “branch” we get individual objects from each.
pp <- sptable(wrld_simpl %>% subset(NAME == "Japan" | grepl("Korea", NAME)))
## explode (or "disaggregate"") objects to individual polygons
## here each branch becomes an object, and each object only has one branch
## (ignoring hole-vs-island status for now)
wone <- sp(pp %>% mutate(object_ = branch_), crs = attr(pp, "crs"))
op <- par(mfrow = c(2, 1), mar = rep(0, 4))
plot(sp(pp), col = grey(seq(0.3, 0.7, length = length(unique(pp$object)))))
## Warning: Unknown or uninitialised column: 'object'.
The long-form table is also ready for ggplot2:
library(ggplot2)
ggplot(pp) + aes(x = x_, y = y_, fill = factor(object_), group = branch_) + geom_polygon()
## resample the branch_ IDs to mix up the colours so that
## object disaggregation is more apparent
set.seed(10)
ggplot(pp) + aes(x = x_, y = y_, fill = factor(sample(branch_)), group = branch_, col = object_) + geom_polygon()
Please note that that geom_polygon
cannot handle islands with multiple holes, and it only can do one hole by pretending it is a closed path and hides the boundary so you don’t see the sleight of hand. See the package ‘ggpolypath’ for a ‘geom_polypath’ geom.
For the four-table for de-duplication is performed in “x”/“y” by default, but the technique is general and could be applied to a geometric space of any dimension. Duplicates are identified by coercing numeric values to character (after ‘base::duplicated’) and so is not robust to numeric issues. We haven’t found any problems yet.
An example of working with the four table form.
codes <- c("AUS", "SLB", "FJI", "FSM", "KIR",
"NCL", "NFK", "CCK", "CXR", "VUT", "NRU", "PNG",
"SGP", "TUV", "IDN", "TLS", "PLW", "MHL", "NZL")
mm <- map_table(wrld_simpl %>% subset(ISO3 %in% codes))
## cross the dateline properly
mm$v <- mm$v %>% mutate(x_ = ifelse(x_ < 0, x_ + 360, x_))
plot(mm$v[, c("x_", "y_")])
## get an sp object
## cascaded inner join and arrange within branch
ctable <- with(mm,
inner_join(v, bXv) %>% inner_join(b) %>%
arrange(branch_, order_) %>%
dplyr::select(-vertex_))
## Joining, by = "vertex_"
## Joining, by = "branch_"
## class : SpatialPolygonsDataFrame
## features : 19
## extent : 95.00803, 209.7808, -54.74973, 14.59805 (xmin, xmax, ymin, ymax)
## coord. ref. : NA
## variables : 12
## # A tibble: 19 x 12
## FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON
## <fct> <fct> <fct> <int> <fct> <int> <dbl> <int> <int> <dbl>
## 1 AS AU AUS 36 Aust… 768230 2.03e7 9 53 136.
## 2 BP SB SLB 90 Solo… 2799 4.72e5 9 54 160.
## 3 FJ FJ FJI 242 Fiji 1827 8.28e5 9 54 178.
## 4 FM FM FSM 583 Micr… 70 1.10e5 9 57 158.
## 5 KR KI KIR 296 Kiri… 73 9.20e4 9 57 175.
## 6 NC NC NCL 540 New … 1828 2.34e5 9 54 165.
## 7 NF NF NFK 574 Norf… 0 0. 9 53 168.
## 8 CK CC CCK 166 Coco… 1 0. 0 0 96.8
## 9 KT CX CXR 162 Chri… 0 0. 0 0 106.
## 10 NH VU VUT 548 Vanu… 1219 2.15e5 9 54 167.
## 11 NR NR NRU 520 Nauru 2 1.01e4 9 57 167.
## 12 NZ NZ NZL 554 New … 26799 4.10e6 9 53 172.
## 13 PP PG PNG 598 Papu… 45286 6.07e6 9 54 143.
## 14 SN SG SGP 702 Sing… 67 4.33e6 142 35 104.
## 15 TV TV TUV 798 Tuva… 3 1.04e4 9 61 179.
## 16 ID ID IDN 360 Indo… 181157 2.26e8 142 35 114.
## 17 TT TL TLS 626 Timo… 1487 1.07e6 142 35 126.
## 18 PS PW PLW 585 Palau 0 2.01e4 9 57 135.
## 19 RM MH MHL 584 Mars… 0 5.67e3 9 57 169.
## # ... with 2 more variables: LAT <dbl>, object_1 <chr>
## get a gg plot
ggplot(ctable %>% inner_join(dplyr::select(mm$o, NAME, object_))) +
aes(x = x_, y = y_, group = branch_, fill = NAME) +
geom_polygon() + ggplot2::coord_equal()
## Joining, by = "object_"