rgeospatialquality
together with rgbif
The Geospatial Quality API (GQ API) is a REST API created to provide access to a set of basic geospatial assessment functions over sets of primary biodiversity records. This package, rgeospatialquality
, is built as a wrapper for the GQ API. It provides native access to the methods of the API and allows to use its functions from within an R environment.
In this document, I will show how this package can be used together with ROpenSci’s rgbif
to easily apply quality assessment functions to data downloaded through its methods.
rgbif
packageSince version 0.9.2, rgbif
package offers a new function called occ_data
. According to the changelog:
(…) its primary purpose to perform faster data requests. Whereas occ_search() gives you lots of data, including taxonomic hierarchies and media records, occ_data() only gives occurrence data. (via)
This is a perfect function to show how to build synergies between both packages. We will use the occ_data
method to download a set of records using any of the available filters and will pass the data to the add_flags
function to directly assess the quality of the records.
First, we need to download some records from GBIF with occ_data
:
if(requireNamespace("rgbif", quietly = TRUE)){
library(rgbif)
d <- occ_data(
scientificName="Apis mellifera",
limit=50,
minimal=FALSE
)
}
We will extract just 50 records for the bee species Apis mellifera. The default value for limit is 500, but for the purpose of this example, we will stick to a smaller amount of records. minimal=FALSE
allows us to get the full set of fields for each record and not only the three “basic” ones (see occ_data
documentation for more info).
This method returns a list with 2 elements, meta
and data
. We will operate with the records themselves, which can be found in the data
element
if(requireNamespace("rgbif", quietly = TRUE)){
d <- d$data
str(d)
}
Both GBIF and the GQ API use Darwin Core (DwC) as the standard for biodiversity data exchange. This standard suggests certain specific names and formats for data values. In particular, the DwC suggests:
genus
+ specificEpithet
)The data frame we obtained in the previuos step is already formatted according to the DwC standard:
if(requireNamespace("rgbif", quietly = TRUE)){
"decimalLatitude" %in% names(d)
"decimalLongitude" %in% names(d)
"countryCode" %in% names(d)
"scientificName" %in% names(d)
}
Therefore, we don’t need any further transformation of the data frame, and we can proceed to assess the geospatial quality of the records.
We will use the add_flags
function to assess the quality of a set of more than one record. This function is a wrapper for the POST
method of the GQ API.
Internally, the function transforms the content of the supplied data.frame
to JSON and performs the POST
request. Then, translates the results back from JSON
to a new data.frame
. The resulting object has the same structure as the provided one, with the addition of a list
-type new element called flags
. Inside that element, there are several sub-fields, each one with the result of a particular check. Please see the GQ API documentation for more information on the functioning of the API.
if(requireNamespace("rgbif", quietly = TRUE)){
dd <- add_flags(d)
str(dd)
dd[1,]$flags # Flags for the first record
}