Using rgeospatialquality together with rgbif

Javier Otegui



The Geospatial Quality API (GQ API) is a REST API created to provide access to a set of basic geospatial assessment functions over sets of primary biodiversity records. This package, rgeospatialquality, is built as a wrapper for the GQ API. It provides native access to the methods of the API and allows to use its functions from within an R environment.

In this document, I will show how this package can be used together with ROpenSci’s rgbif to easily apply quality assessment functions to data downloaded through its methods.

Getting occurrence data with rgbif package

Since version 0.9.2, rgbif package offers a new function called occ_data. According to the changelog:

(…) its primary purpose to perform faster data requests. Whereas occ_search() gives you lots of data, including taxonomic hierarchies and media records, occ_data() only gives occurrence data. (via)

This is a perfect function to show how to build synergies between both packages. We will use the occ_data method to download a set of records using any of the available filters and will pass the data to the add_flags function to directly assess the quality of the records.

First, we need to download some records from GBIF with occ_data:

if(requireNamespace("rgbif", quietly = TRUE)){
    d <- occ_data(
        scientificName="Apis mellifera",

We will extract just 50 records for the bee species Apis mellifera. The default value for limit is 500, but for the purpose of this example, we will stick to a smaller amount of records. minimal=FALSE allows us to get the full set of fields for each record and not only the three “basic” ones (see occ_data documentation for more info).

This method returns a list with 2 elements, meta and data. We will operate with the records themselves, which can be found in the data element

if(requireNamespace("rgbif", quietly = TRUE)){
    d <- d$data

Data structure

Both GBIF and the GQ API use Darwin Core (DwC) as the standard for biodiversity data exchange. This standard suggests certain specific names and formats for data values. In particular, the DwC suggests:

The data frame we obtained in the previuos step is already formatted according to the DwC standard:

if(requireNamespace("rgbif", quietly = TRUE)){
    "decimalLatitude" %in% names(d)
    "decimalLongitude" %in% names(d)
    "countryCode" %in% names(d)
    "scientificName" %in% names(d)

Therefore, we don’t need any further transformation of the data frame, and we can proceed to assess the geospatial quality of the records.

Sending the records to the GQ API

We will use the add_flags function to assess the quality of a set of more than one record. This function is a wrapper for the POST method of the GQ API.

Internally, the function transforms the content of the supplied data.frame to JSON and performs the POST request. Then, translates the results back from JSON to a new data.frame. The resulting object has the same structure as the provided one, with the addition of a list-type new element called flags. Inside that element, there are several sub-fields, each one with the result of a particular check. Please see the GQ API documentation for more information on the functioning of the API.

if(requireNamespace("rgbif", quietly = TRUE)){
    dd <- add_flags(d)
    dd[1,]$flags  # Flags for the first record