The Spatiotemporal Observation Annotation Tool (STOAT) is a tool for the automated and flexible environmental annotation of biodiversity datasets with gridded environmental data. rstoat provides a portal to initiate, check the progress of, and retrieve data from annotation requests. This introduction vignette will demonstrate the basics of data annotation, retrieval, and analysis, all in the R environment.
The rstoat package provides a wrapper to STOAT. To use rstoat, you first need to authenticate using a personal Map of Life (MOL) account. If you are using RStudio, you can leave the password argument blank to fill via pop-up.
NOTE: This vignette demonstrates code for an account preloaded with appropriate data (the sample species occurrence datasets below, which are additionally included for download using download_sample_data()). Your Map of Life account starts out empty and thus certain lines of code will not work right out of the box.
Occurrence data originate from GBIF, for the two species the powerful owl and the budgerigar, filtered to records from January 1, 2010 - May 21, 2020, limited to continental Australia. DOIs: https://doi.org/10.15468/dl.rx9t53 https://doi.org/10.15468/dl.48e2yv
Powerful owl data consisted of 8,821 records. Budgerigar data consisted of 18,947 records from which 9000 were sampled to match the powerful owl data. Data were reformatted for the Map of Life uploader, and provided here. Successfully annotated data are also provided, in the STOAT result directory format. Access sample data using the following function:
#download_sample_data("destination/path")
samples_directory <- download_sample_data()
#> Created directory: sample_data
#> Downloading powerful owl occurrence data.
#> Downloading budgerigar occurrence data.
#> Downloading powerful owl annotated results data.
#> Unzipping powerful owl
#> Downloading budgerigar annotated results data.
#> Unzipping budgerigar
#> Annotation available in directory: sample_data
list.files(samples_directory)
#> [1] "budgerigar_results" "budgerigar_vignette.csv" "powerful_owl_results" "powerful_owl_vignette.csv"
rstoat provides two different functions for annotation. start_annotation_batch() is the primary annotation function and requires the uploading of a biodiversity dataset to the user’s MOL account via: https://mol.org/upload/.
Alternatively, users can use start_annotation_simple() for small jobs or pilot jobs before a full annotation. start_annotation_simple() accepts occurrence records directly through the console as a dataframe, though it annotates fewer records than a full annotation job.
For both types of annotations, users must provide an environmental layer configuration code that represents the product, layer, spatial and temporal buffers for annotation. See https://mol.org/stoat/sources for a full list of available layers and an explanation of buffers. Multiple layers can be requested for annotation in a single job.
The environmental layer code takes the form: “product-variable-spatialbuffer-temporalbuffer”. Use get_products() to view the full list of product and variable codes, and to check that the requested spatial and temporal buffers are within the ranges allowed (for computational reasons). The units of the spatial and temporal buffers are meters, and days, respectively.
Please see ?start_annotation_simple() for definitions of the output columns
head(get_products())
#> variable product min_spatial max_spatial min_temporal max_temporal spatial_resolution temporal_resolution start_date end_date
#> 1 landcover esacci 300 5000 365 365 300 365 1992-01-01 2018-12-31
#> 2 elevation srtm 30 100 0 0 30 0 <NA> <NA>
#> 3 ndvi modis 250 5000 1 30 250 1 2000-02-24 <NA>
#> 4 evi modis 250 5000 1 30 250 1 2000-02-24 <NA>
#> 5 evi landsat7_rt 30 250 1 64 30 16 2017-05-02 <NA>
#> 6 evi landsat7_pre 30 250 1 64 30 16 1999-05-01 2017-04-21
powerful_owl_sample <- read.csv("sample_data/powerful_owl_vignette.csv")
head(powerful_owl_sample)
#> scientificName decimalLatitude decimalLongitude eventDate
#> 1 Ninox strenua -27.52915 152.9573 2020-03-03
#> 2 Ninox strenua -26.67722 153.1176 2020-04-28
#> 3 Ninox strenua -27.53015 153.2494 2020-05-03
#> 4 Ninox strenua -27.74489 152.9122 2020-05-05
#> 5 Ninox strenua -27.50000 153.0000 2020-03-03
#> 6 Ninox strenua -33.71084 151.0459 2020-04-25
simple_results <- start_annotation_simple(powerful_owl_sample[1:50,], "modis-lst_day-1000-1",
coords = c("decimalLongitude", "decimalLatitude"),
date = "eventDate")
head(simple_results)
#> event_id scientificName decimalLatitude decimalLongitude eventDate product variable s_buff t_buff value stdev valid_pixel_count
#> 1 1 Ninox strenua -27.52915 152.9573 2020-03-03 modis lst_day 1000 1 NA NA 0.000000
#> 2 2 Ninox strenua -26.67722 153.1176 2020-04-28 modis lst_day 1000 1 296.2425 0.15084945 2.164706
#> 3 3 Ninox strenua -27.53015 153.2494 2020-05-03 modis lst_day 1000 1 297.1021 0.08640988 3.564706
#> 4 4 Ninox strenua -27.74489 152.9122 2020-05-05 modis lst_day 1000 1 296.6948 0.51000000 3.545098
#> 5 5 Ninox strenua -27.50000 153.0000 2020-03-03 modis lst_day 1000 1 NA NA 0.000000
#> 6 6 Ninox strenua -33.71084 151.0459 2020-04-25 modis lst_day 1000 1 NA NA 0.000000
Proceeding with full annotation, once a dataset has been uploaded, the user can view their uploaded datasets using my_datasets(). On this account, we have loaded point occurrence datasets from two species, the Powerful Owl (Ninox strenua) and the Budgerigar (Melopsittacus undulatus). Both datasets are small subsets of the species’ GBIF data, over the range of the two species in Australia. my_datasets() will importantly return the dataset_id.
Now, a user can submit a custom annotation job using start_annotation_batch(). The user must submit the dataset_id to be annotated, a name for the job, and a environmental layer code as with the simple annotation.
In this example, we annotate the Powerful Owl dataset with MODIS Land Surface Temperature data with a 1000-meter spatial buffer radius, and two different temporal buffers: 1-day (i.e. only day-of data) and 30-day.
We also annotate the Budgerigar dataset with MODIS Land Surface Temperature data (1000-meter spatial buffer, 1-day temporal buffer) for a species comparison. STOAT supports the combination of multiple species into a single dataset and thus the annotation of multiple species concurrently, but here the species are separated into two jobs for clarity.
NOTE This code will not run successfully unless you have a dataset uploaded to your Map of Life account
get_products()
#start_annotation_batch("dataset_id", "Annotation Title", "Layer Code(s)")
# Here we retrive the UUID of the most recent datasets (the powerful owl and budgerigar), and start annotation on them
start_annotation_batch(dataset_list$dataset_id[1], "powerful_owl_vignette", c("modis-lst_day-1000-1", "modis-lst_day-1000-30"))
start_annotation_batch(dataset_list$dataset_id[2], "budgerigar_vignette", "modis-lst_day-1000-1")
A user can get a list of all of their past and present jobs and their statuses using my_jobs().
Then, they can check in on the status of a submitted annotation job using job_details(). Additionally, they can view the species annotated for a particular successfully completed job using job_species().
#job_details("annotation_id")
job_details(job_list$annotation_id[1])
#job_species("annotation_id")
job_species(job_list$annotation_id[1])
After a short while, the annotation will complete, and an email notification will be sent to the user’s login address. The user can then pull the results into R using download_annotation(). The function will return the directory name to which the file has been downloaded, which can be saved for easy reference.
#download_annotation("annotation_id", "optional/destination/directory")
ninox_result_dir <- download_annotation(job_list$annotation_id[1])
melopsittacus_result_dir <- download_annotation(job_list$annotation_id[2])
For those interested in following the analysis on their own machine, see Sample Data section above for annotated datasets
Begin data visualization by opening the respective output folder and loading in the appropriate csvs. Records are split across multiple files (one for each layer annotated), which must be first merged. STOAT provides a convenience function, read_output(), that automates this procedure and loads annotated data into a single dataframe.
#dataframe_name <- read_output("path/to/your/downloaded/data/directory")
# This it the code you would use if you directly downloaded a successful annotation
#ninox <- read_output(ninox_result_dir)
# Instead, here we read the equivalent output from the sample data directory
ninox <- read_output(paste0(samples_directory, "/powerful_owl_results"))
head(ninox)
#> molid scientificname date.x lng.x lat.x uncertainty lat.y lng.y date.y modis_lst_day_1000_1 modis_lst_day_1000_30
#> 1 25096d2d-3dc0-48ac-9ef9-48d259a5a75a Ninox strenua 2010-10-03T00:00:00Z 152.8816 -27.38746 25000 -27.38746 152.8816 2010-10-03 NA 299.4025
#> 2 4775b07b-8249-478e-86be-aedada4b0569 Ninox strenua 2010-06-04T00:00:00Z 150.6471 -34.74158 25000 -34.74158 150.6471 2010-06-04 NA 289.5026
#> 3 19d95d25-d86e-4c76-8667-361e5f7cd321 Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158 25000 -34.74158 150.6471 2010-08-07 NA 286.1000
#> 4 af37ae9f-95c1-4611-9841-be21b3bee68f Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158 25000 -34.74158 150.6471 2010-08-07 NA 286.1000
#> 5 80285a52-2e54-425c-813c-1e3213296036 Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158 25000 -34.74158 150.6471 2010-08-07 NA 286.1000
#> 6 a3a68f1a-c038-421d-81ee-f365e55fbbb2 Ninox strenua 2010-09-20T00:00:00Z 152.3598 -32.48022 25000 -32.48022 152.3598 2010-09-20 NA 292.8160
Now we can visualize the environmental data for the Powerful Owl with a basic histogram and scatterplot.
hist(ninox$modis_lst_day_1000_1-273.15, breaks = 50, xlab = "MODIS LST Day (degrees C)", ylab = "Count",
main = "Ninox strenua: MODIS LST Day")
plot(as.Date(ninox$date.y), (ninox$modis_lst_day_1000_1-273.15),
xlim = as.Date(c("2010-01-01", "2019-01-01")),
xlab = "Year", ylab = "MODIS LST Day (degrees C)",
main = "Ninox strenua: MODIS LST Day")
axis.Date(1, ninox$date, at=seq(as.Date("2009-01-01"), as.Date("2019-01-01"), by="years"))
We can compare the environmental characterizations of different species to gain insight into their environmental niches. Here, we compare the Powerful Owl with the Budgerigar, annotated with the same spatial and temporal buffers.
melopsittacus <- read_output(paste0(samples_directory, "/budgerigar_results"))
two_species <- rbind(melopsittacus[,1:10], ninox[,1:10])
plot((modis_lst_day_1000_1-273.15) ~ as.Date(date.y), data = two_species,
col = as.numeric(factor(two_species$scientificname)),
xlim = as.Date(c("2010-01-01", "2019-01-01")),
pch = as.numeric(factor(two_species$scientificname)), cex = 0.4,
xlab = "Year", ylab = "MODIS LST Day (degrees C)",
main = "Powerful Owl vs Budgerigar MODIS LST Day")
axis.Date(1, two_species$date, at=seq(as.Date("2009-01-01"), as.Date("2019-01-01"), by="years"))
legend(x="topright", legend = as.character(levels(factor(two_species$scientificname))),
pch = 1:length(levels(factor(two_species$scientificname))),
col=1:length(levels(factor(two_species$scientificname))))
We can also start to explore the impacts of spatial and temporal buffers on the resultant environmental characterizations of species.
plot(density(na.omit(ninox$modis_lst_day_1000_30-273.15)),col = "red",
xlab = "MODIS LST Day (degrees C)",
main = "Powerful Owl MODIS LST Day 1 vs 30-day t_buff")
lines(density(na.omit(ninox$modis_lst_day_1000_1-273.15)), col = "black")
legend(x="topright", legend = c("1-day t_buff", "30-day t_buff"),
pch = 20,
col=c("black", "red"))