Non-raster Data: Using Predictor Importance, Predictor Directionality,and Predictive Performance Metrics

Tom Auer, Daniel Fink



  1. Introduction
  2. Loading Centroids
  3. Selecting Region and Season
  4. Plotting Centroids and Extent of Analysis
  5. Plot Predictor Importance
  6. Plot Partial Dependencies
  7. Predictive Performance Metrics


Beyond estimates of occurrence and relative abundance, the eBird Status and Trends products contain information about predictor importance as well as predictive performance metrics (PPMs). The PPMs can be used to evaluate statistical performance of the models, either over the entire spatiotemporal extent of the model results, or for a specific region and season. Predictor Importances (PIs) and can be used to understand relationships between occurrence and abundance and predictors, most notably the land cover variables used in the model. The functions described in this section help load this data from the results packages, give tools for assessing predictive performance and predictor importances. More details about predictive performance metrics and how they were calculated and interpreted can be found in Fink et al. (2019).

Data Structure

IMPORTANT. AFTER DOWNLOADING THE RESULTS, DO NOT CHANGE THE FILE STRUCTURE. All functionality in this package relies on the structure inherent in the delivered results. Changing the folder and file structure will cause errors with this package.

Data are stored in three files, found under /<run_name>/results/, as described below.

/<run_name>/results/preds/test_pred_ave.txt /<run_name>/results/stixels/pi.txt /<run_name>/results/stixels/summary.txt

The ebirdst package provides functions for accessing these, such that you should never have to handle them manually, granted that the original file structure of the results is maintained. These data are stored at “stixel centroids,” which are the centers of the independent, partitioned, regional model extents (see Fink et al. 2010, Fink et al. 2013).

Loading Centroids

The first step when working with stixel centroid data is to load the Predictor Importances (PIs). This file will be used for all of the functions in this vignette and is the input to many of the functions in ebirdst.


# Currently, example data is available on a public s3 bucket. The following 
# ebirdst_download() function copies the species results to a selected path and 
# returns the full path of the results.

# Because the non-raster data is large, there is a parameter on the
# ebirdst_download function that defaults to downloading only the raster data.
# To access the non-raster data, set tifs_only = FALSE.
sp_path <- ebirdst_download(species = "example_data", tifs_only = FALSE)

pis <- load_pis(sp_path)

Selecting Region and Season

When working with Predictive Performance Metrics (PPMs) and/or PIs, it is very common to select a subset of space and time for analysis. In ebirdst this is done by creating a spatiotemporal extent object with ebirdst_extent(). These objects define the region and season for analysis and are passed to many functions in ebirdst. To review the available stixel centroids associated with both PIs and PDs and to see which have been selected by a spatiotemporal subset, use the map_centroids function, which will map and summarize this information.

lp_extent <- ebirdst_extent(c(xmin = -86, xmax = -83, ymin = 42, ymax = 45),
                            t = c(0.425, 0.475))
map_centroids(sp_path, ext = lp_extent)

Plotting Centroids and Extent of Analysis

Similarly, calc_effective_extent() will analyze a spatiotemporal subset of PIs or PDs and plot the selected stixel centroids, as well as a RasterLayer depicting where a majority of the information is coming from. The map ranges from 0 to 1, with pixels have a value of 1 meaning that 100% of the selected stixels are contributing information at that pixel. The function returns the RasterLayer in addition to mapping.

par(mfrow = c(1, 1), mar = c(0, 0, 0, 6))
calc_effective_extent(sp_path, ext = lp_extent)

Plot Predictor Importance

Once predictive performance has been evaluated, exploring information about habitat association and/or avoidance can be done using the PIs and PDs. The plot_pis() function generates a bar plot showing a rank of the most important predictors within a spatiotemporal subset. There is an option to show all predictors or to aggregate FRAGSTATS by the land cover types.

# with all classes
plot_pis(pis, ext = lp_extent, by_cover_class = FALSE, n_top_pred = 15)

# aggregating fragstats for cover classes
plot_pis(pis, ext = lp_extent, by_cover_class = TRUE, n_top_pred = 15)

Predictive Performance Metrics

Beyond confidence intervals provided for the abundance estimates, the centroid data can also be used to calculate predictive performance metrics, to get an idea as to whether there is substantial statistical performance to evaluate information provided by PIs and PDs (as well as abundance and occurrence information).

Binary Metrics by Time

The plot_binary_by_time() function analyzes a species’ entire range of data and plots predictive performance metrics by a custom time interval (typically either 52 for weeks or 12 for months).

plot_binary_by_time(path = sp_path, metric = "kappa", n_time_periods = 12)

All Metrics for Spatiotemporal Extent

The plot_all_ppms() function provides all available predictive performance metrics and is important for determining predictive performance within a spatiotemporal subset region and season. This function is also useful in comparing the performance between subsets.

plot_all_ppms(sp_path, lp_extent)


Fink, D., T. Damoulas, & J. Dave, J. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Twenty-Seventh AAAI Conference on Artificial Intelligence.

Fink, D., W.M. Hochachka, B. Zuckerberg, D.W. Winkler, B. Shaby, M.A. Munson, G. Hooker, M. Riedewald, D. Sheldon, & S. Kelling. 2010. Spatiotemporal exploratory models for broad‐scale survey data. Ecological Applications, 20(8), 2131-2147.

Fink, D., T. Auer, A. Johnston, V. Ruiz‐Gutierrez, W.M. Hochachka, & S. Kelling. 2019. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 00(00):e02056.