# Alternative Thresholds

## Overview

The heatwaveR package was designed to include many different methods for the creation of climatologies and thresholds for detecting extremes events (e.g. heatwaves and cold-spells) in time series data. To this end we made a very large change in the event detection pipeline when we moved from the RmarineHeatWaves package to heatwaveR. This change may primarily be seen with the inclusion of the ts2clm() function and the removal of climatology generating found in RmarineHeatWaves::detect() in favour of detect_event(), which does not calculate climatologies. In this way we have allowed for the introduction of a multitude of more complex climatology calculation and event detection/filtering methods. It is our overarching goal to provide one package that allows climate scientists to calculate these events in both the atmosphere and oceans. But rather than talking about it, let’s walk through some case studies on how this package can be used for diverse applications.

## Double thresholds

Brought to our attention by Mr. Haouari from the IHFR institute of meteorology in Algeria was the concept of using a flat (e.g. 25$$^\circ$$C) tMin bottom boundary when calculating events from tMax with the standard 90th percentile upper threshold. As the authors of the heatwaveR package are admittedly marine oriented, we tend to work with daily time series that have only one mean value per day (tMean; temp; sst). This is why there are not arguments in the heatwaveR suite of functions that call on tMin and tMax explicitly, but that does not mean that one cannot do so. Below we will work through the steps one would take to calculate (atmospheric) heatwaves, as per their definition in Perkins and Alexander (2013) (but excluding the calculation of EHF), and with the additional step proposed by Mr. Haouari.

The following sub-sections will show the step-by-step process one may use to calculate atmospheric heatwaves using a 90th percentile threshold created from the tMax time series for a location, but will only use the days when the corresponding tMin also exceeds a pre-determined flat bottom boundary on the same days to quantify the heatwave metric. We will finish by visualising the results with the built-in heatwaveR graphing functions event_line() and lolli_plot. In the following section we will demonstrate how to detect events with only one threshold (the default method) and filter those results with another threshold.

### Data prep

The first step with any analysis in R should be the loading of the packages to be used.

library(tidyverse)
library(ggpubr)
library(heatwaveR)

# Era-interim would be ideal but as of this writing it is still not
library(weathercan)

With our libraries loaded, we will now go about downloading some weather station data. I am using the weathercan package for this because I currently live in Halifax, Nova Scotia, Canada. But anyone following along should feel free to use whatever data they would like. So long as the data have a date (t), tMin, and tMax column this code will function as designed. We will download and prepare the Halifax Airport weather data in the following code chunk.

# If you'd like to see what stations aare available, run the following line of code:
# station_ID <- weathercan::stations_dl()

halifax_raw <- weather_dl(station_ids = c(6358, 50620), interval = "day", quiet = T)

# Prepare for analysis
halifax <- halifax_raw %>%
dplyr::select(date, min_temp, max_temp) %>%
dplyr::rename(t = date, tMin = min_temp, tMax = max_temp) %>%
dplyr::filter(t >= "1960-01-01") %>%
na.omit()

### Calculating thresholds

With our data downloaded and prepared, we will now calculate the two thresholds we need to correctly detect the heatwaves and accurately quantify their metrics. The first is the 90th percentile threshold based on the tMax time series. The second is the flat exceedance of 15$$^\circ$$C based on the tMin data. I’ve chosen 15$$^\circ$$C here arbitrarily just from looking at the time series data and seeing that this is a moderately warm summer temperature. Theoretically one would have chosen this flat bottom threshold based on some biologically relevant threshold known a priori. For (fake) example, the night-time thermal envelope of the nesting red-crested maple leaf snatcher.

# The tMax threshold
# The current WMO standard climatology period is 1981-01-01 to 2010-12-31 and should be used when possible
tMax_clim <- ts2clm(data = halifax, y = tMax, climatologyPeriod = c("1981-01-01", "2010-12-31"), pctile = 90)

# The tMin exceedance
# Note the use here of 'minDuration = 3' and 'maxGap = 1' as the default atmospheric arguments
# The default marine arguemnts are 'minDuration = 5' and 'maxGap = 2'
tMin_exc <- exceedance(data = halifax, y = tMin, threshold = 15, minDuration = 3, maxGap = 1)$threshold ### Calculating events Now that the two thresholds have been calculated we use the detect_event() function as usual, but provide the second threshold to the argument threshClim2 that normally would lay dormant. # Note that because we calculated our 90th percentile threshold on a column named 'tMax' # and not the default column name 'temp', we must specify this below with 'y = tMax' events <- detect_event(data = tMax_clim, y = tMax, # The 90th percentile threshold threshClim2 = tMin_exc$exceedance) # The flat exceedance threshold

Please note that even though the use of the second threshold does allow for the resultant event metrics to differ, the values themselves are still being calculated against the seasonal climatology and daily temperatures for the time series given to the 90th percentile threshold calculation (in this case tMax) and so using a second threshold (in this case tMin) won’t generally have much of an effect on the event metrics. Rather it mostly screens out smaller or larger events depending on how one chooses to set the threshold. In the case when an exceedance threshold is chosen for a temperature that would typically only occur in summer (e.g. 15$$^\circ$$C, as used here), one is also effectively screening events by season. There are many use cases where this would be desirable. For example, if one is only interested in events that would occur during a season in which a migrating species may be found in an area, such as our eponymous maple leaf snatcher.

### Creating visuals

Even though we have used two thresholds to calculate our events, the results are output the same as though only one threshold (default) were used. This means that we may use the visualisation functions that come with heatwaveR without any extra fuss.

# don't forget to set 'event_line(y = tMax)'
ggarrange(event_line(events, y = tMax, metric = "intensity_max"),
event_line(events, y = tMax, metric = "intensity_max", category = T),
lolli_plot(events),
ncol = 1, nrow = 3, align = "v")

### Alternative second thresholds

Using a percentile based second threshold is not much different than using a static second threshold. Rather than using exceedance() to get our second threshold we can use ts2clm nested within detect_event(). It must also be pointed out that in addition to using multiple thresholds, we can adjust the minimum duration (minDuration) and maximum gap (maxGap) arguments for our multiple thresholds, too. This allows us to provide different ‘flavours’ of criteria for our events. For example, let’s say we are interested in night-time events (tMin) when the temperatures remain above the 80th percentile threshold (pctile = 80) for 10 or more days (minDuration = 10) without dipping below that threshold for more than 2 consecutive days (maxGap = 2). But on top of that, we are also only interested in those parts of the event when the daytime temperatures exceed the 90th percentile threshold (pctile = 90) for 3 or more days (minDuration = 3) non-stop (maxGap = 0).

Below we will look at how to detect/calculate events that meet these rather specific criteria. We will also calculate events with just the first threshold and compare the results visually. It must be noted here that whichever criteria is the most strict, in this case minDuration = 3 and maxGap = 0, will be the predominant filter through which the event metrics are quantified.

# Note that because we are not using the standard column name 'temp' we must
# specify the chosen column name twice, once for ts2clm() and again for detect_event()

# First threshold based on tMin
thresh_tMin <- ts2clm(data = halifax, y = tMin, pctile = 80,
climatologyPeriod = c("1981-01-01", "2010-12-31"))

# Second threshold based on tMax
# Be careful here that you put the arguments within the correct brackets
thresh_tMax <- detect_event(ts2clm(data = halifax, y = tMax, pctile = 90,
climatologyPeriod = c("1981-01-01", "2010-12-31")),
# These arguments are passed to detect_event()
minDuration = 3, maxGap = 0, y = tMax, protoEvents = T)

# Detect/calculate events using the two precalculated thresholds
# Because detect_event() is not able to deduce which arguments we used above,
# we must again tell it explicitly here
events_two_thresh <- detect_event(data = thresh_tMin, y = tMin, minDuration = 10, maxGap = 2,
threshClim2 = thresh_tMax$event, minDuration2 = 3, maxGap2 = 0) # Or to simply use one threshold events_one_thresh <- detect_event(data = thresh_tMin, y = tMin, minDuration = 10, maxGap = 2) Here are the differences in lolliplot format: ggarrange(lolli_plot(events_one_thresh), lolli_plot(events_two_thresh), labels = c("One threshold", "Two thresholds")) Here is a brief display of the top few events from each method: head(events_one_thresh$event)
## # A tibble: 6 x 22
##   event_no index_start index_peak index_end duration date_start date_peak
##      <dbl>       <dbl>      <dbl>     <dbl>    <dbl> <date>     <date>
## 1        1        4743       4753      4759       17 1973-06-25 1973-07-05
## 2        2        6783       6785      6793       11 1979-01-25 1979-01-27
## 3        3        7537       7537      7550       14 1981-02-17 1981-02-17
## 4        4        8334       8341      8344       11 1983-04-25 1983-05-02
## 5        5       10538      10544     10547       10 1989-05-07 1989-05-13
## 6        6       12437      12441     12454       18 1994-07-19 1994-07-23
## # ... with 15 more variables: date_end <date>, intensity_mean <dbl>,
## #   intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## #   intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## #   intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## #   intensity_mean_abs <dbl>, intensity_max_abs <dbl>,
## #   intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>,
## #   rate_onset <dbl>, rate_decline <dbl>
head(events_two_thresh$event) ## # A tibble: 6 x 22 ## event_no index_start index_peak index_end duration date_start date_peak ## <dbl> <dbl> <dbl> <dbl> <dbl> <date> <date> ## 1 1 4756 4757 4758 3 1973-07-08 1973-07-09 ## 2 2 7537 7537 7539 3 1981-02-17 1981-02-17 ## 3 3 12438 12440 12440 3 1994-07-20 1994-07-22 ## 4 4 12617 12618 12619 3 1995-01-15 1995-01-16 ## 5 5 17943 17950 17950 8 2009-08-15 2009-08-22 ## 6 6 18172 18176 18177 6 2010-04-01 2010-04-05 ## # ... with 15 more variables: date_end <date>, intensity_mean <dbl>, ## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>, ## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>, ## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>, ## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, ## # intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>, ## # rate_onset <dbl>, rate_decline <dbl> If we look at these results we see that the use of two thresholds created far fewer events than the use of one threshold. This is because the second threshold was much more ‘difficult’ for the time series to surpass than the first. This method allows for a lot of flexibility, but users should also be cautious that they understand what exactly they are asking their machines to do. In the case above, it is likely that we would actually prefer to calculate our event metrics based entirely on the first threshold, but filter out the events that didn’t meet our second threshold criteria. We will see how to do this in the following section. ## Filtering with a second threshold The methodology outlined below for the detection and filtering of events with two thresholds is somewhat cumbersome. A potential issue with this technique is that the multiple filters do not affect the calculation of the event metrics (e.g. intensity_cumulative), as only the primary threshold given to detect_event() is used when calculating event metrics. This may however be the desired case if one is still interested in knowing the cumulative intensity above the given percentile threshold, but only wants to filter the full event based on some other threshold criteria. I can imagine real-world use cases for both scenarios, which is why this seemingly less sophisticated approach is detailed below. ### Filtering events Because we have already calculated our single threshold events (events_one_thresh) and our second threshold (thresh_tMax) we may directly begin filtering the results. Before we do so, let’s pull out the list components of our results into dataframes for easier use down the line. # Pull out each data.frame as there own object for easier use events_one_event <- events_one_thresh$event
events_one_climatology <- events_one_thresh$climatology This is where things may get tricky for some users, and where the default use of the functions in the heatwaveR package ends. We are now going ‘off-road’ so to speak. But do not despair! The tidyverse suite of packages makes data wrangling like this much more user friendly than it was in the dark days of Base R coding. In order to make the filtering of events easier, we will combine the two different dataframes that we are using as threshold/filtering guides to chose the events that meet all of our selection criteria. # Join the two threshold dataframes two_thresh <- left_join(events_one_climatology, thresh_tMax, by = c("t")) # Remove all days that did not qualify as events in both thresholds two_thresh_filtered <- two_thresh %>% filter(event.x == TRUE, event.y == TRUE) With our filtering guide created, we may now apply it to events_one_thresh to get our filtered results. # Copy data with a new name events_one_thresh_filtered <- events_one_thresh # Then filter events_one_thresh_filtered$event <- events_one_thresh_filtered$event %>% filter(event_no %in% two_thresh_filtered$event_no.x)

# Compare results
head(events_one_thresh_filtered$event) ## # A tibble: 6 x 22 ## event_no index_start index_peak index_end duration date_start date_peak ## <dbl> <dbl> <dbl> <dbl> <dbl> <date> <date> ## 1 1 4743 4753 4759 17 1973-06-25 1973-07-05 ## 2 3 7537 7537 7550 14 1981-02-17 1981-02-17 ## 3 5 10538 10544 10547 10 1989-05-07 1989-05-13 ## 4 6 12437 12441 12454 18 1994-07-19 1994-07-23 ## 5 7 12615 12618 12626 12 1995-01-13 1995-01-16 ## 6 8 13818 13824 13827 10 1998-04-30 1998-05-06 ## # ... with 15 more variables: date_end <date>, intensity_mean <dbl>, ## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>, ## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>, ## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>, ## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, ## # intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>, ## # rate_onset <dbl>, rate_decline <dbl> head(events_two_thresh$event)
## # A tibble: 6 x 22
##   event_no index_start index_peak index_end duration date_start date_peak
##      <dbl>       <dbl>      <dbl>     <dbl>    <dbl> <date>     <date>
## 1        1        4756       4757      4758        3 1973-07-08 1973-07-09
## 2        2        7537       7537      7539        3 1981-02-17 1981-02-17
## 3        3       12438      12440     12440        3 1994-07-20 1994-07-22
## 4        4       12617      12618     12619        3 1995-01-15 1995-01-16
## 5        5       17943      17950     17950        8 2009-08-15 2009-08-22
## 6        6       18172      18176     18177        6 2010-04-01 2010-04-05
## # ... with 15 more variables: date_end <date>, intensity_mean <dbl>,
## #   intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## #   intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## #   intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## #   intensity_mean_abs <dbl>, intensity_max_abs <dbl>,
## #   intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>,
## #   rate_onset <dbl>, rate_decline <dbl>

The event numbers found in events_one_thresh_filtered are similar to the event numbers found in events_two_thresh with the important difference that the event metrics in events_two_thresh were calculated only on the days that exceeded both thresholds, while the events in events_one_thresh_filtered have had their metrics calculated from all of the days that exceeded only the first threshold.

### Visualising filtered events

To better understand how different the results from these two different techniques may be we will use lolliplots to visualise them.

ggarrange(lolli_plot(events_two_thresh, metric = "duration"),
lolli_plot(events_one_thresh_filtered, metric = "duration"),
labels = c("Double threshold", "Filter threshold"))
ggarrange(lolli_plot(events_two_thresh, metric = "intensity_cumulative"),
lolli_plot(events_one_thresh_filtered, metric = "intensity_cumulative"),
labels = c("Double threshold", "Filter threshold"))

One may of course visualise the outputs from the events calculated here with geom_flame() and geom_lolli() as well, but this will not differ from the default method of using these functions as outlined in their help files so we will not go into that here.

## Summary

This vignette serves as a guideline for how to implement multiple methodologies for using two thresholds (tMin and tMax) with atmospheric. We also showed in this vignette a more straight forward approach to using a second threshold through the built-in arguments in detect_event(). The use of a second threshold in this way, whether it be based on a static threshold or one derived from a percentile, is useful for the consideration of events that may be more specifically relevant to a given organism.

I hope the techniques shown in this vignette will be useful both technically and theoretically. The authors of heatwaveR are very happy to receive any further input on the development of the package as well as other potential methods for calculating heatwaves and cold-spells.

## References

Perkins, Sarah E., and Lisa V. Alexander. 2013. “On the measurement of heat waves.” Journal of Climate 26 (13): 4500–4517. https://doi.org/10.1175/JCLI-D-12-00383.1.