This vignette is an extended set of examples to highlight the foieGras
package’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is the user’s job to understand their data, ensure they are asking the right questions of their data, and that the analyses they undertake appropriately reflect those questions. We can not do this for you!
This vignette provides a (very) brief overview of how to use foieGras
to filter animal track locations obtained via the Argos satellite system. foieGras
provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):
rw
crw
Both models are continuous-time models, that is, they account for time intervals between successive observations, thereby naturally accounting for the irregularly-timed nature of most Argos data. We won’t dwell on the details of the models here, those will come in a future paper, except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.
foieGras
expects data to be provided in one of four possible formats.
data.frame
or tibble
that looks like this#> # A tibble: 6 x 8
#> id date lc lon lat smaj smin eor
#> <int> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1 111. -66.4 2442 416 42
#> 2 54591 2012-03-05 13:28:10 A 110. -66.4 2758 569 98
#> 3 54591 2012-03-05 18:46:22 Z 109. -66.8 9350 1415 88
#> 4 54591 2012-03-06 04:55:14 0 110. -66.4 49660 391 90
#> 5 54591 2012-03-06 11:43:57 B 110. -66.4 3264 358 79
#> 6 54591 2012-03-06 18:29:49 B 111. -66.4 4305 478 85
where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.
data.frame
or tibble
that looks like this#> # A tibble: 6 x 5
#> id date lc lon lat
#> <chr> <dttm> <chr> <dbl> <dbl>
#> 1 r11 1997-10-27 04:51:17 0 159. -54.6
#> 2 r11 1997-10-27 16:26:39 0 160. -54.6
#> 3 r11 1997-10-28 08:08:46 0 160. -54.7
#> 4 r11 1997-10-28 17:57:13 B 161. -54.5
#> 5 r11 1997-10-29 11:05:20 B 162. -55.1
#> 6 r11 1997-10-30 02:35:14 A 163. -55.6
where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.
#> # A tibble: 6 x 8
#> id date lc lon lat smaj smin eor
#> <int> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1 111. -66.4 2442 416 42
#> 2 54591 2012-03-05 13:28:10 A 110. -66.4 2758 569 98
#> 3 54591 2012-03-05 18:46:22 Z 109. -66.8 NA NA NA
#> 4 54591 2012-03-06 04:55:14 0 110. -66.4 NA NA NA
#> 5 54591 2012-03-06 11:43:57 B 110. -66.4 NA NA NA
#> 6 54591 2012-03-06 18:29:49 B 111. -66.4 4305 478 85
in this situation, foieGras
treats observations with missing error ellipse information as though they are LS-based observations.
sf
object where observations have any of the previous 3 structures and also include CRS
information#> Simple feature collection with 6 features and 6 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 2416.102 ymin: -913.6784 xmax: 2437.96 ymax: -828.1397
#> epsg (SRID): 3031
#> proj4string: +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs
#> # A tibble: 6 x 7
#> id date lc smaj smin eor geometry
#> <int> <dttm> <chr> <dbl> <dbl> <dbl> <POINT [km]>
#> 1 54591 2012-03-05 05:09:33 1 2442 416 42 (2430.943 -912.3109)
#> 2 54591 2012-03-05 13:28:10 A 2758 569 98 (2437.855 -906.0344)
#> 3 54591 2012-03-05 18:46:22 Z 9350 1415 88 (2416.102 -828.1397)
#> 4 54591 2012-03-06 04:55:14 0 49660 391 90 (2437.96 -903.7763)
#> 5 54591 2012-03-06 11:43:57 B 3264 358 79 (2436.429 -908.365)
#> 6 54591 2012-03-06 18:29:49 B 4305 478 85 (2431.585 -913.6784)
model fitting is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see ?foieGras::prefilter
for details), including applying the argsofilter::sdafilter
to identify extreme outlier observations. Additionally, if the input data are not supplied as an sf
object, prefilter
guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the fit_ssm
function:
## load foieGras example data
data(ellie)
## prefilter and fit Random Walk SSM, using a 24 h time step
fit <- fit_ssm(ellie, model = "rw", time.step = 24)
#>
#> prefiltering data...
#>
#> fitting SSM...
these are the minimum arguments required: the input data, the model (“rw” or “crw”) and the time.step (in h) to which locations are predicted. Additional control can be exerted over the prefiltering step, via the vmax
, ang
, distlim
, spdf
and min.dt
arguments. see ?foieGras::fit_ssm
for details, the defaults for these arguments are quite conservative, usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted but these should rarely need to be accessed by users and will not be dealt with here.
Simple summary information about the foieGras
fit can be obtained by calling the fit object:
fit$ssm[[1]]
#> Process model: rw
#> Time interval: 24 hours
#> number of observations: 190
#> number of regularised state estimates: 114
#>
#> parameter estimates
#> -------------------
#> Estimate Std. Error
#> rho_p -0.737 0.036
#> sigma 89.829 4.783
#> sigma 81.653 4.219
#> rho_o 0.000 0.000
#> tau 1.000 0.000
#> tau 1.000 0.000
#> psi 0.008 0.659
#> -------------------
#> negative log-likelihood: 1860.276
#> convergence: relative convergence (4)
and a summary plot
method allows a quick visual of the SSM fit to the data:
The predicted values are the state estimates predicted at regular time intervals, specified by
time.step
(here every 24 h). Fitted values (not shown) are the state estimates corresponding to the time of each observation; their time-series are plotted by default - plot(fit$ssm[[1]])
.
Estimated tracks can be mapped using the foieGras
-applied projection (here EPSG 3395). We use the foieGras::grab()
function to access the SSM-predicted values. The (low-res) land is added using the rnaturalearth
package. The ggspatial
package’s annotation_spatial
and layer_spatial
functions ease plotting of sf
class data.
library(rnaturalearth)
library(ggspatial)
## change units from km to m (attempt to avoid win-builder error)
ploc_sf <- grab(fit, what = "predicted") %>% st_transform(., crs = "+init=epsg:3395 +units=m")
## get coastline data
coast <- ne_countries(scale=110, returnclass = "sf")
ggplot() +
annotation_spatial(data = coast, fill = grey(0.8), lwd = 0) +
layer_spatial(data = ploc_sf, colour = "firebrick", size = 1.25) +
scale_x_continuous(breaks = seq(-180, 180, by = 5)) +
scale_y_continuous(breaks = seq(-85, -30, by = 5)) +
theme_bw()
The tracks can also be transformed to other projections and locations coloured by date
## use Antarctic Polar Stereographic projection approximately centred on the track midpoint
coast <- coast %>% st_transform(., crs = "+init=epsg:3031 +lon_0=85 +units=m")
lab_dates <- with(ploc_sf, seq(min(date), max(date), l = 5)) %>% as.Date()
ggplot() +
annotation_spatial(data = coast, fill = grey(0.8), lwd = 0) +
layer_spatial(data = ploc_sf, aes(colour = as.numeric(as.Date(date))), size = 1.25) +
theme(legend.position = "bottom",
legend.text = element_text(size = 8, vjust = 0),
legend.key.width = unit(1.5, "cm"),
legend.key.height = unit(0.5, "cm"),
legend.title = element_blank()
) + scale_colour_viridis_c(breaks = as.numeric(lab_dates),
option = "viridis",
labels = lab_dates,
end = 0.95)
The estimated locations can be accessed for further analysis, custom mapping, etc… by using the grab
function. They can be returned as a projected sf object or as a simple unprojected tibble. Note, that for all foieGras
outputs the x
, y
, x.se
and y.se
units are in km.
## grab predicted locations from fit object as a projected sf object
plocs_sf <- grab(fit, what = "p")
## grab predicted locations in unprojected form
plocs <- grab(fit, what = "p", as_sf = FALSE)
## unprojected form looks like this
plocs
#> # A tibble: 114 x 8
#> id date lon lat x y x.se y.se
#> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:00:00 111. -66.4 12309. -9956. 1.29 1.39
#> 2 54591 2012-03-06 05:00:00 110. -66.4 12293. -9947. 14.4 0.270
#> 3 54591 2012-03-07 05:00:00 110. -66.5 12299. -9963. 4.13 2.09
#> 4 54591 2012-03-08 05:00:00 110. -66.4 12290. -9957. 14.0 10.0
#> 5 54591 2012-03-09 05:00:00 110. -66.5 12298. -9972. 7.80 2.11
#> 6 54591 2012-03-10 05:00:00 111. -66.4 12301. -9952. 14.5 13.0
#> 7 54591 2012-03-11 05:00:00 111. -66.5 12320. -9970. 14.7 12.8
#> 8 54591 2012-03-12 05:00:00 110. -66.5 12290. -9963. 16.0 7.08
#> 9 54591 2012-03-13 05:00:00 110. -66.4 12299. -9943. 20.4 18.5
#> 10 54591 2012-03-14 05:00:00 110. -66.4 12293. -9949. 9.62 4.00
#> # … with 104 more rows
fit_ssm
can be applied to single tracks as shown, it can also fit to multiple individual tracks in a single input tibble
opr data.frame
. The SSM is fit to each individual separately. The resulting output is a compound tibble
with rows corresponding to each individual foieGras
fit object.
# load royal penguin example data
data(rope)
fit <- fit_ssm(rope, vmax = 20, model = "crw", time.step = 6)
#>
#> prefiltering data...
#>
#> fitting SSM...
# list fit outcomes for all penguins
fit
#> # A tibble: 3 x 3
#> id ssm converged
#> <chr> <list> <lgl>
#> 1 r11 <S3: foieGras> TRUE
#> 2 r18 <S3: foieGras> TRUE
#> 3 r19 <S3: foieGras> TRUE
individual id
is displayed in the 1st column, all fit output (ssm
) in the 2nd column, and convergence
status of each model fit is displayed in the 3rd column
The individual fits can easily be combined and plotted together using the grab
function. Fitted values can be grab
-bed using what = "fitted"
, or just "f"
, and predicted values using "p"
.
plocs <- grab(fit, what = "p")
ggplot(plocs, aes(colour = id)) +
geom_sf() +
scale_colour_viridis_d(option="cividis")