A users guide to improved analysis of marine animal movement data using HMMoce

Camrin Braun, Benjamin Galuardi, Simon Thorrold

2017-10-27

Summary

While the number of marine animals being tagged and tracked continues to grow, current satellite tracking techniques largely constrain meaninful inference to largescale movements of surface-dwelling species and are inherently prone to significant error. Hidden Markov models (HMMs) have become increasingly common in the analysis of animal movement data by incorporating underlying behavioral states into movement data. This discretized approach also provides efficient handling of grid-based oceanographic data and likelihood surfaces generated within the package. We present an open-source R package, HMMoce, that uses a state-space HMM approach to improve position estimates derived from electronic tags using three-dimensional oceanographic data. We demonstrate HMMoce with example blue shark (Prionace glauca) data that is included in the package. Our findings illustrate how our software leverages all available tag data, along with oceanographic information, to improve position estimates of tagged marine species. For more details on these methods, including thorough references to the literature HMMoce is based on, please refer to C. D. Braun, Galuardi, and Thorrold (2017).

Introduction

There are many approaches to estimating animal movements from various types of tag data. The paradigm in fish tracking has been to use light levels to estimate position, but many species spend considerable time away from the photic zone. Diving behavior, like a typical diel vertical migration exhibited by deep diving swordfish, can render light geolocation useless. Yet, deep diving provides depth-temperature profile data recorded by the archival tag as it samples throughout a tagged individual’s vertical movements. This sampling provides a unique signature through the oceanographic environment that can be leveraged to help constrain position. When combined with other tag-measured data streams like sea surface temperature (SST), light levels and maximum diving depth, we expect a unique combination of oceanographic characteristics to be diagnostic of an animal’s location. Thus, HMMoce seeks to provide the framework for improving estimates of animal movements based on these oceanographic characteristics and strives to automate much of the data formatting and calculations in a transparent and flexible way.

Getting prepared

The basic premise for getting yourself ready to implement HMMoce for your data is simple. You need to get deployment information and tag data together, setup some spatial limits and download the gridded environmental data.

Installation

Get started by installing the latest stable release of HMMoce from CRAN or get the most recent development version from GitHub:

# CRAN download
install.packages('HMMoce')

# development version is on GitHub
devtools::install_git('https://github.com/camrinbraun/HMMoce', depends = T)

# then load the package
library(HMMoce)

Reading and Formatting Tag Data

Once you have the package installed and loaded, its time to get your data loaded and formatted properly. The first thing this requires is to establish a data frame containing start and end dates and locations for your deployment(s). If you plan to run HMMoce for multiple individuals, you probably already have a simple spreadsheet with this metadata so read that in here and format according to the examples, but for now we do it by individual for simplicity.

# PTT or Unique Individual ID
ptt <- 141259

# TAG/POPUP DATES AND LOCATIONS (dd, mm, YYYY, lat, lon)
iniloc <- data.frame(matrix(c(13, 10, 2015, 41.575, -69.423, 
                              24, 2, 2016, 26.6798, -69.0147), nrow = 2, ncol = 5, byrow = T))
colnames(iniloc) = list('day','month','year','lat','lon')
tag <- as.POSIXct(paste(iniloc[1,1], '/', iniloc[1,2], '/', iniloc[1,3], sep=''), format = '%d/%m/%Y')
pop <- as.POSIXct(paste(iniloc[2,1], '/', iniloc[2,2], '/', iniloc[2,3], sep=''), format = '%d/%m/%Y')

# VECTOR OF DATES FROM DATA. THIS WILL BE THE TIME STEPS, T, IN THE LIKELIHOODS
dateVec <- as.Date(seq(tag, pop, by = 'day')) 

Next up is reading in the tag data. HMMoce includes a read function (read.wc) that reads and formats Wildlife Computers tag data automatically, and we plan to add functionality for other manufacturers based on user input. For now, we assume the tag data source .csv files (e.g. “141259-PDTs.csv”) have been downloaded from the Wildlife Computers data portal for standardization. For more on the portal visit Wildlife Computers. Set the directory where your data lives and load the necessary files. Here we choose to load all the available data for this tag to demonstrate how all the different types are leveraged later on.

#------------
# LOAD THE TAG DATA
#------------
# setwd()

# SET INITIAL LOCATIONS (TAG AND POP-UP)
iniloc <- data.frame(matrix(c(13, 10, 2015, 41.3, -69.27, 
                              10, 4, 2016, 40.251, -36.061), nrow = 2, ncol = 5, byrow = T))
names(iniloc) <- list('day','month','year','lat','lon')
tag <- as.POSIXct(paste(iniloc[1,1], '/', iniloc[1,2], '/', iniloc[1,3], sep=''), format = '%d/%m/%Y', tz='UTC')
pop <- as.POSIXct(paste(iniloc[2,1], '/', iniloc[2,2], '/', iniloc[2,3], sep=''), format = '%d/%m/%Y', tz='UTC')

# VECTOR OF DATES FROM DATA. THIS WILL BE THE TIME STEPS, T, IN THE LIKELIHOODS
dateVec <- as.Date(seq(tag, pop, by = 'day')) 

# READ IN DATA AS OUTPUT FROM WC PORTAL
# SST DATA
sstFile <- system.file("extdata", "141259-SST.csv", package = "HMMoce")
tag.sst <- read.wc(ptt, sstFile, type = 'sst', tag=tag, pop=pop, verbose=T) 
sst.udates <- tag.sst$udates; tag.sst <- tag.sst$data

# DEPTH-TEMPERATURE PROFILE DATA
pdtFile <- system.file("extdata", "141259-PDTs.csv", package = "HMMoce")
pdt <- read.wc(ptt, pdtFile, type = 'pdt', tag=tag, pop=pop, verbose=T) 
pdt.udates <- pdt$udates; pdt <- pdt$data

# RAW LIGHT DATA
#lightFile <- system.file("extdata", "141259-LightLoc.csv", package = "HMMoce")
#light <- read.wc(ptt, lightFile, type = 'light', tag=tag, pop=pop); 
#light.udates <- light$udates; light <- light$data

# LIGHT BASED POSITIONS FROM GPE2 (INSTEAD OF RAW LIGHTLOCS FROM PREVIOUS)
locsFile <- system.file("extdata", "141259-Locations-GPE2.csv", package = "HMMoce")
locs <- read.table(locsFile, sep = ',', header = T, blank.lines.skip = F)
locDates <- as.Date(as.POSIXct(locs$Date, format=findDateFormat(locs$Date)))

The read.wc function reads the type of data requested and automatically formats it for use in the likelihood calculations. See ?read.wc for more information and a list of available data input types.

Setting spatial bounds

The next preparation step involves setting spatial boundaries to work within. This can either be set manually (as a list) or by passing a -Locations.csv file to setup.locs.grid. This step is critical in the model setup because it must incorporate the complete geographic limits of your animal(s) movements! If it doesn’t, the movement likelihoods will likely run into the edges of your spatial limits (specified here), and you will have to start over. Thus, this step is typically accomplished best by a combination of expert opinion (thats you!) and by looking at the spatial bounds of tagging and pop-up locations and longitude estimates from GPE2. If you aren’t using GPE2, do some preliminary plotting of tag data like SST to try to constrain where you think the animal went. There’s no harm in having to come back to this when your model runs into the boundaries except that you’ll have to download all the environmental data again which, you will soon find out, takes time. The trade-off here is that larger model boundaries yield larger grids which means longer computation time. Finally, be smart about setting spatial limits when you plan to use HMMoce for a group of tag datasets. Find the largest common grid and use that for all analyses. That means you only have to download the oceanographic data once!

# SET SPATIAL LIMITS, IF DESIRED, OR PASS GPE FILE
# these are the lat/lon bounds of your study area (e.g. where you think the animal went)
sp.lim <- list(lonmin = -82, lonmax = -25, latmin = 15, latmax = 50)

if (exists('sp.lim')){
  locs.grid <- setup.locs.grid(sp.lim)
} else{
  locs.grid <- setup.locs.grid(gpe2)
  sp.lim <- list(lonmin = min(locs.grid$lon[1,]), lonmax = max(locs.grid$lon[1,]),
                 latmin = min(locs.grid$lat[,1]), latmax = max(locs.grid$lat[,1]))
}

Getting environmental data

With our spatial limits set and our initial tag data read in, we need to have the environmental data to compare our tag measurements to. To do this, HMMoce has a get.env function for which you specify your dates of interest, spatial limits, and type of data you need. The get.env function then downloads the requested data. The SST default data is the Optimally Interpolated (OI) 1/4\(^\circ\) product and includes an option to use GHRSST data, but additional datasets can easily be added based on user input. Depth-temperature profiles are compared to HYCOM predictions from the 1/12\(^\circ\) global analysis or the WOA climatology. The WOA data used here is from the link above but is hosted in a more accessible form for our purposes on Dropbox. See get.env(type = 'woa') for more info. Bathymetry data comes from the Scripps SRTM30 product. The HYCOM and GHRSST datasets for the duration of a tag deployment can be large so be patient. If you plan to analyze multiple tag datasets over a similar time period and spatial domain, consider that before downloading the environmental datasets to save yourself from having to change spatial/temporal bounds later and download everything again! For a group of tags, combine the unique dates of SST measurements across all tags and find a common spatial grid before downloading the SST data. The last thing you should do is download new oceanographic data for each tagged individual if there’s spatial and temporal overlap!

#------------
# GET ENVIRONMENTAL DATA
#------------ 

# DOWNLOAD SST DATA
sst.dir <- paste(tempdir(), '/sst/', sep='')
dir.create(sst.dir, recursive = TRUE)
get.env(sst.udates, filename='oisst', type = 'sst', sst.type='oi', spatLim = sp.lim, save.dir = sst.dir)

# YOU NEED SOME REPRESENTATION OF ENVIRONMENTAL DEPTH-TEMPERATURE
# HYCOM DATA
hycom.dir <- paste(tempdir(), '/hycom/', sep='')
dir.create(hycom.dir, recursive = TRUE)
get.env(pdt.udates, filename='hycom', type = 'hycom', spatLim = sp.lim, save.dir = hycom.dir)

# OR WORLD OCEAN ATLAS DATA
#woa.dir <- paste(tempdir(), '/woa/', sep='')
#dir.create(woa.dir, recursive = TRUE)
#get.env(type = 'woa', resol = 'quarter', save.dir = woa.dir)
# THEN LOAD AND CHECK THE DOWNLOADED RDA FILE FOR WOA
#load(paste(woa.dir,'woa.quarter.rda',sep=''))
#str(woa.quarter)
#List of 4
#$ watertemp: num [1:44, 1:46, 1:57, 1:12] 26.5 26.5 26.4 26.3 26.2 ...
#$ lon      : num [1:44(1d)] -95.5 -94.5 -93.5 -92.5 -91.5 -90.5 -89.5 -88.5 -87.5 -86.5 ...
#$ lat      : num [1:46(1d)] 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 ...
#$ depth    : num [1:57(1d)] 0 5 10 15 20 25 30 35 40 45 ...

# BATHYMETRY
bathy.dir <- paste(tempdir(), '/bathy/', sep='')
dir.create(bathy.dir, recursive = TRUE)
bathy <- get.bath.data(sp.lim$lonmin, sp.lim$lonmax, sp.lim$latmin, sp.lim$latmax, folder = bathy.dir)
#library(raster); plot(bathy)
# OR READ IT FROM NETCDF
#bathy.nc <- RNetCDF::open.nc(paste(bathy.dir, 'bathy.nc', sep=''))

The observation model

Once the environmental data is downloaded to its respective directory, you’re ready for likelihood calculations. There are 3 main data streams currently collected by archival tags on marine animals: light, SST, depth-temperature profiles. Each of these data streams contains information about the location of the animal in the global ocean; thus, each can be leveraged to inform our estimation of animal movements. Light levels and SST are the current paradigm for fish tracking and are the most straightforward to use for positioning. The depth-temperature profiles are more complex but provide rich information about oceanographic characteristics animals experience as they move. Each data type has its own likelihood function(s) that does the grunt work for you.

Light likelihood

In HMMoce, there are currently two methods for light-based likelihood calculations. The simplest is using tag-based light levels to estimate sunrise and sunset times and thus position. This approach is performed by calc.srss but is an overly simplistic treatment of this data so we often 1) throw out latitude estimates and only keep longitude and 2) get a lot of bad location information, particularly from species that don’t frequent the photic zone. For example, a surface-oriented species that dives below the photic zone 30 minutes before sunset would generate an artificial sunset time 30 minutes early, causing ~8° longitudinal difference in position estimate! A somewhat improved approach is to use the more complex light-based geolocation algorithm previously employed by the tag manufacturer, Wildlife Computers, called GPE2. This uses a threshold approach developed by Hill and Braun (2001) which results in 1) less spurious position estimates and 2) user-controlled vetting process of estimates via a GUI. This functionality requires that the user process their light data using the GPE2 environment which is no longer supported by the manufacturer but is still available here. GPE2 output data is used in HMMoce with calc.gpe2. In either case, the resulting L.light raster contains daily (when available) likelihood surfaces representing the likelihood of an animal’s location based on light data.

# LIGHT-BASED LIKELIHOODS
# RAW LIGHT LEVELS
#L.1 <- calc.srss(light, locs.grid = locs.grid, dateVec = dateVec, res=0.25) # if trying to use raw light levels, not currently recommended (v0.2)

# GPE2 METHOD
L.1 <- calc.gpe2(locs, locDates, locs.grid = locs.grid, dateVec = dateVec, errEll = FALSE, gpeOnly = TRUE)

Sea surface temperature likelihood

For the SST likelihood approach in HMMoce, tag-based SST data is represented as a daily range in SST \(\pm\) error (currently defaults to 1%) and compare that SST envelope to a remotely-sensed SST product. We currently use the Optimum Interpolation product due to its comprehensive coverage and 1/4\(^\circ\) resolution; however, any SST product could be used here with only minor changes to the download function, get.env. GHRSST was recently added based on the need for higher resolution (0.01°) from some users but is currently automatically downsampled to 0.1° to ease computation requirements. In addition, parallel computing was recently added to leverage core availability when running HMMoce on powerful servers or cloud solutions. The output from calc.sst (or the parallelized version calc.sst.par) is a raster of daily likelihood surfaces for the animal’s movements based on SST measurements. For complete details on this calculation, including the density function and integration equation, see the supplemental methods in C. D. Braun, Galuardi, and Thorrold (2017).

# GENERATE DAILY SST LIKELIHOODS
L.2 <- calc.sst.par(tag.sst, filename='oisst', sst.dir = sst.dir, dateVec = dateVec, sens.err = 1)
# calc.sst() is non-parallel version of the same thing

Depth-temperature profile likelihood

The depth-temperature profile (PDT) data from the tag is the main contribution of HMMoce to the marine animal tracking community. This functionality allows users to use depth-temperature profiles measured by tagged animals to improve position estimates, which is particularly useful for study species that rarely visit the photic zone during the day (e.g. swordfish, Neilson et al. (2009)) or spend considerable periods of time in the mesopelagic (e.g. basking sharks, Skomal et al. (2009)). In HMMoce, there are currently two approaches to using the PDT data for geolocation. The first follows Luo et al. (2015) by integrating profile data to calculate Ocean Heat Content (OHC). We integrate tag-based PDT data from a certain isotherm to the surface to calculate the “heat content” of that layer measured by the tagged animal. Similarly, we perform the same integration on the model ocean as represented in the HYbrid Coordinate Ocean Model (HYCOM) and compare the two integrated metrics to generate a likelihood surface representing the animal’s estimated daily position. The second approach is to use the profile in 3D space and compare it to oceanography at measured depth levels. This uses the same tag-based PDT data (not integrated) and compares it to modeled HYCOM data or climatological mean data contained in the World Ocean Atlas (WOA). In either case, we use a linear regression to predict the tag-based temperature at the standard depth levels measured in the oceanographic datasets. Then a likelihood is calculated in the same fashion by comparing temperature from the tag to ocean temperature at each depth level and resulting likelihood layers are multiplied across depth levels to result in a single daily likelihood layer based on the tagged animal’s dive data (L.prof). Parallelized calculation functions are available for all three depth-temperature based likelihood functions (calc.ohc.par, calc.woa.par, calc.hycom.par). For complete details on the various depth-temperature profile methods available in HMMoce, please refer to C. D. Braun, Galuardi, and Thorrold (2017).

# GENERATE DAILY OHC LIKELIHOODS
L.3 <- calc.ohc.par(pdt, filename='hycom', ohc.dir = hycom.dir, dateVec = dateVec, isotherm = '', use.se = F)

# LIKELIHOODS BASED ON WOA PROFILES (IN SITU CLIMATOLOGICAL MEAN)
L.4 <- calc.woa.par(pdt, ptt=ptt, woa.data = woa.quarter, sp.lim=sp.lim, focalDim = 9, dateVec = dateVec, use.se = T)

# AND HYCOM PROFILES (MODEL OCEAN)
L.5 <- calc.hycom.par(pdt, filename='hycom', hycom.dir, focalDim = 9, dateVec = dateVec, use.se = T)

Resampling and combining likelihoods

After the desired likelihood calculations are complete, the likelihood rasters are resampled to ensure comparable extent and resolution among layers using resample.grid. This function also returns a variable called L.mle which is a more coarse (lower resolution) representation of the overall likelihoods to speed up parameter estimation in the next step. After resampling, we use make.L to combine the different likelihoods and construct a single, overall likelihood. Known locations can also be incorporated here if there are any sightings, acoustic data or other sources of additional position information for this individual. The resulting L array from make.L is carried forward into the convolution of our observations with theoretical animal movements in the next steps.

#----------------------------------------------------------------------------------#
# LIST AND RESAMPLE
#----------------------------------------------------------------------------------#

# create a list of the likelihood rasters just created
L.rasters <- mget(ls(pattern = 'L\\.')) # use with caution as all workspace items containing 'L.' will be listed. We only want the likelihood outputs calculated above

# resample them all to match the most coarse layer (typically light at 1/4 deg)
# this can be changed to use whatever resolution you choose
resamp.idx <- which.max(lapply(L.rasters, FUN=function(x) raster::res(x)[1]))
L.res <- resample.grid(L.rasters, L.rasters[[resamp.idx]])
  
# Figure out appropriate L combinations
# use this if you have a vector (likVec) indicating which likelihoods you are calculating
# for example, likVec <- c(1,2,5) for light, sst, and hycom likelihoods
if (length(likVec) > 2){
  L.idx <- c(utils::combn(likVec, 2, simplify=F), utils::combn(likVec, 3, simplify=F))
} else{
  L.idx <- utils::combn(likVec, 2, simplify=F)
}

# which of L.idx combinations do you want to run?
run.idx <- c(1,2,4)

# vector of appropriate bounding in filter. see ?hmm.filter for more info
bndVec <- c(NA, 5, 10)

# vector of appropriate migr kernel speed. see ?makePar for more info.
parVec <- c(2, 4)

#----------------------------------------------------------------------------------#
# COMBINE LIKELIHOODS
#----------------------------------------------------------------------------------#
L <- make.L(L1 = L.res[[1]][L.idx[[tt]]],
            L.mle.res = L.res$L.mle.res, dateVec = dateVec,
            locs.grid = locs.grid, iniloc = iniloc, bathy = bathy, pdt = pdt)
      
L.mle <- L$L.mle
L <- L$L
g <- L.res$g
g.mle <- L.res$g.mle
lon <- g$lon[1,]
lat <- g$lat[,1]

Theoretical movements and convolution

Parameter estimation

Now that the observation data has been used to generate daily likelihoods, we need a way to represent the most likely way the animal moved through likelihood space. To do this, we generate a theoretical movement model assuming diffusive (Brownian) motion. That is, the animal is represented virtually as a particle that is allowed to diffuse (no advection) based on diffusion characteristics of 2 different behavior states, for example, migratory and resident behaviors. Each behavior state is represented by different diffusion metrics which we expect a migratory animal to diffuse much more widely than a resident animal. Currently, diffusion speeds are fixed based on expert knowledge of the tagged animal. Migratory speed is required and is set using migr.spd in makePar. The accompanying resid.frac argument determines the percentage of the migratory speed the user wants to set the resident speed at. If not set, it defaults to 10%.

The other parameters governing the theoretical movements is the probability of switching between the two behavior states. This is represented as a 2x2 matrix where [1,1] indicated the probability of an animal staying in state1 given it is currently in state 1 and [1,2] is the probability of the animal moving to state 2 given it is in state 1. The same is true (in reverse order) for the second row. These parameters are calculated using an Expectation-Maximization algorithm similar to Woillez et al. (2016). In HMMoce, this is done for you in the expmax function and is best performed on the more coarse outputs from make.L (g.mle and L.mle). The coarse grids tend to provide estimates that are similar to those calculated from the finer grids in much less time or can be used to make better guesses (p.guess) for a second run on the full-resolution grid. The parameter estimation works best in 2 steps: 1) make a guess (or use the default) and iterate on a coarse grid until convergence is reached; 2) store coarse grid switch estimates and switch to full-resolution grid. Run makePar again without estimating switch probabilities (calcP=FALSE) to get movement kernels based on the finer grid.

# GET SWITCH PROB BASED ON COARSE GRID (MUCH FASTER)
par0 <- makePar(migr.spd=i, grid=g.mle, L.arr=L.mle, p.guess=c(.9,.9), calcP=T)
P.final <- par0$P.final
              
# GET MOVEMENT KERNELS FROM FULL-RES GRID, IGNORE SWITCH PROB
par0 <- makePar(migr.spd=i, grid=g, L.arr=L, p.guess=c(.9,.9), calcP=F)
K1 <- par0$K1; K2 <- par0$K2

Convolution and the HMM

Once all the parameters are in order, the theoretical movement model and the observations are convolved in a HMM filter that provides the probability distribution of the states (location and behavior) forward in time conditional on data. These state estimates are calculated successively by alternating between so-called time and data updates of the current state. Following filtering, the recursions of the HMM smoothing step work backwards in time using the filtered state estimates and all available data to determine the smoothed state estimates. The smoothed state estimates are more accurate and generally appear ‘smoother’ than the filtering estimates because they exploit the full data set. The probability distribution of all states at specific times are the state estimates returned from the HMM smoothing algorithm. For more information on the HMM, see M. W. Pedersen et al. (2008), M. Pedersen et al. (2011) and the supplemental materials for C. D. Braun, Galuardi, and Thorrold (2017).

# RUN THE FILTER STEP
if(!is.na(bnd)){
  f <- hmm.filter(g, L, K1, K2, maskL=T, P.final, minBounds = bnd)
  maskL.logical <- TRUE
} else{
  f <- hmm.filter(g, L, K1, K2, P.final, maskL=F)
  maskL.logical <- FALSE
  }
nllf <- -sum(log(f$psi[f$psi>0])) # negative log-likelihood
      
# RUN THE SMOOTHING STEP
s <- hmm.smoother(f, K1, K2, L, P.final)

The Results

Finally, its time to look at some results. The state estimates returned from the filter need some summarizing to yield a “most probable track”. There are several ways to do this (see C. D. Braun, Galuardi, and Thorrold (2017)), but HMMoce currently relies on calculating the mean from each posterior distribution to estimate a track using calc.track.

# GET THE MOST PROBABLE TRACK
tr <- calc.track(s, g, dateVec, iniloc)

A simple plot of track results and behavior estimates can be generated using plotHMM and to visualize the residency distributions (see M. Pedersen et al. (2011)) use plotRD.

# A SIMPLE MOVEMENT/BEHAVIOR PLOT
plotHMM(s, tr, dateVec, ptt=runName, save.plot = T)

# PLOT RESIDENCY DISTRIBUTION
plotRD(s, tr, xlims, ylims, save.plot=F)

An example script

Here we put all of the above logic into a full working example script using the example blue shark data included with the package. Please file any issues or suggestions for improvement of HMMoce on the GitHub site.

#========================
## HMMoce run w/example data
#========================
# might be a good idea to install latest version of HMMoce
# install.packages('HMMoce')
library(HMMoce)

#------------
# LOAD THE TAG DATA
#------------
# setwd()

# SET INITIAL LOCATIONS (TAG AND POP-UP)
iniloc <- data.frame(matrix(c(13, 10, 2015, 41.3, -69.27, 
                              10, 4, 2016, 40.251, -36.061), nrow = 2, ncol = 5, byrow = T))
names(iniloc) <- list('day','month','year','lat','lon')
tag <- as.POSIXct(paste(iniloc[1,1], '/', iniloc[1,2], '/', iniloc[1,3], sep=''), format = '%d/%m/%Y', tz='UTC')
pop <- as.POSIXct(paste(iniloc[2,1], '/', iniloc[2,2], '/', iniloc[2,3], sep=''), format = '%d/%m/%Y', tz='UTC')

# VECTOR OF DATES FROM DATA. THIS WILL BE THE TIME STEPS, T, IN THE LIKELIHOODS
dateVec <- as.Date(seq(tag, pop, by = 'day')) 

# READ IN DATA AS OUTPUT FROM WC PORTAL
# SST DATA
sstFile <- system.file("extdata", "141259-SST.csv", package = "HMMoce")
tag.sst <- read.wc(ptt, sstFile, type = 'sst', tag=tag, pop=pop, verbose=T) 
sst.udates <- tag.sst$udates; tag.sst <- tag.sst$data

# DEPTH-TEMPERATURE PROFILE DATA
pdtFile <- system.file("extdata", "141259-PDTs.csv", package = "HMMoce")
pdt <- read.wc(ptt, pdtFile, type = 'pdt', tag=tag, pop=pop, verbose=T) 
pdt.udates <- pdt$udates; pdt <- pdt$data

# RAW LIGHT DATA
#lightFile <- system.file("extdata", "141259-LightLoc.csv", package = "HMMoce")
#light <- read.wc(ptt, lightFile, type = 'light', tag=tag, pop=pop); 
#light.udates <- light$udates; light <- light$data

# LIGHT BASED POSITIONS FROM GPE2 (INSTEAD OF RAW LIGHTLOCS FROM PREVIOUS)
locsFile <- system.file("extdata", "141259-Locations-GPE2.csv", package = "HMMoce")
locs <- read.table(locsFile, sep = ',', header = T, blank.lines.skip = F)
locDates <- as.Date(as.POSIXct(locs$Date, format=findDateFormat(locs$Date)))

# SET SPATIAL LIMITS
# these are the lat/lon bounds of your study area (e.g. where you think the animal went)
sp.lim <- list(lonmin = -82,
               lonmax = -25,
               latmin = 15,
               latmax = 50)

#------------ 
##  GET ENVIRONMENTAL DATA 
#------------ 
# env data downloads can be
#large, depending on application for 180 days of data spanning the NW Atlantic
#(the example application), the downloads will take ~10mins on Amazon EC2.
#Personal computers will likely be slower.

# DOWNLOAD SST DATA
sst.dir <- paste(tempdir(), '/sst/', sep='')
dir.create(sst.dir, recursive = TRUE)
get.env(sst.udates, filename='oisst', type = 'sst', sst.type='oi', spatLim = sp.lim, save.dir = sst.dir)

# YOU NEED SOME REPRESENTATION OF ENVIRONMENTAL DEPTH-TEMPERATURE
# HYCOM DATA
hycom.dir <- paste(tempdir(), '/hycom/', sep='')
dir.create(hycom.dir, recursive = TRUE)
get.env(pdt.udates, filename='hycom', type = 'hycom', spatLim = sp.lim, save.dir = hycom.dir)

# OR WORLD OCEAN ATLAS DATA
#woa.dir <- paste(tempdir(), '/woa/', sep='')
#dir.create(woa.dir, recursive = TRUE)
#get.env(type = 'woa', resol = 'quarter', save.dir = woa.dir)
# THEN LOAD AND CHECK THE DOWNLOADED RDA FILE FOR WOA
#load(paste(woa.dir,'woa.quarter.rda',sep=''))
#str(woa.quarter)
#List of 4
#$ watertemp: num [1:44, 1:46, 1:57, 1:12] 26.5 26.5 26.4 26.3 26.2 ...
#$ lon      : num [1:44(1d)] -95.5 -94.5 -93.5 -92.5 -91.5 -90.5 -89.5 -88.5 -87.5 -86.5 ...
#$ lat      : num [1:46(1d)] 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 ...
#$ depth    : num [1:57(1d)] 0 5 10 15 20 25 30 35 40 45 ...

# BATHYMETRY
bathy.dir <- paste(tempdir(), '/bathy/', sep='')
dir.create(bathy.dir, recursive = TRUE)
bathy <- get.bath.data(sp.lim$lonmin, sp.lim$lonmax, sp.lim$latmin, sp.lim$latmax, folder = bathy.dir)
#library(raster); plot(bathy)
# OR READ IT FROM NETCDF
#bathy.nc <- RNetCDF::open.nc(paste(bathy.dir, 'bathy.nc', sep=''))

#------------
# CALCULATE LIKELIHOODS
#------------
# .par functions are the same calculations as those lacking .par, except they have been parallelized to leverage multiple CPUs
locs.grid <- setup.locs.grid(sp.lim)

# vector indicating which likelihoods to run (e.g. 1=light, 2=sst, 5=hycom)
# can be combined with if() statements around calc functions: if (any(likVec == 5) & !exists('L.5')){calc.hycom(...)}
likVec <- c(1,2,5) 

# LIGHT-BASED LIKELIHOODS
#L.1 <- calc.srss(light, locs.grid = locs.grid, dateVec = dateVec, res=0.25) # if trying to use raw light levels, not currently recommended (v0.2)
L.1 <- calc.gpe2(locs, locDates, locs.grid = locs.grid, dateVec = dateVec, errEll = FALSE, gpeOnly = TRUE)
#library(fields);library(raster)
#plot(L.1[[12]]); world(add=T)

# SST LIKELIHOODS
#L.2 <- calc.sst(tag.sst, filename='oisst', sst.dir = sst.dir, dateVec = dateVec, sens.err = 1)
L.2 <- calc.sst.par(tag.sst, filename='oisst', sst.dir = sst.dir, dateVec = dateVec, sens.err = 1)
# save.image() # good idea to save after these larger calculations in case the next one causes problems
 gc(); closeAllConnections() # also good to do garbage collection and kill any straggling processes that are running

# PDT LIKELIHOODS
# OCEAN HEAT CONTENT (INTEGRATED PDTS)
L.3 <- calc.ohc.par(pdt, filename='hycom', ohc.dir = hycom.dir, dateVec = dateVec, isotherm = '', use.se = F)
# save.image() # good idea to save after these larger calculations in case the next one causes problems
 gc(); closeAllConnections() # also good to do garbage collection and kill any straggling processes that are running

# WORLD OCEAN ATLAS-BASED LIKELIHOODS
L.4 <- calc.woa.par(pdt, ptt=ptt, woa.data = woa.quarter, sp.lim=sp.lim, focalDim = 9, dateVec = dateVec, use.se = T)
# save.image() # good idea to save after these larger calculations in case the next one causes problems
 gc(); closeAllConnections() # also good to do garbage collection and kill any straggling processes that are running

# HYCOM PROFILE BASED LIKELIHOODS
L.5 <- calc.hycom.par(pdt, filename='hycom', hycom.dir, focalDim = 9, dateVec = dateVec, use.se = T)
# save.image() # good idea to save after these larger calculations in case the next one causes problems
 gc(); closeAllConnections() # also good to do garbage collection and kill any straggling processes that are running
#save.image('~/ebs/example.rda')

#------------
# PREPARE TO RUN THE MODEL
#------------
L.rasters <- mget(ls(pattern = 'L\\.')) # use with caution as all workspace items containing 'L.' will be listed. We only want the likelihood outputs calculated above
resamp.idx <- which.max(lapply(L.rasters, FUN=function(x) raster::res(x)[1]))
L.res <- resample.grid(L.rasters, L.rasters[[resamp.idx]])

# Figure out appropriate L combinations
# use this if you have a vector (likVec) indicating which likelihoods you are calculating
# for example, likVec <- c(1,2,5) for light, sst, and hycom likelihoods
if (length(likVec) > 2){
  L.idx <- c(utils::combn(likVec, 2, simplify=F), utils::combn(likVec, 3, simplify=F))
} else{
  L.idx <- utils::combn(likVec, 2, simplify=F)
}

# which of L.idx combinations do you want to run?
run.idx <- c(1,2,4)

# vector of appropriate bounding in filter. see ?hmm.filter for more info
bndVec <- c(NA, 5, 10)

# vector of appropriate migr kernel speed. see ?makePar for more info.
parVec <- c(2, 4)

# GOOD IDEA TO CLEAN THINGS UP AND SAVE
#rm(list=c('L.1','L.2','L.3','L.4','L.5', 'woa.quarter'))
# setwd(); base::save.image('.rda')

#------------
# RUN THE MODEL
#------------
# CAN BE PARALLELIZED...
#require(foreach)
#print('Processing in parallel... ')
#ncores <- ceiling(parallel::detectCores() * .25)
#cl = parallel::makeCluster(ncores)
#doParallel::registerDoParallel(cl, cores = ncores)
#ans = foreach::foreach(tt = run.idx) %dopar%{

for (tt in run.idx){
  for (bnd in bndVec){
    for (i in parVec){
      
      ptt=141259
      runName <- paste(ptt,'_idx',tt,'_bnd',bnd,'_par',i,sep='')
      
      # COMBINE LIKELIHOOD MATRICES
      # L.idx combination indicates likelihood surfaces to consider
      L <- make.L(L1 = L.res[[1]][L.idx[[tt]]],
                  L.mle.res = L.res$L.mle.res, dateVec = dateVec,
                  locs.grid = locs.grid, iniloc = iniloc, bathy = bathy,
                  pdt = pdt)
      L.mle <- L$L.mle
      L <- L$L
      g <- L.res$g
      g.mle <- L.res$g.mle
      lon <- g$lon[1,]
      lat <- g$lat[,1]
      
      # GET MOVEMENT KERNELS AND SWITCH PROB FOR COARSE GRID
      par0 <- makePar(migr.spd=i, grid=g.mle, L.arr=L.mle, p.guess=c(.9,.9), calcP=T)
      P.final <- par0$P.final
      
      # GET MOVEMENT KERNELS AND SWITCH PROB FOR FINER GRID
      par0 <- makePar(migr.spd=i, grid=g, L.arr=L, p.guess=c(.9,.9), calcP=F)
      K1 <- par0$K1; K2 <- par0$K2
      
      # RUN THE FILTER STEP
      if(!is.na(bnd)){
        f <- hmm.filter(g, L, K1, K2, maskL=T, P.final, minBounds = bnd)
        maskL.logical <- TRUE
      } else{
        f <- hmm.filter(g, L, K1, K2, P.final, maskL=F)
        maskL.logical <- FALSE
      }
      nllf <- -sum(log(f$psi[f$psi>0])) # negative log-likelihood
      
      # RUN THE SMOOTHING STEP
      s <- hmm.smoother(f, K1, K2, L, P.final)
      
      # GET THE MOST PROBABLE TRACK
      tr <- calc.track(s, g, dateVec, iniloc)
      #setwd(myDir); 
      plotHMM(s, tr, dateVec, ptt=runName, save.plot = T)
      
      # WRITE OUT RESULTS
      outVec <- matrix(c(ptt=ptt, minBounds = bnd, migr.spd = i,
                         Lidx = paste(L.idx[[tt]],collapse=''), P1 = P.final[1,1], P2 = P.final[2,2],
                         spLims = sp.lim[1:4], resol = raster::res(L.rasters[[resamp.idx]]),
                         maskL = maskL.logical, NLL = nllf, name = runName), ncol=15)
      #write.table(outVec,paste(dataDir, 'outVec_results.csv', sep=''), sep=',', col.names=F, append=T)
      #names(outVec) <- c('ptt','bnd','migr.spd','Lidx','P1','P2','spLims','resol','maskL','nll','name')
      res <- list(outVec = outVec, s = s, g = g, tr = tr, dateVec = dateVec, iniloc = iniloc, grid = raster::res(L.res[[1]]$L.5)[1])
      #setwd()
      save(res, file=paste(runName, '-HMMoce_res.rda', sep=''))
      #save.image(file=paste(ptt, '-HMMoce.RData', sep=''))
      #source('~/HMMoce/R/hmm.diagnose.r') # not yet functional
      #hmm.diagnose(res, L.idx, L.res, dateVec, locs.grid, iniloc, bathy, pdt, plot=T)
      
      write.table(outVec, file='HMMoce_results_outVec.csv', sep=',', append=T)
      
    } # parVec loop
  } # bndVec loop
} # L.idx loop


#parallel::stopCluster(cl)
#closeAllConnections()

References

Braun, C D, B Galuardi, and S R Thorrold. 2017. “An improved hidden Markov method for geolocating archival-tagged fishes.” Methods in Ecology and Evolution x (x).

Hill, Roger D, and Melinda J Braun. 2001. “Geolocation by light level.” In Electronic Tagging and Tracking in Marine Fisheries: Proceedings of the Symposium on Tagging and Tracking Marine Fish with Electronic Devices, 315–30. University of Hawaii: Springer.

Luo, Jiangang, Jerald S. Ault, Lynn K. Shay, John P. Hoolihan, Eric D. Prince, Craig a. Brown, and Jay R. Rooker. 2015. “Ocean Heat Content Reveals Secrets of Fish Migrations.” Plos One 10 (10): e0141101. doi:10.1371/journal.pone.0141101.

Neilson, John D, Sean Smith, François Royer, Stacey D Paul, Julie M Porter, and Molly Lutcavage. 2009. “Investigations of horizontal movements of Atlantic swordfish using pop-up satellite archival tags.” In Tagging and Tracking of Marine Animals with Electronic Devices, 145–59. Springer.

Pedersen, M W, D Righton, U H Thygesen, K H Andersen, and H Madsen. 2008. “Geolocation of North Sea cod (Gadus morhua) using hidden Markov models and behavioural switching.” Canadian Journal of Fisheries and Aquatic Sciences 65 (11): 2367–77. doi:10.1139/f08-144.

Pedersen, M.W., C.W. Berg, U.H. Thygesen, a. Nielsen, and H. Madsen. 2011. “Estimation methods for nonlinear state-space models in ecology.” Ecological Modelling 222 (8). Elsevier B.V.: 1394–1400. doi:10.1016/j.ecolmodel.2011.01.007.

Skomal, G B, S I Zeeman, J H Chisholm, E L Summers, H J Walsh, K W McMahon, and S R Thorrold. 2009. “Transequatorial migrations by basking sharks in the western Atlantic Ocean.” Current Biology 19 (12): 1019–22. doi:10.1016/j.cub.2009.04.019.

Woillez, Mathieu, Ronan Fablet, Tran Thanh Ngo, Maxime Lalire, Pascal Lazure, and Hélène de Pontual. 2016. “A HMM-based model to geolocate pelagic fish from high-resolution individual temperature and depth histories: European sea bass as a case study.” Ecological Modelling 321. Elsevier B.V.: 10–22. doi:10.1016/j.ecolmodel.2015.10.024.