Simplified and extensible time series coercion tools
The time series landscape in R is vast, deep, and complex causing many inconsistencies in data attributes and formats ultimately making it difficult to coerce between the different data structures. The zoo
and xts
packages solved a number of the issues in dealing with the various classes (ts
, zoo
, xts
, irts
, msts
, and the list goes on…). However, because these packages deal in classes other than data frame, the issues with coercion between tbl
and other time series object classes are still present.
The timetk
package provides tools that solve the issues with coercion, maximizing attribute extensibility (the required data attributes are retained during the coercion to each of the primary time series classes). The following tools are available to coerce and retrieve key information:
Coercion functions: tk_tbl
, tk_ts
, tk_xts
, tk_zoo
, and tk_zooreg
. These functions coerce time-based tibbles tbl
to and from each of the main time-series data types xts
, zoo
, zooreg
, ts
, maintaining the time-based index.
Index function: tk_index
returns the index. When the argument, timetk_idx = TRUE
, A time-based index (non-regularized index) of forecast
objects, models, and ts
objects is returned if present. Refer to tk_ts()
to learn about non-regularized index persistence during the coercion process.
This vignette includes a brief case study on coercion issues and then a detailed explanation of timetk
function coercion between time-based tbl
objects and several primary time series classes (xts
, zoo
, zooreg
and ts
).
Before we get started, load the following packages.
library(tidyquant)
library(timetk)
We’ll use the ten-year treasury rate available from the FRED database with the code, “DGS10”. We’ll retrieve the data set using tq_get(get = "economic.data")
. The return structure is a tibble (or “tidy” data frame), which is not conducive to many of the popular time series analysis packages including quantmod
, TTR
, forecast
and many others.
ten_year_treasury_rate_tbl <- tq_get("DGS10",
get = "economic.data",
from = "1997-01-01",
to = "2016-12-31") %>%
rename(pct = price) %>%
mutate(pct = pct / 100)
ten_year_treasury_rate_tbl
## # A tibble: 5,218 x 2
## date pct
## <date> <dbl>
## 1 1997-01-01 NA
## 2 1997-01-02 0.0654
## 3 1997-01-03 0.0652
## 4 1997-01-06 0.0654
## 5 1997-01-07 0.0657
## 6 1997-01-08 0.0660
## 7 1997-01-09 0.0652
## 8 1997-01-10 0.0663
## 9 1997-01-13 0.0663
## 10 1997-01-14 0.0653
## # ... with 5,208 more rows
For purposes of the Case Study, we’ll change to a quarterly periodicity using tq_transmute()
from the tidyquant
package. Note that NA
values are automatically removed from the data (message not shown).
ten_year_treasury_rate_tbl <- ten_year_treasury_rate_tbl %>%
tq_transmute(pct, mutate_fun = to.period, period = "quarters")
ten_year_treasury_rate_tbl
## # A tibble: 80 x 2
## date pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
The ts
object class has roots in the stats
package and many popular packages use this time series data structure including the popular forecast
package. With that said, the ts
data structure is the most difficult to coerce back and forth because by default it does not contain a time-based index. Rather it uses a regularized index computed using the start
and frequency
arguments. Coercion to ts
is done using the ts()
function from the stats
library, which results in various problems.
First, only numeric columns get coerced. If the user forgets to add the [,"pct"]
to drop the “date” column, ts()
returns dates in numeric format which is not what the user wants.
# date column gets coerced to numeric
ts(ten_year_treasury_rate_tbl, start = 1997, freq = 4) %>%
head()
## date pct
## 1997 Q1 9951 0.0692
## 1997 Q2 10042 0.0651
## 1997 Q3 10134 0.0612
## 1997 Q4 10226 0.0575
## 1998 Q1 10316 0.0567
## 1998 Q2 10407 0.0544
The correct method is to call the specific column desired. However, this presents a new issue. The date index is lost, and a different “regularized” index is built using the start
and frequency
attributes.
ten_year_treasury_rate_ts_stats <- ts(ten_year_treasury_rate_tbl[,"pct"],
start = 1997,
freq = 4)
ten_year_treasury_rate_ts_stats
## Qtr1 Qtr2 Qtr3 Qtr4
## 1997 0.0692 0.0651 0.0612 0.0575
## 1998 0.0567 0.0544 0.0444 0.0465
## 1999 0.0525 0.0581 0.0590 0.0645
## 2000 0.0603 0.0603 0.0580 0.0512
## 2001 0.0493 0.0542 0.0460 0.0507
## 2002 0.0542 0.0486 0.0363 0.0383
## 2003 0.0383 0.0354 0.0396 0.0427
## 2004 0.0386 0.0462 0.0414 0.0424
## 2005 0.0450 0.0394 0.0434 0.0439
## 2006 0.0486 0.0515 0.0464 0.0471
## 2007 0.0465 0.0503 0.0459 0.0404
## 2008 0.0345 0.0399 0.0385 0.0225
## 2009 0.0271 0.0353 0.0331 0.0385
## 2010 0.0384 0.0297 0.0253 0.0330
## 2011 0.0347 0.0318 0.0192 0.0189
## 2012 0.0223 0.0167 0.0165 0.0178
## 2013 0.0187 0.0252 0.0264 0.0304
## 2014 0.0273 0.0253 0.0252 0.0217
## 2015 0.0194 0.0235 0.0206 0.0227
## 2016 0.0178 0.0149 0.0160 0.0245
We can see from the structure (using the str()
function) that the regularized time series is present, but there is no date index retained.
# No date index attribute
str(ten_year_treasury_rate_ts_stats)
## Time-Series [1:80, 1] from 1997 to 2017: 0.0692 0.0651 0.0612 0.0575 0.0567 0.0544 0.0444 0.0465 0.0525 0.0581 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "pct"
We can get the index using the index()
function from the zoo
package. The index retained is a regular sequence of numeric values. In many cases, the regularized values cannot be coerced back to the original time-base because the date and date time data contains significantly more information (i.e. year-month-day, hour-minute-second, and timezone attributes) and the data may not be on a regularized interval (frequency).
# Regularized numeric sequence
index(ten_year_treasury_rate_ts_stats)
## [1] 1997.00 1997.25 1997.50 1997.75 1998.00 1998.25 1998.50 1998.75
## [9] 1999.00 1999.25 1999.50 1999.75 2000.00 2000.25 2000.50 2000.75
## [17] 2001.00 2001.25 2001.50 2001.75 2002.00 2002.25 2002.50 2002.75
## [25] 2003.00 2003.25 2003.50 2003.75 2004.00 2004.25 2004.50 2004.75
## [33] 2005.00 2005.25 2005.50 2005.75 2006.00 2006.25 2006.50 2006.75
## [41] 2007.00 2007.25 2007.50 2007.75 2008.00 2008.25 2008.50 2008.75
## [49] 2009.00 2009.25 2009.50 2009.75 2010.00 2010.25 2010.50 2010.75
## [57] 2011.00 2011.25 2011.50 2011.75 2012.00 2012.25 2012.50 2012.75
## [65] 2013.00 2013.25 2013.50 2013.75 2014.00 2014.25 2014.50 2014.75
## [73] 2015.00 2015.25 2015.50 2015.75 2016.00 2016.25 2016.50 2016.75
The timetk
package contains a new function, tk_ts()
, that enables maintaining the original date index as an attribute. When we repeat the tbl
to ts
coercion process using the new function, tk_ts()
, we can see a few differences.
First, only numeric columns get coerced, which prevents unintended consequences due to R coercion rules (e.g. dates getting unintentionally converted or characters causing the homogeneous data structure converting all numeric values to character). If a column is dropped, the user gets a warning.
# date automatically dropped and user is warned
ten_year_treasury_rate_ts_timetk <- tk_ts(ten_year_treasury_rate_tbl,
start = 1997,
freq = 4)
## Warning in tk_xts_.data.frame(ret, select = select, silent = silent): Non-
## numeric columns being dropped: date
ten_year_treasury_rate_ts_timetk
## Qtr1 Qtr2 Qtr3 Qtr4
## 1997 0.0692 0.0651 0.0612 0.0575
## 1998 0.0567 0.0544 0.0444 0.0465
## 1999 0.0525 0.0581 0.0590 0.0645
## 2000 0.0603 0.0603 0.0580 0.0512
## 2001 0.0493 0.0542 0.0460 0.0507
## 2002 0.0542 0.0486 0.0363 0.0383
## 2003 0.0383 0.0354 0.0396 0.0427
## 2004 0.0386 0.0462 0.0414 0.0424
## 2005 0.0450 0.0394 0.0434 0.0439
## 2006 0.0486 0.0515 0.0464 0.0471
## 2007 0.0465 0.0503 0.0459 0.0404
## 2008 0.0345 0.0399 0.0385 0.0225
## 2009 0.0271 0.0353 0.0331 0.0385
## 2010 0.0384 0.0297 0.0253 0.0330
## 2011 0.0347 0.0318 0.0192 0.0189
## 2012 0.0223 0.0167 0.0165 0.0178
## 2013 0.0187 0.0252 0.0264 0.0304
## 2014 0.0273 0.0253 0.0252 0.0217
## 2015 0.0194 0.0235 0.0206 0.0227
## 2016 0.0178 0.0149 0.0160 0.0245
Second, the data returned has a few additional attributes. The most important of which is a numeric attribute, “index”, which contains the original date information as a number. The ts()
function will not preserve this index while tk_ts()
will preserve the index in numeric form along with the time zone and class.
# More attributes including time index, time class, time zone
str(ten_year_treasury_rate_ts_timetk)
## Time-Series [1:80, 1] from 1997 to 2017: 0.0692 0.0651 0.0612 0.0575 0.0567 0.0544 0.0444 0.0465 0.0525 0.0581 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "pct"
## - attr(*, "index")= atomic [1:80] 8.60e+08 8.68e+08 8.76e+08 8.84e+08 8.91e+08 ...
## ..- attr(*, "tzone")= chr "UTC"
## ..- attr(*, "tclass")= chr "Date"
## - attr(*, ".indexCLASS")= chr "Date"
## - attr(*, "tclass")= chr "Date"
## - attr(*, ".indexTZ")= chr "UTC"
## - attr(*, "tzone")= chr "UTC"
Since we used the tk_ts()
during coercion, we can extract the original index in date format using tk_index(timetk_idx = TRUE)
(the default is timetk_idx = FALSE
which returns the default regularized index).
# Can now retrieve the original date index
timetk_index <- tk_index(ten_year_treasury_rate_ts_timetk, timetk_idx = TRUE)
head(timetk_index)
## [1] "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31"
## [6] "1998-06-30"
class(timetk_index)
## [1] "Date"
Next, the tk_tbl()
function has an argument timetk_idx
also which can be used to select which index to return. First, we show coercion using the default index. Notice that the index returned is “regularized” meaning its actually a numeric index rather than a time-based index.
# Coercion back to tibble using the default index (regularized)
ten_year_treasury_rate_ts_timetk %>%
tk_tbl(index_rename = "date", timetk_idx = FALSE)
## # A tibble: 80 x 2
## index pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
We can now get the original date index using the tk_tbl()
argument timetk_idx = TRUE
.
# Coercion back to tibble now using the timetk index (date / date-time)
ten_year_treasury_rate_tbl_timetk <- ten_year_treasury_rate_ts_timetk %>%
tk_tbl(index_rename = "date", timetk_idx = TRUE)
ten_year_treasury_rate_tbl_timetk
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
We can see that in this case (and in most cases) you can get the same data frame you began with.
# Comparing the coerced tibble with the original tibble
identical(ten_year_treasury_rate_tbl_timetk, ten_year_treasury_rate_tbl)
## [1] FALSE
Using the ten_year_treasury_rate_tbl
, we’ll go through the various coercion methods using tk_tbl
, tk_xts
, tk_zoo
, tk_zooreg
, and tk_ts
.
The starting point is the ten_year_treasury_rate_tbl
. We will coerce this into xts
, zoo
, zooreg
and ts
classes.
# Start:
ten_year_treasury_rate_tbl
## # A tibble: 80 x 2
## date pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
Use tk_xts()
. By default “date” is used as the date index and the “date” column is dropped from the output. Only numeric columns are coerced to avoid unintentional coercion issues.
# End
ten_year_treasury_rate_xts <- tk_xts(ten_year_treasury_rate_tbl)
## Warning in tk_xts_.data.frame(data = data, select = select, date_var =
## date_var, : Non-numeric columns being dropped: date
## Using column `date` for date_var.
head(ten_year_treasury_rate_xts)
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Use the select
argument to specify which columns to drop. Use the date_var
argument to specify which column to use as the date index. Notice the message and warning are no longer present.
# End - Using `select` and `date_var` args
tk_xts(ten_year_treasury_rate_tbl, select = -date, date_var = date) %>%
head()
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Also, as an alternative, we can set silent = TRUE
to bypass the warnings since the default dropping of the “date” column is what is desired. Notice no warnings or messages.
# End - Using `silent` to silence warnings
tk_xts(ten_year_treasury_rate_tbl, silent = TRUE) %>%
head()
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Use tk_zoo()
. Same as when coercing to xts, the non-numeric “date” column is automatically dropped and the index is automatically selected as the date column.
# End
ten_year_treasury_rate_zoo <- tk_zoo(ten_year_treasury_rate_tbl, silent = TRUE)
head(ten_year_treasury_rate_zoo)
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Use tk_zooreg()
. Same as when coercing to xts, the non-numeric “date” column is automatically dropped. The regularized index is built from the function arguments start
and freq
.
# End
ten_year_treasury_rate_zooreg <- tk_zooreg(ten_year_treasury_rate_tbl,
start = 1997,
freq = 4,
silent = TRUE)
head(ten_year_treasury_rate_zooreg)
## pct
## 1997 Q1 0.0692
## 1997 Q2 0.0651
## 1997 Q3 0.0612
## 1997 Q4 0.0575
## 1998 Q1 0.0567
## 1998 Q2 0.0544
The original time-based index is retained and can be accessed using tk_index(timetk_idx = TRUE)
.
# Retrieve original time-based index
tk_index(ten_year_treasury_rate_zooreg, timetk_idx = TRUE) %>%
str()
## Date[1:80], format: "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31" ...
Use tk_ts()
. The non-numeric “date” column is automatically dropped. The regularized index is built from the function arguments.
# End
ten_year_treasury_rate_ts <- tk_ts(ten_year_treasury_rate_tbl,
start = 1997,
freq = 4,
silent = TRUE)
ten_year_treasury_rate_ts
## Qtr1 Qtr2 Qtr3 Qtr4
## 1997 0.0692 0.0651 0.0612 0.0575
## 1998 0.0567 0.0544 0.0444 0.0465
## 1999 0.0525 0.0581 0.0590 0.0645
## 2000 0.0603 0.0603 0.0580 0.0512
## 2001 0.0493 0.0542 0.0460 0.0507
## 2002 0.0542 0.0486 0.0363 0.0383
## 2003 0.0383 0.0354 0.0396 0.0427
## 2004 0.0386 0.0462 0.0414 0.0424
## 2005 0.0450 0.0394 0.0434 0.0439
## 2006 0.0486 0.0515 0.0464 0.0471
## 2007 0.0465 0.0503 0.0459 0.0404
## 2008 0.0345 0.0399 0.0385 0.0225
## 2009 0.0271 0.0353 0.0331 0.0385
## 2010 0.0384 0.0297 0.0253 0.0330
## 2011 0.0347 0.0318 0.0192 0.0189
## 2012 0.0223 0.0167 0.0165 0.0178
## 2013 0.0187 0.0252 0.0264 0.0304
## 2014 0.0273 0.0253 0.0252 0.0217
## 2015 0.0194 0.0235 0.0206 0.0227
## 2016 0.0178 0.0149 0.0160 0.0245
The original time-based index is retained and can be accessed using tk_index(timetk_idx = TRUE)
.
# Retrieve original time-based index
tk_index(ten_year_treasury_rate_ts, timetk_idx = TRUE) %>%
str()
## Date[1:80], format: "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31" ...
Going back to tibble is just as easy using tk_tbl()
.
# Start
head(ten_year_treasury_rate_xts)
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Notice no loss of data going back to tbl
.
# End
tk_tbl(ten_year_treasury_rate_xts)
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
# Start
head(ten_year_treasury_rate_zoo)
## pct
## 1997-03-31 0.0692
## 1997-06-30 0.0651
## 1997-09-30 0.0612
## 1997-12-31 0.0575
## 1998-03-31 0.0567
## 1998-06-30 0.0544
Notice no loss of data going back to tbl
.
# End
tk_tbl(ten_year_treasury_rate_zoo)
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
# Start
head(ten_year_treasury_rate_zooreg)
## pct
## 1997 Q1 0.0692
## 1997 Q2 0.0651
## 1997 Q3 0.0612
## 1997 Q4 0.0575
## 1998 Q1 0.0567
## 1998 Q2 0.0544
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index
tk_tbl(ten_year_treasury_rate_zooreg)
## # A tibble: 80 x 2
## index pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
With timetk_idx = TRUE
the index is the original date sequence. The result is the original tbl
that we started with!
# End - with timetk index that is the same as original time-based index
tk_tbl(ten_year_treasury_rate_zooreg, timetk_idx = TRUE)
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
# Start
ten_year_treasury_rate_ts
## Qtr1 Qtr2 Qtr3 Qtr4
## 1997 0.0692 0.0651 0.0612 0.0575
## 1998 0.0567 0.0544 0.0444 0.0465
## 1999 0.0525 0.0581 0.0590 0.0645
## 2000 0.0603 0.0603 0.0580 0.0512
## 2001 0.0493 0.0542 0.0460 0.0507
## 2002 0.0542 0.0486 0.0363 0.0383
## 2003 0.0383 0.0354 0.0396 0.0427
## 2004 0.0386 0.0462 0.0414 0.0424
## 2005 0.0450 0.0394 0.0434 0.0439
## 2006 0.0486 0.0515 0.0464 0.0471
## 2007 0.0465 0.0503 0.0459 0.0404
## 2008 0.0345 0.0399 0.0385 0.0225
## 2009 0.0271 0.0353 0.0331 0.0385
## 2010 0.0384 0.0297 0.0253 0.0330
## 2011 0.0347 0.0318 0.0192 0.0189
## 2012 0.0223 0.0167 0.0165 0.0178
## 2013 0.0187 0.0252 0.0264 0.0304
## 2014 0.0273 0.0253 0.0252 0.0217
## 2015 0.0194 0.0235 0.0206 0.0227
## 2016 0.0178 0.0149 0.0160 0.0245
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index
tk_tbl(ten_year_treasury_rate_ts)
## # A tibble: 80 x 2
## index pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
With timetk_idx = TRUE
the index is the original date sequence. The result is the original tbl
that we started with!
# End - with timetk index
tk_tbl(ten_year_treasury_rate_ts, timetk_idx = TRUE)
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
This section covers additional concepts that the user may find useful when working with time series.
The function has_timetk_idx()
can be used to test whether toggling the timetk_idx
argument in the tk_index()
and tk_tbl()
functions will have an effect on the output. Here are several examples using the ten year treasury data used in the case study:
There’s no “timetk index” if the ts()
function is used. The solution is to use tk_ts()
to coerce the to ts
.
# Data coerced with stats::ts() has no timetk index
has_timetk_idx(ten_year_treasury_rate_ts_stats)
## [1] FALSE
If we try to toggle timetk_idx = TRUE
when retrieving the index with tk_index()
, we get a warning and the default regularized time series is returned.
tk_index(ten_year_treasury_rate_ts_stats, timetk_idx = TRUE)
## Warning in tk_index.ts(ten_year_treasury_rate_ts_stats, timetk_idx = TRUE):
## timetk attribute `index` not found. Returning default instead.
## [1] 1997.00 1997.25 1997.50 1997.75 1998.00 1998.25 1998.50 1998.75
## [9] 1999.00 1999.25 1999.50 1999.75 2000.00 2000.25 2000.50 2000.75
## [17] 2001.00 2001.25 2001.50 2001.75 2002.00 2002.25 2002.50 2002.75
## [25] 2003.00 2003.25 2003.50 2003.75 2004.00 2004.25 2004.50 2004.75
## [33] 2005.00 2005.25 2005.50 2005.75 2006.00 2006.25 2006.50 2006.75
## [41] 2007.00 2007.25 2007.50 2007.75 2008.00 2008.25 2008.50 2008.75
## [49] 2009.00 2009.25 2009.50 2009.75 2010.00 2010.25 2010.50 2010.75
## [57] 2011.00 2011.25 2011.50 2011.75 2012.00 2012.25 2012.50 2012.75
## [65] 2013.00 2013.25 2013.50 2013.75 2014.00 2014.25 2014.50 2014.75
## [73] 2015.00 2015.25 2015.50 2015.75 2016.00 2016.25 2016.50 2016.75
If we try to toggle timetk_idx = TRUE
during coercion to tbl
using tk_tbl()
, we get a warning and the default regularized time series is returned.
tk_tbl(ten_year_treasury_rate_ts_stats, timetk_idx = TRUE)
## Warning in tk_tbl.zooreg(zoo::as.zoo(data), preserve_index, rename_index, :
## No `timetk index` attribute found. Using regularized index.
## # A tibble: 80 x 2
## index pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
The tk_ts()
function returns an object with the “timetk index” attribute.
# Data coerced with tk_ts() has timetk index
has_timetk_idx(ten_year_treasury_rate_ts_timetk)
## [1] TRUE
If we toggle timetk_idx = TRUE
when retrieving the index with tk_index()
, we get the index of dates rather than the regularized time series.
tk_index(ten_year_treasury_rate_ts_timetk, timetk_idx = TRUE)
## [1] "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31"
## [6] "1998-06-30" "1998-09-30" "1998-12-31" "1999-03-31" "1999-06-30"
## [11] "1999-09-30" "1999-12-31" "2000-03-31" "2000-06-30" "2000-09-29"
## [16] "2000-12-29" "2001-03-30" "2001-06-29" "2001-09-28" "2001-12-31"
## [21] "2002-03-28" "2002-06-28" "2002-09-30" "2002-12-31" "2003-03-31"
## [26] "2003-06-30" "2003-09-30" "2003-12-31" "2004-03-31" "2004-06-30"
## [31] "2004-09-30" "2004-12-31" "2005-03-31" "2005-06-30" "2005-09-30"
## [36] "2005-12-30" "2006-03-31" "2006-06-30" "2006-09-29" "2006-12-29"
## [41] "2007-03-30" "2007-06-29" "2007-09-28" "2007-12-31" "2008-03-31"
## [46] "2008-06-30" "2008-09-30" "2008-12-31" "2009-03-31" "2009-06-30"
## [51] "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30"
## [56] "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-30"
## [61] "2012-03-30" "2012-06-29" "2012-09-28" "2012-12-31" "2013-03-28"
## [66] "2013-06-28" "2013-09-30" "2013-12-31" "2014-03-31" "2014-06-30"
## [71] "2014-09-30" "2014-12-31" "2015-03-31" "2015-06-30" "2015-09-30"
## [76] "2015-12-31" "2016-03-31" "2016-06-30" "2016-09-30" "2016-12-30"
If we toggle timetk_idx = TRUE
during coercion to tbl
using tk_tbl()
, we get the index of dates rather than the regularized index in the returned tbl
.
tk_tbl(ten_year_treasury_rate_ts_timetk, timetk_idx = TRUE)
## # A tibble: 80 x 2
## index pct
## <date> <dbl>
## 1 1997-03-31 0.0692
## 2 1997-06-30 0.0651
## 3 1997-09-30 0.0612
## 4 1997-12-31 0.0575
## 5 1998-03-31 0.0567
## 6 1998-06-30 0.0544
## 7 1998-09-30 0.0444
## 8 1998-12-31 0.0465
## 9 1999-03-31 0.0525
## 10 1999-06-30 0.0581
## # ... with 70 more rows
The timetk_idx
argument will only have an effect on objects that use regularized time series. Therefore, has_timetk_idx()
returns FALSE
for other object types (e.g. tbl
, xts
, zoo
) since toggling the argument has no effect on these classes.
has_timetk_idx(ten_year_treasury_rate_xts)
## [1] FALSE
Toggling the timetk_idx
argument has no effect on the output. Output with timetk_idx = TRUE
is the same as with timetk_idx = FALSE
.
tk_index(ten_year_treasury_rate_xts, timetk_idx = TRUE)
## [1] "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31"
## [6] "1998-06-30" "1998-09-30" "1998-12-31" "1999-03-31" "1999-06-30"
## [11] "1999-09-30" "1999-12-31" "2000-03-31" "2000-06-30" "2000-09-29"
## [16] "2000-12-29" "2001-03-30" "2001-06-29" "2001-09-28" "2001-12-31"
## [21] "2002-03-28" "2002-06-28" "2002-09-30" "2002-12-31" "2003-03-31"
## [26] "2003-06-30" "2003-09-30" "2003-12-31" "2004-03-31" "2004-06-30"
## [31] "2004-09-30" "2004-12-31" "2005-03-31" "2005-06-30" "2005-09-30"
## [36] "2005-12-30" "2006-03-31" "2006-06-30" "2006-09-29" "2006-12-29"
## [41] "2007-03-30" "2007-06-29" "2007-09-28" "2007-12-31" "2008-03-31"
## [46] "2008-06-30" "2008-09-30" "2008-12-31" "2009-03-31" "2009-06-30"
## [51] "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30"
## [56] "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-30"
## [61] "2012-03-30" "2012-06-29" "2012-09-28" "2012-12-31" "2013-03-28"
## [66] "2013-06-28" "2013-09-30" "2013-12-31" "2014-03-31" "2014-06-30"
## [71] "2014-09-30" "2014-12-31" "2015-03-31" "2015-06-30" "2015-09-30"
## [76] "2015-12-31" "2016-03-31" "2016-06-30" "2016-09-30" "2016-12-30"
tk_index(ten_year_treasury_rate_xts, timetk_idx = FALSE)
## [1] "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31"
## [6] "1998-06-30" "1998-09-30" "1998-12-31" "1999-03-31" "1999-06-30"
## [11] "1999-09-30" "1999-12-31" "2000-03-31" "2000-06-30" "2000-09-29"
## [16] "2000-12-29" "2001-03-30" "2001-06-29" "2001-09-28" "2001-12-31"
## [21] "2002-03-28" "2002-06-28" "2002-09-30" "2002-12-31" "2003-03-31"
## [26] "2003-06-30" "2003-09-30" "2003-12-31" "2004-03-31" "2004-06-30"
## [31] "2004-09-30" "2004-12-31" "2005-03-31" "2005-06-30" "2005-09-30"
## [36] "2005-12-30" "2006-03-31" "2006-06-30" "2006-09-29" "2006-12-29"
## [41] "2007-03-30" "2007-06-29" "2007-09-28" "2007-12-31" "2008-03-31"
## [46] "2008-06-30" "2008-09-30" "2008-12-31" "2009-03-31" "2009-06-30"
## [51] "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30"
## [56] "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-30"
## [61] "2012-03-30" "2012-06-29" "2012-09-28" "2012-12-31" "2013-03-28"
## [66] "2013-06-28" "2013-09-30" "2013-12-31" "2014-03-31" "2014-06-30"
## [71] "2014-09-30" "2014-12-31" "2015-03-31" "2015-06-30" "2015-09-30"
## [76] "2015-12-31" "2016-03-31" "2016-06-30" "2016-09-30" "2016-12-30"
It’s common to need to coerce data stored as data frame or another structure with a time-base to ts
to perform some analysis. It’s also common to need to coerce it from the regularized structure to a time-based structure such as xts
or zoo
to perform further analysis within your workflow. Traditionally coercing a ts
class object to an xts
or zoo
class object was difficult or impossible since the ts
object does not maintain a time-based index and the xts
and zoo
objects require the order.by
argument to specify a time-based index. The zoo
package contains some regularizing functions (yearmon
and yearqtr
) that can be converted to dates, but there is no easy method to coerce ts
objects on frequencies such as daily until now. The general process is as follows:
tbl
) or xts
object.ts
using the tk_ts()
function setting the start
and frequency
parameters for regularization. This generates a regularized ts
object as normal, but using the tk_ts()
function also maintains the time-based “timetk index”.tk_xts()
or tk_zoo()
respectively.Here’s a quick example. Our starting point is a tibble (tbl
) but it could be another time-based object such as xts
or zoo
.
# Start with a date or date-time indexed data frame
data_tbl <- tibble::tibble(
date = seq.Date(as.Date("2016-01-01"), by = 1, length.out = 5),
x = cumsum(11:15) * rnorm(1))
data_tbl
## # A tibble: 5 x 2
## date x
## <date> <dbl>
## 1 2016-01-01 -17.23228
## 2 2016-01-02 -36.03113
## 3 2016-01-03 -56.39656
## 4 2016-01-04 -78.32855
## 5 2016-01-05 -101.82712
Coerce to ts
class using the tk_ts()
function. Note that the non-numeric column “date” is being dropped, and the silent = TRUE
hides the message.
# Coerce to ts
data_ts <- tk_ts(data_tbl, start = 2016, freq = 365, silent = TRUE)
data_ts
## Time Series:
## Start = c(2016, 1)
## End = c(2016, 5)
## Frequency = 365
## x
## [1,] -17.23228
## [2,] -36.03113
## [3,] -56.39656
## [4,] -78.32855
## [5,] -101.82712
## attr(,"index")
## [1] 1451606400 1451692800 1451779200 1451865600 1451952000
## attr(,"index")attr(,"tzone")
## [1] UTC
## attr(,"index")attr(,"tclass")
## [1] Date
## attr(,".indexCLASS")
## [1] Date
## attr(,"tclass")
## [1] Date
## attr(,".indexTZ")
## [1] UTC
## attr(,"tzone")
## [1] UTC
Coercion to xts
normally requires a date or datetime index to be passed to the order.by
argument. However, when coercing ts
objects created with tk_ts()
, the tk_xts
function automatically uses the “timetk index” if present.
# Inspect timetk index
has_timetk_idx(data_ts)
## [1] TRUE
If the “timetk index” is present, the user can simply pass the ts
object to the coercion function (tk_xts()
), which will automatically use the “timetk index” to order by.
# No need to specify order.by arg
data_xts <- tk_xts(data_ts)
data_xts
## x
## 2016-01-01 -17.23228
## 2016-01-02 -36.03113
## 2016-01-03 -56.39656
## 2016-01-04 -78.32855
## 2016-01-05 -101.82712
We can see that the xts
structure is maintained.
str(data_xts)
## An 'xts' object on 2016-01-01/2016-01-05 containing:
## Data: num [1:5, 1] -17.2 -36 -56.4 -78.3 -101.8
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "x"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
The same process can be used to coerce from ts
to zoo
class using tk_zoo
.
# No need to specify order.by arg
data_zoo <- tk_zoo(data_ts)
data_zoo
## x
## 2016-01-01 -17.23228
## 2016-01-02 -36.03113
## 2016-01-03 -56.39656
## 2016-01-04 -78.32855
## 2016-01-05 -101.82712
We can see that the zoo
structure is maintained.
str(data_zoo)
## 'zoo' series from 2016-01-01 to 2016-01-05
## Data: num [1:5, 1] -17.2 -36 -56.4 -78.3 -101.8
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:5] "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" ...
## ..$ : chr "x"
## Index: Date[1:5], format: "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-01-05"
Note that tbl
requires the timetk_idx = TRUE
argument to specify the use of the non-regularized index.
tk_tbl(data_ts, timetk_idx = TRUE)
## # A tibble: 5 x 2
## index x
## <date> <dbl>
## 1 2016-01-01 -17.23228
## 2 2016-01-02 -36.03113
## 3 2016-01-03 -56.39656
## 4 2016-01-04 -78.32855
## 5 2016-01-05 -101.82712
The zoo
package has the yearmon
and yearqtr
classes for working with regularized monthly and quarterly data, respectively. The “timetk index” tracks the format during coercion. Here’s and example with yearqtr
.
yearqtr_tbl <- ten_year_treasury_rate_tbl %>%
mutate(date = as.yearqtr(date))
yearqtr_tbl
## # A tibble: 80 x 2
## date pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
We can coerce to xts
and the yearqtr
class is intact.
yearqtr_xts <- tk_xts(yearqtr_tbl)
## Warning in tk_xts_.data.frame(data = data, select = select, date_var =
## date_var, : Non-numeric columns being dropped: date
## Using column `date` for date_var.
yearqtr_xts %>%
head()
## pct
## 1997 Q1 0.0692
## 1997 Q2 0.0651
## 1997 Q3 0.0612
## 1997 Q4 0.0575
## 1998 Q1 0.0567
## 1998 Q2 0.0544
We can coerce to ts
and, although the “timetk index” is hidden, the yearqtr
class is intact.
yearqtr_ts <- tk_ts(yearqtr_xts, start = 1997, freq = 4)
yearqtr_ts %>%
head()
## Qtr1 Qtr2 Qtr3 Qtr4
## 1997 0.0692 0.0651 0.0612 0.0575
## 1998 0.0567 0.0544
Coercing from ts
to tbl
using timetk_idx = TRUE
shows that the original index was maintained through each of the coercion steps.
yearqtr_ts %>%
tk_tbl(timetk_idx = TRUE)
## # A tibble: 80 x 2
## index pct
## <S3: yearqtr> <dbl>
## 1 1997 Q1 0.0692
## 2 1997 Q2 0.0651
## 3 1997 Q3 0.0612
## 4 1997 Q4 0.0575
## 5 1998 Q1 0.0567
## 6 1998 Q2 0.0544
## 7 1998 Q3 0.0444
## 8 1998 Q4 0.0465
## 9 1999 Q1 0.0525
## 10 1999 Q2 0.0581
## # ... with 70 more rows
It can be important to retrieve the index from models and other objects that use an underlying time series data set. We’ll go through an example retrieving the time index from an ARIMA model using tk_index()
.
library(forecast)
fit_arima <- ten_year_treasury_rate_ts %>%
auto.arima()
We can get the time index from the ARIMA model.
tk_index(fit_arima)
## [1] 1997.00 1997.25 1997.50 1997.75 1998.00 1998.25 1998.50 1998.75
## [9] 1999.00 1999.25 1999.50 1999.75 2000.00 2000.25 2000.50 2000.75
## [17] 2001.00 2001.25 2001.50 2001.75 2002.00 2002.25 2002.50 2002.75
## [25] 2003.00 2003.25 2003.50 2003.75 2004.00 2004.25 2004.50 2004.75
## [33] 2005.00 2005.25 2005.50 2005.75 2006.00 2006.25 2006.50 2006.75
## [41] 2007.00 2007.25 2007.50 2007.75 2008.00 2008.25 2008.50 2008.75
## [49] 2009.00 2009.25 2009.50 2009.75 2010.00 2010.25 2010.50 2010.75
## [57] 2011.00 2011.25 2011.50 2011.75 2012.00 2012.25 2012.50 2012.75
## [65] 2013.00 2013.25 2013.50 2013.75 2014.00 2014.25 2014.50 2014.75
## [73] 2015.00 2015.25 2015.50 2015.75 2016.00 2016.25 2016.50 2016.75
We can also get the original index from the ARIMA model be setting timetk_idx = TRUE
.
tk_index(fit_arima, timetk_idx = TRUE)
## [1] "1997-03-31" "1997-06-30" "1997-09-30" "1997-12-31" "1998-03-31"
## [6] "1998-06-30" "1998-09-30" "1998-12-31" "1999-03-31" "1999-06-30"
## [11] "1999-09-30" "1999-12-31" "2000-03-31" "2000-06-30" "2000-09-29"
## [16] "2000-12-29" "2001-03-30" "2001-06-29" "2001-09-28" "2001-12-31"
## [21] "2002-03-28" "2002-06-28" "2002-09-30" "2002-12-31" "2003-03-31"
## [26] "2003-06-30" "2003-09-30" "2003-12-31" "2004-03-31" "2004-06-30"
## [31] "2004-09-30" "2004-12-31" "2005-03-31" "2005-06-30" "2005-09-30"
## [36] "2005-12-30" "2006-03-31" "2006-06-30" "2006-09-29" "2006-12-29"
## [41] "2007-03-30" "2007-06-29" "2007-09-28" "2007-12-31" "2008-03-31"
## [46] "2008-06-30" "2008-09-30" "2008-12-31" "2009-03-31" "2009-06-30"
## [51] "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" "2010-09-30"
## [56] "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-30"
## [61] "2012-03-30" "2012-06-29" "2012-09-28" "2012-12-31" "2013-03-28"
## [66] "2013-06-28" "2013-09-30" "2013-12-31" "2014-03-31" "2014-06-30"
## [71] "2014-09-30" "2014-12-31" "2015-03-31" "2015-06-30" "2015-09-30"
## [76] "2015-12-31" "2016-03-31" "2016-06-30" "2016-09-30" "2016-12-30"