Incremental Dynamic Time Warping

Leodolter Maximilian


Dynamic Time Warping is a well known and often applied technic to measure the distance between two time series. (Google) finds about 750,000 matches for the term Dynamic Time Warping. Searching the scientific search engines for research articles for this subject results in:

To learn more about DTW interested readers may also read (Wikipedia DTW) or have look at this (Dummy Example). This (tutorial) also explains the main idea of DTW, described in the original paper by (Sakoe and Chiba: Dynamic programming algorithm optimization for spoken word recognition)



Introduction to Dynamic Time Warping: dtw()

The distance measure DTW calcuates the minimal costs of the shortest non-linear alignment of two time series \(Q\) (the query time series) and \(C\) (the candidate time series), conditional to

Remark: The DTW distance measure is not a distance metric in the conventional sense, since it does not fulfill the triangle inequality.

In the following we show by aid of toy example how to use the functions of IncDTW and what they are doing. We start with the basic functions. Suppose we have two time series and we need to calculate the DTW distance of these:

Q <- sin(1:10)#+rnorm(20)
C <- sin(-2:10)#+rnorm(15)
tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE, return_wp = TRUE,
                   return_QC = TRUE, return_cm = TRUE, return_diffp = TRUE)
##  [1] "gcm"                 "dm"                  "diffp"               "ii"                  "jj"                  "wp"                  "distance"            "normalized_distance" "cm"                  "diffM"               "Q"                   "C"

Another toy example:

Q <- c(1, 1, 2, 3, 2, 0)
C <- c(0, 1, 1, 2, 3, 2, 1)

tmp <- IncDTW::dtw(Q = Q, C = C, return_diffM = TRUE, return_wp = TRUE,
                   return_QC = TRUE, return_cm = TRUE, return_diffp = TRUE)
##  [1] "gcm"                 "dm"                  "diffp"               "ii"                  "jj"                  "wp"                  "distance"            "normalized_distance" "cm"                  "diffM"               "Q"                   "C"

The matrix of differences diffM simply stores the differences of the two time series \(Q\) and \(C\).

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    0    0   -1   -2   -1    0
## [2,]    1    0    0   -1   -2   -1    0
## [3,]    2    1    1    0   -1    0    1
## [4,]    3    2    2    1    0    1    2
## [5,]    2    1    1    0   -1    0    1
## [6,]    0   -1   -1   -2   -3   -2   -1

The global cost matrix is calculated by walking through the matrix of absolute differences abs(diffM) from top left to bottom right applying the following rule:

cm <- abs(diffM) # cost matrix
gcm <- cm # initialize the global cost matrix
for(j in 1:m){
   for(i in 1:n){
      gcm[ , 1] <- cm[ ,1]
      gcm[1,  ] <- cm[1, ]
      gcm[i, j] <- cm[i, j] + min(c(gcm[i-1, j-1], 
                                    gcm[i  , j-1],
                                    gcm[i-1, j  ]))}}

The direction matrix dm is calculated simultaneously to the global cost matrix and stores for each position where to go next to make the cheapest step.

dm <- matrix(NA, ncol=length(C), nrow = length(Q))
dm[1,  ] <- 3
dm[ , 1] <- 2
dm[1, 1] <- NA

for(j in 1:m){
   for(i in 1:n){
      min_index <- which.min(c(gcm[i-1, j-1], 
                               gcm[i  , j-1], 
                               gcm[i-1, j  ]))
      if( min_index == 1){
         dm[i, j] <- 1
      } else if( min_index == 2){
         dm[i, j] <- 2
      } else if( min_index == 3){
         dm[i, j] <- 3
      } } }

In case the cost matrix, or the matrix of differences is already in storage, and you want to save calculation time, you can use these matrices also as input:

IncDTW::dtw(Q = tmp$diffM, C = "diffM")
IncDTW::dtw(Q = abs(tmp$diffM), C = "cm")

Incremental Calculation of DTW: idtw()

The main intention of idtw() is to compare the DTW distance for different subsets of two time series. That is to compare the DTW(\(Q\), \(C_0\)) with DTW(\(Q\), \(C_1\)) where \(C_1\) could be the extension of \(C_0\) for couple of observations.

Another possible application is a live data stream and you want to update the DTW calculation each time a set of values of \(Q\) are observed. Of course this depends on the application and certainly does not make sense in any case, since a time warp of two time series where the lengths differ too much is questionable. A reasonable range for the ratio n/m depends on the application and purposes of the data analysis.

Assume we already have calculated the DTW of \(C\) and \(Q\), when an update of \(C\) provides new observations:

C0 <- cumsum(rnorm(1000))
Q <- cumsum(rnorm(800))
tmp0 <- IncDTW::dtw(Q = Q, C = C0)
gcm0 <- tmp0$gcm
dm0 <- tmp0$dm

To update the DTW distance for new observations of \(C\) we recycle as much from previous results as possible to keep calculation time low:

C_new <- cumsum(rnorm(10))
C_update <- c(C0, C_new) 

# result from incremental calculation
res_inc <- IncDTW::idtw(Q = Q, C = C_update, newObs = C_new, 
                    gcm = tmp0$gcm, dm = tmp0$dm)

Finally we compare the results from the incremental calculation with the one from scratch:

# result from scratch
res_scratch <- IncDTW::dtw(Q = Q, C = C_update) 
sapply(names(res_inc), function(x){identical(res_inc[[x]], res_scratch[[x]])})
## distance      gcm       dm 
##     TRUE     TRUE     TRUE

Decremental Dynamic Time Warping: dec_dm()

Now that we have understood the incremental approach and tested the calculation time and correctness of results, we want to introduce another concept, which is just the reverse: Instead of adding new observations to the time series x, in this section we introduce an easy and fast way of reducing most current observations, conditional to already existing calculation results including all observations of \(C\).

Of course this is only interesting if you are interested not only in the DTW distance, but also want to know about the warping path. Meanwhile the DTW distance could be directly taken from the global cost matrix from previous calculations (by selecting the value of the last row and last but k column, k is the number of reduced observations of \(C\)), we cannot update or modify the previous warping path but need to calculate it again from scratch. This is required, since adding or neglecting observations causes an updated alignment of the two updated time series, and in general the new alignment could be completely different from the original.

Q <- cos(1:100)
C <- cumsum(rnorm(80))
Ndec <- 4
# the ordinary calculation
result_base <- IncDTW::dtw(Q=Q, C=C) 
gcm0 <- result_base$gcm

To update the DTW distance by decreasing x for some observations (say 4), again we recycle as much from previous results as possible to keep calculation time low:

# the ordinary calculation without the last 4 observations
result_decr1 <- IncDTW::dtw(Q=Q, C=C[1:(length(C) - Ndec)], 
                            return_wp = TRUE)
gcm1 <- result_decr1$gcm

# the decremental step: reduce C for 4 observation
result_decr2 <- IncDTW::dec_dm(result_base$dm, Ndec = Ndec) 

# compare the results: ii, jj, wp and the gcm of 
# result_decr1 (conventional) and result_decr2 (recycling previous results)
comparison_0 <- c(
identical(result_decr1$ii, result_decr2$ii),
identical(result_decr1$jj, result_decr2$jj),
identical(result_decr1$wp, result_decr2$wp))

Vector-based Implementation

The matrix based implementation is perfect for a detailed analysis of the alignment of two time series. However for calculating the distances of many time series it has one major drawback, the computation time. If the data analysis task is to cluster or classify a database of time series a distance matrix of pairwise distances is required. Obtaining such a distance matrix is computationally expensive, and so the classic DTW implementation maybe is considered as too slow. The vector-based implementation offers a solution which is much faster than the matrix based implementation since no matrices are allocated. The computation principle is the same as for the matrix-based method, however instead of allocating the two cost matrix, global cost matrix and direction matrix only two vectors are allocated. To achieve the DTW distance between the time series Q and C the calculation of the j-th column of the global cost matrix solely depends on the values of the former column and the respective distances of the time series C and the j-th entry of Q. Since there is no dependence on the second last column the algorithm overwrites this column. This principle is implemented in the functions dtw2vec() and the incremental equivalent idtw2vec().

Q <- rnorm(100)
C <- rnorm(120)
IncDTW::dtw2vec(Q, C, ws = 30)$distance
## [1] 90.13429
IncDTW::dtw(Q, C, ws = 30)$distance
## [1] 90.13429

We can also perform the incremental calculation with the vector based implementation with idtw2vec(), which can distinguish between an initial calculation and the incremental. For this example we simulate time series as univariate random walks.

Q <- cumsum(rnorm(800))
C_initial <- Q[1:750] + rnorm(750)
resV_init <- IncDTW::idtw2vec(Q = Q, newObs = C_initial, gcm_lc = NULL)

C_newObs <- Q[751:800] + rnorm(50)
resV_inc <- IncDTW::idtw2vec(Q = Q, newObs = C_newObs, 
                             gcm_lc = resV_init$gcm_lc_new)

Also we compare the DTW distance of the incremental calculation idtw2vec() with the one from scratch dtw2vec() and see that the distance measures are equal.

C_update <- c(C_initial, C_newObs)
resV_scr <- IncDTW::dtw2vec(Q = Q, C = C_update)
resV_scr$distance - resV_inc$distance
## [1] 0

New Observations for Both Time Series

If new observations for both time series are available the update of the DTW calculation works in a consecutive fashion similar to the simple case where only one time series is updated. First we update the calculations for the first time series as usual:

Next we switch places of Q and C as input for idtw2vec() and proceed analogously. Also we need to switch the last column with the last row of the global cost matrix.

Finally we compare the results with the results from scratch and see that the calculated distance measures are equal:

## [1] 0
## [1] 0

Open Alignments

The original DTW algorithm has fixed alignments at the beginning and the end of the time series, so that when calculating dtw(Q, C) the first element of Q must be aligned to the first element of C, and the last element of Q to the last element of C. % Here we relax this restriction and allow one time series to be aligned only partially, so dtw(Q, C[1:k]), where the lengths of Q and C are \(m\) and \(n\) respectively, and \(n > k\). The counterpart is dtw(Q[1:l], C) for any \(l<m\). Once the global cost matrix for the full alignment is determined dtw(Q, C) the DTW distances for possible partial open-end alignments are given by the last column and last row of the global cost matrix. The function dtw_partial() returns the indices of the open-end alignment with the minimum normalized DTW distance.

In the following example we first get the full alignment and use this as input to find the open-end alignment for a parital match of C.

## $rangeQ
## [1]  1 50
## $rangeC
## [1]  1 61
## $normalized_distance
## [1] 0.08396951

Since DTW is reversible (for symmetric step patterns), so dtw(Q[1:m], C[1:n]) = dtw(Q[m:1], C[n:1]), possible open-start alignments can be compared similarly. So for the following time series, where obviously the initial 50 observations of C have no reasonable connection with Q we try to find the best open-start alignment. Considering that we first get the full alignment of the reverse time series and use this as input to find the open-start alignment for a parital match of C.

## $rangeQ
## [1]   1 100
## $rangeC
## [1]  51 150
## $normalized_distance
## [1] 0

Combining the incremental calculation and the partial alignment is also possible since the incremental algorithm returns either the updated global cost matrix (for the matrix based algorithm idtw() or the last column and last row (vetor based idet2vec()) for possible future iterations. As a consequence we can update the partial alignment after each incremental step, where Q is a query pattern and C is recorded at the time of analysis, just like a data stream:

## $rangeQ
## [1]  1 24
## $rangeC
## [1]  1 20
## $normalized_distance
## [1] 0.08470878

With new observations of C we can easily update the partial matching:

## $rangeQ
## [1]  1 44
## $rangeC
## [1]  1 40
## $normalized_distance
## [1] 0.0643759

Analysing a Database of Time Series

There are three more functions in IncDTW that are highly useful when comparing a set of time series: