Fast Constrained Sparse Dynamic Time Warping

1. Introduction

After the first introduction of Dynamic Time Warping (DTW) in speech recognition[1], machine learning has been using it for its various problems[2-4]. Several review articles[5,6] highlighted the potential of DTW for time series comparison. However, there was a main barrier to using DTW, computational complexity. The time complexity, $$O(NM)$$ , provides an approximation of the work required for an algorithm based on the input sizes, $$N$$ and $$M$$ . Numerous algorithmic strategies to reduce the complexity of DTW have been published however they didn't solve DTW problems exactly[7,8]. \iffalse After the first introduction of Dynamic Time Warping (DTW) in speech recognition[1], it has been proposed as a solution for various machine learning problems[2-4] Several review articles[5,6] highlighted the potential of DTW for time series comparison. However, there was a main barrier to use DTW, computational complexity. The time complexity is in general $$O(NM)$$ , where $$N$$ and $$M$$ are the lengths of the time series. Many algorithmic approaches to reduce the complexity of DTW have been published however they didn?t apply specific problems[7,8]. \fi \iffalse DTW was first introduced in speech recognition [1], and has been applied to a wide range of machine learning problems such as handwriting recognition [2], fingerprint verification [3], classification of motor activities [4] \iffalse [4,9,10] \fi and others. DTW is often viewed as one of the best methods for time series comparison in data mining as discussed in several review articles [5,6]. However, computational complexity has been a barrier to the use of DTW. The time complexity is in general $$O(NM)$$ , where $$N$$ and $$M$$ are the lengths of the time series. There are many algorithmic approaches to reducing the complexity of DTW for general time series which are not tied to specific problems. The interested reader should refer to [7,8]. One common approach is to add a global constraint on slope [11] or width [1] along the diagonal of the warping path domain. Another approach involves multi-resolution approximation [12]. This latter strategy improves the time complexity to $$O(N+M)$$ . However, these modifications of DTW can yield solutions which are far from the exact DTW solution. \fi \iffalse One approach to reducing this complexity is to add a global constraint on slope[11] or width[1] along the diagonal of the permissible warping path domain. The complexity can also be reduced by pruning cells of computation whose distances are larger than an upper bound of DTW distance[13]. The so-called PrunedDTW was based on the idea that the distances around the optimal warping path are relatively low but cells distant from the optimal path often have higher distances. It was reported to reduce the time complexity by two on average. Another approach involves a multi-resolution approximation of the time series [12]. Their strategy is to recursively project an optimal warping path computed at a coarse resolution level to the next higher resolution level and then to refine the projected path. This method improves the time complexity to $$O(N)$$ . However, none of the above speed enhancements of DTW provably achieve the exact unconstrained solution except for PrunedDTW. In fact, these methods can yield solutions which are quite far from optimal[14]. \fi One of the real-life cases in sparse time series is managing zero values such as periods of silence in speech, inactivity or lack of food intake[15], or periods when house appliances are not running[16]. \iffalse They have been ignored as missing data but\fi They are not missing data but explain the sparse characteristics of the source. \iffalse Recent efforts for complexity reduction in DTW focused on the type of sparsity[8,14,17,18]. \fi To mitigate the challenges when using DTW for such sources, Sparse Dynamic Time Warping (SDTW)[14], Binary Sparse Dynamic Time Warping (BSDTW)[18], Constrained Sparse Dynamic Time Warping (CSDTW)[7], and Constrained Binary sparse Dynamic Time Warping (CBSDTW)[19] were developed and analyzed to speed up the DTW. The new algorithms were equivalent to DTW but had their computational complexities reduced by the sparsity ratio, $$s$$ or $$s^2$$ (BSDTW, CBSDTW), where $$s$$ was defined as the arithmetic mean of the proportion of non-zero values in the data. Also, Fast Sparse Dynamic Warping (FSDTW)[8] and AWarp[17] were developed and analyzed for further speed improvement by the order of $$s^2$$ at the expense of accuracy. \iffalse One of the real-life cases in sparse time series is managing zero values such as quiet periods in speech, lack of physical activity or food intake[15], or absence of power usage by appliances[16]. They have been ignored as missing data but they are informative when explaining the characteristics of the source. Recent efforts for complexity reduction in DTW focused on the type of sparsity[8,14,17,18]. \fi \iffalse An important case of sparse time series is one where the data takes zero value at many epochs (when we refer to sparsity in the sequel it is of this type). These zero values are not missing data, but rather are informative and characteristic of their source. For example, zero data arises from quiet periods in speech, lack of physical activity or food intake [15], or absence of power usage by appliances [16]. There has only recently been work on complexity reduction for DTW which specifically takes into account this type of sparsity [8,14,17,18]. \iffalse The binary DTW problem has also received attention from practitioners[18,20,21]. For example, several of the CASAS human activity data sets[22] that have been examined in the context of DTW consist of binary data points (e.g., sensor data indicating when a door is open/closed)[17,23].\fi \fi \iffalse To mitigate the challenges when using DTW, Sparse Dynamic Time Warping (SDTW)[14], Binary Sparse Dynamic Time Warping (BSDTW)[18], Constrained Sparse Dynamic Time Warping (CSDTW)[7], and Constrained Binary sparse Dynamic Time Warping (CBSDTW)[19] were developed to improve the speed of the DTW. The new algorithms were equivalent to DTW but have their computational complexities reduced by the sparsity ratio $$s$$ that is defined as the arithmetic mean of the fraction of non-zeros in the two time series. Also, Fast Sparse Dynamic Warping (FSDTW)[8] and AWarp[17] were developed for further speed improvement at the expense of accuracy. \fi \iffalse Sparse dynamic time warping (SDTW) [14] was developed for real-valued time series to exploit such sparsity. SDTW is equivalent to DTW, i.e., it yields the exact DTW distance (and if desired the corresponding warping paths), and the time complexity of SDTW is about $$2s$$ times the complexity of DTW, where the sparsity ratio $$s$$ is defined as the arithmetic mean of the fraction of non-zeros in the two time series. Binary sparse dynamic time warping (BSDTW) [18] was then developed for comparing binary-valued (1 or 0) time series. BSDTW is equivalent to DTW when restricted to binary values, and the time complexity of BSDTW is about $2s^{2}$ times the complexity of DTW. \iffalse [21] proposed an algorithm to do DTW in linear time for binary data. \fi AWarp [17] was also developed to compare real-valued time series to exploit sparsity. It's time complexity is about $4s^{2}$ times the complexity of DTW, and hence is theoretically faster than SDTW for sparse time series. However, AWarp is not in general equivalent to DTW, although it produces impressive results in some applications where the exact DTW solution is not required. \iffalse It is also argued that AWarp is equivalent to DTW for the binary-valued time series. \fi Then, fast sparse dynamic time warping (FSDTW) [8] was developed for general-valued time series to exploit sparsity. FSDTW is a fast and very accurate approximation to DTW based on an simplification of the SDTW. The time complexity of FSDTW is upper bounded by about 6s $^{2}$ times the complexity of DTW. Hence comparing the worst cases, FSDTW is about 1.5 times slower than AWarp. But It has been shown that for a wide range of examples FSDTW provides similar time complexity and much more accurate approximation to DTW for sparse time series. In fact, the relative error of FSDTW (relative to DTW) is typically $\sim10^{-4},$ and is several orders of magnitude less than the relative error of AWarp. In addition, constrained sparse dynamic time warping (CSDTW) [7] was developed for the constrained case where the width is limited along the diagonal of the permissible path domain. CSDTW is exactly equivalent to constrained DTW(CDTW) and its time complexity is about $$2s$$ times the complexity of CDTW. It turns out that CSDTW updates the data at the intersection of the feasible set that satisfies the constraint and the boundary of certain rectangles determined by the nonsparse time indices, as opposed to the intersection of the feasible set and the indices themselves in DTW. Then, constrained binary sparse dynamic time warping (CBSDTW) [19] was developed for the binary time series and the constrained permissible warping paths. CBSDTW is exactly equivalent to CDTW and its time complexity is about $$3s^2$$ times the complexity of CDTW. It turns out that CBSDTW updates the data at the end points of the intersection of the feasible set that satisfies the constraint and the boundary of certain rectangles determined by the nonsparse time indices. \fi \iffalse It is important to note that CSDTW updates the data at the subset of the region where SDTW updates them by taking the intersection with the feasible set. \fi The current study focuses on the faster approximation of CSDTW, Fast Constrained Sparse Dynamic Time Warping (FCSDTW). The time complexity of FCSDTW is upper bounded by about $$6s^2$$ times the complexity of CDTW. Even though FCSDTW is slightly slower than constrained AWarp (CAWarp) when comparing worst cases, a wide range of examples of FCSDTW is needed to provide better complexity and a more accurate estimation of CDTW. The relative error of FCSDTW (relative to CDTW) is typically $\sim10^{-4}$ , and is at least ten-thousandths that of CAWarp. Providing the most rigorous analysis, this study should be a core background for further reduction of time complexity. \iffalse This paper focuses on the faster approximation of CSDTW, Fast Constrained Sparse Dynamic Time Warping (FCSDTW). The time complexity of FCSDTW is upper bounded by about $$6s^2$$ times the complexity of CDTW. Even though FCSDTW is about 1.5 times slower than constrained AWarp (CAWarp) when comparing worst cases, a wide range of examples of FCSDTW is needed to provide better complexity and a more accurate approximation to CDTW. The relative error of FCSDTW (relative to CDTW) is typically $\sim10^{-4}$ , and is at most ten-thousandths times that of CAWarp. Providing the most rigorous analysis, this study should be a core background for further reduction of time complexity. \fi \iffalse Hence comparing the worst cases, FCSDTW is about 1.5 times slower than constrained AWarp (CAWarp). But a wide range of examples of FCSDTW is needed to provide similar time complexity and a much more accurate approximation to CDTW for sparse time series. The relative error of FCSDTW (relative to CDTW) is typically $\sim10^{-4},$ and is several orders of magnitude less than the relative error of CAWarp. Providing the most rigorous analysis, this study should be a core background for further reduction of time complexity.\fi \iffalse This paper focuses on the development and analysis of a fast approximate algorithm of CDTW based on the CSDTW framework. We call this constrained fast sparse dynamic time warping (FCSDTW). The time complexity of FCSDTW is upper bounded by about 6s $^{2}$ times the complexity of CDTW. Hence comparing worst cases, FCSDTW is about 1.5 times slower than constrained AWarp (CAWarp). But we shall show that for a wide range of examples FCSDTW provides similar time complexity and much more accurate approximation to CDTW for sparse time series. In fact, the relative error of FCSDTW (relative to CDTW) is typically $\sim10^{-4},$ and is several orders of magnitude less than the relative error of CAWarp. \iffalse It turns out that FCSDTW updates the data at the end points of the intersection of the feasible set that satisfies the constraint and the boundary of certain rectangles determined by the nonsparse time indices. \fi There are no comparable approaches and rigorous analysis for the exact solution to the constrained problem for this kind of sparse data. Furthermore, this study provides a benchmark and also background to potentially understand how such sparsity might be exploited by approximation of the time series, for further complexity reduction. \fi The review of DTW from the perspective of dynamic programming is not included. The interested reader should refer to the previous publications[7,8]. Section 2 presents the principles of FCSDTW and its reduction of complexity compared to CDTW. Section 3 has a description of the FCSDTW algorithm. Finally, the last section demonstrates the efficiency of FCSDTW with some numerical analysis. \iffalse The review of DTW from the dynamic programming perspective is omitted and the interested reader should refer to the previous publications[7,8]. Section 2 provides the principles of FCSDTW and its complexity reduction compared to CSDTW. Section 3 includes a description of the FCSDTW algorithm. Finally, the last section demonstrates the efficiency of FCSDTW with some numerical analysis. \fi \iffalse Section 2 provides the principles underlying FCSDTW and its complexity reduction while comparing it against CSDTW. Section 3 explicitly describes the FCSDTW algorithm. Finally, the last section presents some numerical analysis to demonstrate the efficiency of FCSDTW.\fi \iffalse The review of DTW from the dynamic programming perspective is omitted and the interested reader should refer to the previous publications[7,8]. Section [???] provides the principles underlying FCSDTW and its complexity reduction while comparing against CSDTW. Section [???] explicitly describes the FCSDTW algorithm. Finally, the last section presents some numerical analysis to demonstrate the efficiency of FCSDTW. \fi \iffalse

2. Review of DTW

DTW originated as a dynamic programming algorithm to compare time series in the presence of timing differences for spoken word recognition [1]. In order to describe the timing differences, consider two time series $X=(x_{1},x_{2},\cdots,x_{M})$ and $Y=(y_{1},y_{2},\cdots,y_{N})$ . The timing differences between them can be associated with a sequence of points $F=(c(1),c(2),\cdots,c(k),\cdots,c(K))$ , where $$c(k)=(m(k),n(k))$$ . This sequence realizes a mapping from the time indices of $$X$$ onto those of $$Y$$ . It is called a warping function. The measure of difference between $x_{m}$ and $y_{n}$ is usually given as $d(c)=d(m,n)=|x_{m}-y_{n}|^{p}$ , where $0\lt p\lt \infty$ (Euclidean $$p$$ -norm). For $$p=0$$ , $$d(m,n)=1$$ if $x_{m}\not=y_{n}$ and zero otherwise. $$d(m,n)$$ is called the local distance. The minimum of the weighted summation of local distances over all warping functions $$F$$ is $D(X,Y)=\frac{1}{\sum_{k=1}^{K}w(k)}\min_{F}[\sum_{k=1}^{K}d(c(k))w(k)]$ where $$w(k)$$ is a nonnegative weighting coefficient. $$D(X,Y)$$ is the DTW distance and provides a measure of the difference between $$X$$ and $$Y$$ after alignment. Restrictions on the warping function $$F$$ (or points $$c(k)=(m(k),n(k))$$ are usually given as follows. 1) Monotonic conditions: $m(k-1)\leq m(k)$ and $n(k-1)\leq n(k)$ . 2) Continuity conditions: $m(k)-m(k-1)\leq1$ and $n(k)-n(k-1)\leq1$ . 3) Boundary conditions: $$m(1)=1,n(1)=1$$ and $$m(K)=M,n(K)=N$$ . Determining the distance $$D(X,Y)$$ \iffalse given by Eq. [???]\fi is a problem to which the well-known dynamic programming principle can be applied. \iffalse The basic algorithm is given in the following under the above restrictions on the warping function. Initial condition:

\begin{equation} g_{1}(c(1))=d(c(1))w(1). \end{equation}

Dynamic Programming equation:

\begin{equation} g_{k}(c(k))=\min_{c(k-1)}[g_{k-1}(c(k-1))+d(c(k))w(k)]. \end{equation}

Time-normalized distance:

\begin{equation} D(X,Y)=\frac{1}{\sum_{k=1}^{K}w(k)}g_{K}(c(K)) \end{equation}

$$g(k)$$ is called the accumulated distance. The algorithm Eq. [???]-[???] can be simplified under the above restrictions on the warping function. \fi Let $w_{d}$ , $w_{v}$ , and $w_{h}$ represent the weights for each diagonal, vertical, and horizontal step. Typically, the weight of the diagonal steps is larger than those of the other steps, because the diagonal step can be regarded as the combination of the vertical step and the horizontal step. Then, the algorithm is given as follows. Its initial condition is given as $g(1,1)=d(1,1)w_{d}$ . DP-equation can be written as $g(m,n)=\min[g(m,n-1)+d(m,n)w_{v}, g(m-1,n-1)+d(m,n)w_{d}, g(m-1,n)+d(m,n)w_{h}]$ . Finally, Time-normalized distance is computed as $D(X,Y)=\frac{1}{\sum w(k)}g(M,N)$ . $$g(m,n)$$ is called the accumulated distance and $\{g(m,n)$ : $1\leq m\leq M$ , $1\leq n\leq N\}$ is called the accumulated distance matrix. \iffalse Initial Condition:

\begin{equation} g(1,1)=d(1,1)w_{d} \end{equation}

DP-equation:

\begin{equation} g(m,n)=\min[\begin{array}{c} g(m,n-1)+d(m,n)w_{v}
g(m-1,n-1)+d(m,n)w_{d}
g(m-1,n)+d(m,n)w_{h} \end{array}] \end{equation}

Time-normalized distance:

\begin{equation} D(X,Y)=\frac{1}{\sum w(k)}g(M,N) \end{equation}

\fi \fi

3. Principles Underlying FCSDTW

This section starts with some notation and review of CSDTW[7]. Then, the FCSDTW algorithm is evident from FSDTW[8] properties and its complexity can be derived. However, explicit description of the algorithm requires some effort, so this is done in Section [???]. Let us have two time series such as $X=(x_{1},x_{2},\cdots,x_{M})$ and $Y=(y_{1},y_{2},\cdots,y_{N})$ , and their subsequences of non-zero values are represented as $X_{s}=(x_{m_{1}},x_{m_{2}},\cdots,x_{m_{M_{s}}})$ and $Y_{s}=(y_{n_{1}},y_{n_{2}},\cdots,y_{n_{N_{s}}})$ , respectively. We have developed a faster approximation of the (i.e., FCSDTW) algorithm with $X_{s}$ and $Y_{s}$ that approximates CDTW in terms of the distance between $$X$$ and $$Y$$ . The same notions[7,8] were used when describing the geometric objects in the $$m-n$$ plane for the time indices of the non-zero data. \iffalse We have the same notations to describe the geometric objects such as rectangles, vertices and edges in the $$m-n$$ plane. They are made out of the time indices of the non-zero data. The interested reader should refer to the previous publications[7,8] \fi A rectangle $R_{i,j}$ is made of four vertices from the consecutive nonzeros $m_{i-1}, m_i, n_{j-1}, n_j$ . \iffalse It has four edges: top, bottom, left, and right. \fi We have the following accumulated distance definition from [7]: $g(m,n)=\min[g(m,n-1)+d(m,n)w_{v}, g(m-1,n-1)+d(m,n)w_{d}, g(m-1,n)+d(m,n)w_{h}]$ , where

$$d(m, n)$$ is the local distance. \iffalse

Figure 1. Plot of

R_{i,j}

, and its vertices and edges. Dotted rectangle represents

R_{i,j}^{\circ}

\fi \iffalse A rectangle is written as $R_{i,j}:=\{(m,n)|m_{i-1}\leq m\leq m_{i},n_{j-1}\leq n\leq n_{j}\}$ , as shown in Figure 1 that is reproduced from [14]. The rectangle includes the indices between the two successive time indices of non-zero data $(m_{i-1},m_{i})$ and $(n_{j-1},n_{j})$ . It has as its vertices $(m_{i-1},n_{j-1}),(m_{i-1},n_{j}),(m_{i},n_{j-1})$ and $(m_{i},n_{j})$ . The vertices are labeled as $V_{l,b},V_{l,t},V_{r,b}$ , and $V_{r,t}$ , respectively in Fig. 1. $$l$$ , $$r$$ , $$t$$ , and $$b$$ stand for left, right, top, and bottom, respectively. Top, bottom, left, and right edges of $R_{i,j}$ are represented as sequences $E_{t},E_{b},E_{l}$ , and $E_{r}$ , respectively. For example, $$E_t$$ is defined as $((m_{i-1}+1,n_{j}),\cdots,(m_{i}-1,n_{j}))$ . \iffalse \begin{eqnarray*} E_{t} & = & ((m_{i-1}+1,n_{j}),\cdots,(m_{i}-1,n_{j}))
E_{r} & = & ((m_{i},n_{j-1}+1),\cdots,(m_{i},n_{j}-1))
E_{b} & = & ((m_{i-1}+1,n_{j-1}),\cdots,(m_{i}-1,n_{j-1}))
E_{l} & = & ((m_{i-1},n_{j-1}+1),\cdots,(m_{i-1},n_{j}-1)) \end{eqnarray*} \fi The interior of $R_{i,j}$ is defined as $R_{i,j}^{\circ}:=\{(m,n)|m_{i-1}+1\leq m\leq m_{i}-1,n_{j-1}+1\leq n\leq n_{j}-1\}$ , i.e., the dotted rectangle and its interior as shown in Figure 1. Now we define the region of permissible warping paths, the feasible region, in $$m-n$$ plane as follows: $F= \{ (m,n) | m-L \leq n \leq m+L \}$ , where $1 \leq m \leq M$ and $1 \leq n \leq N$ . \fi As a brief review, SDTW calculates the accumulated distances along the top and right edges of the rectangle $R_{i,j}$ . For SDTW, computation at the lower left end of the interior, $(m_{i-1}+1, n_{j-1}+1)$ is enough because the accumulated distances at the interior of $R_{i,j}$ are the same. For CSDTW, on the other hand, the feasible set F is involved in its calculation. [Note that CDTW calculates the accumulated distances on F.] FCSDTW computes the distances only at the end points of $E_t \cap F$ and $E_r \cap F$ because the accumulated distance calculation is sufficient at their endpoints. For the property of FCSDTW, the definition of the intersections between the edges and the feasible set is needed. \iffalse Before deriving FCSDTW, let us review SDTW briefly. SDTW computes the accumulated distances along the top and right edges of the rectangle $R_{i,j}$ . \iffalse because the accumulated distances are the same at the interior of $R_{i,j}$ . So we have only to compute them only at the lower left end of the interior, $(m_{i-1}+1, n_{j-1}+1)$ . \fi For SDTW, computation at the lower left end of the interior, $(m_{i-1}+1, n_{j-1}+1)$ is enough because the accumulated distances at the interior of $R_{i,j}$ are the same. For CSDTW, on the other hand, the feasible set $$F$$ is involved in its calculation. [Note that CDTW calculates the accumulated distances on $$F$$ .] \iffalse Thus, the time complexity of SDTW results in approximately $$2s$$ times the complexity of DTW because computation is necessary only along the horizontal and vertical lines in the warping path domain. Those lines exist in the time indices of non-zero data for the two time series. On the other hand, CSDTW calculates the accumulated distances only at the intersection of the edges $$E_t$$ and $$E_r$$ with the feasible set $$F$$ . Note that CDTW computes the accumulated distances on $$F$$ . \fi FCSDTW computes the distances only at the end points of $E_t \cap F$ and $E_r \cap F$ because the accumulated distance calculation is sufficient at their endpoints. \iffalse because the end points of the intersections of the feasible region $$F$$ and the edges connecting the vertices are sufficient for nearby accumulated distance computation. \fi \iffalse So the time complexity of CSDTW becomes on average $$2s$$ times the complexity of DTW, because computation is necessary only at the intersection of the horizontal and vertical lines with the feasible set. Remark [???] discusses the upper bound of complexity in detail. \fi \iffalse CDTW computes the accumulated distances for all points in the feasible region $F = \{(m,n) | m-L \leq n \leq m+L \}$ , where $1\leq m\leq M$ , and $1\leq n\leq N$ . On the other hand, CSDTW computes the accumulated distances only on the intersections of the feasible region $$F$$ and the edges connecting vertices made up of $\{(x,y)|x\in X_{s},y\in Y_{s}\}$ , because the accumulated distances are equal at $R_{i,j}^{\circ} \cap F$ . FCSDTW computes the accumulated distances only on the ``boundaries" of the intersections of the feasible region $$F$$ and the edges connecting vertices made up of $\{(x,y)|x\in X_{s},y\in Y_{s}\}$ because the end points of the intersections of the feasible region $$F$$ and the edges connecting the vertices are sufficient for nearby accumulated distance computation. \fi For the property of FCSDTW, the definition of the intersections between the edges and the feasible set is needed. \fi

Remark: The intersections between the edges ( $$E_t$$ and $$E_r$$ ) and the feasible region F and their boundaries (endpoints) are listed.
\iffalse The intersections of the edges Et and Er with the feasible region F and their boundaries (endpoints) are listed.
\fi \iffalse The intersections of the edges $$E_t$$ and $$E_r$$ with the feasible region $$F$$ are given below along with their boundaries(end points).\fi $E_t \cap F = \{ (m, n_j) | \max (m_{i-1}+1,n_j-L) \leq m \leq \min ( m_i-1, n_j+L ) \}$ .
$E_r \cap F = \{ (m_i, n) | \max (n_{j-1}+1,m_i-L) \leq n \leq \min ( n_j-1, m_i+L) \}$ .
$tial \{ E_t \cap F \} = \{ ( \max (m_{i-1}+1,n_j-L), n_j), ( \min ( m_i-1, n_j+L ), n_j) \}$ .
$tial \{ E_r \cap F \}= \{ (m_i, \max (n_{j-1}+1,m_i-L)), (m_i, \min ( n_j-1, m_i+L)) \}$ .

Proof: The proof is shown in [7]. Their boundaries are given from the minimum and maximum. \iffalse If $m_{i}+L\lt n_{j}$ , then the upper end of $E_r \cap F$ is $$(m_i, m_i+L)$$ . Otherwise, the upper end of $E_r \cap F$ is $$(m_i, n_j-1)$$ . So its upper end can be written as $(m_i, \min(m_i+L, n_j-1))$ . \iffalse Similarly, if $m_i-L \gt n_{j-1}$ , then the lower end of $E_r \cap F$ is $$(m_i, m_i-L)$$ . Otherwise, its lower end is $(m_i, n_{j-1}+1)$ in Figs. 2. \fi Similarly, the lower end of $E_r \cap F$ can be written as $(m_i, \max (m_i-L, n_{j-1}+1))$ . In the same way $E_t \cap F$ can be shown. \fi \qedsymbol

\iffalse

Figure 2.

R_{i,j} \cap F

when

m_{i}+L\lt n_{j}

and

m_i-L \leq n_{j-1}

\fi \iffalse

Figure 3.

R_{i,j} \cap F

when

m_{i}+L\lt n_{j}

and

m_i-L \gt n_{j-1}

Figure 4.

R_{i,j} \cap F

when

m_{i}+L \geq n_{j}

and

m_i-L\gt n_j

Figure 5.

R_{i,j} \cap F

when

m_{i}+L \geq n_{j}

and

n_{j-1} \lt m_i-L \leq n_j

\fi \iffalse

Figure 6.

R_{i,j} \cap F

when

m_{i}+L \geq n_{j}

and

m_i-L \leq n_{j-1}

\fi \iffalse We need the following lemmas like those in [14]. Lemma [???] is different in proof and the accumulated distance in $R_{i,j}^{\circ} \cap F$ from the unconstrained version of Lemma 1 in [14] because of different geometry of $R_{i,j}^{\circ} \cap F$ . Lemma [???] has an additional proof that was not included in the unconstrained case due to different geometry. Lemma [???] is the same as the unconstrained version. Theorem [???] has differences in the proof for the initial condition due to different geometry that results from the intersection with the feasible set.

Lemma: The accumulated distances are the same in $R_{i,j}^{\circ} \cap F$ , if they are nondecreasing along $E_b \cap F$ and $E_l \cap F$

Proof: There are three cases of the accumulated distances in $R_{i,j}^{\circ} \cap F$ depending on whether the vertex $V_{l,b}$ , $(m_{i-1},n_{j-1})$ is included in the feasible set $$F$$ or not. The following first two cases arise when $(m_{i-1},n_{j-1})$ is not in $$F$$ . Firstly, suppose that $m_{i-1}+L \lt n_{j-1}$ . The Figures 2 - 6 of $R_{i,j} \cap F$ are introduced to help understanding this case. Note that the accumulated distance at the left end of the first row above $$E_b$$ is obtained as $g(n_{j-1}-L+1,n_{j-1}+1) = \min ( g(n_{j-1}-L, n_{j-1}), g(n_{j-1}-L+1, n_{j-1})) = g(n_{j-1}-L, n_{j-1})$ from the nondecreasing assumption along $E_b \cap F$ . Proof is slightly different depending on whether the vertex $V_{r,b}$ , $(m_i, n_{j-1})$ is in $R_{i,j} \cap F$ or not. If $(m_i, n_{j-1})$ satisfies the constraint, i.e., $m_i -L \leq n_{j-1}$ as in Figs. 2 and 6, then it is enough to show that the accumulated distances of the first row above $$E_b$$ are the same from $(n_{j-1}-L+1,n_{j-1}+1)$ until $(m_i-1, n_{j-1}+1)$ , which can be proved in the same way as in Lemma 1 of [14]. The rest rows follow similarly. Otherwise( $V_{r,b} \not\in R_{i,j}\cap F$ ), the proof is the same except at the end, $(n_{j-1}+1+L, n_{j-1}+1)$ . \iffalse it is enough to show that the first row above $E_b \cap F$ has the same accumulated distances as $g(n_{j-1}-L+1, n_{j-1}+1) = \cdots = g(n_{j-1}+L+1, n_{j-1}+1)$ . Similar to the above derivation, we have $g(n_{j-1}-L+1,n_{j-1}+1) = \cdots = g(n_{j-1}+L, n_{j-1}+1)$ . The accumulated distance at the right end of the row is calculated as $g(n_{j-1}+L+1,n_{j-1}+1) = \min ( g(n_{j-1}+L, n_{j-1}+1), g(n_{j-1}+L, n_{j-1})) = \min ( g(n_{j-1}-L, n_{j-1}), g(n_{j-1}+L, n_{j-1})) = g(n_{j-1}-L, n_{j-1})$ , where the second equality comes from the equality derived in the case of $V_{r,b} \in F$ and the third equality comes from the nondecreasing assumption along $E_b \cap F$ . \fi Secondly, suppose that $m_{i-1}-L \gt n_{j-1}$ . The proof is similar to the case of $m_{i-1}+L \lt n_{j-1}$ . The accumulated distances are computed along the columns right to the edge $E_l \cap F$ . $R_{i,j}^{\circ} \cap F$ has the same accumulated distances as $$g(m_i, m_i-L)$$ . Finally, suppose that $m_{i-1}-L \leq n_{j-1} \leq m_{i-1}+L$ , then $(m_{i-1}+1,n_{j-1}+1)$ is in the $R_{i,j}^{\circ} \cap F$ . The accumulated distances at the first row above $E_b \cap F$ are the same from the case of $m_{i-1}+L \lt n_{j-1}$ . Similarly, the accumulated distances at the column right to $E_l \cap F$ are the same. Zero local distances in $R_{i,j}^{\circ} \cap F$ and the same accumulated distances along the first row above $$E_b$$ and the column right to $$E_l$$ lead to the same accumulated distances as $g(m_{i-1}+1,n_{j-1}+1)$ .

Lemma: The accumulated distances are nondecreasing along $E_r \cap F$ or $E_t \cap F$ if they are the same in $R_{i,j}^{\circ} \cap F$ .

Proof: If the left end of $$E_t$$ , $(m_{i-1}+1,n_j)$ , or the low end of $$E_r$$ , $(m_i, n_{j-1}+1)$ , is in the feasible region, Lemma 2 of [14] proves the claim. Otherwise, $$E_r$$ has its low end as $$(m_i, m_i-L)$$ . Then $g(m_i, m_i-L) = d(x_i,0)\min(w_h,w_d) + g(m_i-1,m_i-L)$ from the assumption. It is enough to show that $$g(m_i, m_i-L+1) = g(m_i, m_i-L)$$ . The rest follows similarly. $g(m_i, m_i-L+1) = \min \{d(x_i,0)\min(w_h,w_d) + g(m_i-1,m_i-L+1), d(x_i,0)w_v + g(m_i, m_i-L) \}$ from its definition and the assumption. Then, plugging the equation on $$g(m_i, m_i-L)$$ into the above concludes the proof. \iffalse Then, $g(m_i, m_i-L+1) = d(x_i,0)\min(w_h,w_d) + g(m_i-1,m_i-L+1)$ by plugging $$g(m_i, m_i-L)$$ . So the equality holds as $g(m_i, m_i-L+1) =g(m_i, m_i-L)=d(x_i,0)\min(w_h,w_d) + g(m_i-1,m_i-L)$ because of the assumption. Note that the accumulated distance at the high end of $$E_r$$ , $$(m_i +L)$$ , is computed differently as $$g(m_i,m_i+L) = w_d d(x_i, 0) + g(m_i - 1, m_i+L-1)$$ because there is no horizontal path from the feasible set. Note that the nondecreasing property still holds at the high end of $$E_r$$ because the distance is larger than or equal to $\min(w_h, w_d) d(x_i, 0) + g(m_i- 1, m_i+L-1) = g(m_i, m_i+L-1)$ . \fi The same can be proved for $E_t \cap F$ that has its left end as $$(n_j-L,n_j)$$ .

\iffalse Lemma [???], Theorem [???], Corollary [???] also hold for the constrained SDTW as long as $$E_t$$ , $$E_r$$ , and $R_{i,j}^{\circ}$ are replaced by their intersections with the feasible region $$F$$ . Proof of Lemma [???] and Corollary [???] won't be repeated here because they remain the same. The proof of Theorem [???] is the same except for some minor differences. For $$i=1, j=1$$ and $m_1 =1, n_1 \gt 1$ , the upper end of $E_r \cap F$ must change from $$(1, n_1 -1)$$ to $(1, \min (n_1 -1, m_1+L))$ . For $i=1, j\gt 1$ and $m_1 \gt 1$ , the accumulated distances on $E_t \cap F$ are not necessarily the same as $g(1,n_{j-1})$ due to the different geometry of $R_{i,j} \cap F$ . They change as $g( \max(1, n_{j-1}-L),n_{j-1})$ . For $i=1.j\gt 1$ and $$m_1 =1$$ , part of $$E_r$$ can be outside the feasible region. So the upper end of $E_r \cap F$ changes from $$(1,n_j -1)$$ to $(1,\min (1+L, n_{j}-1))$ . \fi The above two lemmas show that nondecreasing accumulated distances along the left and bottom edges of $R_{i,j} \cap F$ imply equal accumulated distances in $R_{i,j}^{\circ} \cap F$ , which in turn implies nondecreasing accumulated distances along the right and top edges of $R_{i,j} \cap F$ . This is the basis for an induction proof of Theorem [???] below. First, we need one additional lemma.

Lemma: Suppose $m_{i-1}=m_{i}-1$ or $n_{j-1}=n_{j}-1$ . Then the accumulated distances are nondecreasing along $E_{r} \cap F$ or $E_{t} \cap F$ of $R_{i,j}$ if the same holds for $R_{i-1,j}$ or $R_{i,j-1}$ , respectively.

Proof: When either $m_{i-1}=m_{i}-1$ or $n_{j-1}=n_{j}-1$ , $R_{i,j}^{\circ}$ is empty. If $n_{j-1}=n_{j}-1$ , there is no need to compute $E_{r}$ . The distances along $E_{t} \cap F$ are nondecreasing because they depend only on the nondecreasing distances along $E_{b} \cap F$ of $R_{i,j}$ , which is identical to $E_{t} \cap F$ of $R_{i,j-1}$ , and the local distances are constant as $d(0,n_{j})$ . The case of $m_{i-1}=m_{i}-1$ can be proved similarly.

The above interdependent lemmas actually hold for all $1\leq i\leq M_{s}$ and $1\leq j\leq N_{s}$ which can be stated as follow.

Theorem: The accumulated distances are nondecreasing along the intersection of the feasible region and the edges connecting the vertices in $R_{i,j}$ , for all $1\leq i\leq M_{s}$ and $1\leq j\leq N_{s}$ .

Proof: The accumulated distances are called distances here. For proof by induction, we show that distances are nondecreasing along $E_{t} \cap F$ and $E_{r} \cap F$ of $R_{i,j}$ , if the distances are nondecreasing along $E_{l} \cap F$ and $E_{b} \cap F$ , which are the same as $E_{r} \cap F$ of $R_{i-1,j}$ iteration and $E_{t} \cap F$ of $R_{i,j-1}$ iteration, respectively. For $$i=1$$ and $$j=1$$ , if $m_{1}\gt 1$ and $n_{1}\gt 1$ , then all the distances are zeros in $R_{i,j}^{\circ} \cap F$ because the local distances $d(\cdot)$ are zeros there. Thus, the distances along edges $E_{r} \cap F$ and $E_{t} \cap F$ are nondecreasing from Lemma [???]. \iffalse If $m_{1}=1$ and $n_{1}\gt 1$ , then the distances are nondecreasing along the edge from $$(1,1)$$ to $(1,\min(n_{1}-1, m_1 +L))$ because of a positive local distance $d(x_{m_{1}},0)$ . \fi The distances are nondecreasing either for $m_{1}=1$ and $n_{1}\gt 1$ or $m_{1}\gt 1$ and $n_{1}=1$ due to positive local distances. For $i=1, j\gt 1$ , $i\gt 1, j=1$ and $i\gt 1,j\gt 1$ cases, the distances are nondecreasing on $E_{t} \cap F$ and $E_{r} \cap F$ by combining Lemma [???] and Lemma [???], if we assume the distances are nondecreasing along $E_{b} \cap F$ ( $E_{t} \cap F$ of $R_{i,j-1}$ ) and $E_{l} \cap F$ ( $E_{r} \cap F$ of $R_{i-1,j}$ ). $E_b \cap F$ or $E_l \cap F$ does not exist when $$i$$ or $$j$$ is one. Note that the same holds for consecutive data from Lemma [???]. By induction, the distances are nondecreasing along the intersection of the feasible set and the edges connecting the vertices for all $R_{i,j}$ . \iffalse For $$i=1$$ , $j\gt 1$ and $m_{1}\gt 1$ , first suppose that $n_{j-1}\lt n_{j}-1$ . Since $m_{1}\gt 1$ , all distances are the same in $R_{i,j}^{\circ} \cap F$ by Lemma [???] under the assumption the distances are nondecreasing along $E_{t} \cap F$ of $R_{1,j-1}$ or $E_{b} \cap F$ of $R_{i,j}$ . Then by Lemma [???] the distances are nondecreasing along $E_{r} \cap F$ and $E_{t} \cap F$ for $R_{i,j}$ . \iffalse Suppose that $n_{j-1}=n_{j}-1$ . Then, the distances are still nondecreasing along $E_{t} \cap F$ by Lemma [???]. For $$i=1,$$ $j\gt 1$ and $m_{1}=1$ , the distances are nondecreasing along the edge from $(1,n_{j-1}+1)$ to $(1,\max(1+L, n_{j}-1))$ because of a positive local distance $d(x_{m_{1}},0)$ . \fi We previously showed that the distances are nondecreasing along the edges connecting the vertices of $R_{1,1}$ . By induction, all the distances are nondecreasing along the edges connecting the vertices for all $R_{i,j}$ , where $$i=1$$ and $j\gt 1$ . Similarly, the distances are nondecreasing for $i\gt 1$ and $$j=1.$$ For $i\gt 1$ and $j\gt 1$ , the distances are nondecreasing along $E_{t} \cap F$ and $E_{r} \cap F$ by combining Lemma [???] and Lemma [???], if we assume the distances are nondecreasing along $E_{b} \cap F$ ( $E_{t} \cap F$ of $R_{i,j-1}$ ) and $E_{l} \cap F$ ( $E_{r} \cap F$ of $R_{i-1,j}$ ). Note that the same holds for consecutive data from Lemma [???]. By induction, the distances are nondecreasing along the intersection of the feasible set and the edges connecting the vertices for all $R_{i,j}$ , where $i\gt 1$ and $j\gt 1$ . \fi Combining all the above cases concludes the proof for all $i \geq 1$ and $j \geq 1$ .

From the above theorem, the equal distance property in $R_{i,j}^{\circ} \cap F$ follows directly.

Corollary: The accumulated distances are equal in $R_{i,j}^{\circ} \cap F$ , for all $1\leq i\leq M_{s}+1$ and $1\leq j\leq N_{s}+1$ .

\fi \iffalse

Remark: Note that the boundary of the feasible region that are parallel to the diagonal of the permissible path region is not nondecreasing. For example, Fig [???] shows that $g(m_i,m_i+L) \gt g(m_i + 1, m_i + 1+L)$ .

Proof: We know that $g(m_{i-1}+1,m_{i-1}+1+L) \leq g(m_{i-1},m_{i-1}+L)$ from the definition of $g(m_{i-1}+1,m_{i-1}+1+L)$ . We want to show that the accumulated distances decrease for the example by demonstrating that the equality leads to a contradiction. Suppose that the equality holds. Then, $g(m_{i-1},m_{i-1}+L) = g(m_{i-1}+1,n_{j-1}+1)$ by Lemma [???]. Since $g(m_{i-1}+1,n_{j-1}+1) \leq g(m_{i-1},n_{j-1}+1)$ from its definition, the previous equality leads to the following inequality, $g(m_{i-1},m_{i-1}+L) \leq g(m_{i-1},n_{j-1}+1)$ . Then $g(m_{i-1},m_{i-1}+L)$ must be equal to $g(m_{i-1},n_{j-1}+1)$ because of Lemma [???]. In the same way, $g(m_{i-1},m_{i-1}+L) = g(m_{i-1},m_{i-1}+L-1) = \cdots g(m_{i-1},n_{j-1}+2) = g(m_{i-1},n_{j-1}+1)$ can be shown. The equality leads to the following, $g(m_{i-1},n_{j-1}+1) = g(m_{i-1},m_{i-1}+L) = g(R_{i-2,j-1}^{\circ} \cap F) + w_h d(x_{m_{i-1}},0)$ , where $g(R_{i-2,j-1}^{\circ} \cap F) = g(m_{j-1}-L,m_{j-1})$ holds. We know that $g(m_{j-1}-L,m_{j-1}) = g(R_{i-2,j-2}^{\circ} \cap F) + w_v d(0,y_{n_{j-1}})$ . Thus $g(m_{i-1},n_{j-1}+1) = w_h d(x_{m_{i-1}},0) + g(R_{i-2,j-2}^{\circ} \cap F) + w_v d(0,y_{n_{j-1}})$ We know $g(m_{i-1},n_{j-1}+1) \leq g(m_{i-1},n_{j-1})+ w_v d(x_{m_{i-1}},0)$ by its definition. Also $g(m_{i-1},n_{j-1}) \leq g(R_{i-2,j-2}^{\circ} \cap F) + w_d d(x_{m_{i-1}},y_{n_{j-1}})$ is known. Thus, the following holds $g(m_{i-1},n_{j-1}+1) \leq w_v d(x_{m_{i-1}},0) + g(R_{i-2,j-2}^{\circ} \cap F) + w_d d(x_{m_{i-1}},y_{n_{j-1}})$ . Plugging $g(m_{i-1},n_{j-1}+1) = w_h d(x_{m_{i-1}},0) + g(R_{i-2,j-2}^{\circ} \cap F) + w_v d(0,y_{n_{j-1}})$ gives the contradiction, unless $w_d \geq w_v \frac{d(0, y_{n_{j-1}})}{d(x_{m_{i-1}}, y_{n_{j-1}})} + (w_h-w_v) \frac{d(x_{m_{i-1}},0 )}{d(x_{m_{i-1}}, y_{n_{j-1}})}$ for all $1 \leq i \leq M$ and $1 \leq j \leq N$ , which is very difficult to satisfy. \qedsymbol

\fi \iffalse We know that the time complexity of constrained DTW is given as $$(2L+1)N - L^2 -L$$ . The time complexity of the CSDTW is upper bounded in the following theorem.

Theorem: The upper bound of the time complexity of the constrained SDTW is $$(2L+1)(M_s + N_s) + (N_s+1)(M_s+1)$$ .

Remark: The upper bound can be expressed as $$4(L+1)sN + s_1 s_2 N^2 + 1$$ , where $s_{1}=\tfrac{N_{s}}{N}$ , $s_{2}=\tfrac{M_{s}}{N}$ and $s=\tfrac{1}{2}(s_{1}+s_{2})$ . $$s$$ is called the sparsity ratio. The first order term of the upper bound regarding the sparsity ratio is approximately $$2s$$ times the time complexity of constrained DTW. \iffalse The lower bound is a summation of the contributions from $$X_s$$ and $$Y_s$$ . The contribution from $$X_s$$ can be written as $$(2L+1)s_1 N -L(L+1)$$ if $M_s \gt 2L$ holds. Otherwise, the contribution is given as $$s_1^2 N^2 /2 + s_1 N (L+1)$$ . The same holds for $$Y_s$$ by replacing $$s_1$$ with $$s_2$$ . \fi

Proof: The time complexity of the constrained SDTW corresponds to the lengths of the horizontal and vertical lines of non-zero data in the feasible set $$F$$ instead of the feasible set $\{ (m,n) | m=L \leq n \leq m+L \}$ plane in the constrained DTW. Thus, the upper bound of the time complexity is $$(2L+1)(M_s + N_s) + (N_s+1)(M_s+1)$$ . The lengths of the vertical lines of non-zero data are less than or equal to $$2L+1$$ and the number of the vertical lines are set as $$M_s$$ . Similarly, the time complexity from the horizontal line is less than $$(2L+1)N_s$$ . Then, $$(N_s+1)(M_s+1)$$ comes from computation of the accumulated distances of the lower left points, $g(m_{i-1}+1,n_{j-1}+1)$ in $R_{i,j}^{\circ} \cap F$ for all $$i$$ and $$j$$ . \fi \iffalse The lower bound of the time complexity depends on the value of $$M_s$$ and $$N_s$$ . If $M_s \gt 2L$ , then the time complexity is larger than as $2 \sum_{m=1}^{L}( m+L) + \sum_{m=2L+1}^{M_s} (2L+1) = (2L+1)M_s - L -L^2$ , because we choose the lengths of the vertical lines of non-zero data that are common with the feasible set smaller than their maximum value $$2L+1$$ . Only after the vertical lines whose lengths are shorter than $$2L+1$$ are all assigned, the remaining number of vertical lines are assigned to the lines whose lengths are $$2L+1$$ . Otherwise( $M_s \leq 2L$ ), the time complexity results in $2 \sum_{m=1}^{\lceil M_s/2 \rceil} m+L$ because the vertical lines are filled up in the order of increasing lengths of line segments that are common with the feasible set, where $\lceil x \rceil$ denotes the smallest integer larger than or equal to a real number $$x$$ . Since $\lceil M_s / 2 \rceil \geq M_s /2$ , the lower bound becomes $$M_s^2 / 2 + M_s (L+1)$$ . The same goes with the contribution from $$Y_s$$ .

\fi \iffalse Recall Theorem [???] that the accumulated distances are computed at the lines of non-zero data. For example, There are $$M_s$$ vertical lines and $$N_s$$ horizontal lines whose lengths are $$N$$ . Thus time complexity of unconstrained SDTW is asymptotically $$N(M_s+N_s)$$ as N becomes large. Similarly, the time complexity of the constrained SDTW is given in the following theorem.

Theorem: The expected time complexity of the constrained SDTW is $$(2L+1 -L(L+1)/N)(M_s+N_s)$$ , when the length of both time series is the same as $$N$$ .

Proof: First, let's consider the constrained DTW. Its time complexity can be computed as follows, $\sum_{m=1}^L (m+L) + \sum_{m=L+1}^{N-L} (2L+1) + \sum_{m=N-L+1}^N (N+L+1-m) = (2L+1)N - L^2-L$ For constrained SDTW, the expected time complexity can be written as $$M_s E[N] + N_s E[N]$$ , because $$N$$ can be no longer fixed as the unconstrained SDTW. It varies depending on the location of non-zero time indices of $$X_s$$ and $$Y_s$$ . $$E[N]$$ can be computed by the sample average of $$N$$ and can be computed as $$2L+1 -L(L+1)/N$$ . Thus the expected time complexity of the constrained SDTW becomes $$(2L+1 -L(L+1)/N)(M_s+N_s)$$ . Its speed is $$(M_s+N_s)/N = (s_1 + s_2)$$ times that of the constrained DTW, where $$s_1$$ and $$s_2$$ represent the sparsity ratio of the two series in SDTW.

\fi \iffalse Time complexity for the constrained SDTW has the following bounds.

Theorem: The time complexity of the constrained SDTW, $C_{csdtw}$ is bounded as follows: $2L\lfloor \frac{N_s}{2} \rfloor + \lfloor \frac{N_s}{2} \rfloor (\lfloor \frac{N_s}{2} \rfloor +1) + (N_s -2L)(2L+1) \leq C_{csdtw} \leq (2L+1)N_s + 4L \lceil \frac{\max (N_s -(N-2L),0)}{2} \rceil - \lceil \frac{ \max (N_s -(N-2L),0)}{2} \rceil ( \lceil \frac{\max (N_s -(N-2L),0)}{2} \rceil - 1)$ , when both time series have the same length $$N$$ . $$N_s$$ denotes the average of non-zeros from the two time series. $\lfloor x \rfloor$ denotes the floor function that computes the greatest preceding integer of a real number $$x$$ and $\lceil x \rceil$ does the ceil function that computes the smallest following integer of the real number $$x$$ .

Proof: The lower bound of the time complexity occurs when non-zero data exist in $F \cap \{ (m,n) | 1 \leq m \leq L \mbox{ or } N-L+1 \leq m \leq N \}$ . Thus it is calculated as $2 \sum_{m=1}^{\lfloor \frac{N_s}{2 } \rfloor} m+L = 2L\lfloor \frac{N_s}{2} \rfloor + \lfloor \frac{N_s}{2} \rfloor (\lfloor \frac{N_s}{2} \rfloor +1)$ when $N_s \leq 2L$ . If $N_s \gt 2L$ , then the time complexity increases by $$(N_s - 2L) (2L+1)$$ . The upper bound comes as $$(2L+1)N_s$$ when $N_s \leq N-2L$ . If $N_s \gt N-2L$ , then the upper bound increases by $2 \sum_{m=0}^{ \lceil \frac{ \delta N_s}{2} \rceil -1 } L+ L - m = 4 L \lceil \frac{\delta N_s}{2} \rceil - \lceil \frac{\delta N_s }{2} \rceil ( \lceil \frac{\delta N_s }{2} \rceil -1)$ , where $\delta N_s = N_s - (N-2L)$ . \qedsymbol

\fi For FCSDTW, we have only to compute $g(tial \{ E_t \cap F \})$ and $g(tial \{ E_r \cap F \})$ instead of $g(E_t \cap F)$ and $g(E_r \cap F)$ , respectively because no accumulated distances are necessary on the points between the end points. Note that we assume that the local distances are infinite outside the feasible set $$F$$ , i.e., $d(F^{c})=\infty$ . Let us first look at the case when when $R_{i,j}^{\circ} \cap F = \emptyset$ .

Proposition: Let us denote $tial \{ E_t \cap F \} = \{ (t_1, n_j), (t_2, n_j) \}$ , where $t_1 = \max (m_{i-1}+1,n_j-L)$ and $t_2 = \min ( m_i-1, n_j+L ), n_j)$ Similarly, $tial \{ E_r \cap F \} = \{ (m_i, r_1), (m_i, r_2) \}$ , where $r_1 = \max (n_{j-1}+1,m_i-L)$ and $r_2 = \min ( n_j-1, m_i+L)$ Then, $g(tial \{ E_t \cap F \} = \{ g(t_1, n_j), g(t_2, n_j) \} )$ is computed as follows when $n_j = n_{j-1}+1$ holds. $$g(t_1, n_j)$$ can be computed from DTW.

\begin{equation} g(t_1, n_j)=\min \{ g(t_1, n_{j-1}) +d(0,n_j)w_v , g(t_1-1, n_j) +d(0,n_j)w_h , g(t_1-1, n_{j-1})+d(0,n_j) w_d \} \end{equation}

\iffalse

\begin{equation} g(t_2, n_j) = \min { g(t_2, n_{j-1})+1 , g(t_2-1, n_{j-1}) +1, g(t_1, n_{j})+(t_2-t_1) } \end{equation}

\fi if

t_1 \gt m_{i-1}+1

\begin{eqnarray} g(t_2, n_j) \&=\& \min \begin{cases} g(t_1, n_j)+d(0, n_j)(t_2 - t_1)w_h
g(t_2, n_{j-1})+d(0, n_j)w_v
g(t_2-1, n_{j-1})+d(0, n_j)w_d
g(t_1-1, n_{j-1})+d(0, n_{j-1})(t_2 - t_1)w_h+d(0, n_j)w_d
\end{cases} \end{eqnarray}

otherwise, i.e.,

t_1 = m_{i-1}+1

\begin{eqnarray} g(t_2, n_j) \&=\& \min \begin{cases} g(t_1, n_j)+d(0, n_j)(t_2 - t_1)w_h
g(t_2, n_{j-1})+d(0, n_j)w_v
g(t_2-1, n_{j-1})+d(0, n_j)w_d
g( t_1, n_{j-1})+d(0, n_{j-1})(t_2 - t_1-1)w_h+d(0, n_j)w_d
g(t_1, n_{j-1})+d(0, n_j)(t_2 - t_1-1)w_h+d(0, n_j)w_d
\end{cases} \end{eqnarray}

\iffalse

\begin{eqnarray} g(t_1, n_j) \&=\& \min \begin{cases} g(t_1, n_{j-1}) +1
g(t_1-1, n_j) +1
g(t_1-1, n_{j-1}) +1
\end{cases}
g(t_2, n_j) \&=\& \min \begin{cases} g(t_2, n_{j-1})+1
g(t_2-1, n_{j-1})+1
g(t_1, n_j) + (t_2 - t_1)
\end{cases} \end{eqnarray}

\fi

g(tial \{ E_r \cap F \} = \{ g(m_i, r_1), g(m_i, r_2) \} )

is computed as follows when

m_i= m_{i-1}+1

holds.

\(g(m_i, r_1)\)

can be computed from DTW.

\begin{equation} g(m_i, r_1) = \min \{ g(m_i, r_1-1)+d(m_i,0)w_v , g(m_{i-1}, r_1)+d(m_i,0)w_h , g(m_{i-1}, r_1-1)+d(m_i,0) w_d \} \end{equation}

r_1 \gt n_{j-1}+1

\begin{eqnarray} g(m_i, r_2) \&=\& \min \begin{cases} g(m_i, r_1)+d(m_i,0)(r_2 - r_1)w_v
g(m_{i-1}, r_2)+d(m_i,0)w_h
g(m_{i-1}, r_2-1)+d(m_i,0)w_d
g(m_{i-1}, r_1-1)+d(m_{i-1},0)(r_2 - r_1)w_v+d(m_i,0)w_d
\end{cases} \end{eqnarray}

otherwise, i.e.,

r_1 = n_{j-1}+1

\begin{eqnarray} g(m_i, r_2) \&=\& \min \begin{cases} g(m_i, r_1)+d(m_i,0)(r_2 - r_1)w_v
g(m_{i-1}, r_2)+d(m_i,0)w_h
g(m_{i-1}, r_2-1)+d(m_i,0)w_d
g(m_{i-1}, r_1)+d(m_{i-1},0)(r_2 - r_1-1)w_v+d(m_i,0)w_d
g(m_{i-1}, r_1)+d(m_{i},0)(r_2 - r_1-1)w_v+d(m_i,0)w_d
\end{cases} \end{eqnarray}

\iffalse

\begin{eqnarray} g(m_i, r_1) \&=\& \min \begin{cases} g(m_i, r_1-1) +1
g(m_{i-1}, r_1) +1
g(m_{i-1}, r_1-1) +1
\end{cases}
g(m_i, r_2) \&=\& \min \begin{cases} g(m_{i-1}, r_2)+1
g(m_{i-1}, r_2-1)+1
g(m_i, r_1) + (r_2 - r_1)
\end{cases} \end{eqnarray}

\fi

Proof: Let us first look into $$g(m_i, r_2)$$ when $m_i = m_{i-1}+1$ holds. We know that $$g(m_i, r_2)$$ always depend on $$g(m_i, r_1)$$ . Computation of $$g(m_i, r_2)$$ depends on whether $r_1= n_{j-1}+1$ holds and $$r_2 = n_j -1$$ holds. If $r_1 = n_{j-1}+1$ holds, then $$g(m_i,r_2)$$ depends on $g(m_{i-1}, r_1)$ like Eq. 8 of the unconstrained FSDTW[8]. Otherwise, It depends on $g(m_{i-1}, r_1-1)$ and it does not have the term $g(m_{i-1}, r_1-1) + d(m_i, 0)w_d + \cdots$ like Eq. [???] because $g(m_i, r_1) \leq g(m_{i-1}, r_1-1) + d(m_i, 0)w_d$ from its definition. Similarly, $$g(m_i,r_2)$$ depends on $g(m_{i-1}, r_2)$ if $r_2=n_{j}-1$ holds. Otherwise, It depends on $g(m_{i-1}, r_2-1)$ . For the former case, we have $g(m_{i-1}, r_2-1)=\infty$ and vice versa for the other case. Thus we can add both terms simultaneously and compute $$g(m_i, r_2)$$ without checking if $$r_2 = r_m-1$$ or not. $$g(t_2, n_j)$$ can be derived similarly when $n_j = n_{j-1}+1$ holds. \qedsymbol

Remark: Geometry of the feasible set $$F$$ leads to the following result on $g(R_{i,j}^{\circ} \cap F)$ .

\begin{eqnarray} g(R_{i,j}^{\circ} \cap F)\&=\&\begin{cases} g(n_{j-1}-L, n_{j-1}) \& \textrm{if } m_{i-1}+L \lt n_{j-1},
g(m_{i-1},m_{i-1}-L) \& \textrm{if } m_{i-1}-L \gt n_{j-1},
g(m_{i-1}+1, n_{j-1}+1) \&\textrm{else}.
\end{cases} \end{eqnarray}

Given the above remark, let us look at the case where $R_{i,j}^{\circ} \cap F \ne \emptyset$ .

Proposition: $g(tial \{ E_t \cap F \})$ is equal to $g(R_{i,j}^{\circ} \cap F)+d(0,n_j)\min \{ w_v, w_d\}$ when $(m_{i-1}+1, n_j) \not\in F$ . Otherwise, i.e., $t_1 = m_{i-1}+1$ holds, it is computed as follows: \iffalse

\begin{eqnarray} g(
tial \{E_t \cap F\})\&=\&\begin{cases} g(R_{i,j}^{\circ} \cap F)+1 \& \textrm{if } (m_{i-1}+1, n_j) \not\in F,
\textrm{Eqs. } [???] \textrm{, } [???] \& \textrm{else}.
\end{cases} \end{eqnarray}

When

(m_{i-1}+1, n_j) \in F

t_1 = m_{i-1}+1

holds. \fi \iffalse and the following results hold on

tial \{ E_t \cap F \}

. \fi

\begin{eqnarray} eqn{g(t_1, n_j)} \&\&
\&=\& \min \{g(R_{i,j}^{\circ} \cap F)+d(0, n_j)w_v, g(m_{i-1}, n_{j})+d(0, n_j)w_h, g(m_{i-1}, n_{j}-1)+d(0, n_j)w_d \} \nonumber
g(t_2, n_j)\&=\& \min \{g(R_{i,j}^{\circ} \cap F)+d(0, n_j)\min\{w_d, w_v\}, g(t_1, n_j) +d(0, n_j) (t_2 - t_1)w_h \} \end{eqnarray}

\iffalse

\begin{eqnarray} g(m_{i-1}+1, n_j) \&=\& \min \begin{cases} g(R_{i,j}^{\circ} \cap F)+1
g(m_{i-1}, n_j) +1
g(m_{i-1}, n_j-1) +1
\end{cases}
g(t_2, n_j) \&=\& \min \begin{cases} g(R_{i,j}^{\circ} \cap F) + 1 \&
g(t_1, n_j) + (t_2 - t_1)
\end{cases} \end{eqnarray}

\fi

g(tial \{ E_r \cap F \})

is equal to

g(R_{i,j}^{\circ} \cap F)+d(m_i,0) \min \{ w_h, w_d\}

when

(m_i, n_{j-1}+1) \not\in F

. Otherwise, i.e.,

r_1 = n_{j-1}+1

holds, it is computed as follows: \iffalse

\begin{eqnarray} g(
tial \{E_r \cap F\})\&=\&\begin{cases} g(R_{i,j}^{\circ} \cap F)+1 \& \textrm{if } (m_i, n_{j-1}+1) \not\in F,
\textrm{Eqs. } [???] \textrm{, } [???] \& \textrm{else}.
\end{cases} \end{eqnarray}

When

(m_i, n_{j-1}+1) \in F

r_1 = n_{j-1}+1

holds. \iffalse and the following results hold on

tial \{ E_r \cap F \}

. \fi \fi

\begin{eqnarray} eqn{g(m_i, r_1)} \&\&
\&=\& \min \{g(R_{i,j}^{\circ} \cap F)+d(m_i, 0)w_h, g(m_{i}, n_{j-1})+d(m_i,0)w_v, g(m_{i}-1, n_{j-1}) + d(m_i,0)w_d \} \nonumber
g(m_i, r_2) \&=\& \min \{g(R_{i,j}^{\circ} \cap F)+d(m_i, 0)\min\{w_d, w_h\}, g(m_i, r_1) + d(m_i, 0)(r_2 - r_1)w_v \} \end{eqnarray}

\iffalse

\begin{eqnarray} g(m_i, n_{j-1}+1) \&=\& \min \begin{cases} g(R_{i,j}^{\circ} \cap F)+1
g(m_{i}, n_{j-1}) +1
g(m_{i}-1, n_{j-1}) +1
\end{cases}
g(m_i, r_2) \&=\& \min \begin{cases} g(R_{i,j}^{\circ} \cap F) + 1 \&
g(m_i, r_1) + (r_2 - r_1)
\end{cases} \end{eqnarray}

\fi

Proof: Let us look at $g(tial \{ E_t \cap F \})$ first. When $(m_{i-1}+1, n_j) \not\in F$ , $t_1 = n_j-L \gt n_{i-1}+1$ . So $g(t_1, n_j) = g(t_2, n_j) = g(R_{i,j}^{\circ} \cap F)+ d(0, n_j) \min \{ w_v, w_d\}$ holds. Otherwise, $t_1 = m_{i-1}+1$ holds and we compute $$g(t_1, n_j)$$ from DTW. $$g(t_2, n_j)$$ can be computed from either $g(R_{i,j}^{\circ} \cap F)$ or $$g(t_1, n_j)$$ like Eq. 5 of the unconstrained FSDTW[8]. So Eq. [???] follows. $g(tial \{ E_r \cap F \})$ can be computed similarly. \qedsymbol

4. FCSDTW Algorithm and Complexity

The results in Section [???] show that it is possible to compute the DTW distance by scanning through the intersections of the feasible set and the rectangles $R_{i,j}$ row or column-wise, and only computing the accumulated distances at six points, namely the end points of the top edge, those of the right edge, the upper right vertex and a single interior point (if the interior is not empty). Indeed, the Propositions and the subsequent Remarks show these accumulated distances can be expressed in terms of the corresponding accumulated distances at previously scanned rectangles. The boundary condition of the algorithm was discussed in [7,8]. \iffalse We assumed that $g(\cdot)$ is infinite on $E_{b}, E_{l}$ of $R_{1,1}$ , $E_{b}$ of $R_{i,1}$ , and $E_{l}$ of $R_{1,j}$ , where $i\gt 1$ and $j\gt 1$ . Note that g(0,0) is set as zero. Dummy non-zero data are necessary at $$m=M+1$$ or $$n=N+1$$ if $M_s \lt M$ or $N_s \lt N$ holds. Then, the interior $R_{M_{s}+1,N_{s}+1}^{\circ}$ includes $$(m,n)$$ satisfying $m_{M_{s}}+1\leq m\leq M$ and $n_{N_{s}}+1\leq n\leq N$ .\fi Formally the FCSDTW algorithm can be written in Algorithm 1. \iffalse The main algorithm is the same as that of the CSDTW[14] except that all the intersections of the edges with the feasible set are replaced by their boundaries(end points) of the intersections. In other words, the computation of the accumulated distances on $E_{t} \cap F$ and $E_{r} \cap F$ are replaced by $tial \{E_{t} \cap F \}$ and $tial \{ E_{r} \cap F \}$ for each $R_{i,j}$ . \fi \iffalse In our notation, CSDTW computes $g(\cdot)$ only on $E_{t} \cap F$ , $E_{r} \cap F$ , $V_{r,t} \cap F$ , and $(m_{i-1}+1,n_{j-1}+1) \cap F$ of each $R_{i,j}$ , where $1\leq i\leq M_{s}+1$ and $1\leq j\leq N_{s}+1$ , by assuming that $g(\cdot)$ is infinite on $E_{b}, E_{l}$ of $R_{1,1}$ , $E_{b}$ of $R_{i,1}$ , and $E_{l}$ of $R_{1,j}$ , where $i\gt 1$ and $j\gt 1$ . Note that g(0,0) is set as zero. Dummy non-zero data are necessary at $$m=M+1$$ or $$n=N+1$$ if $M_s \lt M$ or $N_s \lt N$ holds. Then, the interior $R_{M_{s}+1,N_{s}+1}^{\circ}$ includes $$(m,n)$$ satisfying $m_{M_{s}}+1\leq m\leq M$ and $n_{N_{s}}+1\leq n\leq N$ . In our notation, FCSDTW computes $g(\cdot)$ only on $tial \{E_{t} \cap F \}$ , $tial \{ E_{r} \cap F \}$ , $V_{r,t} \cap F$ , and $(m_{i-1}+1,n_{j-1}+1) \cap F$ of each $R_{i,j}$ , where $1\leq i\leq M_{s}+1$ and $1\leq j\leq N_{s}+1$ , by assuming that $g(\cdot)$ is infinite on $E_{b}, E_{l}$ of $R_{1,1}$ , $E_{b}$ of $R_{i,1}$ , and $E_{l}$ of $R_{1,j}$ , where $i\gt 1$ and $j\gt 1$ . Note that g(0,0) is set as zero. Dummy non-zero data are necessary at $$m=M+1$$ or $$n=N+1$$ if $M_s \lt M$ or $N_s \lt N$ holds. Then, the interior $R_{M_{s}+1,N_{s}+1}^{\circ}$ includes $$(m,n)$$ satisfying $m_{M_{s}}+1\leq m\leq M$ and $n_{N_{s}}+1\leq n\leq N$ . Formally, the FCSDTW algorithm changes the constrained SDTW algorithm by replacing all the intersections of the edges with the feasible set $$F$$ with their boundaries as shown below.
\fi \caption{FCSDTW Algorithm} \begin{algorithmic}[1] \STATE Append a dummy one if necessary to get time series $$X$$ and $$Y$$ which end in a one. Build $X_{s}$ and $Y_{s}$ of lengths $M_{s}$ and $N_{s}$ , respectively, from $$X$$ and $$Y$$ of lengths $$M$$ and $$N$$ , respectively, by deleting the zeros.
\FOR{ $j=1:N_{s}$ } \STATE{ \FOR{ $i=1:M_{s}$ } \STATE{ \IF{ $R_{ij}^{\circ}\neq\emptyset$ } \STATE{Compute $g(R_{i,j}^{\circ} \cap F)$ from Eq. [???]. \IF { $(m_{i-1}+1, n_j) \not\in F$ } \STATE{Compute $g(tial \{ E_t \cap F \})$ from $g(R_{i,j}^{\circ} \cap F)+ d(0, n_j) \min \{ w_v, w_d\}$ %Eq. [???]. }\ELSE \STATE{Compute $g(tial \{ E_t \cap F \})$ from Eqs. [???], [???]. } \ENDIF \IF { $(m_i, n_{j-1}+1) \not\in F$ } \STATE{Compute $g(tial \{ E_r \cap F \})$ from $g(R_{i,j}^{\circ} \cap F)+ d(m_i, 0) \min \{ w_h, w_d\}$ %Eq. [???]. }\ELSE \STATE{Compute $g(tial \{ E_r \cap F \})$ from Eqs. [???], [???]. } \ENDIF Compute $g((m_{i},n_{j}) \cap F)$ in the same way as DTW} \ELSE \STATE{ \IF{ $m_{i}=m_{i-1}+1$ } \STATE{Compute $g(tial \{ E_r \cap F \})$ from Eqs. [???], [???], [???]. Compute $g((m_{i},n_{j}) \cap F)$ in the same way as DTW} \ELSIF{ $n_{j}=n_{j-1}+1$ } \STATE{Compute $g(tial \{ E_t \cap F \})$ from Eqs. [???], [???], [???]. Compute $g((m_{i},n_{j}) \cap F)$ in the same way as DTW } \ENDIF } \ENDIF } \ENDFOR } \ENDFOR \STATE Compute the DTW distance as $D(X,Y)=\tfrac{1}{K^{*}}h(M^{*},N^{*})$ , where $M^{*}=M$ if a one is appended to get $$X$$ and $M^{*}=M-1$ otherwise, and $N^{*}=N$ if a one is appended to get $$Y$$ and $N^{*}=N-1$ otherwise.\end{algorithmic} \iffalse Finding the optimal path employs the same approach as CSDTW[7]. However, there is no computation of the accumulated distances in $R_{i,j}^{\circ} \cap F$ due to their equality except for the left bottom corner. Hence, the approach depends on DTW's policy on tiebreak, and we employ the diagonal path recovery here. So a walk toward $$(1,1)$$ is executed until meeting the bottom or left boundaries of $R_{i,j}^{\circ} \cap F$ . $R_{i,j}$ case, where $i\gt 1$ and $j\gt 1$ is given in the right. Algorithm for $R_{i,j}$ with $i\gt 1$ and $j\gt 1$ has several differences from the unconstrained SDTW such as the value of the accumulated distance at $R_{i,j}^{\circ}$ and computation of the accumulated distances along $$E_t$$ and $$E_r$$ . It is important to note that the following algorithm does not run when $m_i+L \leq n_{j-1}$ or $m_{i-1}-L \geq n_j$ holds, because $R_{i,j} \cap F$ does not have any intersections with $$E_t, E_r, $$ or $V_{r,t}$ . $R_{1,j}$ case( $j\gt 1$ ) is the same as the above general case except that $m_{0}$ is set as zero when $m_1\gt 1$ . Otherwise( $$m_1= 1$$ ), the algorithm is the same as the unconstrained DTW case except that the distances are computed at $E_r \cap F$ and $V_{r,t} \cap F$ instead of $$E_r$$ and $V_{r,t}$ . The $R_{i,j}$ case where $$i$$ =1 and $$j=1$$ holds is similar to the unconstrained SDTW and omitted. \fi \iffalse However, we preset the accumulated distances infinite so that the feasible region is automatically recognized without determining $R_{i,j} \cap F$ when finding the optimal path. \fi \iffalse \fbox{\begin{minipage}[c]{1\linewidth - 2\fboxsep - 2\fboxrule}% 1. Build $X_{s}$ and $Y_{s}$ of lengths $M_{s}$ and $N_{s}$ from non-zeros of the time series $$X$$ and $$Y$$ of lengths $$M$$ and $$N$$

2. Add dummy non-zeros at the end, if necessary.
3. Compute the local distances $$d(m,n)$$ , $$d(m,0)$$ , and $$d(0,n)$$ for every $m\in\{m_{1},m_{2},\cdots,m_{M_{s}}\}$ and $n\in\{n_{1},n_{2},\cdots,n_{N_{s}}\}$

4. Compute the accumulated distances $g(\cdot)$ on $tial \{E_{t} \cap F \}$ , $tial \{E_{r} \cap F \}$ , and $V_{r,t} \cap F$ for all $R_{i,j}$ , starting with $R_{1,1}$ , where $$i$$ and $$j$$ increase up to $N_{s}$ and $M_{s}$ , respectively.
5. Find the optimal path from $R_{i,j}$ , where $$i$$ and $$j$$ decrease to one. That is, starting from $E_{t} \cap F$ , $E_{r} \cap F$ , or $V_{r,t} \cap F$ of $R_{i,j}$ , travel towards $E_{l} \cap F$ , $E_{b} \cap F$ , $V_{l,t} \cap F$ , $V_{l,b} \cap F$ , or $V_{r,b} \cap F$ that also belong to $R_{i-1,j}$ , $R_{i,j-1}$ , or $R_{i-1,j-1}$ .
% \end{minipage}} \fi \iffalse The ending and starting indices of $R_{i,j}$ at the above steps 5 and 6 require some further discussion. When $m_{M_{s}}=M$ , the $$i$$ index starts from $M_{s}$ . Otherwise, the $$i$$ index starts from $M_{s}+1$ to cover $R_{i,j}^{\circ}$ . Recall $M_{s}+1$ is set as $$M+1$$ using the dummy non-zero. Similarly, when $n_{N_{s}}=N$ , the $$j$$ index starts from $N_{s}$ . Otherwise, the $$j$$ index starts from $N_{s}+1$ . \fi \iffalse \fbox{\begin{minipage}{1.0\linewidth} 1. Determine $g(\cdot)$ at $R_{i,j}^{\circ} \cap F$ . We denote it by $g(R_{i,j}^{\circ} \cap F)$ . Then Lemma [???] gives the following:

$m_{i-1}+L \lt n_{j-1}$

$g(R_{i,j}^{\circ} \cap F) arrow g(n_{j-1}-L, n_{j-1})$

elseif

$m_{i-1}-L \gt n_{j-1}$

$g(R_{i,j}^{\circ} \cap F) arrow g(m_{i-1},m_{i-1}-L)$

else

$g(R_{i,j}^{\circ} \cap F)arrow g(m_{i-1}+1,n_{j-1}+1)$

$= \min ( g(m_{i-1}+1,n_{j-1}), g(m_{i-1},n_{j-1}+1), g(m_{i-1},n_{j-1}))$

end
2. Compute the accumulated distances on $tial \{E_t \cap F \}$ and $tial \{ E_r \cap F \}$ defined in Remark [???] by Lemma [???].

$(m_{i-1}+1,n_j) \not\in F$

$g(\cdot) arrow g(R_{i,j}^{\circ} \cap F) + 1$ on $tial \{ E_t \cap F \}$

else

$g( \max (m_{i-1}+1,n_j-L), n_j)$ from DTW

$g( \min ( m_i-1, n_j+L ), n_j) = \min g(R_{i,j}^{\circ} \cap F) + 1$

$g( \max (m_{i-1}+1,n_j-L), n_j) + (\min ( m_i-1, n_j+L ), n_j) - \max (m_{i-1}+1,n_j-L))$

end

$(m_{i},n_{j-1}+1) \not\in F$ ,

$g(\cdot) arrow g(R_{i,j}^{\circ} \cap F) + 1$ on $tial \{ E_r \cap F \}$

else

$g(m_i, \max (n_{j-1}+1,m_i-L))$ from DTW

$g(m_i, \min ( n_j-1, m_i+L)) = \min g(R_{i,j}^{\circ} \cap F) + 1$

$g(m_i, \max (n_{j-1}+1,m_i-L)) + (\min ( n_j-1, m_i+L) - \max (n_{j-1}+1,m_i-L))$

end

3. Compute the accumulated distance at $V_{r,t}$ the same way as the unconstrained SDTW, if it is in $$F$$ . Note that $g(R_{i,j}^{\circ})$ is replaced by $g(R_{i,j}^{\circ} \cap F)$ .
\end{minipage}} \fi \iffalse

4.1. Computational Complexity of FCSDTW

\fi \iffalse We have the following bound on the time complexity $C_{FCSDTW}$ of FCSDTW, as measured by the number of the accumulated distance computations. It is proportional to that of FSDTW[8]. Its proof is omitted. \fi Bound on the time complexity $C_{FCSDTW}$ of FCSDTW was computed by counting the necessary computation of the accumulated distances. It is proportional to that of FSDTW[8]. Its proof is omitted.

Proposition: $C_{FSDTW} \le \frac{2L}{N} \{ 6(N_{s}+1)(M_{s}+1)-2N-1 \}$ on average.

Remark: The expression of the bound is $C_{FCSDTW}\le \frac{2L}{N} \{ 6s_{1}s_{2}N^{2}+12sN-2N+5 \}$ and further upper bounded by $C_{FCSDTW}\le \frac{2L}{N} \{6s^{2}N^{2}+12sN-2N+5\}$ , where as above $s_{1}=\tfrac{N_{s}}{N}$ , $s_{2}=\tfrac{M_{s}}{N}$ and $s=\tfrac{1}{2}(s_{1}+s_{2})$ , the sparsity ratio. Note that $C_{FCSDTW}=O(s^{2}N^{2})$ as $Narrow\infty$ . The complexity of CDTW is $$2LN$$ Thus, the complexity of FCSDTW over CDTW is calculated as $$6s^2+12s/N-2/N+5/N^2$$ .

\iffalse

Proof: No accumulated distances need to be computed in $R_{i,j}^{\circ}$ but its lower left corner $(m_{i-1}+1,n_{j-1}+1)$ . Complexity reductions are also present in the top and right edges except for their starts and ends. So the time complexity reduction is described as $\max((m_{i}-m_{i-1}-1)(n_{j}-n_{j-1}-1)-1,0)+\max(m_{i}-m_{i-1}-3,0)+\max(n_{j}-n_{j-1}-3,0)$ . Let $\delta m_{i}=m_{i}-m_{i-1}-1$ and $\delta n_{j}=n_{j}-n_{j-1}-1$ represent the number of zeros between $(m_{i-1},m_{i})$ and $(n_{j-1},n_{j})$ , respectively. Thus, $\sum_{i=1}^{M_{s}+1}\delta m_{i}=N-M_{s}$ equals the number of zeros in the sequence $x_{m_{i}}$ under the assumption that $m_{0}=0$ and $m_{M_{s}+1}=N+1$ . Likewise, $\sum_{j=1}^{N_{s}+1}\delta n_{j}=N-N_{s}$ . Then, the complexity reduction in each $$(i,j)$$ update is represented as $\max(\delta m_{i}\delta n_{j}-1,0)+\max(\delta m_{i}-2,0)+\max(\delta n_{j}-2,0)$ . Its lower bound can be written as $\delta m_{i}\delta n_{j}+\delta m_{i}+\delta n_{j}-5$ . Summation over all $$i$$ and $$j$$ gives the lower bound of the complexity reduction as $N^{2}+2N-6(N_{s}+1)(M_{s}+1)+1$ . Thus, the upper bound of the time complexity is given as $6(N_{s}+1)(M_{s}+1)-2N-1$ by subtracting the lower bound of complexity reduction from $N^{2}$ . \qedsymbol

\fi \iffalse $R_{1,j}$ case, where $j\gt 1$ is described in the following. For example, let's look a case of $m_{i-1}+L\lt n_{j-1}$ . From the above algorithm, $g(R_{i,j}^{\circ}) = g(n_{j-1}-L, n_{j-1})$ . The algorithm for $R_{1,j}$ with $j\gt 1$ has a difference from that of the unconstrained SDTW. The accumulated distances on $E_t \cap F$ must be computed as $g(R_{1,j}^{\circ} \cap F) +d(0,n_j)$ instead of $g(1,n_{j-1})+d(0,n_j)$ because $g(R_{1,j}^{\circ} \cap F)$ can be different from $g(1,n_{j-1})$ . The algorithm for $R_{1,j}$ does not execute in case of $m_1+L \leq n_{j-1}$ because $R_{1,j} \cap F$ does not have any intersections with $$E_t, E_r, $$ or $V_{r,t}$ .
\fbox{\begin{minipage}[c]{1\linewidth - 2\fboxsep - 2\fboxrule}% if

$m_{1}\gt 1$

Determine $g(R_{1,j\lt /u\gt ^{\circ} \cap F)$ }
if $L \lt n_{j-1}$
$g(R_{1,j}^{\circ} \cap F) arrow g(n_{j-1}-L, n_{j-1})$
else
$g(R_{1,j}^{\circ} \cap F) arrow g(1,n_{j-1}+1) = g(1,n_{j-1})$ because $d(1,n_{j-1}+1) = 0$
end
$E_t \cap F$ computation:
$g(\cdot) arrow d(0,n_{j})w_{v}+g(R_{1,j}^{\circ} \cap F)$
if $n_j +L \lt m_i$ ,
$g(n_j+L, n_j) arrow g(R_{i,j}^{\circ} \cap F) + d(0,n_j)w_d$
end
$E_r \cap F$ computation:
if $(m_{i},n_{j-1}+1) \not\in F$ ,
$g(\cdot) arrow g(R_{i,j}^{\circ} \cap F) + d(m_i,0)\min (w_h, w_d)$ on $E_r \cap F$
if $m_i +L \lt n_j$ ,
$g(m_i, m_i+L) arrow g(R_{i,j}^{\circ} \cap F) + d(m_i,0)w_d$
end
else
the same as unconstrained SDTW, where $g(R_{i,j}^{\circ})$ is replaced by $g(R_{i,j}^{\circ} \cap F)$
end
\iffalse $g(m_{1},n_{j-1}+1) = \min(d(m_{1},0)w_{v}+g(m_{1},n_{j-1}),d(m_{1},0)w_{d}+g(m_{1}-1,n_{j-1}),d(m_{1},0)w_{h}+g(R_{1,j}^{\circ} \cap F ) )$
for $k=n_{j-1}+2:\min(n_{j}-1, m_i+L)$
$g(m_{1},k) = \min(d(m_{1},0)w_{v}+g(m_{1},k-1),d(m_{1},0)w_{h}+g(R_{1,j}^{\circ}),d(m_{1},0)w_{d}+g(R_{1,j}^{\circ} \cap F))$
end
\fi if $V_{r,t} \in F$
$g(V_{r,t}) arrow \min(g(m_{1},n_{j}-1)+d(m_{1},n_{j})w_{v}, g(m_{1}-1,n_{j})+d(m_{1},n_{j})w_{h},$
$g(R_{1,j}^{\circ} \cap F)+ d(m_{1},n_{j})w_{d}$
end
else
$E_r \cap F$ computation:
for $k=n_{j-1}+1:\min(n_{j}-1, m_i+L )$
$g(m_{1},k)arrow g(m_{1},k-1)+d(m_{1},0)w_{v}$
end
if $V_{r,t} \in F$ ,
$g(V_{r,t}) arrow g(m_1, n_j-1)+ d(m_{1},n_{j})w_{v}$
end
end % \end{minipage}} \fi \iffalse The $R_{i,j}$ cases where $$i$$ =1 and $$j=1$$ are presented below. $R_{1,1}$ case:
\fbox{\begin{minipage}[c]{1\linewidth - 2\fboxsep - 2\fboxrule}% if $m_{1}=1$ and $n_{1}\gt 1$
$E_{r}\cap F$ : $g(m_{1},k)arrow k\times d(m_{1},0)w_{v}$ , $1\leq k\leq \min(n_{1}-1, m_1+L)$
if $V_{r,t} \in F$ ,
$g(V_{r,t} ) arrow d(m_{1},n_{1})w_v +g(m_{1},n_{1}-1)$
end

elseif $m_{1}\gt 1$ and $n_{1}=1$
$E_{t} \cap F$ : $g(k,n_{1})arrow k\times d(0,n_{1})w_{h}$ , $1\leq k\leq \min( m_{1}-1, n_1+L)$
if $V_{r,t} \in F$ ,
$g(V_{r,t} )arrow d(m_{1},n_{1})w_h +g(m_{1}-1,n_{1})$
end

elseif $m_{1}=1$ and $n_{1}=1$
$g(m_{1},n_{1})arrow d(m_{1},n_{1})w_{d}$

else
$g(\cdot)$ is set as zero in $R_{1,1}^{\circ} \cap F$
$g(E_{t} \cap F)$ , $g(E_{r} \cap F)$ , and $g(V_{r,t} \cap F)$ are computed in the same way as $R_{i,j}$ , where $i\gt 1$ and $j\gt 1$ .
\iffalse $g(E_{t} \cap F) arrow d(0,n_{1})w_{v}$
The following computations are the same as those of $R_{i,j}$ case, where $i\gt 1$ and $j\gt 1$ .
if $n_1+L \lt m_1$ ,
$g(n_1+L, n_1) arrow g(R_{1,1}^{\circ} \cap F) + d(0,n_1)w_d$
end
$g(E_{r} \cap F) arrow d(m_{1},0))w_{h}$
if $m_1 +L \lt n_1$ ,
$g(m_1, m_1+L) arrow g(R_{1,1}^{\circ} \cap F) + d(m_1,0)w_d$
end
if $V_{r,t} \in F$
$g(V_{r,t} )arrow \min(d(m_{1},n_{1})w_{d} + g(R_{1,1}^{\circ} \cap F),d(m_{1},n_{1})w_{v}+g(m_1,n_1-1),$
$d(m_{1},n_{1})w_{h}+g(m_1-1,n_1)$
end

\fi end % \end{minipage}} \fi

5. Numerical Analysis

This section shows efficiency comparisons using synthetic and experimental data between FCSDTW and CDTW and between FCSDTW with CAWarp and CSDTW. Other research has shown a significant reduction in time complexity but a large increase in errors[14,7]. \iffalse This section shows efficiency comparisons between FCSDTW and CDTW and between FCSDTW with constrained AWarp (CAWarp). and CSDTW by both synthetic and experimental data. \fi \iffalse This section presents some numerical results to compare the efficiency of FCSDTW versus CDTW, and also to make some comparisons of FCSDTW with constrained AWarp(CAWARP) and CSDTW. These experiments involve both synthetic and experimental data. Note that other approaches have shown a significant reduction in time complexity but a large increase in error[14,7]. \fi \iffalse Note that other approaches which modify DTW in a way which does not focus on the sparse structure were compared with SDTW and AWarp in [17,14], where it was shown they can yield large complexity reduction but sometimes a significant increase in error. \fi For FCSDTW we have the same hyperparameters as CSDTW[7] with the same computer but different versions of software such as MATLAB R2018b, and OS X 10.11.6. Following the same method in [7], the UCR collection was used to generate sparse time series data[24], specifically ECG, Trace, Car, MALLAT, and CinC ECG torso. Their abbreviations were ECG, TR, Car, ML, and CET, respectively. \iffalse We set the width

\(L\)

of the Sako-Chiba band from the diagonal of the constraint region be 10% of the length of time series, which is a popular choice[1]. If the width

\(L\)

decreases further, the speed of constrained DTW improves but there is a concern that its accuracy can decline. The weights

\(w_h\)

\(w_v\)

, and

\(w_d\)

are all assumed to be one. The Euclidean norm was used for the local distances. For the experiment, we used the same computer as CSDTW but different versions of software such as MATLAB R2018b, and OS X 10.11.6. \fi \iffalse

\fi

Figure 7. Relative time complexity of CSDTW, FCSDTW, and CAWarp against CDTW are shown by diamonds, crosshairs, and circles, respectively. The colors red, green, blue, black, and magenta denote the data ECG, TR, Car, ML, and CET, respectively.

\iffalse

Table 1: Ratio of time complexity of FCSDTW over CDTW with varying sparsity ratios

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30
SC	0.686	0.748	0.809	0.943	1.014	1.123
DPT	0.479	0.490	0.563	0.636	0.720	0.806
ECG	0.390	0.382	0.471	0.529	0.622	0.709
50W	0.133	0.181	0.244	0.322	0.416	0.517
TR	0.140	0.180	0.241	0.320	0.420	0.520
CF	0.219	0.330	0.287	0.327	0.414	0.510
BF	0.209	0.227	0.281	0.358	0.442	0.526
Car	0.171	0.205	0.253	0.322	0.393	0.491
L2	0.166	0.193	0.242	0.307	0.372	0.464
WM	0.151	0.167	0.206	0.255	0.320	0.387
ML	0.123	0.149	0.190	0.241	0.297	0.360
CET	0.127	0.156	0.193	0.245	0.303	0.371

\end{minipage} \captionof{table}{Regression coefficients for relative time complexity of FCSDTW over CDTW versus sparsity ratio}

data set	length	$b_{2}$	$b_{0}$ SC	60	5.368	0.937 ECG	96	4.290	0.496 50W	270	4.478	0.199 TR	275	4.332	0.205 CF	286	3.586	0.289 BF	470	3.745	0.168 Car	577	3.463	0.205 L2	637	3.706	0.118 WM	900	2.957	0.166 ML	1024	3.036	0.205 CET	1639	2.729	0.140

\fi \captionof{table}{Regression coefficients for relative time complexity of FCSDTW over CDTW versus sparsity ratio.

\(b_2\)

and

\(b_0\)

represent the coefficients of

\(s^2\)

and the constant. }

data set	length	$b_{2}$	$b_{0}$ \iffalse SC	60	5.368	0.937 \fi ECG	96	4.290	0.496 \iffalse 50W	270	4.478	0.199 \fi TR	275	4.332	0.205 \iffalse CF	286	3.586	0.289 BF	470	3.745	0.168 \fi Car	577	3.463	0.205 \iffalse L2	637	3.706	0.118 WM	900	2.957	0.166 \fi ML	1024	3.036	0.205 CET	1639	2.729	0.140

\iffalse

Table 2: Ratio of time complexity of FCSDTW over CSDTW with varying sparsity ratios

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30
SC	0.562	0.491	0.466	0.443	0.449	0.460
DPT	0.421	0.476	0.466	0.459	0.468	0.473
ECG	0.357	0.386	0.378	0.420	0.429	0.432
50W	0.478	0.393	0.394	0.412	0.437	0.479
TR	0.463	0.392	0.398	0.421	0.447	0.502
CF	0.259	0.235	0.224	0.252	0.461	0.483
BF	0.309	0.316	0.385	0.453	0.481	0.514
Car	0.929	0.674	0.620	0.602	0.622	0.651
L2	0.933	0.669	0.522	0.628	0.644	0.673
WM	0.691	0.581	0.555	0.570	0.595	0.628
ML	0.690	0.588	0.569	0.584	0.612	0.603
CET	0.709	0.603	0.580	0.594	0.595	0.690

\end{minipage}

Table 3: Relative time complexity of CAWarp over CDTW with varying sparsity ratio

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30
SC	.237	.330	.413	.555	.683	.862
DPT	.170	.231	.338	.454	.576	.719
ECG	.144	.197	.289	.404	.554	.647
50W	.0494	.122	.217	.403	.794	.946
TR	.0496	.120	.209	.328	.487	.635
CF	.0934	.204	.228	.327	.455	.597
BF	.052	.0936	.174	.272	.391	.519
Car	.0275	.0773	.150	.245	.349	.484
L2	.0245	.0686	.138	.229	.321	.454
WM	.0167	.052	.103	.175	.258	.359
ML	.0153	.0497	.101	.167	.249	.340
CET	.0142	.0488	.102	.173	.260	.359

\end{minipage}

Table 4: Relative error of CAWarp cost against constrained DTW cost with varying sparsity ratios in $10^{-3}$

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30
SC	0	0	0	0	0	0
DPT	1.4	2.0	3.5	4.9	4.5	5.7
ECG	0.7	2.3	3.1	2.8	2.2	2.6
50W	0.336	0.582	0.703	0.713	0.668	0.578
TR	0.133	0.409	0.634	0.838	0.590	0.325
CF	0	0	0.2	0.9	2.3	0
BF	0.406	0.609	0.344	0	0	0
Car	0.426	0.288	0.316	0.192	0.172	0.086
L2	0.0457	0.0323	0	0	0	0
WM	0.048	0.106	0.171	0	0.023	0
ML	8.439	4.234	2.290	1.045	0.793	0.619
CET	0	0	0	0	0	0

\end{minipage} \fi

Table 5: Relative errors of the cost of FCSDTW over CDTW with varying sparsity ratio in $10^{-4}$ . The symbol `eps' stands for errors less than double precision accuracy in MATLAB.

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40
\iffalse IPD	eps	eps	eps	eps	eps	1.443
\fi \iffalse SC	eps	eps	eps	0.0260	4.049	1.742
SRS	0	0	0	eps	0	eps
DPT	eps	0.166	eps	1.327	1.978	2.999
\fi ECG	eps	eps	eps	0.0017	0.1039	0.0172	0.1733	1.2845
\iffalse CBF	eps	eps	eps	eps	eps	0.657
EC5	eps	0	eps	eps	eps	eps
50W	eps	0.0189	0.135	0.320	0.323	0.577
\fi TR	eps	eps	eps	0.0769	0.0940	0.0789	0.0345	0.0474
\iffalse CF	eps	eps	eps	6.874	0.0148	0.0476
BF	eps	0	0	0	0.0494	0.326
OO	eps	eps	eps	1.328	1.072	3.264
\fi Car	eps	eps	0.0673	0.0586	0.2367	0.2328	0.7437	0.3674
\iffalse L2	eps	eps	0.0576	0.181	0.332	1.045
WM	eps	eps	eps	0.468	0.662	0.455
\fi ML	eps	0.0250	0.0464	0.0364	0.1336	0.5085	0.1859	0.3958
CET	eps	eps	0.0023	0.0293	0.0384	0.0710	0.1065	0.2563

\end{minipage}

Table 6: Relative errors of the cost of CAWarp over CDTW cost with varying sparsity ratio.

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40
\iffalse IPD	33.094	24.959	89.640	165.520	139.971	82.659
SC	150.857	625.671	540.534	645.174	794.070	844.856
SRS	0	0	241.207	46.267	157.735	43.886
DPT	177.505	111.074	228.100	197.067	193.190	170.947
\fi ECG	0.3053	0.2553	0.2363	0.1898	0.1681	0.1608	0.1634	0.1494
\iffalse CBF	383.840	420.609	283.245	587.533	406.610	411.291
EC5	eps	eps	468.466	74.623	1.357	387.677
50W	5.589	13.984	19.434	21.219	26.105	30.290
\fi TR	0.3998	0.3534	0.3273	0.2938	0.2878	0.2673	0.2470	0.2467
\iffalse CF	4.476	10.489	25.657	67.956	39.095	38.563
BF	0.352	23.153	9.272	11.770	3.634	32.058
OO	3.288	33.644	131.998	163.406	196.390	258.514
\fi Car	0.2476	0.1664	0.1671	0.1342	0.1476	0.1407	0.1497	0.1312
\iffalse L2	25.328	92.240	106.789	146.288	176.498	204.591
WM	0.129	1.059	3.810	7.422	15.104	24.074
\fi ML	0.2004	0.1402	0.1092	0.1018	0.0910	0.0810	0.0689	0.0714
CET	0.3627	0.3101	0.2927	0.2851	0.2089	0.2300	0.2126	0.2065

\end{minipage} \iffalse Although publicly available sparse time series data exist [16], it is desirable to control the sparsity of experimental data. \iffalse In such cases, sparsity can be controlled by removing random segments of contiguous time points [25]. Similarly, \fi Our sparse time series are generated like CSDTW by selecting time indices whose intervals follow a geometric distribution and masking out the remainder as zeros[7]. For the equal length time series considered here, this distribution has mean

\tfrac{1}{s}

, where

\(s\)

is the sparsity ratio. \fi \iffalse Following the same method in [7], we have generated sparse time series data from the UCR collection [24], specifically \iffalse Synthetic Control, DistalPhalanxTW, ECG, 50Words, Trace, Coffee, Beef, Car, Lighting-2, Worms, MALLAT, and CinC ECG torso, and and are abbreviated by SC, DPT, ECG, 50W, TR, CF, BF, Car, L2, Wm, ML, and CET, respectively. \fi ECG, Trace, Car, MALLAT, and CinC ECG torso. They are shortened by ECG, TR, Car, ML, and CET, respectively. \fi \iffalse Publicly available sparse time series data are rare. In addition, it is impossible to control the sparsity of such experimental data for evaluation of the speed of FCSDTW. In such cases, sparsity can be controlled by removing random segments of contiguous time points [25]. We consider time series data sets from the UCR collection [24]. Similar to [25], our sparse time series are generated by selecting time indices whose intervals follow a geometric distribution and masking out the remainder as zeros. Hence, each time series independently has a probability of non-zero's that is the same as the mean sparsity ratio

\overline{s}

, where the bar denotes expectation over the geometric distributions. Note that in the Remark following Theorem [???] the bounds continue to hold with the mean quantities

\overline{s}=\overline{s}_{1}=\overline{s}_{2}

replacing the pointwise values (we omit the mean qualifier in the sequel, and simply refer to the sparsity ratio). \fi \iffalse The solid line in Figure 7 shows the time complexity of constrained BSDTW relative to constrained DTW versus different sparsity ratios for the time series dataset of ``50 words''. With sparsity ratio around 0.30, CSDTW finally loses its speed advantage. The dotted line represents a linear regression of this relative time complexity on the sparsity ratio

b_{2}s^2 +b_{0}

, where

\(b_2 = 3.299\)

and

\(b_0 = 0.0527\)

are given from the regression analysis. \fi \iffalse Table [???] shows the average values of the relative time complexity of FCSDTW over CDTW, i.e., the ratio of their computing times. The speed advantage of FCSDTW increases for longer data records (see Table [???] for the data lengths), probably due to the initialization overhead of the algorithm such as extraction of the non-zero data and their time indices. Similar behavior was found with SDTW, CSDTW, BSDTW, and FSDTW [7,8,14,18]. \fi Figure 7 demonstrates the relative time complexity - the ratio of the computing times - of CSDTW, FCSDTW, and CAWarp over CDTW. In longer data records, the speed of CSDTW, FCSDTW, and CAWarp increases (see Table 1), presumably from the lower overhead when the algorithm was initialized. A similar trend was observed with the SDTW, BSDTW, and FSDTW[8,14,18]. The estimated time complexity of CAWarp lies between 0.3 and 1.3 against FCSDTW in Figure 7. Their theoretical ratio was 2/3. \iffalse Figure 7 demonstrates the relative time complexity - the ratio of the computing times - of CSDTW, FCSDTW, and CAWarp over CDTW. In longer data records, the speed of CSDTW, FCSDTW, and CAWarp increases (see Table 1). \iffalse The speed advantages of CSDTW, FCSDTW, and CAWarp increase for longer data records (see Table 1 for the data lengths)\fi We assume that it is due to the initialization overhead of the algorithm and found a similar trend with the SDTW, BSDTW, and FSDTW [8,14,18]. \iffalse probably due to the initialization overhead of the algorithm such as extraction of the non-zero data and their time indices. Similar behavior was also found with the SDTW, BSDTW, and FSDTW\fi The estimated time complexity of CAWarp lies between 0.3 and 1.3 against FCSDTW in Figure 7. Their theoretical ratio was 2/3. \fi \iffalse The time complexity of CAWarp is estimated mostly between 0.3 and 1.3 of that of FCSDTW in Figure 7. It compared with the theoretical ratio of

\frac{2}{3}

. Direct comparison results of their speeds are omitted. \fi Table [???] presents a second-order polynomial regression analysis comparing the relative time complexity of algorithm FCSDTW to CDTW with the sparsity ratio from the sparse time series datasets. The obtained regression is

b_{2}s^{2}+b_{0}

. \iffalse Table [???] shows a second order polynomial regression analysis of the relative time complexity of BSDTW over DTW with respect to the sparsity ratio for the sparse time series data sets. The obtained regression is

b_{2}s^{2}+b_{0}

. \fi We have the values of

b_{2}

between two and four, satisfying the upper bound six in the Remark following Proposition [???]. However, We have the values of

b_{0}

much higher than FSDTW, probably because of the computing load to find the region

R_{i,j} \cap F

. Such examples are displayed in [7]. \iffalse For example, there are two different figures of such region when

m_{i-1} +L \lt n_{j-1}

holds[7]. \fi Note that

\(s\)

term is not negligible for ECG time series because its length

\(N\)

is only 96. So

\(b_0\)

is estimated around 0.5. In contrast,

\(b_0\)

values are around 0.14 for CET because their lengths are larger than 1600. Note that the values of

\(b_0\)

of CAWarp are between 0.02 and 0.2, which are between two times and six times lower than FCSDTW. Such differences in the values of

\(b_0\)

result in the time complexity differences between FCSDTW and CAWarp in Figure 7. Table [???] shows the relative errors of the cost by comparing FCSDTW to CDTW in ten-thousandth (

10^{-4}

) scale. The relative errors of the cost of CAWarp are between

10^{4}

and

10^{5}

times those of FCSDTW in Table [???]. Therefore when considering FCSDTW's small errors, the speed of FCSDTW is more significant compared to CAWarp. In addition, \iffalse FCSDTW shows the same characteristics as the unconstrained FSDTW[8]. \fi FCSDTW doesn't have errors like FSDTW when the low sparsity ratio is 0.05 or 0.1 because they nearly have no consecutive non-zeros, the cause of errors. \iffalse Table [???] shows the relative errors of the cost of FCSDTW over CDTW in ten thousandth(

10^{-4}

) scale. CAWarp has the relative errors of the cost between

10^{4}

and

10^{5}

times those of FCSDTW in Table [???]. Therefore when considering FCSDTW?s small errors, the speed of FCSDTW is more significant compared to CAWarp. In addition, FCSDTW shows the same characteristics as the unconstrained FSDTW[8] having zero errors for low sparsity ratio

\(s = 0.05\)

and

\(s = 0.1\)

because they nearly have no consecutive non-zeros, the cause of errors. \fi \iffalse So FCSDTW's speed is appealing against CAWarp considering its significantly less relative errors of the cost. In spite of such large errors of the cost, classification performance looks excellent[17]. \iffalse The relative error of the cost of CAWarp are omitted for sparse time series whose sparsity ratios are 0.35 and 0.40 due to space limit. Their values are between 0.01 and 0.40. \fi \iffalse The average relative error of the cost are in range between 0.117 and 1.735 among different data sets. \fi Note that infinite relative errors of the cost were ignored for CAWarp errors. Such case arises when DTW has a zero cost but CAWarp has a non-zero cost. Especially, FCSDTW has practically zero errors for low sparsity ratio

\(s=0.05\)

and

\(s=0.1\)

as the unconstrained FSDTW had the same characteristic[8]. The reason seems to be that very sparse time series is unlikely to have consecutive non-zeros that generate the errors of the accumulated distances. \fi \iffalse Table [???] shows improvement in the average value values of the relative time complexity of FCSDTW over CSDTW with varying sparsity ratios. \fi \iffalse Table [???] shows the average values of the relative time complexity of CAWarp against CDTW with varying sparsity ratios. It shows that CAWarp is faster than FCSDTW, surpassing the theoretic value of the time complexity ratio,

\frac{2}{3}

. Note that the time complexity of CAWARP is upper bounded by

\(4s^2\)

, on the other hand FCSDTW is upper bounded by

\(6s^2\)

. Table [???] shows the relative errors of the cost of CAWarp against CDTW, which are about thousand times lower than those of non-binary results in CSDTW[7]. Such drastic smaller error apparently occurred because AWARP is exactly equivalent to DTW for binary data in the unconstrained case[18]. \fi \iffalse

Table 7: Relative time complexity of CAWarp over constrained DTW with varying sparsity ratio

\begin{minipage}[c]{1\linewidth}

data set	sparsity ratio
data set		0.05	0.10	0.15	0.20	0.25	0.30
IPD	0.126	0.252	0.362	0.477	0.564	0.655
SC	0.0756	0.135	0.211	0.318	0.414	0.602
SRS	0.126	0.106	0.198	0.359	0.486	0.633
DPT	.0564	0.111	0.197	0.297	0.401	0.534
ECG	.0494	0.102	0.186	0.277	0.406	0.498
CBF	0.0485	0.0945	0.157	0.259	0.371	0.485
EC5	0.0645	0.0912	0.168	0.250	0.375	0.475
50W	0.0229	0.0737	0.144	0.231	0.338	0.449
TR	0.0251	0.0732	0.144	0.231	0.345	0.466
CF	0.0269	0.0736	0.144	0.222	0.340	0.449
BF	0.0215	0.0634	0.134	0.217	0.318	0.427
OO	0.0193	0.0600	0.141	0.218	0.318	0.424
Car	0.0199	0.0651	0.134	0.223	0.320	0.451
L2	0.0189	0.0604	0.129	0.213	0.303	0.423
WM	0.0172	0.0611	0.124	0.219	0.303	0.417
ML	0.0177	0.0633	0.132	0.217	0.324	0.427
CET	0.0179	0.066	0.137	0.231	0.324	0.445

\end{minipage} \fi \iffalse Table [???] compares the time complexities of constrained AWarp, CSDTW, PrunedDTW against that of constrained DTW for NHANES data. (Forward problem only !!!!!) Constrained SDTW is slower than constrained DTW over all sparsity ratios. Note that the relative error of the cost from AWarp ignores infinity that occurs when there is an error from AWarp but none exists from DTW.

Table 8: Comparison of ratios of time complexity among constrained SDTW, PrunedDTW and AWarp against constrained DTW for NHANES data

sparsity ratio	0.08333	0.125	0.16667	0.20833	0.250	0.29167	0.33333 SDTW	1.3259	1.6004	1.8779	2.0180	2.1822	2.2949	2.4506 PrunedDTW	1.3720	1.4382	1.4313	1.4336	1.4365	1.4446	1.4643
AWarp	0.2232	0.3139	0.4079	0.4925	0.5861	0.6669	0.7825
error of the cost(AWarp)	18.2608	7.0270	5.8522	5.6250	6.1608	7.0407	9.2129

\fi Table [???] provides a comparison of the time complexities in CAWarp, FCSDTW, and CSDTW against that of CDTW for washing machine power usage[16] to demonstrate the better performance of FCSDTW in real datasets. The dataset was downsampled by 40 for fast processing and its length was reduced to 270 (from 10800). Although CAWarp is faster, but its error of the cost is much bigger than that of FCSDTW: The relative errors in class 3 are 717.64 for CAWarp and 2.2052 for FCSDTW; those in class 7 are 482.85 for CAWarp and 0.4922 for FCSDTW. Thus application of CAWarp to real data can be impractical because its relative error amounts to more than 7% of the CDTW cost. \iffalse To illustrate the performance of FCSDTW in real data, Table [???] compares the time complexities of CAWarp, FCSDTW, CSDTW against that of CDTW for washing machine power usage[16]. The time series is downsampled by 40 for fast processing and its length decreases to 270 (from 10800). Although CAWarp is faster, its error of the cost is much larger than those of FCSDTW: The relative errors in

10^{-4}

from class 3 are 717.64 and 2.2052 for CAWarp and FCSDTW, respectively. Those from class 7 are 482.85 and 0.4922 for CAWarp and FCSDTW, respectively. Thus application of CAWarp to real data can be impractical because its relative error amounts to more than 7% of the CDTW cost. \fi \iffalse Each time series is one day long at eight seconds resolution starting at midnight. For fast processing, the time series is downsampled by 40, so that the length of the time series decreases from 10800 to 270. Instead of considering all the appliances, we considered only the power usage of the washing machines. \iffalse In total, we have 625 days of on-off time series for the washing machines. The data is very sparse because the washing machines are not often in use. \fi Although CAWarp is four or seven times faster than FCSDTW, its error of the cost is between hundreds and thousands times larger than that of FSDTW: The relative errors in

10^{-4}

Table 9: Comparison of the relative time complexities of CAWarp, CSDTW, and FCSDTW against CDTW

class	3	7
sparsity ratio	0.0578	0.02697
CSDTW	0.2204	0.1672
FCSDTW	0.1426	0.1362
CAWarp	0.0350	0.0194

\iffalse Speed improvement by reduced time complexity from the global constraint becomes more effective for time series with longer lengths because of overhead in running SDTW in Figure 8. From the top, the solid line, dashed line, dash-dotted line, '+', and '{*}' represent relative time complexity of SDTW with the global constraint to that of SDTW for synthetic control data, CBF, 50 words, beef, and Mallat data, respectively. Figure [???] shows decrease of speed enhancement of constrained SDTW over unconstrained SDTW with sparse ratio approaching 0.01 for short time series such synthetic control data and CBF. Apparently, determination of the constraint region makes the constrained SDTW ineffective when the warping path has only very small choices due to one or two non-zero data.

Figure 8. Ratio of computational complexities between constrained SDTW and unconstrained SDTW

Table [???] shows regression results of the the relative time complexity of FCSDTW to that of CDTW against the sparsity ratio from ``very sparse'' time series such that the sparsity ratio ranges from 0.01 to 0.1 with 0.01 increments. The regression coefficients suggested that the constrained SDTW performed better relative to constrained DTW for very sparse time series than the sparse time series with the sparsity ratio ranging from 0.05 to 0.40, only except for the synthetic control data in Table [???].

Table 10: Regression results of relative time complexity of constrained SDTW against sparsity ratio up to 0.1

train data set	length	$b_{1}$	$b_{0}$ synthetic control	61	4.8913	0.2745 CBF	129	5.6920	0.0727 50 words	271	5.9246	0.0040 Beef	471	5.4744	0.0069 MALLAT	1025	5.2093	-0.0056

Figure 9. Time complexity comparison of DTW vs SDTW for Mallat time series

Figure 9 demonstrates that constrained SDTW indeed has speed improvement over unconstrained SDTW, expanding its advantage over DTW for less sparse time series. The dashed, dash-dotted, and solid line represent DTW with constraint, SDTW without constraint, and SDTW with constraint, respectively. \fi

6. Conclusion

This study developed and analyzed FCSDTW to compare sparse time series with zero data in many research applications. FCSDTW closely approximated CDTW with much lower time complexities for time series with many zero values and especially large lengths. \iffalse This study shows computational complexity of FCSDTW can be significantly reduced with a small sparsity ratio and large lengths. \fi FCSDTW is a distinct algorithm from other approaches producing huge complexity reduction with significant errors or no change in complexity without error. Further numerical experiments support the efficiency of FCSDTW compared against CSDTW and CAWarp. \iffalse FCSDTW has been developed and analyzed to utilize sparse time series when managing zero data in many research applications. FCSDTW closely approximate CDTW with much less computational burden for sparse time series. This study shows its computational complexity can be significantly reduced with a small sparsity ratio and large lengths. FCSDTW is a distinct algorithm from other approaches producing huge complexity reduction with significant errors or no change in complexity without error. Further numerical experiments support the efficiency of FCSDTW compared against CSDTW and CAWarp. \fi \iffalse FCSDTW has been developed and analyzed to utilize sparse time series when managing zero data in many research applications. FCSDTW is closely approximate to CDTW and can find the same solution with less computation for sparse time series. Its computational complexity is proportional to the square of the sparsity ratio and can yield significant decreases for time series with small enough sparsity ratio and large enough lengths. FCSDTW is different than other approaches which modify the time series and/or the DTW algorithm in a way which does not focus on the sparse structure. These methods can yield large complexity reduction but at the expense of increased error or can yield no complexity reduction without any error. Numerical experiments from synthetic and real data support the analysis that FCSDTW is faster than CSDTW for the sparse time series. This work provides a framework to combine FCSDTW with these other complexity reduction approaches. \fi

References

[1]. Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.

[2]. Claus Bahlmann and Hans Burkhardt. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3):299–310, 2004.

[3]. Zsolt Miklos Kovacs-Vajna. A fingerprint verification system based on triangular matching and dynamic time warping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1266–1276, 2000.

[4]. Samsu Sempena, Nur Ulfa Maulidevi, and Peb Ruswono Aryan. Human action recognition using dynamic time warping. In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, pages 1–5. IEEE, 2011.

[5]. Philippe Esling and Carlos Agon. Time-series data mining. ACM Computing Surveys (CSUR), 45(1):12, 2012.

[6]. Tak-chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164–181, 2011.

[7]. Youngha Hwang and Saul B Gelfand. Constrained sparse dynamic time warping. In International Conference on Machine Learning and Applications, pages 216–222. IEEE, 2018.

[8]. Youngha Hwang and Saul B Gelfand. Fast sparse dynamic time warping. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3872–3877. IEEE, 2022.

[9]. Heather A Eicher-Miller, Saul Gelfand, Youngha Hwang, Edward Delp, Anindya Bhadra, and Jiaqi Guo. Distance metrics optimized for clustering temporal dietary patterning among us adults. Appetite, 144:104451, 2020.

[10]. David Murray and L Stankovic. Refit: electrical load measurements. URL= http://www.refitsmarthomes.org/, 2017.

[11]. Youngha Hwang and Saul B Gelfand. Sparse dynamic time warping. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 163–175. Springer, 2017.

[12]. Youngha Hwang and Saul B Gelfand. Binary sparse dynamic time warping. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 748–759. Springer, 2019.

[13]. Youngha Hwang. Constrained binary sparse dynamic time warping. In manuscript.

[14]. Abdullah Mueen, Nikan Chavoshi, Noor Abu-El-Rub, Hossein Hamooni, Amanda Minnich, and Jonathan MacCarthy. Speeding up dynamic time warping distance for sparse time series data. Knowledge and Information Systems, 54(1):237–263, 2018.

[15]. Eamonn Keogh, Xiaopeng Xi, Li Wei, and Chotirat Ann Ratanamahatana. The ucr time series classification/clustering homepage. URL= http://www.cs.ucr.edu/˜ eamonn/time series data, 2006.

Cite this article

Hwang,Y. (2024). Fast Constrained Sparse Dynamic Time Warping. Applied and Computational Engineering,120,50-58.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning

ISBN：978-1-83558-809-3(Print) / 978-1-83558-810-9(Online)

Editor：Stavros Shiaeles

Conference website: https://2025.confspml.org/

Conference date: 12 January 2025

Series: Applied and Computational Engineering

Volume number: Vol.120

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).