The problem of sequence alignment arises in many fields of science as a consequence of dealing with data that does not live in fixed dimensional Euclidean spaces. In computer vision, sequence alignment is an important first step used to solve problems such as the human activity analysis and recognition. The alignment can be used to establish a measure of similarity between two sequences of video frames or motion capture data, which can be subsequently employed for sequence classification or clustering.

A traditional way to address the alignment problem between two sequences relies on Dynamic Time Warping (DTW). DTW is typically solved using a combinatorial dynamic programming algorithm that searches for a globally optimal warping path, mapping the domain of one sequence onto the other. DTW and its derivates have shown great success in many practical alignment applications [3]. In practice the unconstrained warping of a generic DTW often fails to yield reasonable and robust results. Imposing constraints on the feasible warping paths has empirically shown to improve the classification performance. For instance, the Sakoe-Chiba band constraint, which restricts the maximum deviation of matching slices from the diagonal by \( p \)% of the sequence length, can result in substantially improved alignments depending on the choice of \( p \). Recently, [6] proposed an adaptive band approach that estimates function spaces of time warping paths, removing the need for a fixed \(p\). In this setting motion class-specific warping-path constraints are learned for each class that reflect the warping variations of samples within it. Nevertheless, imposing a proper set of constraints on DTW is still a challenging problem.

One additional property of DTW-based alignment is that it assumes pairing of individual data points: a sample at time \(t_{x}\) in sequence \(x\) is typically aligned with only one other sample at time \(t_{y}\) in sequence \(y\). In many practical applications it may be more desirable to establish pairing between groups of points: associating a temporal segment \( \mathbf{t}_{x} = [t_{x,1}, … , t_{x,2}] \) to another segment of the contrasting sequence, \( \mathbf{t}_{y} = [t_{y,1},…,t_{y,2}] \). The segment pairing can reflect alignment of e.g., segment sufficient statistics instead of the raw sample values, making the comparison more robust to individual sample differences. For example, in mocap data segment-level alignment can be less susceptible to individual subject differences while performing a certain motion or activity. Alignment on the segment level can also be justified as a way of comparing the local segment densities, in view of the Hilbert space embeddings of probability distributions. Additionally, point-to-point pairing deems DTW to be very sensitive to noise. While pairing convex combinations of the segments of the two contrasting sequences can impose additional filtering on data leading to a more robust alignment.

A natural way of solving the alignment problem is to find the aligned subspace (manifold in general) where the correlation of two sequences is maximized. Canonical correlation analysis (CCA) provides the necessary means for finding such a subspace between a pair of random variables. However, CCA in its original formulation does not respect the critical *monotonicity* property of any temporal alignment, which prevents arbitrary permutations of time indexes (e.g., self intersection of time, mapping \( t_{x,1} \rightarrow t_{y,2} \) and \( t_{x,2} \rightarrow t_{y,1} \) when \( t_{x,1} < t_{x,2}, t_{y,1} < t_{y,2} \)). To address this problem [8] proposed to formulate CCA of two vectors as a regression problem with a proper hyperbolic basis for the coefficient vector, guaranteeing monotonicity and the isotonic character of this solution. Their approach, however, does not immediately extend to multivariate time series and also does not explicitly model the segment alignment.

Given the above discussion, our approach was based on an isotonic extension to CCA to tackle the problem of sequence alignment. Unlike the traditional CCA, we introduced alternative constraints that guarantee monotonicity in the projected alignment space. We showed that these constraints are of quadratic nature, similar to the traditional CCA normalization constraints. This set of constraints is supplemented with non-negativity and norm-1 normalization constraints, which together allow alignment of subsegments and the Hilbert density embedding interpretation. We then presented an efficient solution to the isotonic CCA optimization based on iterative coordinate descent and non-negative least squares. The details and the results of applying IsoCCA is presented in our paper published in proceeding of International Conference on Computer Vision, 2011, Barcelona, Spain.

The code for IsoCCA is available at the software section.

**Related Publications**

- [1] S. Shariat and V. Pavlovic. “Isotonic CCA for Sequence Alignment and Activity Recognition”.
*Internation Conference on Computer Vision*. 2011.