Learned Pattern Similarity (LPS)

This is a supporting page to our paper -  Time series similarity based on a pattern-based representation (LPS)

by Mustafa Gokce Baydogan and George Runger
* LPS is compared to 11 similarity measures on 75 time series classification problems. It is ranked first but it is not significantly outperforming the competitors. The detailed results are available below and provided as an Excel file in the following link.
* this study is presented in INFORMS 2013@Minneapolis, IFORS 2014@Barcelona, FEAST Workshop at ICPR 2014@Stockholm and INFORMS 2014@San Francisco.  The presentation (INFORMS 2013 one) is available here."
* the paper is submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence on September 23rd, 2013 and rejected with one resubmit as new / one major revision / one reject decision on June 1st, 2014.
* During the review process, we came up with a more robust version of LPS that does not require tuning of any parameters. This version is submitted to Data Mining and Knowledge Discovery on November 17th, 2014.
 

Latest blog posts on Learned Pattern Similarity

DATASETS and RESULTS
LPS is compared to nearest neighbors (NN) classifiers discussed by Lines and Bagnall (2014) on 75 datasets from different sources. See Lines and Bagnall (2014) for details of the classifiers and the datasets. Our detailed results are provided as an Excel file in the related files section. Here is the direct link to the folder.
 
Eleven similarity measures are considered in our comprehensive evaluation: DTW and DDTW Dynamic time warping and derivative dynamic time warping that use the full warping window, DTWBest and DDTWBest DTW and DDTW with the window size setting determined through cross-validation, DTWWeight and DDTWWeigh Weighted version of DTW and DDTW, LCSS Longest common subsequence, MSM Move-Split- Merge, TWE Time warp edit distance, ERP Edit distance with real penalty, and ED Euclidean distance. 
 
Comparison of LPS to multiple classifiers over all datasets is done using a procedure suggested by Demˇsar (2006). The testing procedure employs a Friedman test (Friedman, 1940) followed by the Nemenyi test (Nemenyi, 1963) if a significant difference is identified by Friedman’s test. It is basically a non-parametric form of Analysis of Variance based on the ranks of the approaches on each dataset (Lines and Bagnall, 2014). Based on the Friedman test, we find that there is a significant difference between the 13 classifiers at the 0.05 level. Proceeding with the Nemenyi test, we compute the critical difference (CD) at 0.05 level to identify if the performance of two classifiers is different. This test concludes that two classifiers have a significant difference in their performances if the their average ranks differ by at least the critical difference. Figure below shows the average ranks for all classifiers on 75 datasets. LPSBest has best average rank and LPS is second best.  Based on the Friedman test, we find that there is a significant difference between the 13 classifiers at the 0.05 level. The critical differences at significance level 0.05 and 0.10 are 2.107 and 1.957, respectively.
 

As mentioned by Lines and Bagnall (2014), some of these measures require certain hyper-parameters to be set. The parameter optimization is done on the training set through cross-validation by allowing at most 100 model evaluations for each approach.
 
The most important parameter in LPS is the segment length setting as discussed.  We introduce two strategies for the segment length setting. The first strategy sets the segment length $L$ as the proportion of full time series length, $\gamma \in \{0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95\}$, and, furthermore, we consider the depth $D \in \{2,4,6\}$ based on a leave-one-out (LOO) cross-validation (CV) on the training data. For the cross-validation, we train 25 trees ($J=25$) for each fold. This version of LPS is named as LPSBest as it is  analogous to DTWBest. Hence, we allow for $\|\gamma\| \times \|D\|=7 \times 3 = 21$ model evaluations in our study. After the parameters providing the best cross-validation accuracy are obtained, we train $J=200$ trees in the ensemble with the selected parameters to obtain the final representation. The detailed results of LOO-CV are provided in the files section.
 
In the second strategy, the segment length $L$ is chosen randomly for each tree as the proportion of (full) time series length between $0.05 \times T$ and $0.95 \times T$. Also, the number of trees $J$ and the depth $D$ are fixed to $200$ and $6$, respectively, for all datasets. This version of LPS is referred to as LPS. The values of the parameters is set the same for all datasets to illustrate the robustness of LPS. In other words, no parameter tuning is conducted for this strategy. 

CODES

We implemented LPS as an R package. The source files are available at CRAN.

Recently, we also made a MATLAB implementation of LPS available. The source file is provided here in the files section. This version is a slightly different version than the one implemented with R but the overall idea is the same. 

HOW TO RUN LPS

The details of R implementation are described in the blog entry here.
The details of MATLAB implementation are provided in the blog entry here.

REFERENCES

J. Demˇsar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30, 2006.
P. Nemenyi. Distribution-free Multiple Comparisons. Princeton University, 1963
J. Lines and A. Bagnall. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, pages 1–28, 2014
 

Copyright © 2014 mustafa gokce baydogan

LinkedIn
Twitter
last.fm