Error rates on UCR time series datasets

Error rates of TSBF (with random and uniform features) over 10 replicates, nearest-neighbor classifiers with dynamic time warping distance (NNDTWBest and NNDTWNoWin), a nearest-neighbor classifier with sparse spatial sample kernels (NNSSSK), bag-of-words (BoW) approach with RF and SVM, RF (Raw) applied directly to time series values, and RF (Feature) applied directly to interval features. The time series classification datasets from UCR time series database (as of 2/13/2013). For more information about the datasets, please visit UCR time series database.
 
  TSBF NNDTW NN BoW Random Forest
  Random Uniform Best NoWin SSSKernel RF SVM raw feature
50Words 0.209 0.211 0.242 0.310 0.488 0.347 0.316 0.348 0.333
Adiac 0.245 0.295 0.391 0.396 0.575 0.322 0.325 0.361 0.249
Beef 0.287 0.460 0.467 0.500 0.633 0.267 0.267 0.300 0.257
CBF 0.009 0.004 0.004 0.003 0.090 0.030 0.048 0.112 0.076
Coffee 0.004 0.007 0.179 0.179 0.071 0.000 0.036 0.007 0.004
ECG 0.145 0.207 0.120 0.230 0.220 0.150 0.110 0.184 0.158
Face (all) 0.234 0.196 0.192 0.192 0.369 0.278 0.238 0.190 0.231
Face (four) 0.051 0.048 0.114 0.170 0.102 0.125 0.102 0.211 0.172
Fish 0.080 0.056 0.160 0.167 0.177 0.034 0.029 0.221 0.175
Gun-Point 0.011 0.015 0.087 0.093 0.133 0.013 0.407 0.073 0.010
Lighting-2 0.257 0.334 0.131 0.131 0.393 0.230 0.328 0.244 0.252
Lighting-7 0.262 0.370 0.288 0.274 0.438 0.301 0.370 0.263 0.295
OliveOil 0.090 0.167 0.167 0.133 0.300 0.267 0.233 0.107 0.093
OSU Leaf 0.329 0.155 0.384 0.409 0.326 0.240 0.153 0.518 0.443
Swedish Leaf 0.075 0.088 0.157 0.210 0.339 0.149 0.125 0.126 0.088
Synt. Control 0.008 0.009 0.017 0.007 0.067 0.017 0.017 0.046 0.017
Trace 0.020 0.020 0.010 0.000 0.300 0.010 0.000 0.165 0.071
Two Patterns 0.001 0.004 0.002 0.000 0.087 0.034 0.010 0.158 0.190
Wafer 0.004 0.003 0.005 0.020 0.029 0.011 0.010 0.012 0.002
Yoga 0.149 0.156 0.155 0.164 0.172 0.159 0.145 0.191 0.188
ChlorineConc. 0.336 0.346 0.350 0.352 0.428 0.384 0.405 0.291 0.272
CinC\_ECG\_torso 0.262 0.221 0.070 0.349 0.438 0.167 0.164 0.250 0.088
Cricket\_X 0.278 0.256 0.236 0.223 0.585 0.346 0.305 0.427 0.362
Cricket\_Y 0.259 0.260 0.197 0.208 0.654 0.300 0.313 0.396 0.330
Cricket\_Z 0.263 0.244 0.180 0.208 0.574 0.297 0.295 0.406 0.380
DiatomSize 0.126 0.098 0.065 0.033 0.173 0.114 0.111 0.093 0.123
ECGFiveDays 0.183 0.239 0.203 0.232 0.360 0.334 0.164 0.210 0.062
FacesUCR 0.090 0.107 0.088 0.095 0.356 0.158 0.137 0.215 0.192
Haptics 0.488 0.478 0.588 0.623 0.591 0.562 0.630 0.551 0.548
InlineSkate 0.603 0.604 0.613 0.616 0.729 0.638 0.629 0.665 0.716
ItalyPowerDemand 0.096 0.107 0.045 0.050 0.101 0.058 0.044 0.033 0.040
MALLAT 0.037 0.036 0.086 0.066 0.153 0.042 0.098 0.082 0.094
MedicalImages 0.269 0.279 0.253 0.263 0.463 0.379 0.401 0.277 0.304
MoteStrain 0.135 0.102 0.134 0.165 0.166 0.158 0.177 0.119 0.103
SonyRobot 0.175 0.225 0.305 0.275 0.376 0.398 0.409 0.321 0.280
SonyRobotII 0.196 0.222 0.141 0.169 0.339 0.205 0.154 0.197 0.201
StarLightCurves 0.022 0.025 0.095 0.093 0.135 0.023 0.021 0.052 0.036
Symbols 0.034 0.025 0.062 0.050 0.184 0.077 0.088 0.148 0.138
TwoLeadECG 0.046 0.030 0.132 0.096 0.257 0.112 0.248 0.268 0.119
uWaveGesture\_X 0.164 0.160 0.227 0.273 0.358 0.260 0.242 0.245 0.210
uWaveGesture\_Y 0.249 0.239 0.301 0.366 0.493 0.354 0.352 0.314 0.290
uWaveGesture\_Z 0.217 0.213 0.322 0.342 0.439 0.343 0.325 0.290 0.282
Thorax1 0.138 0.158 0.185 0.209 0.362 0.488 0.489 0.123 0.112
Thorax2 0.130 0.116 0.129 0.135 0.315 0.184 0.220 0.090 0.079

(1) TSBF results [a] are also provided in Results section. Detailed information about the parameter settings are also available in the files.

(2) NNDTW results are from UCR time series database.

(3) SSSK code is provided by Pavel Kuksa.  The series are first discretized to generate a symbolic representation. Then, the similarity between time series is computed over subsequences. We consider double and triple kernels as proposed by [b] (given as "d" and "t" respectively in the table). There are a number of hyperparameters that have to be chosen carefully, such as kernel parameter d, alphabet size, discretization scheme (uniform binning, VQ, kmeans, etc.) and related parameters (e.g., number of bins b). We discretize the time series using SAX ([c]).We consider five levels for the alphabet size and interval lengths of {4; 8; 12; 16; 20}. The kernel parameter d is selected from the {5; 10; : : : ; min(50; wordlength/2)} (Different for ItalyPowerDemand dataset since the length is only 24 time units). To set the parameters, we perform leave-one-out cross-validation (CV) on the training data. The parameter combination providing the best CV error rate is used for testing (also given in the table). The MATLAB code for parameter selection and classification is available on http://www.mustafabaydogan.com/files/viewcategory/6-time-series-classification-based-on-bag-of-features-tsbf.html. (you still need to obtain SSSK and SAX code)

(4) Our supervised BoF approach is compared to an unsupervised BoW approach with a codebook derived from K-means clustering. In the unsupervised approach, the Euclidean distance between subsequences is computed. The subsequences are generated for each level of z as in TSBF with uniform subsequence extraction. Then K-means clustering with k selected from the set {k = 25, 50, 100, 250, 500, 1000} is used to label subsequences for each z setting. We use the histogram of the cluster assignments to generate the codebook.  Two classifiers, Random Forest (RF) and Support Vector Machine (SVM), are trained on the codebook for classification. For SVM, the z and k settings, and the parameters of the SVM (kernel type, cost parameter), are determined based on 10-fold cross-validation (CV) on the training data.  The details are provided in our paper submitted to PAMI.

(5)  Random Forest is trained on the observations with default settings (no feature extraction, observed values are used as features). We also extract interval features over 5 time units segments. These features include the mean and variance of the values over the segment, and the slope of the fitted regression line. The number of trees is set based on the progress of out-of-bag (OOB) error rates on the training data. This blog entry further discusses the supervised learning approaches on time series classification problems.

 
[a] M. G. Baydogan, G. C. Runger, and E. Tuv, “A bag-of-features framework to classify time series,” Technical Report, 2012.
[b] P. Kuksa and V. Pavlovic, “Spatial representation for efficient sequence classification,” in Pattern Recognition (ICPR), 2010 20th International Conference on, aug. 2010, pp. 3320 –3323.
[c] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series,” Data Mining and Knowledge Discovery, vol. 15, pp. 107–144, 2007.

Copyright © 2014 mustafa gokce baydogan

LinkedIn
Twitter
last.fm