Multivariate Time Series Classification with Learned Discretization

This is a supporting page to our paper -  Multivariate Time Series Classification with Learned Discretization (SMTS)
by Mustafa Gokce Baydogan and George Runger
*this study is presented in INFORMS 2012@Phoenix.  The presentation is available here. Note that there might be changes on the approach (as well as results) compared to our submission."
*the paper submitted to Data Mining and Knowledge Discovery will be available in Papers category in Files section."
*the name of the study has changed to Learning a Symbolic Representation for Multivariate Time Series Classification (SMTS) after receiving suggestions from an anonymous reviewer during the review process.
We test our proposed approach on datasets from diļ¬€erent applications such as speech recognition, activity recognition, medicine and etc. The dataset characteristics are given below. This table provides the information about: Characteristics of MTS (number of classes, number of variables, length of time series, number of training instances and number of testing instances). Test performance is also reported for all datasets. Column ”CV” indicates if comparisons are also based on cross-validation. The source of the datasets are in the last column. The datasets are available here.  
IMPORTANT UPDATE (April 7th, 2015): Our recent submission (LPS paper) to Data Mining and Knowledge Discovery added more multivariate time series classification data from various sources. We have added the new set of datasets in MATLAB format in the files section. The details are provided in the data sets section (the file size is around 313 MB) . Recently added time series datasets are also shown towards the end of the table below with red font color.
  # of # of   Dataset Size        
  classes variables Length Train Test CV Source
AUSLAN 95 22 45-136 1140 1425 10-fold UCI
Pendigits 10 2 8 300 10692
Japanese Vowels 9 12 7-29 270 370
Robot Failure            
LP1 4 6 15 38 50 5-fold
LP2 5 6 15 17 30
LP3 4 6 15 17 30
LP4 3 6 15 42 75
LP5 5 6 15 64 100
ECG 2 2 39-152 100 100 10-fold Olszewski
Wafer 2 6 104-198 298 896
CMU_MOCAP_S16 2 62 127-580 29 29 10-fold CMU MOCAP
ArabicDigits 10 13 4-93 6600 2200 x UCI
CharacterTrajectories 20 3 109-205 300 2558 x
LIBRAS 15 2 45 180 180 x
uWaveGestureLibrary 8 3 315 200 4278 x UCR
PEMS 7 963 144 267 173 x UCI
KickvsPunch 2 62 274-841 16 10 x CMU MOCAP
WalkvsRun 2 62 128-1918 28 16 x
Network Flow 2 4 50-997 803 534 x Sübakan et al.
DigitsShape 4 2 30-98 24 16 x
Shapes 3 2 52-98 18 12 x


The codes are provided in the files section. Here is the direct link to the folder. You will find zip file containing:
  1. R implementation: We use the "Random Forest" package in R so you need to install R software ( and then install the required library using the command install.packages("randomForest").  We also provide a parallel implementation of SMTS for multicore computers. For parallel implementation, install "doMC" package and modify the number of cores to be used accordingly. Depending on the dataset, you can observe significant gains in computation times with the parallel implementation.
  2. C code and compiled libraries: The codebook generation is implemented in C and called directly from R. The code is compiled using an 64bit Ubuntu 12.04 system (*.so file is in the archive). For Windows (32bit or 64bit) or Linux system (32bit), you need to compile the C code yourself by running "R CMD SHLIB yourpath/mts_functions.c" on your command window (in Windows) or terminal (in Linux). Recently I compiled the code for Windows 64bit version and made dll file available in the folder. You need to modify the script that points to the dll file in the data preparation file as described below.
  3. Scripts for data preparation and parameter selection: In order to read and prepare the data and select the parameters, two R scripts are implemented. Also in data preparation code, you will find a wrapper to use the functions implemented in C. If you are on Windows operating system, R commands to include the compiled library must be changed (must switch to *.dll from *.so after compiling the library in Windows).
  4. Example datasets: GunPoint and Libras dataset from UCR time series database and UCI machine learning repository  respectively are provided.

We also provided our cross-validation code. There are two R scripts modified to run cross-validation for evaluation purposes. An archive (zip file) is available in the same folder.


Please include datasets and codes in the same folder.  The algorithm is run in R software (use R code "smts_Main.r to run the algorithm). If you have any problems running the code, please contact


Copyright © 2014 mustafa gokce baydogan