Symbolic Representations for Multivariate Time Series Classification (SMTS)

Website is still under construction and missing some important links (April 19th, 2022)

This is a supporting page to our paper – Multivariate Time Series Classification with Learned Discretization (SMTS)

by Mustafa Gokce Baydogan and George Runger

*this study is presented in INFORMS 2012@Phoenix. The presentation is available

here

. Note that there might be changes on the approach (as well as results) compared to our submission.”

*the paper submitted to Data Mining and Knowledge Discovery will be available in

Papers

category in Files section.”

*the name of the study has changed to Learning a Symbolic Representation for Multivariate Time Series Classification (SMTS) after receiving suggestions from an anonymous reviewer during the review process.

DATASETS

We test our proposed approach on datasets from diﬀerent applications such as speech recognition, activity recognition, medicine and etc. The dataset characteristics are given below. This table provides the information about: Characteristics of MTS (number of classes, number of variables, length of time series, number of training instances and number of testing instances). Test performance is also reported for all datasets. Column ”CV” indicates if comparisons are also based on cross-validation. The source of the datasets are in the last column. The datasets are available here.

IMPORTANT UPDATE (April 7th, 2015): Our recent submission (

LPS paper

) to Data Mining and Knowledge Discovery added more multivariate time series classification data from various sources. We have added the new set of datasets in MATLAB format in the files section. The details are provided in the

data sets section

(the file size is around 313 MB) . Recently added time series datasets are also shown towards the end of the table below with red font color.

	# of	# of		Dataset Size
	classes	variables	Length	Train	Test	CV	Source
AUSLAN	95	22	45-136	1140	1425	10-fold	UCI
Pendigits	10	2	8	300	10692
Japanese Vowels	9	12	7-29	270	370
Robot Failure
LP1	4	6	15	38	50	5-fold
LP2	5	6	15	17	30
LP3	4	6	15	17	30
LP4	3	6	15	42	75
LP5	5	6	15	64	100
ECG	2	2	39-152	100	100	10-fold	Olszewski
Wafer	2	6	104-198	298	896
CMU_MOCAP_S16	2	62	127-580	29	29	10-fold	CMU MOCAP
ArabicDigits	10	13	4-93	6600	2200	x	UCI
CharacterTrajectories	20	3	109-205	300	2558	x
LIBRAS	15	2	45	180	180	x
uWaveGestureLibrary	8	3	315	200	4278	x	UCR
PEMS	7	963	144	267	173	x	UCI
KickvsPunch	2	62	274-841	16	10	x	CMU MOCAP
WalkvsRun	2	62	128-1918	28	16	x
Network Flow	2	4	50-997	803	534	x	Sübakan et al.
DigitsShape	4	2	30-98	24	16	x
Shapes	3	2	52-98	18	12	x

CODES

The codes are provided in the files section. Here is the

direct link

to the folder. You will find zip file containing:

R implementation: We use the “Random Forest” package in R so you need to install R software (http://www.r-project.org/) and then install the required library using the command install.packages(“randomForest”). We also provide a parallel implementation of SMTS for multicore computers. For parallel implementation, install “doMC” package and modify the number of cores to be used accordingly. Depending on the dataset, you can observe significant gains in computation times with the parallel implementation.
a C code and compiled libraries: The codebook generation is implemented in C and called directly from R. The code is compiled using an 64bit Ubuntu 12.04 system (*.so file is in the archive). For Windows (32bit or 64bit) or Linux system (32bit), you need to compile the C code yourself by running “R CMD SHLIB yourpath/mts_functions.c” on your command window (in Windows) or terminal (in Linux). Recently I compiled the code for Windows 64bit version and made dll file available in the folder. You need to modify the script that points to the dll file in the data preparation file as described below.
Scripts for data preparation and parameter selection: In order to read and prepare the data and select the parameters, two R scripts are implemented. Also in data preparation code, you will find a wrapper to use the functions implemented in C. If you are on Windows operating system, R commands to include the compiled library must be changed (must switch to *.dll from *.so after compiling the library in Windows).
Example datasets: GunPoint and Libras dataset from UCR time series database and UCI machine learning repository respectively are provided.

We also provided our cross-validation code. There are two R scripts modified to run cross-validation for evaluation purposes. An archive (zip file) is available in the same folder.

HOW TO RUN SMTS

Please include datasets and codes in the same folder. The algorithm is run in R software (use R code “smts_Main.r to run the algorithm). If you have any problems running the code, please

contact