Category Archives: Time Series Data Mining

Performance improvements for TSBF

Note that this is an old post from 2012-07-19 moved to new site.

Performance improvements for TSBF

While running TSBF on the new data from UCR database for our revisions to the paper, I realized that current R implementation is not efficient.  Overall approach is still not implemented in a good way since feature extraction is done separately (C code) where the connection with R is through text files. This affects the time to run TSBF significantly since reading files into matrices in R is taking substantial time (especially for large datasets). To shorten the time for reading feature matrices and handle the memory efficiently, I did the following revisions:

1)  Removing matrices that are not used (memory management) illustrated below for some of the matrices.

2)  Reading subsequence features to a matrix using scan (improves the memory usage and computation time) .
Before (read.table reads to a data.frame which is not efficient memorywise if the data is numeric, use of matrix instead improves the memory usage and time to read):
#read generated features
subtr<- read.table(“RFsub_train”)
subtst<- read.table(“RFsub_test”)
After (added two lines of code to c implementation so that we know the number of subsequences per time series and number of columns of the feature matrix
#read subsequence data information and generated features
stats<-scan(“stats”,n=2,quiet=TRUE) #[1] number of subsequences [2]  number of features
Performance with and without scan on a Windows 7 system with i5 2.13 Ghz processor (feature matrix for subsequence features of CinC_ECG_torso dataset, matrix size: 407100 X 102):
system.time(subtst<- read.table(“RFsub_test”)) 
  user         system  elapsed 
1141.50       6.89    1169.18
  user      system    elapsed 
116.93     2.48        121.59

Please let me know if you have any questions! The direct link to the folder for the updated files is

