## Updated TSPD codes are online

Last Updated on Sunday, 29 December 2013 01:02 Sunday, 22 December 2013 09:50

After submission of our paper named "Supervised Time Series Pattern Discovery through Local Importance" (TSPD) (supporting page), we made the codes available online. The functions used in TSPD are implemented as part of my recent R package called LPS. Source code for LPS is available here.

We illustrated the performance of TSPD on classification problems. R code for classification is provided in Files section. This example uses GunPoint dataset from UCR Time Series Database. Here, I will go over the steps and explain how to run TSPD for classification.

__ 1. Calling the package and setting the parameters:__ Assuming that you have installed packages LPS and randomForest, we call them using require function. We set the parameters as described in the paper. We use the same parameter setting for all datasets as mentioned. Comments (after #) clearly describe the correspondence between the variables and the parameters of TSPD.

require(LPS) require(randomForest) #Parameters: Corresponding parameter notation in the paper is provided nrep=10 # number of replications in TSPD treePerIter=50 # J=J_I=J_P number of trees intmaxfrac=0.05 # I(max) maximum interval length as a fraction of TS length maxshapefrac=0.25 # used to set L based on a fraction of TS length kfrac=c(2,1,0.5,0.25) # K (number of patterns) is set based on certain levels of number of training data (N) ksteps=c(0,100,500,1000,10000) # N levels for setting K (i.e. if N<100 K=kfrac[1]*N which is K=2N)

__ 2. Organization of the training and test files, setting parameters that are based on training dataset characteristics:__ This consists of three main tasks. The files are read and we get the class information. The time series are standardized to zero mean and deviation of one to make the approaches comparable to DTW results provided by UCR Time Series Database. Then we set the number of patterns and maximum possible interval length based on the number of training instances and time series length.

#read training data and characteristics traindata=as.matrix(read.table("GunPoint_TRAIN")) trainclass=traindata[,1] noftrain=nrow(traindata) traindata=t(apply(traindata[,2:ncol(traindata)], 1, function(x) (x-mean(x))/sd(x))) nofclass=length(unique(trainclass)) lenseries=ncol(traindata) #read test data and characteristics testdata=as.matrix(read.table("GunPoint_TEST")) noftest=nrow(testdata) testclass=testdata[,1] testdata=t(apply(testdata[,2:ncol(testdata)], 1, function(x) (x-mean(x))/sd(x))) nofpattern=floor(kfrac[findInterval(noftrain,ksteps)]*noftrain) # setting K based on N intmaxL=floor(lenseries*intmaxfrac) # maximum interval length for feature generation

__ 3. Training:__ This consists of training RFint and RFpattern for nrep replications. Codes for one replication are given below. We select a random interval length between 5 and I(max) time units and train RFint on the interval representation. We sample patterns based on the local importance from RFint and compute best matching distances of time series to patterns. We then train RFpattern on this representation.

allvotes=matrix(0,noftest, nofclass) allvotesOOB=matrix(0,noftrain, nofclass) shapeletInfo=list(select=matrix(0,nofpattern,nrep),level=matrix(0,nofpattern,nrep))

intlen=max(5,floor(runif(1)*intmaxL)+1) # select random interval length (w) between 5 and I(max)-> intmaxL slidelen=floor(intlen/2) # set w=d/2 as described in the paper maxInt=floor((lenseries*maxshapefrac)/(intlen))+1 # set K level #train RFint train=intervalFeatures(traindata,intlen,slidelen) RFint <- randomForest(train$features,factor(trainclass),ntree=treePerIter,localImp=TRUE) localimp=RFint$localImp #train RFpattern shapelet=shapeletSimilarity(traindata,localimp,train,maxInt,nshapelet=nofpattern) RFpattern=randomForest(shapelet$similarity,factor(trainclass),ntree=treePerIter) allvotesOOB=allvotesOOB+predict(RFpattern,type='vote') shapeletInfo$select[,n]=shapelet$sel shapeletInfo$level[,n]=shapelet$lev

__ 4. Testing:__ Testing requires computation of best matching distances of test time series to patterns and classification by RFpattern. The voting results are aggregated using allvotes matrix (of dimensions noftest x nofclass). The largest vote determines the class for each time series. Codes for one replication of TSPD for testing is provided below.

test=shapeletSimilarityTest(testdata, traindata, shapelet$importanceOrder, shapeletInfo, train, n) prediction=predictShapelet(RFpattern, test$similarity, whichTrees=c(1,treePerIter)) allvotes=allvotes+prediction$vote

**A SAMPLE RUN RESULT**

Screenshot of a sample run of TSPD on GunPoint dataset for 10 replications is provided below: (Ubuntu 12.10 system with 8 GB RAM, dual core CPU i7-3620M 2.7 GHz):

**Sample patterns found to be important by RFpattern**

- ► 2016 (2)
- ► 2015 (2)
- ► 2014 (6)
- ► 2013 (9)
- ► 2012 (8)