LPS package is updatedLast Updated on Thursday, 03 April 2014 23:24 Monday, 23 December 2013 14:26 I recently updated the Learned Pattern Similarity (LPS) package. The implementation for computing LPS is now faster. Time to train is almost the half of the time of earlier implementation. Testing time is decreased significantly. The new version will be uploaded in the Files section soon. I am working on a comparison of the current implementation with the new one. Once I am done with the experiments, this page will be updated. Stay tuned! Updated TSPD codes are onlineLast Updated on Sunday, 29 December 2013 01:02 Sunday, 22 December 2013 09:50 After submission of our paper named "Supervised Time Series Pattern Discovery through Local Importance" (TSPD) (supporting page), we made the codes available online. The functions used in TSPD are implemented as part of my recent R package called LPS. Source code for LPS is available here. We illustrated the performance of TSPD on classification problems. R code for classification is provided in Files section. This example uses GunPoint dataset from UCR Time Series Database. Here, I will go over the steps and explain how to run TSPD for classification. 1. Calling the package and setting the parameters: Assuming that you have installed packages LPS and randomForest, we call them using require function. We set the parameters as described in the paper. We use the same parameter setting for all datasets as mentioned. Comments (after #) clearly describe the correspondence between the variables and the parameters of TSPD. require(LPS) require(randomForest) #Parameters: Corresponding parameter notation in the paper is provided nrep=10 # number of replications in TSPD treePerIter=50 # J=J_I=J_P number of trees intmaxfrac=0.05 # I(max) maximum interval length as a fraction of TS length maxshapefrac=0.25 # used to set L based on a fraction of TS length kfrac=c(2,1,0.5,0.25) # K (number of patterns) is set based on certain levels of number of training data (N) ksteps=c(0,100,500,1000,10000) # N levels for setting K (i.e. if N<100 K=kfrac[1]*N which is K=2N) 2. Organization of the training and test files, setting parameters that are based on training dataset characteristics: This consists of three main tasks. The files are read and we get the class information. The time series are standardized to zero mean and deviation of one to make the approaches comparable to DTW results provided by UCR Time Series Database. Then we set the number of patterns and maximum possible interval length based on the number of training instances and time series length. #read training data and characteristics traindata=as.matrix(read.table("GunPoint_TRAIN")) trainclass=traindata[,1] noftrain=nrow(traindata) traindata=t(apply(traindata[,2:ncol(traindata)], 1, function(x) (xmean(x))/sd(x))) nofclass=length(unique(trainclass)) lenseries=ncol(traindata) #read test data and characteristics testdata=as.matrix(read.table("GunPoint_TEST")) noftest=nrow(testdata) testclass=testdata[,1] testdata=t(apply(testdata[,2:ncol(testdata)], 1, function(x) (xmean(x))/sd(x))) nofpattern=floor(kfrac[findInterval(noftrain,ksteps)]*noftrain) # setting K based on N intmaxL=floor(lenseries*intmaxfrac) # maximum interval length for feature generation 3. Training: This consists of training RFint and RFpattern for nrep replications. Codes for one replication are given below. We select a random interval length between 5 and I(max) time units and train RFint on the interval representation. We sample patterns based on the local importance from RFint and compute best matching distances of time series to patterns. We then train RFpattern on this representation.
Initialize the matrices for storing predictions and data structure to store pattern information over replications
allvotes=matrix(0,noftest, nofclass) allvotesOOB=matrix(0,noftrain, nofclass) shapeletInfo=list(select=matrix(0,nofpattern,nrep),level=matrix(0,nofpattern,nrep))
Single replication of training TSPD
intlen=max(5,floor(runif(1)*intmaxL)+1) # select random interval length (w) between 5 and I(max)> intmaxL slidelen=floor(intlen/2) # set w=d/2 as described in the paper maxInt=floor((lenseries*maxshapefrac)/(intlen))+1 # set K level #train RFint train=intervalFeatures(traindata,intlen,slidelen) RFint < randomForest(train$features,factor(trainclass),ntree=treePerIter,localImp=TRUE) localimp=RFint$localImp #train RFpattern shapelet=shapeletSimilarity(traindata,localimp,train,maxInt,nshapelet=nofpattern) RFpattern=randomForest(shapelet$similarity,factor(trainclass),ntree=treePerIter) allvotesOOB=allvotesOOB+predict(RFpattern,type='vote') shapeletInfo$select[,n]=shapelet$sel shapeletInfo$level[,n]=shapelet$lev 4. Testing: Testing requires computation of best matching distances of test time series to patterns and classification by RFpattern. The voting results are aggregated using allvotes matrix (of dimensions noftest x nofclass). The largest vote determines the class for each time series. Codes for one replication of TSPD for testing is provided below.
Single replication of testing TSPD
test=shapeletSimilarityTest(testdata, traindata, shapelet$importanceOrder, shapeletInfo, train, n) prediction=predictShapelet(RFpattern, test$similarity, whichTrees=c(1,treePerIter)) allvotes=allvotes+prediction$vote A SAMPLE RUN RESULT Screenshot of a sample run of TSPD on GunPoint dataset for 10 replications is provided below: (Ubuntu 12.10 system with 8 GB RAM, dual core CPU i73620M 2.7 GHz):
Sample patterns found to be important by RFpattern
This output can be compared to the results from other shapelet studies. Simply a Google search on 'Gunpoint shapelet' should return some relevant links. There is a good summary of the data sets and descriptions in the jmotif Google Code Homepage. The patterns discovered match with class descriptions.
Extending the Time Series BagofFeatures (TSBF) for multivariate time series classificationLast Updated on Tuesday, 29 October 2013 10:24 Tuesday, 29 October 2013 01:00 During our revision for our SMTS paper, we have extended TSBF to multivariate time series classification (MTSBF) for comparison purposes. Codes are available here. MTSBF performs better than SMTS for some datasets where SMTS outperforms MTSBF significantly for the others. This is due to the problem characteristics. When the relationships between the attributes are important in the definition of a class, SMTS performs better in general with a representation that is quite simple conceptually and operationally. This archive (*.zip file) stores the required R files, compiled files (*.so file for linux based systems) and source codes (in C language). If you want to run this on Windows, you need to compile c files using command "R CMD SHLIB /pathname" to generate a *.dll (dynamic library). You will need to modify the file named 'multivariateTSBF_functions.r' accordingly. This file does not check the operating system and looks for "*.so" file in its current form. We also have a sample dataset (ECG dataset from http://www.cs.cmu.edu/~bobski/). MTSBF uses the same file structure as SMTS. Let me know if you have any questions. My seminar on Learned Pattern Similarity at Arizona State UniversityLast Updated on Saturday, 12 October 2013 19:49 Saturday, 12 October 2013 19:44 On October 11th, 2013, I gave a talk on our recent work "Learned Pattern Similarity (LPS)" in Computing, Informatics and Decision Systems Engineering (CIDSE) at Arizona State University (ASU). The announcement is here. The presentation is available in files section. This version is slightly different (probably better) than the version presented at INFORMS conference in Minneapolis (2013). The presentation had to be too short for INFORMS because of the time limitations where I had enough time for this seminar. Best viewed in slideshow view as I have animations. Let me know if you have any questions. 



More Articles... 

«StartPrev123NextEnd» 

Page 2 of 3 
 ► 2016 (2)
 ► 2015 (2)
 ► 2014 (6)
 ► 2013 (9)
 ► 2012 (8)