Saturday, March 16, 2013

caretEnsemble Classification example

Here's a quick demo of how to fit a binary classification model with caretEnsemble.  Please note that I haven't spent as much time debugging caretEnsemble for classification models, so there's probably more bugs than my last post.  Also note that multi class models are not yet supported.

Wednesday, March 13, 2013

New package for ensembling R models


I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models.

Thursday, January 24, 2013

Time series cross-validation 5

The caret package for R now supports time series cross-validation!  (Look for version 5.15-052 in the news file).  You can use the createTimeSlices function to do time-series cross-validation with a fixed window, as well as a growing window.  This function generates a list of indexes for the training set, as well as a list of indexes for the test set, which you can then pass to the trainControl object.

Friday, July 6, 2012

Error metrics for multi-class problems in R: beyond Accuracy and Kappa

The caret package for R provides a variety of error metrics for regression models and 2-class classification models, but only calculates Accuracy and Kappa for multi-class models.  Therefore, I wrote the following function to allow caret:::train to calculate a wide variety of error metrics for multi-class problems:

Monday, June 11, 2012

Time series cross-validation 4: forecasting the S&P 500

I finally got around to publishing my time series cross-validation package to github, and I plan to push it out to CRAN  shortly.

You can clone the repo using github for mac, for windows, or linux, and then run the following script to check it out:

Monday, January 23, 2012

My first R package: parallel differential evolution

UPDATE: a better parallel algorythm will be included in a future version of DEoptim, so I've removed my package from CRAN.  You can still use the code from this post, but keep Josh's comments in mind.


Last night I was working on a difficult optimization problems, using the wonderful DEoptim package for R. Unfortunately, the optimization was taking a long time, so I thought I'd speed it up using a foreach loop, which resulted in the following function:


Here's what's going on: I divide the bounds for each parameter into n segments, and use a foreach loop to run DEoptim on each segment, collect the results of the loop, and then return the optimization results for the segment with the lowest value of the objective function.  Additionally, I defined a "parDEoptim" class to make it easier to combine the results during the foreach loop.  All of the work is still being done by the DEoptim algorithm.  All I've done is split up the problem into several chunks.

Here is an example, straight out of the DEoptim documentation:


In theory, on a 20-core machine, this should run a bit faster than the serial example.  Note that you may need to set itermax for the parallel run at a higher value than (itermax for the serial run)/(number of segments), as you want to make sure the algorithm can find the minimum of each segment.  Also note that, in this example, there are 20 segments on the interval c(-10,-10) to c(10,10), which means that 2 of the segments have boundaries at c(1,1), which is the global minimum of the function.  The DEoptim algorithm has no trouble finding a solution at the boundary of the parameter space, which is why it's so easy to parallelize.

Rumor has it that the next version of DEoptim will include foreach parallelization, but if you can't wait until then, I rolled up the above function into an R package and posted it to CRAN.  Let me know what you think!

Thursday, December 29, 2011

Benchmarking time series models

This is a quick post on the importance of benchmarking time-series forecasts.  First we need to reload the functions from my last few posts on times-series cross-validation.  (I copied the relevant code at the bottom of this post so you don't have to find it).

Next, we need to load data for the S&P 500.  To simplify things, and allow us to explore seasonality effects, I'm going to load monthly data, back to 1980.




Monday, December 12, 2011

Time series cross-validation 3

I've updated my time-series cross validation algorithm to fix some bugs and allow for a possible xreg term.     This allows for cross-validation of multivariate models, so long as they are specified as a function with the following paramters: x (the series to model), xreg (independent variables, optional), newxreg (xregs for the forecast), and h (the number of periods to forecast).  Note that h should equal the number of rows in the xreg matrix.  Also note that you need to forecast the xreg object BEFORE forecasting your x object.  For example, if you wish to forecast 12 months into the future, your xreg object should have 12 extra rows.

Monday, December 5, 2011

A pure R poker hand evaluator

There's already a lot of great posts out there about poker hand evaluators, so I'll keep this short.  Kenneth J. Shackleton recently released a very slick 5-card and 7-card poker hand evaluator called SpecialK.  This evaluator is licensed under GPL 3, and is described in detail in 2 blog posts: part 1 and part 2.  Since the provided code is open source, I felt free to hack around with it a bit, and ported the python source to R.

Tuesday, November 22, 2011

Time series cross-validation 2

In my previous post, I shared a function for parallel time-series cross-validation, based on Rob Hyndman's code.  I thought I'd expand on that example a little bit, and share some additional wrapper functions I wrote to test other forecasting algorithms.  Before you try this at home, be sure to load the cv.ts and tsSummary functions from my last post.

Sociable