caretEnsemble is a package for making ensembles of caret models. You should already be somewhat familiar with the caret package before trying out caretEnsemble.

caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble and caretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models.

caretList

caretList is a flexible function for fitting many different caret models, with the same resampling parameters, to the same dataset. It returns a convenient list of caret objects which can later be passed to caretEnsemble and caretStack. caretList has almost exactly the same arguments as train (from the caret package), with the exception that the trControl argument comes last. It can handle both the formula interface and the explicit x, y interface to train. As in caret, the formula interface introduces some overhead and the x, y interface is preferred.

caretEnsemble has 2 arguments that can be used to specify which models to fit: methodList and tuneList. methodList is a simple character vector of methods that will be fit with the default train parameters, while tuneList can be used to customize the call to each component model and will be discussed in more detail later. First, lets build an example dataset (adapted from the caret vignette):

#Adapted from the caret vignette
library('caret')
library('mlbench')
library('pROC')
data(Sonar)
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTrain,]
testing <- Sonar[-inTrain,]
my_control <- trainControl(
  method='boot',
  number=25,
  savePredictions=TRUE,
  classProbs=TRUE,
  index=createResample(training$Class, 25),
  summaryFunction=twoClassSummary
  )

Notice that we are explicitly setting the resampling index to being used in trainControl. If you do not set this index manually, caretList will attempt to set it for automatically, but it’s generally a good idea to set it yourself.

Now we can use caretList to fit a series of models (each with the same trControl):

library('rpart')
library('caretEnsemble')
model_list <- caretList(
  Class~., data=training,
  trControl=my_control,
  methodList=c('glm', 'rpart')
  )

(As with train, the formula interface is convienent but introduces move overhead. For large datasets the explicitly passing x and y is preferred). We can use the predict function to extract predicitons from this object for new data:

p <- as.data.frame(predict(model_list, newdata=head(testing)))
print(p)
glm rpart
0.0000000 0.7794118
0.0000000 0.0882353
0.0000000 0.0882353
0.0000675 0.0882353
0.0000000 0.6666667
0.7654240 0.7794118

If you desire more control over the model fit, use the caretModelSpec to contruct a list of model specifications for the tuneList argument. This argumenent can be used to fit several different variants of the same model, and can also be used to pass arguments through train down to the component functions (e.g. trace=FALSE for nnet):

library('mlbench')
library('randomForest')
library('nnet')
model_list_big <- caretList(
  Class~., data=training,
  trControl=my_control,
  metric='ROC',
  methodList=c('glm', 'rpart'),
  tuneList=list(
    rf1=caretModelSpec(method='rf', tuneGrid=data.frame(.mtry=2)),
    rf2=caretModelSpec(method='rf', tuneGrid=data.frame(.mtry=10), preProcess='pca'),
    nn=caretModelSpec(method='nnet', tuneLength=2, trace=FALSE)
  )
)

Finally, you should note that caretList does not support custom caret models. Fitting those models are beyond the scope of this vignette, but if you do so, you can manually add them to the model list (e.g. model_list_big[['my_custom_model']] <- my_custom_model). Just be sure to use the same re-sampling indexes in trControl as you use in the caretList models!

caretEnsemble

caretList is the preferred way to construct list of caret models in this package, as it will ensure the resampling indexes are identical across all models. Lets take a closer look at our list of models:

xyplot(resamples(model_list))

As you can see from this plot, these 2 models are un-correlated, and the rpart model is ocassionally anti-predictive, with a few re-samples showing AUCS around 0.3 to 0.4.

We can confirm the 2 model’s correlation with the modelCor function from caret (caret has a lot of convienent functions for analyzing lists of models):

modelCor(resamples(model_list))
##             glm     rpart
## glm   1.0000000 0.5493246
## rpart 0.5493246 1.0000000

These 2 models make a good candidate for an ensemble: their predicitons are fairly un-correlated, but their overall accuaracy is similar. We do a simple, linear greedy optimization on AUC using caretEnsemble:

greedy_ensemble <- caretEnsemble(model_list)
summary(greedy_ensemble)
## The following models were ensembled: glm, rpart 
## They were weighted: 
## 0.38 0.62
## The resulting AUC is: 0.7632
## The fit for each individual model on the AUC is: 
##  method    metric   metricSD
##     glm 0.6972100 0.07499503
##   rpart 0.7156949 0.06109738

The ensemble’s AUC on the training set resamples is 0.76, which is about 7% better than the best individual model. We can confirm this finding on the test set:

library('caTools')
model_preds <- lapply(model_list, predict, newdata=testing, type='prob')
model_preds <- lapply(model_preds, function(x) x[,'M'])
model_preds <- data.frame(model_preds)
ens_preds <- predict(greedy_ensemble, newdata=testing)
model_preds$ensemble <- ens_preds
colAUC(model_preds, testing$Class)
##               glm     rpart  ensemble
## M vs. R 0.6496914 0.6566358 0.6983025

The ensemble’s AUC on the test set is about 6% higher than the best individual model.

We can also use varImp to extract the variable importances from each member of the ensemble, as well as the final ensemble model:

varImp(greedy_ensemble)
variable glm rpart ensemble
V1 1.1333169 0.000000 0.4306604
V2 3.0996403 0.000000 1.1778633
V3 2.8104353 0.000000 1.0679654
V4 2.6724900 0.000000 1.0155462
V5 0.0612158 0.000000 0.0232620
V6 0.9468313 0.000000 0.3597959
V7 1.1944628 0.000000 0.4538959
V8 2.9547546 0.000000 1.1228068
V9 3.2537912 15.416273 10.7945297
V10 0.3012508 11.979705 7.5418927
V11 0.5166883 17.132903 10.8187413
V12 1.3496604 15.256820 9.9720990
V13 0.9620234 10.300580 6.7519283
V14 1.1891755 0.000000 0.4518867
V15 0.3819398 5.776464 3.7265448
V16 0.2577508 7.754766 4.9059005
V17 0.8271323 6.173673 4.1419874
V18 0.3369883 6.022617 3.8620780
V19 3.0106838 0.000000 1.1440598
V20 4.2131978 0.000000 1.6010151
V21 3.5798764 0.000000 1.3603530
V22 2.7322766 0.000000 1.0382651
V23 1.0892770 0.000000 0.4139253
V24 2.2190151 0.000000 0.8432257
V25 1.4311691 0.000000 0.5438443
V26 0.4463702 0.000000 0.1696207
V27 0.5780564 4.186200 2.8151054
V28 0.4616641 0.000000 0.1754324
V29 0.6984145 0.000000 0.2653975
V30 3.3782158 0.000000 1.2837220
V31 4.7123197 0.000000 1.7906815
V32 1.5762180 0.000000 0.5989628
V33 1.0562030 0.000000 0.4013571
V34 3.8445765 0.000000 1.4609391
V35 4.7097589 0.000000 1.7897084
V36 2.8446737 0.000000 1.0809760
V37 0.9291063 0.000000 0.3530604
V38 0.7438946 0.000000 0.2826799
V39 0.6709937 0.000000 0.2549776
V40 0.6935602 0.000000 0.2635529
V41 2.8604649 0.000000 1.0869767
V42 4.5521294 0.000000 1.7298092
V43 0.1451172 0.000000 0.0551445
V44 1.4977606 0.000000 0.5691490
V45 0.2963071 0.000000 0.1125967
V46 0.4392124 0.000000 0.1669007
V47 0.9161909 0.000000 0.3481525
V48 3.1502310 0.000000 1.1970878
V49 2.0082597 0.000000 0.7631387
V50 3.6268111 0.000000 1.3781882
V51 1.6816694 0.000000 0.6390344
V52 0.9801068 0.000000 0.3724406
V53 0.0955147 0.000000 0.0362956
V54 1.7373624 0.000000 0.6601977
V55 0.4035841 0.000000 0.1533620
V56 1.8294611 0.000000 0.6951952
V57 0.3157968 0.000000 0.1200028
V58 1.7584441 0.000000 0.6682088
V59 1.8365071 0.000000 0.6978727
V60 0.0000000 0.000000 0.0000000

(The columns each sum up to 100.)

caretStack

caretStack allows us to move beyond simple blends of models to using “meta-models” to ensemble collections of predictive models. DO NOT use the trainControl object you used to fit the training models to fit the ensemble. The re-sampling indexes will be wrong. Fortunately, you don’t need to be fastidious with re-sampling indexes for caretStack, as it only fits one model, and the defaults train uses will usually work fine:

glm_ensemble <- caretStack(
  model_list, 
  method='glm',
  metric='ROC',
  trControl=trainControl(
    method='boot',
    number=10,
    savePredictions=TRUE,
    classProbs=TRUE,
    summaryFunction=twoClassSummary
  )
)
model_preds2 <- model_preds
model_preds2$ensemble <- predict(glm_ensemble, newdata=testing, type='prob')$M
CF <- coef(glm_ensemble$ens_model$finalModel)[-1]
colAUC(model_preds2, testing$Class)
##               glm     rpart ensemble
## M vs. R 0.6496914 0.6566358 0.695216
CF/sum(CF)
##       glm     rpart 
## 0.3535492 0.6464508

Note that glm_ensemble$ens_model is a regular caret object of class train. The glm-weighted model weights (glm vs rpart) and test-set AUCs are extremely similar to the caretEnsemble greedy optimization.

We can also use more sophisticated ensembles than simple linear weights, but these models are much more succeptible to over-fitting, and generally require large sets of resamples to train on (n=50 or higher for bootstrap samples). Lets try one anyways:

library('gbm')
gbm_ensemble <- caretStack(
  model_list, 
  method='gbm',
  verbose=FALSE,
  tuneLength=10,
  metric='ROC',
  trControl=trainControl(
    method='boot',
    number=10,
    savePredictions=TRUE,
    classProbs=TRUE,
    summaryFunction=twoClassSummary
  )
)
model_preds3 <- model_preds
model_preds3$ensemble <- predict(gbm_ensemble, newdata=testing, type='prob')$M
colAUC(model_preds3, testing$Class)
##               glm     rpart  ensemble
## M vs. R 0.6496914 0.6566358 0.6851852

In this case, the sophisticated ensemble is no better than a simple weighted linear combination. Non-linear ensembles seem to work best when you have:

  1. Lots of data.
  2. Lots of models with similar accuracies.
  3. Your models are un-correllated: each one seems to capture a different aspect of the data, and different models perform best on different subsets of the data.