What is boostr? In brief, boostr was designed to be a software “laboratory” of sorts. This package is primarily meant to help you tinker with and evaluate your (boosting) algorithms. In a sense, boostr is here to let you explore and refine.

What is boostr not? boostr is not here to design algorithms / boosting procedures for you. As far as I know, no software can do that (yet). If you don’t have an algorithm to play with, but still are interested in this package: don’t worry! In addition to letting you bagg your favorite estimators, boostr implements three classical boosting algorithms, with the freedom to mix and match aggregators and reweighters, provided the pair are compatible. For a more thorough look at the various user input in the boostr framework check out this vignette.

Since this is meant to be a “dive right in” kind of vignette, I’m going to assume you are cursorily familiar with the principle behind boosting. In particular, I’m assuming you’ve seen one of the classic boosting algoritms like “AdaBoost”, and have a feel for how boosting might be generalized. If you don’t, check out the paper behind boostr. The paper may feel a bit math-y but I promise it’s a pretty easy read.

# Diving right in

Let’s say you wanted to boost an svm according to the arc-x4 boosting algorithm. Well, good news: boostr implements this algorithm for you with the boostWithArcX4 function.

library(mlbench)
data(Glass)
set.seed(1234)
boostedSVM1 <-
boostr::boostWithArcX4(x = list(train = e1071::svm),
B = 3,
data = Glass,
.procArgs = list(
.trainArgs=list(
formula=formula(Type~.),
cost=100)))
## Warning: Walker's alias method used: results are different from R < 2.2.0
boostedSVM1
## A boostr object composed of 3 estimators.
##
## Available performance metrics: oobErr, oobConfMat, errVec
##
## Structure of reweighter output:
## List of 2
##  $weights: num [1:3, 1:214] 0.00806 0.00491 0.00319 0.00403 0.00491 ... ##$ m      : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ...
##
## Performance of Boostr object on Learning set:
## $oobErr ## [1] 0.08879 ## ##$oobConfMat
##         oobResponse
## oobPreds  1  2  3  5  6  7
##        1 63  5  1  0  0  1
##        2  6 70  2  0  1  0
##        3  1  1 14  0  0  0
##        5  0  0  0 13  1  0
##        6  0  0  0  0  7  0
##        7  0  0  0  0  0 28
##
## $errVec ## [1] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ## [36] 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ## [141] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [211] 0 0 0 0 In boostr lists are the de-facto data-handlers. So to make sure the boostr interface, boostr::boost, passing the right information to other functions, make sure you encapsulate things in named lists. In the example above, we want to make sure our svm received the arguments formula=formula(Type~.) and cost=100 so we put them in a named list, called .trainArgs, and put that in a named list called .procArgs. The naming convention in boostr may seem a bit odd, but the rationale is a list named .xyzArgs will pass its named arguments to the xyz variable in the encapsulating list or function. Hence, our procedure x is a list with named entry train, so we use .trainArgs, in .procArgs to pass arguments to the train component of proc (x). Since this may seem a bit weird, let’s look at this exact same situation, but without the convenience function: set.seed(1234) boostedSVM2 <- boostr::boost(x = list(train=e1071::svm), B = 3, reweighter = boostr::arcx4Reweighter, aggregator = boostr::arcx4Aggregator, data = Glass, .procArgs = list( .trainArgs=list( formula=formula(Type~.), cost=100)), .boostBackendArgs = list( .reweighterArgs=list(m=0))) boostedSVM2 ## A boostr object composed of 3 estimators. ## ## Available performance metrics: oobErr, oobConfMat, errVec ## ## Structure of reweighter output: ## List of 2 ##$ weights: num [1:3, 1:214] 0.00806 0.00491 0.00319 0.00403 0.00491 ...
##  $m : num [1:3, 1:214] 1 1 1 0 1 1 1 2 2 1 ... ## ## Performance of Boostr object on Learning set: ##$oobErr
## [1] 0.08879
##
## $oobConfMat ## oobResponse ## oobPreds 1 2 3 5 6 7 ## 1 63 5 1 0 0 1 ## 2 6 70 2 0 1 0 ## 3 1 1 14 0 0 0 ## 5 0 0 0 13 1 0 ## 6 0 0 0 0 7 0 ## 7 0 0 0 0 0 28 ## ##$errVec
##   [1] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
##  [36] 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [141] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [211] 0 0 0 0
identical(boostr::reweighterOutput(boostedSVM1),
boostr::reweighterOutput(boostedSVM2))
## [1] TRUE

But this was micky mouse-type stuff: boostr already implemented this algorithm for you. What’s really cool about boostr isn’t the implemented algorithms, its the total modularity. Check out doc for boostr::boost (the package interface) and the extended vignette for more information.