Introduction to the MachineShop Package

Package Version 1.2.0

Brian J Smith

University of Iowa


1 Overview

1.1 Description

MachineShop is a meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Support is provided for predictive modeling of numerical, categorical, and censored time-to-event outcomes and for resample (bootstrap, cross-validation, and split training-test sets) estimation of model performance. This vignette introduces the package interface with a survival data analysis example, followed by supported methods of variable specification; applications to other response variable types; available performance metrics, resampling techniques, and graphical and tabular summaries; and modeling strategies.

1.2 Features

2 Melanoma Example

The package is illustrated in the following sections with an overall survival analysis example in which the response variable is a time to event outcome. Since survival outcomes are a combination of numerical (time to event) and categorical (event) variables, package features for both variable types will be utilized in the example. Outcomes other than survival, including nominal and ordinal factors as well as numeric vectors and matrices, are supported by MachineShop and will be discussed.

Survival analysis is performed with the Melanoma dataset from the MASS package (Andersen et al. 1993). This dataset provides survival time, in days, from disease treatment to (1) death from disease, (2) alive at end of study, or (3) death from other causes for 205 Denmark patients with malignant melanomas. Also provided are potential predictors of the survival outcomes. The analysis begins by loading required packages MachineShop, survival, and MASS as well as magrittr (Bache and Wickham 2014) for its pipe (%>%) operator to simplify some of the code syntax. For the analysis, a binary overall survival outcome is created by combining the two death categories (1 and 3) into one. The dataset is then split into a training set on which a survival model will be fit and a test set on which predictions will be made. A global formula surv_fo is defined to relate the predictors on the right hand side to the overall survival outcome on the left and will be used in all of the survival models in this vignette.

## Analysis libraries

## Malignant melanoma analysis dataset
surv_df <- within(Melanoma, status <- as.numeric(status != 2))
Table 1. Variable summaries for the Melanoma survival analysis example.
Characteristic Value
Number of subjects 205
Median (Range) 2005 (10, 5565)
1 = Dead 71 (34.63%)
0 = Alive 134 (65.37%)
1 = Male 79 (38.54%)
0 = Female 126 (61.46%)
Median (Range) 54 (4, 95)
Median (Range) 1970 (1962, 1977)
Median (Range) 1.94 (0.1, 17.42)
1 = Presence 90 (43.9%)
0 = Absence 115 (56.1%)

## Training and test sets
train_indices <- sample(nrow(surv_df), nrow(surv_df) * 2 / 3)
surv_train <- surv_df[train_indices, ]
surv_test <- surv_df[-train_indices, ]

## Global formula for the analysis
surv_fo <- Surv(time, status) ~ sex + age + year + thickness + ulcer

3 Model Fitting and Prediction

Model fitting requires user specification of an available model. A named list of MachineShop models can be obtained interactively with the modelinfo function, and includes a descriptive "label", the source "packages" on which the models depend, supported response variable "types", and "arguments" that can be specified in calls to the model functions. Note that in order to use a model the source packages must be installed with the install.packages or equivalent function, but need not be loaded with the library function. Function modelinfo can be called with one or more model functions, function names, function calls, or observed response variables; and will return information on all models matching the calling arguments.

## All available models
modelinfo() %>% names
#>  [1] "AdaBagModel"         "AdaBoostModel"       "BARTModel"          
#>  [4] "BARTMachineModel"    "BlackBoostModel"     "C50Model"           
#>  [7] "CForestModel"        "CoxModel"            "CoxStepAICModel"    
#> [10] "EarthModel"          "FDAModel"            "GAMBoostModel"      
#> [13] "GBMModel"            "GLMBoostModel"       "GLMModel"           
#> [16] "GLMStepAICModel"     "GLMNetModel"         "KNNModel"           
#> [19] "LARSModel"           "LDAModel"            "LMModel"            
#> [22] "MDAModel"            "NaiveBayesModel"     "NNetModel"          
#> [25] "PDAModel"            "PLSModel"            "POLRModel"          
#> [28] "QDAModel"            "RandomForestModel"   "RangerModel"        
#> [31] "RPartModel"          "StackedModel"        "SuperModel"         
#> [34] "SurvRegModel"        "SurvRegStepAICModel" "SVMModel"           
#> [37] "SVMANOVAModel"       "SVMBesselModel"      "SVMLaplaceModel"    
#> [40] "SVMLinearModel"      "SVMPolyModel"        "SVMRadialModel"     
#> [43] "SVMSplineModel"      "SVMTanhModel"        "TreeModel"          
#> [46] "XGBModel"            "XGBDARTModel"        "XGBLinearModel"     
#> [49] "XGBTreeModel"

## Survival-specific models
modelinfo(Surv(0)) %>% names
#>  [1] "BARTModel"           "BlackBoostModel"     "CForestModel"       
#>  [4] "CoxModel"            "CoxStepAICModel"     "GAMBoostModel"      
#>  [7] "GBMModel"            "GLMBoostModel"       "GLMNetModel"        
#> [10] "RangerModel"         "RPartModel"          "StackedModel"       
#> [13] "SuperModel"          "SurvRegModel"        "SurvRegStepAICModel"

## Model-specific information
#> $GBMModel
#> $GBMModel$label
#> [1] "Generalized Boosted Regression"
#> $GBMModel$packages
#> [1] "gbm"
#> $GBMModel$types
#> [1] "factor"  "numeric" "Surv"   
#> $GBMModel$arguments
#> function (distribution = NULL, n.trees = 100, interaction.depth = 1, 
#>     n.minobsinnode = 10, shrinkage = 0.1, bag.fraction = 0.5) 
#> $GBMModel$grid
#> [1] TRUE
#> $GBMModel$varimp
#> [1] TRUE

Information is displayed above for the GBMModel function which is a generalized boosted regression model — a tree-based ensemble method that can be applied to survival outcomes. Package models, like GBMModel can be specified in the model argument of the fit function to estimate a relationship (surv_fo) between predictors and an outcome based on a set of data (surv_train). Argument specifications may be in terms of the model function, function name, or a function call.

## Generalized boosted regression fit

## Model function
surv_fit <- fit(surv_fo, data = surv_train, model = GBMModel)

## Model function name
fit(surv_fo, data = surv_train, model = "GBMModel")

## Model function call
fit(surv_fo, data = surv_train, model = GBMModel(n.trees = 100, interaction.depth = 1))

A predict function is supplied and can be applied to model fit results to obtain values predicted on a dataset specified with its newdata argument or on the original dataset if not specified. Survival means are predicted for survival outcomes by default. Alternatively, a times argument allows for specification of follow-up times at which to obtain predicted survival probabilities (type = "prob") or 0-1 survival events (default: type = "response"). In addition, the cutoff probability for classification of survival events or other binary responses can be set optionally in predict (default: cutoff = 0.5).

## Predicted survival means
predict(surv_fit, newdata = surv_test) %>% head
#> [1] 4264.785 4411.585 3621.750 2069.857 2594.537 3745.281

## Predict survival probabilities and events at specified follow-up times
surv_times <- 365 * c(5, 10)

predict(surv_fit, newdata = surv_test, times = surv_times, type = "prob") %>% head
#> An object of class "SurvProbs"
#>           [,1]      [,2]
#> [1,] 0.9242729 0.8539451
#> [2,] 0.9509326 0.9040459
#> [3,] 0.8019459 0.6424098
#> [4,] 0.4552399 0.2064317
#> [5,] 0.5827357 0.3386677
#> [6,] 0.8262041 0.6819637
#> Slot "times":
#> [1] 1825 3650

predict(surv_fit, newdata = surv_test, times = surv_times, cutoff = 0.5) %>% head
#> An object of class "SurvEvents"
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    1    1
#> [5,]    0    1
#> [6,]    0    0
#> Slot "times":
#> [1] 1825 3650

4 Variable Specifications

Variable specification defines the relationship between response and predictor variables as well as the data used to estimate the relationship. Three main types of specifications are supported by the fit, resample, and tune functions: traditional formulas, model frames, and recipes.

4.1 Traditional Formula

Models may be specified with a traditional formula and data frame pair, as was done at the start of the survival example. With this specification, in-line functions, interactions, and . substitution of variables not already appearing in the formula may be included.

4.2 Model Frame

Model frame specification is similar to the traditional formula, except with the formula and data frame pair defined within the ModelFrame class constructor provided by MachineShop.

The model frame approach has a few advantages over the traditional formula. One is that cases with missing values on any of the response or predictor variables are excluded from the model frame by default. This is often desirable for models that do not handle missing values. Conversely, missing values can be retained in the model frame by setting its argument na.action = na.pass for models, like GBMModel, that do handle them. A second advantage is that case weights can be included in the model frame to be passed on to the model fitting functions.

A third, which will be illustrated later, is user-specification of a variable for stratified resampling via the constructor’s strata argument.

4.3 Preprocessing Recipe

The recipes package (Kuhn and Wickham 2018) provides a flexible framework for defining predictor and response variables as well as preprocessing steps to be applied to them prior to model fitting. Using recipes helps ensure that estimation of predictive performance accounts for all modeling step. They are also a convenient way of consistently applying preprocessing to new data. A basic recipe is given below in terms of the formula and data frame ingredients needed for the analysis.

Case weights and stratified resampling are also supported for recipes via the designations of "case_weight" and "case_strata" roles, respectively.

5 Response Variable Types

5.1 Factor Response

Categorical responses with two or more levels should be coded as factor variables for analysis.

5.2 Ordered Factor Response

Ordinal categorical responses should be coded as ordered factor variables. For categorical vectors, this can be accomplished with the factor function and its argument ordered = TRUE or more simply with the ordered function. Numeric vectors can be converted to ordered factors with the cut function.

5.3 Numeric Vector Response

Univariate numerical responses should be coded as numeric variables.

5.4 Numeric Matrix Response

Multivariate numerical responses should be given as numeric matrix variables for model fitting with traditional formulas or model frames.

For recipes, the multiple response variables should be given on the left hand side of the formula specification.

5.5 Survival Response

Survival responses should be coded as Surv variables for model fitting with traditional formulas or model frames.

For recipes, survival outcomes should be specified with the individual survival time and event variables given on the left hand side of the formula and with their roles designated as "surv_time" and "surv_event".

6 Model Performance Metrics

6.1 Performance Function

Performance metrics quantify associations between observed and predicted responses and provide a means of assessing and comparing the predictive performances of models. Metrics can be computed with the performance function applied to observed responses and responses predicted with the predict function. Metrics of observed versus predicted survival probabilities or events will be calculated at each survival time and returned along with their time-integrated mean.

Function performance computes a default set of metrics according to the observed and predicted response types, as indicated in the table below.

Table 2. Default performance metrics by response types.

Response Default Metrics
Factor Brier Score, Accuracy, Cohen’s Kappa
Binary Factor Brier Score, Accuracy, Cohen’s Kappa, Area Under ROC Curve, Sensitivity, Specificity
Numeric Vector or Matrix Root Mean Squared Error, R2, Mean Absolute Error
Survival Means Concordance Index
Survival Probabilities Area Under ROC Curve, Brier Score, Accuracy
Survival Events Accuracy

The defaults may be changed by specifying one or more package-supplied metric functions to the metrics argument of performance. A named list of supplied metric functions can be obtained interactively with the metricinfo function, and includes a descriptive "label", whether to "maximize" the metric for better performance, the function "arguments", and supported response variable "types" for each. Function metricinfo may be called with one or more metric functions, function names, an observed response variable, or an observed and predicted response variable pair; and will return information on all matching metrics.

Specification of the metrics argument can be in terms of a single metric function, function name, or list of metric functions. List names, if specified, will be displayed as metric labels in graphical and tabular summaries; otherwise, the function names will be used as labels for unnamed lists.

Metrics based on classification of two-level class probabilities, like sensitivity and specificity, optionally allow for specification of the classification cutoff probability (default: cutoff = 0.5).

6.2 Factor Response Metrics

Metrics applicable to multi-level factor response variables are summarized below.

Proportion of correctly classified responses.
Brier score.
Cross entropy loss averaged over the number of cases.
Cohen’s kappa statistic measuring relative agreement between observed and predicted classifications.
Weighted Cohen’s kappa. This metric is only available for ordered factor responses.

Brier score and cross entropy loss are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership, defined as the factor level with the highest predicted probability.

6.3 Binary Factor Response Metrics

Metrics for binary factors include those given for multi-level factors as well as the following.

Area under a performance curve.
Concordance index computed as rank order agreement between predicted probabilities for paired event and non-event cases. This metric can be interpreted as the probability that a randomly selected event case will have a higher predicted value than a randomly selected non-event case, and is the same as area under the ROC curve.
F score, \(F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \times \text{precision} + \text{recall}}\). F1 score \((\beta = 1)\) is the package default.
False negative rate, \(FNR = \frac{FN}{TP + FN} = 1 - TPR\).
False positive rate, \(FPR = \frac{FP}{TN + FP} = 1 - TNR\).
Negative predictive value, \(NPV = \frac{TN}{TN + FN}\).
Table 3. Confusion matrix of observed and predicted response classifications.
Predicted Response
Observed Response
Negative Positive
Negative True Negative (TN) False Negative (FN)
Positive False Positive (FP) True Positive (TP)
ppv, precision
Positive predictive value, \(PPV = \frac{TP}{TP + FP}\).
pr_auc, auc
Area under a precision recall curve.
roc_auc, auc
Area under an ROC curve.
A tradeoff function of sensitivity and specificity as defined by the f argument in this function (default: sensitivity + specificity). The function allows for specification of tradeoffs (Perkins and Schisterman 2006) other than the default of Youden’s J statistic (Youden 1950).
Rate of positive prediction, \(RPP = \frac{TP + FP}{TP + FP + TN + FN}\).
sensitivity, recall, tpr
True positive rate, \(TPR =\frac{TP}{TP + FN} = 1 - FNR\).
specificity, tnr
True negative rate, \(TNR = \frac{TN}{TN + FP} = 1 - FPR\).

Area under the ROC and precision-recall curves are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership. Memberships are defined to be in the second factor level if predicted probabilities are greater than the cutoff value set in the performance function.

6.4 Numeric Response Metrics

Performance metrics are defined below for numeric vector responses. If applied to a numeric matrix response, the metrics are computed separately for each column of the matrix and then averaged to produce a single value.

Gini coefficient.
Mean absolute error, \(MAE = \frac{1}{n}\sum_{i=1}^n|y_i - \hat{y}_i|\), where \(y_i\) and \(\hat{y}_i\) are the \(n\) observed and predicted responses.
Mean squared error, \(MSE = \frac{1}{n}\sum_{i=1}^n(y_i - \hat{y}_i)^2\).
Mean squared log error, \(MSLE = \frac{1}{n}\sum_{i=1}^n(log(1 + y_i) - log(1 + \hat{y}_i))^2\).
One minus residual divided by total sums of squares, \(R^2 = 1 - \sum_{i=1}^n(y_i - \hat{y}_i)^2 / \sum_{i=1}^n(y_i - \bar{y})^2\).
Square root of mean squared error.
Square root of mean squared log error.

6.5 Survival Reseponse Metrics

All previously described metrics for binary factor responses—plus accuracy, Brier score and Cohen’s kappa—are applicable to survival probabilities predicted at specified follow-up times. Metrics are evaluated separately at each follow-up time and reported along with a time-integrated mean. The survival concordance index is computed with the method of Harrell (1982) and Brier score according to Graf et al. (1999); whereas, the others are computed according to the confusion matrix probabilities below, in which term \(\hat{S}(t)\) is the predicted survival probability at follow-up time \(t\) and \(T\) is the survival time (Heagerty, Lumley, and Pepe 2004).

Table 4. Confusion matrix of observed and predicted survival response classifications.
Predicted Response
Observed Response
Non-Event Event
Non-Event \(TN = \Pr(\hat{S}(t) \gt \text{cutoff} \cap T \ge t)\) \(FN = \Pr(\hat{S}(t) \gt \text{cutoff} \cap T \lt t)\)
Event \(FP = \Pr(\hat{S}(t) \le \text{cutoff} \cap T \ge t)\) \(TP = \Pr(\hat{S}(t) \le \text{cutoff} \cap T \lt t)\)

In addition, all of the metrics described for numeric vector responses are applicable to predicted survival means and are computed using only those cases with observed (non-censored) events.

7 Resample Estimation of Model Performance

7.1 Resampling Algorithms

Model performance can be estimated with resampling methods that simulate repeated training and test set fits and predictions. With these methods, performance metrics are computed on each resample to produce an empirical distribution for inference. Resampling is controlled in the MachineShop with the functions:

Simple bootstrap resampling. Models are repeatedly fit with bootstrap resampled training sets and used to predict the full data set.
Repeated K-fold cross-validation. The full data set is repeatedly partitioned into K-folds. For a given partitioning, prediction is performed on each of the K folds with models fit on all remaining folds.
Out-of-bootstrap resampling. Models are fit with bootstrap resampled training sets and used to predict the unsampled cases.
Split training and test sets. The data are randomly partitioned into a training and test set.
Training resubstitution. A model is fit on and used to predict the full training set in order to estimate training, or apparent, error.

For the survival example, repeated cross-validation control structures are defined to estimate model performance in predicting survival means and 5 and 10-year survival probabilities. In addition to arguments controlling the resampling algorithms, a seed can be set to ensure reproducibility of resampling results obtained with the structures.

7.2 Parallel Resampling

Resampling is implemented with the foreach package (Microsoft and Weston 2017) and will run in parallel if a compatible backend is loaded, such as that provided by the doParallel package (Corporation and Weston 2018).

7.3 Resample Function

Resampling is performed by calling the resample function with a variable specification, model, and control structure. Like the fit function, variables may be specified in terms of a traditional formula, model frame, or recipe. Summary statistics and plots of resample output can be obtained with the summary and plot functions.

The summary function when applied directly to output from resample computes default performance metrics as described in the Performance Function section. Likewise, the metricinfo and performance functions can be applied to the output in order to list and compute applicable metrics.

7.4 Stratified Resampling

Stratification of cases for the construction of resampled training and test sets can be employed to help achieve balance across the sets. Stratified resampling is automatically performed if variable specification is in terms of a traditional formula and will be done according to the response variable if a numeric vector or factor, the event variable if survival, and the first variable if a numeric matrix. For model frames and recipes, stratification variables must be defined explicitly with the strata argument to the ModelFrame constructor or with the "case_strata" role designation in a recipe step.

7.5 Resampling for Model Comparisons

Resampled metrics from different models can be combined for comparison with the Resamples function. Optional names given on the left hand side of equal operators within calls to Resamples will be used as labels in output from the summary and plot functions. For comparisons of resampled output, the same control structure must be used in all associated calls to resample to ensure that resulting model metrics are computed on the same resampled training and test sets.

Pairwise model differences for each metric can be calculated with the diff function applied to results from a call to Resamples. The differences can be summarized descriptively with the summary and plot functions and assessed for statistical significance with the t.test function.

8 Graphical and Tabular Performance Summaries

8.1 Variable Importance

The importance of variables in a model fit is estimated with the varimp function and plotted with plot. Variable importance is a measure of the relative importance of predictors in a model and has a default range of 0 to 100, where 0 denotes the least important variables and 100 the most.

8.2 Calibration Curves

Agreement between model-predicted and observed values can be visualized with calibration curves. In the construction of these curves, cases are partitioned into bins according to their (resampled) predicted responses. Mean observed responses are then calculated within each of the bins and plotted on the vertical axis against the bin midpoints on the horizontal axis. An option to produce curves smoothed over the individual predicted values is also provided. Calibration curves that are close to the 45-degree line indicate close agreement between observed and predicted responses and a model that is said to be well calibrated.

8.3 Confusion Matrices

Confusion matrices of cross-classified observed and predicted factor responses are available with the confusion function. They can be constructed with predicted class membership or with predicted class probabilities. In the latter case, predicted class membership is derived from predicted probabilities according to a probability cutoff value for binary factors and according to the class with highest probability for factors with more than two levels. Performance metrics, such as those described earlier for binary factors, can be computed with the performance function and summarized with summary and plot.

8.4 Partial Dependence Plots

Partial dependence plots display the marginal effects of predictors on the response variable. The response scale displayed in the plots will depend on the response type: probability for factors and predicted survival probabilities, original scale for numerics, and survival time for predicted survival means.

8.5 Performance Curve Analysis

Tradeoffs between correct and incorrect classifications of binary outcomes, across the range of possible cutoff probabilities, can be studied with performance curves.

8.5.1 ROC Curves

Receiver operating characteristic (ROC) curves are one example in which true positive rates (sensitivity) are plotted against false positive rates (1 - specificity). Area under resulting ROC curves can be computed as an overall measure of model predictive performance and interpreted as the probability that a randomly selected event case will have a higher predicted value than a randomly selected non-event case.

8.5.2 Precision Recall Curves

In general, any two binary response metrics may be specified for the construction of a performance curve. Precision recall curves are another example.

8.5.3 Lift Curves

Lift curves depict the rate at which observed binary responses are identifiable from (resampled) predicted response probabilities. In particular, they plot the true positive findings (sensitivity) against the positive test rates for all possible classification probability cutoffs. Accordingly, a lift curve can be interpreted as the rate at which positive responses are found as a function of the positive test rate among cases.

9 Modeling Strategies

9.1 Model Tuning

Many of the modeling functions have arguments, or parameters, that control aspects of their model fitting algorithms. For example, GBMModel parameters n.trees and interaction.depth control the number of decision trees to fit and the maximum depth of variable interactions. The tune function performs model fitting over a grid of parameter values and returns the model with the most optimal values. Optimality is determined based on the first performance metric supplied to the metrics argument of tune. Furthermore, argument grid controls the construction of grid values and can be a single numeric value giving the grid length in each parameter dimension, a call to Grid with the grid length and number of grid points to sample at random, or a user-specified data frame of grid points. Summary statistics and plots of resulting performances across all metrics and tuning parameters can be obtained with the summary and plot functions.

## Tune over automatic grid of model parameters
(surv_tune <- tune(surv_fo, data = surv_df, model = GBMModel,
                   grid = 3,
                   control = surv_means_control,
                   metrics = c("CIndex" = cindex, "RMSE" = rmse)))
#> An object of class "MLModelTune"
#> Model name: GBMModel
#> Label: Generalized Boosted Regression
#> Packages: gbm
#> Response types: factor, numeric, Surv
#> Parameters:
#> $n.trees
#> [1] 50
#> $interaction.depth
#> [1] 1
#> $n.minobsinnode
#> [1] 10
#> $shrinkage
#> [1] 0.1
#> $bag.fraction
#> [1] 0.5
#> Grid:
#>   n.trees interaction.depth
#> 1      50                 1
#> 2     100                 1
#> 3     150                 1
#> 4      50                 2
#> 5     100                 2
#> 6     150                 2
#> 7      50                 3
#> 8     100                 3
#> 9     150                 3
#> An object of class "Performance"
#> Metrics: CIndex, RMSE 
#> Models: GBMModel.1, GBMModel.2, GBMModel.3, GBMModel.4, GBMModel.5, GBMModel.6, GBMModel.7, GBMModel.8, GBMModel.9 
#> Selected (CIndex): GBMModel.1

#> , , CIndex
#>                 Mean    Median         SD       Min       Max NA
#> GBMModel.1 0.7058983 0.6883562 0.06256998 0.6300000 0.8359788  0
#> GBMModel.2 0.6941198 0.6762295 0.05933365 0.6244541 0.8227513  0
#> GBMModel.3 0.6850918 0.6745843 0.06864062 0.5805085 0.8174603  0
#> GBMModel.4 0.6961474 0.6886792 0.05489630 0.6207627 0.8028504  0
#> GBMModel.5 0.6828937 0.6792453 0.06191098 0.5677966 0.8015873  0
#> GBMModel.6 0.6785267 0.6933962 0.06486174 0.5868644 0.7885986  0
#> GBMModel.7 0.6925677 0.6721311 0.05600214 0.6165254 0.8004751  0
#> GBMModel.8 0.6778869 0.6650943 0.05688398 0.5889831 0.7838480  0
#> GBMModel.9 0.6709213 0.6745283 0.06440546 0.5805085 0.8028504  0
#> , , RMSE
#>                Mean   Median       SD      Min      Max NA
#> GBMModel.1 2641.262 2647.588 329.4775 2015.310 3229.041  0
#> GBMModel.2 2729.431 2716.819 351.6683 2099.813 3260.165  0
#> GBMModel.3 2762.201 2721.302 388.5256 2059.193 3346.956  0
#> GBMModel.4 2739.729 2737.601 356.4921 2010.709 3245.199  0
#> GBMModel.5 2862.102 2903.612 391.1429 2132.708 3533.432  0
#> GBMModel.6 2933.915 2954.276 392.4251 2096.266 3482.183  0
#> GBMModel.7 2803.999 2830.390 350.3109 2026.888 3290.263  0
#> GBMModel.8 2976.085 3023.242 382.5607 2184.120 3517.882  0
#> GBMModel.9 3075.965 3143.521 413.9979 2161.720 3668.848  0

plot(surv_tune, type = "line")

The return value of tune is a model object with the optimal tuning parameters and not a model fit object. The returned model can be fit subsequently to a set of data with the fit function.

9.2 Model Selection

Model selection can be performed with the tune function to select from any combination of models and model parameters. It has as a special case the just-discussed tuning of a single model over a grid of parameter values. In general, a list containing any combination of model functions, function names, and function calls can be supplied to the models argument of tune to perform model selection. An expand.model helper function is additionally provided to expand a model over a grid of tuning parameters for inclusion in the list if so desired. In this general form of model selection, the grid argument discussed previously for grid tuning is not used.

9.3 Ensemble Models

Ensemble methods combine multiple base learning algorithms as a strategy to improve predictive performance. Two ensemble methods implemented in Machineshop are stacked regression (Breiman 1996) and super learners (Lann and Hubbard 2007). Stacked regression fits a linear combination of resampled predictions from specified base learners; whereas, super learners fit a specified model, such as GBMModel, to the base learner predictions and optionally also to the original predictor variables. Illustrated below is a performance evaluation of stacked regression and a super learner fit to gradient boosted, random forest, and Cox regression base learners. In the second case, a separate gradient boosted model is used as the super learner.

10 User-Defined Models and Metrics

Custom models and metrics can be defined with the MLModel and MLMetric constructors for use with the model fitting, prediction, and performance assessment tools provided by the package.

## Logistic regression model
LogisticModel <- MLModel(
  name = "LogisticModel",
  types = "binary",
  fit = function(formula, data, weights, ...) {
    glm(formula, data = data, weights = weights, family = binomial, ...)
  predict = function(object, newdata, ...) {
    predict(object, newdata = newdata, type = "response")
  varimp = function(object, ...) {
    pchisq(coef(object)^2 / diag(vcov(object)), 1)

## F2 score metric
f2_score <- MLMetric(
  function(observed, predicted, ...) {
    f_score(observed, predicted, beta = 2, ...)
  name = "f2_score",
  label = "F2 Score",
  maximize = TRUE

res <- resample(type ~ ., data =, model = LogisticModel)
summary(performance(res, metric = f2_score))
#>               Mean    Median        SD       Min       Max NA
#> f2_score 0.5697769 0.6060606 0.1924873 0.1666667 0.8571429  0

11 Appendix

11.1 Model Constructor Functions

Table A1. Package-supplied model constructor functions and supported response variable types.
Response Variable Types
Function Label Categorical1 Continuous2 Survival3
AdaBagModel Bagging with Classification Trees f
AdaBoostModel Boosting with Classification Trees f
BARTModel Bayesian Additive Regression Trees f n S
BARTMachineModel Bayesian Additive Regression Trees b n
BlackBoostModel Gradient Boosting with Regression Trees b n S
C50Model C5.0 Classification f
CForestModel Conditional Random Forests f n S
CoxModel Cox Regression S
CoxStepAICModel Cox Regression (Stepwise) S
EarthModel Multivariate Adaptive Regression Splines f n
FDAModel Flexible Discriminant Analysis f
GAMBoostModel Gradient Boosting with Additive Models b n S
GBMModel Generalized Boosted Regression f n S
GLMBoostModel Gradient Boosting with Linear Models b n S
GLMModel Generalized Linear Models b n
GLMStepAICModel Generalized Linear Models (Stepwise) b n
GLMNetModel Lasso and Elastic-Net f m, n S
KNNModel K-Nearest Neighbors Model f, o n
LARSModel Least Angle Regression n
LDAModel Linear Discriminant Analysis f
LMModel Linear Model f m, n
MDAModel Mixture Discriminant Analysis f
NaiveBayesModel Naive Bayes Classifier f
NNetModel Feed-Forward Neural Networks f n
PDAModel Penalized Discriminant Analysis f
PLSModel Partial Least Squares f n
POLRModel Ordered Logistic Regression o
QDAModel Quadratic Discriminant Analysis f
RandomForestModel Random Forests f n
RangerModel Fast Random Forests f n S
RPartModel Recursive Partitioning and Regression Trees f n S
StackedModel Stacked Regression f, o m, n S
SuperModel Super Learner f, o m, n S
SurvRegModel Parametric Survival S
SurvRegStepAICModel Parametric Survival (Stepwise) S
SVMModel Support Vector Machines f n
SVMANOVAModel Support Vector Machines (ANOVA) f n
SVMBesselModel Support Vector Machines (Bessel) f n
SVMLaplaceModel Support Vector Machines (Laplace) f n
SVMLinearModel Support Vector Machines (Linear) f n
SVMPolyModel Support Vector Machines (Poly) f n
SVMRadialModel Support Vector Machines (Radial) f n
SVMSplineModel Support Vector Machines (Spline) f n
SVMTanhModel Support Vector Machines (Tanh) f n
TreeModel Regression and Classification Trees f n
XGBModel Extreme Gradient Boosting f n
XGBDARTModel Extreme Gradient Boosting (DART) f n
XGBLinearModel Extreme Gradient Boosting (Linear) f n
XGBTreeModel Extreme Gradient Boosting (Tree) f n
1 b = binary factor, f = factor, o = ordered factor
2 m = matrix, n = numeric
3 S = Surv

11.2 Metric Functions

Table A2. Package-supplied performance metric functions and supported response variable types.
Response Variable Types
Function Label Categorical1 Continuous2 Survival3
accuracy Accuracy f S
auc Area Under Performance Curve b S
brier Brier Score f S
cindex Concordance Index b S
cross_entropy Cross Entropy f
f_score F Score b S
fnr False Negative Rate b S
fpr False Positive Rate b S
gini Gini Coefficient m, n S
kappa2 Cohen’s Kappa f S
mae Mean Absolute Error m, n S
mse Mean Squared Error m, n S
msle Mean Squared Log Error m, n S
npv Negative Predictive Value b S
ppv Positive Predictive Value b S
pr_auc Area Under Precision-Recall Curve b S
precision Precision b S
r2 Coefficient of Determination m, n S
recall Recall b S
rmse Root Mean Squared Error m, n S
rmsle Root Mean Squared Log Error m, n S
roc_auc Area Under ROC Curve b S
roc_index ROC Index b S
rpp Rate of Positive Prediction b S
sensitivity Sensitivity b S
specificity Specificity b S
tnr True Negative Rate b S
tpr True Positive Rate b S
weighted_kappa2 Weighted Cohen’s Kappa o
1 b = binary factor, f = factor, o = ordered factor
2 m = matrix, n = numeric
3 S = Surv


Andersen, PK, O Borgan, RD Gill, and N Keiding. 1993. Statistical Models Based on Counting Processes. New York: Springer.

Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R.

Breiman, L. 1996. “Stacked Regression.” Machine Learning 24: 49–64.

Corporation, Microsoft, and Steve Weston. 2018. DoParallel: Foreach Parallel Adaptor for the ’Parallel’ Package.

Graf, E, C Schmoor, W Sauerbrei, and M Schumacher. 1999. “Assessment and Comparison of Prognostic Classification Schemes for Survival Data.” Statistics in Medicine 18 (17–18): 2529–45.

Harrell, FE, RM Califf, DB Pryor, KL Lee, and RA Rosati. 1982. “Evaluating the Yield of Medical Tests.” JAMA 247 (18): 2543–6.

Heagerty, PJ, T Lumley, and MS Pepe. 2004. “Time-Dependent Roc Curves for Censored Survival Data and a Diagnostic Marker.” Biometrics 56 (2): 337–44.

Kuhn, Max, and Hadley Wickham. 2018. Recipes: Preprocessing Tools to Create Design Matrices.

Lann, MJ van der, and AE Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Microsoft, and Steve Weston. 2017. Foreach: Provides Foreach Looping Construct for R.

Perkins, Neil J., and Enrique F. Schisterman. 2006. “The Inconsistency of "Optimal" Cutpoints Obtained Using Two Criteria Based on the Receiver Operating Characteristic Curve.” American Journal of Epidemiology 163 (7): 670–75.

Youden, WJ. 1950. “Index for Rating Diagnostic Tests.” Cancer 3 (1): 32–35.