modelStudio - perks and features

Hubert Baniecki

2020-03-13

modelStudio::modelStudio computes various (instance and dataset level) model explanations and produces an interactive, customisable dashboard made with D3.js. It consists of multiple panels for plots with their short descriptions. Easily save and share the dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.

Let’s use DALEX::HR dataset to explore modelStudio parameters:

train <- DALEX::HR[1:100,]
train$fired <- ifelse(train$status == "fired", 1, 0)
train <- train[,-6]

head(train)
DALEX::HR dataset
gender age hours evaluation salary fired
male 32.58 41.89 3 1 1
female 41.21 36.34 2 5 1
male 37.71 36.82 3 0 1
female 30.06 38.96 3 2 1
male 21.10 62.15 5 3 0
male 40.12 69.54 2 0 1

Prepare data and model for the explainer:

# create a random forest model
library("randomForest")
model <- randomForest(fired ~., data = train)

# prepare validation dataset
test <- DALEX::HR_test[1:100,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test <- test[,-6]

# create an explainer
explainer <- DALEX::explain(model = model,
                            data = test[,-6],
                            y = test[,6],
                            verbose = FALSE)

# start modelStudio
library("modelStudio")

modelStudio parameters

instance explanations

Pass data points to new_observation parameter for instance explanations such as Break Down, Shapley Values and Ceteris Paribus Profiles. Use new_observation_y to show their true labels.

grid size

Achieve bigger or smaller modelStudio grid with facet_dim parameter.

animations

Manipulate time parameter to set animation length. Value 0 will make them invisible.

more calculations means more time

N is a number of observations used for calculation of partial and accumulated dependence profiles. B is a number of random paths used for calculation of shapley values profiles. Decrease N and B parameters to lower computation time or increase them to get more accurate empirical results.

no EDA mode

Don’t compute the EDA plots if they are not needed. Set the eda parameter to FALSE.

progress bar

Hide computation progress bar messages with show_info parameter.

viewer or browser?

Change viewer parameter to set where to display modelStudio. Best described here: r2d3 viewer argument.


parallel computation

Speed up modelStudio computation by setting parallel parameter to TRUE. It uses parallelMap package to calculate local explainers faster. It is really useful when using modelStudio with complicated models, vast datasets or simply many observations are being processed.

All options can be set outside of function call. More on that here.

#set up the cluster 
options(
  parallelMap.default.mode        = "socket",
  parallelMap.default.cpus        = 4,
  parallelMap.default.show.info   = FALSE
)

# calculations will be distributed into 4 cores
modelStudio(explainer, new_observation = test[1:16,], parallel = TRUE)

plot options

Customize some of modelStudio looks by overwriting default options returned by modelStudioOptions().

# set additional graphical parameters
new_options <- modelStudioOptions(
  show_subtitle = TRUE,
  bd_subtitle = "Hello World",
  line_size = 5,
  point_size = 9,
  line_color = "pink",
  point_color = "purple",
  bd_positive_color = "yellow",
  bd_negative_color = "orange" 
)

modelStudio(explainer, new_observation, options = new_options)

R mlr model

Use DALEXtra::explain_*() functions to explain various models. Bellow basic example of making modelStudio for mlr model using DALEXtra::explain_mlr()

library(DALEXtra)
library(mlr)

# prepare mlr model
task <- mlr::makeRegrTask(id = "task",
                          data = train,
                          target = "fired")

learner <- mlr::makeLearner("regr.randomForest",
                            par.vals = list(ntree = 300),
                            predict.type = "response")

model <- mlr::train(learner, task)

# create an explainer for mlr model
explainer_mlr <- explain_mlr(model, data = test[,-6], y = test[,6], label = "mlr")

# call model studio for mlr model
modelStudio(explainer_mlr,
            new_observation,
            N = 100, B = 10)

Python scikit-learn model

Bellow basic example of making modelStudio for scikit-learn model using DALEXtra::explain_scikitlearn() and Python Anaconda. Save your Python model as a .pkl file and explain it in R with few lines of code. Here we use a demo model saved in DALEXtra package.

library(DALEXtra)
library(modelStudio)

titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra")) 
titanic_train <- read.csv(system.file("extdata", "titanic_train.csv", package = "DALEXtra"))

# read scikitlearn model
yml <- system.file("extdata", "scikitlearn.yml", package = "DALEXtra")
pkl_gbm <- system.file("extdata", "scikitlearn.pkl", package = "DALEXtra")
pkl_SGDC <- "SGDC.pkl"

# prepare an explainer for scikitlearn model
explainer_scikit <- explain_scikitlearn(pkl_gbm,
                                        yml = yml,
                                        data = titanic_test[,1:17],
                                        y = titanic_test[,18],
                                        label = "python-gbm")

# start model studio
ms <- modelStudio(explainer_scikit,
                  new_observation = titanic_test[1:4,1:17],
                  options = modelStudioOptions(margin_left = 160))

ms

r2d3::save_d3_html(ms, "modelStudio-python-gbm.html")

modelStudio should work with any explainer class object. Find more about making those here.

References