modelStudio::modelStudio
computes various (instance and dataset level) model explanations and produces an interactive, customisable dashboard made with D3.js. It consists of multiple panels for plots with their short descriptions. Easily save and share the dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.
Let’s use DALEX::HR
dataset to explore modelStudio
parameters:
train <- DALEX::HR[1:100,]
train$fired <- ifelse(train$status == "fired", 1, 0)
train <- train[,-6]
head(train)
gender | age | hours | evaluation | salary | fired |
---|---|---|---|---|---|
male | 32.58 | 41.89 | 3 | 1 | 1 |
female | 41.21 | 36.34 | 2 | 5 | 1 |
male | 37.71 | 36.82 | 3 | 0 | 1 |
female | 30.06 | 38.96 | 3 | 2 | 1 |
male | 21.10 | 62.15 | 5 | 3 | 0 |
male | 40.12 | 69.54 | 2 | 0 | 1 |
Prepare data and model for the explainer:
# create a random forest model
library("randomForest")
model <- randomForest(fired ~., data = train)
# prepare validation dataset
test <- DALEX::HR_test[1:100,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test <- test[,-6]
# create an explainer
explainer <- DALEX::explain(model = model,
data = test[,-6],
y = test[,6],
verbose = FALSE)
# start modelStudio
library("modelStudio")
Pass data points to new_observation
parameter for instance explanations such as Break Down, Shapley Values and Ceteris Paribus Profiles. Use new_observation_y
to show their true labels.
Achieve bigger or smaller modelStudio
grid with facet_dim
parameter.
Manipulate time
parameter to set animation length. Value 0 will make them invisible.
N
is a number of observations used for calculation of partial and accumulated dependence profiles. B
is a number of random paths used for calculation of shapley values profiles. Decrease N
and B
parameters to lower computation time or increase them to get more accurate empirical results.
Don’t compute the EDA plots if they are not needed. Set the eda
parameter to FALSE
.
Hide computation progress bar messages with show_info
parameter.
Change viewer
parameter to set where to display modelStudio
. Best described here: r2d3 viewer argument
.
Speed up modelStudio
computation by setting parallel
parameter to TRUE
. It uses parallelMap
package to calculate local explainers faster. It is really useful when using modelStudio
with complicated models, vast datasets or simply many observations are being processed.
All options can be set outside of function call. More on that here.
#set up the cluster
options(
parallelMap.default.mode = "socket",
parallelMap.default.cpus = 4,
parallelMap.default.show.info = FALSE
)
# calculations will be distributed into 4 cores
modelStudio(explainer, new_observation = test[1:16,], parallel = TRUE)
Customize some of modelStudio
looks by overwriting default options returned by modelStudioOptions()
.
# set additional graphical parameters
new_options <- modelStudioOptions(
show_subtitle = TRUE,
bd_subtitle = "Hello World",
line_size = 5,
point_size = 9,
line_color = "pink",
point_color = "purple",
bd_positive_color = "yellow",
bd_negative_color = "orange"
)
modelStudio(explainer, new_observation, options = new_options)
Use DALEXtra::explain_*()
functions to explain various models. Bellow basic example of making modelStudio
for mlr model using DALEXtra::explain_mlr()
library(DALEXtra)
library(mlr)
# prepare mlr model
task <- mlr::makeRegrTask(id = "task",
data = train,
target = "fired")
learner <- mlr::makeLearner("regr.randomForest",
par.vals = list(ntree = 300),
predict.type = "response")
model <- mlr::train(learner, task)
# create an explainer for mlr model
explainer_mlr <- explain_mlr(model, data = test[,-6], y = test[,6], label = "mlr")
# call model studio for mlr model
modelStudio(explainer_mlr,
new_observation,
N = 100, B = 10)
Bellow basic example of making modelStudio
for scikit-learn model using DALEXtra::explain_scikitlearn()
and Python Anaconda. Save your Python model as a .pkl
file and explain it in R with few lines of code. Here we use a demo model saved in DALEXtra
package.
library(DALEXtra)
library(modelStudio)
titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra"))
titanic_train <- read.csv(system.file("extdata", "titanic_train.csv", package = "DALEXtra"))
# read scikitlearn model
yml <- system.file("extdata", "scikitlearn.yml", package = "DALEXtra")
pkl_gbm <- system.file("extdata", "scikitlearn.pkl", package = "DALEXtra")
pkl_SGDC <- "SGDC.pkl"
# prepare an explainer for scikitlearn model
explainer_scikit <- explain_scikitlearn(pkl_gbm,
yml = yml,
data = titanic_test[,1:17],
y = titanic_test[,18],
label = "python-gbm")
# start model studio
ms <- modelStudio(explainer_scikit,
new_observation = titanic_test[1:4,1:17],
options = modelStudioOptions(margin_left = 160))
ms
r2d3::save_d3_html(ms, "modelStudio-python-gbm.html")
modelStudio
should work with any explainer
class object. Find more about making those here.