Deploying a prediction service with Microsoft Machine Learning Server

This document shows how you can deploy a fitted model as a web service using ACR and AKS. The framework used is Microsoft Machine Learning Server. The process is broadly similar to that for deploying a Plumber service, as described in the “Plumber model deployment” vignette. If you haven’t already, you should read that vignette first as an introduction to how to use AzureContainers.

Model operationalisation with ML Server

ML Server ships with a sophisticated framework for model management and deployment. The more relevant features for this vignette are:

In addition to the above features, ML Server includes comprehensive facilities to manage a server pool and do load balancing. For the purposes of this vignette, we’ll let Kubernetes handle these issues. More information can be obtained from the relevant pages on

Unlike Plumber, ML Server is proprietary software. However, if you have a Microsoft SQL Server license, you will generally also have access to ML Server. There is also a development license that can be used for free.

Deployment artifacts

For illustrative purposes, we’ll reuse the random forest model from the Plumber deployment vignette. The artifacts for deploying this model using ML Server are listed here.

Model building script

This is unchanged from the Plumber vignette, and is run offline.

data(Boston, package="MASS")

# train a model for median house price as a function of the other variables
bos_rf <- randomForest(medv ~ ., data=Boston, ntree=100)

# save the model
saveRDS(bos.rf, "bos_rf.rds")

Scoring and deployment script

This script is run at container startup. The script initialises the prediction service using the publishService function, passing the model object and scoring function as arguments. A version number is also provided; it’s possible to expose multiple models in the same service distinguished by this parameter.

Note that unlike Plumber, the R process that runs this script is not persistent. Rather, it calls the ML Server operationalisation service which in turn manages a number of separate, background R processes. It is these processes that handle incoming requests, using the information supplied in the publishService call.

# save as bos_rf_mls_deploy.R

bos_rf <- readRDS("bos_rf.rds")
bos_rf_score <- function(inputData)
    inputData <-
    predict(bos_rf, inputData)


# make sure you use a strong password or Azure Active Directory authentication in production
remoteLogin("http://localhost:12800", username="admin", password="Microsoft@2018", session=FALSE)
api <- publishService("bos-rf", v="1.0.0",


This Dockerfile installs the Azure CLI (which is needed to initialise the operationalisation feature) and a cut-down version of ML Server that includes only the core Microsoft R packages. It omits the Python portion, as well as the pre-built machine learning models. This reduces the size of the image to about 2GB, as opposed to 9.8GB for a full install.

Some other differences of note from the Plumber Dockerfile:

# Dockerfile for one-box deployment
FROM ubuntu:16.04
RUN apt-get -y update \
    && apt-get install -y apt-transport-https wget \
    && echo "deb [arch=amd64] xenial main" | tee /etc/apt/sources.list.d/azure-cli.list \
    && wget -O /tmp/prod.deb \
    && dpkg -i /tmp/prod.deb \
    && rm -f /tmp/prod.deb \
    && apt-key adv --keyserver --recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 \
    && apt-get -y update \
    && apt-get install -y microsoft-r-open-foreachiterators-3.4.3 \
    && apt-get install -y microsoft-r-open-mkl-3.4.3 \
    && apt-get install -y microsoft-r-open-mro-3.4.3 \
    && apt-get install -y microsoft-mlserver-packages-r-9.3.0 \
    && apt-get install -y azure-cli=2.0.26-1~xenial \
    && apt-get install -y dotnet-runtime-2.0.0 \
    && apt-get install -y microsoft-mlserver-adminutil-9.3.0 \
    && apt-get install -y microsoft-mlserver-config-rserve-9.3.0 \
    && apt-get install -y microsoft-mlserver-computenode-9.3.0 \
    && apt-get install -y microsoft-mlserver-webnode-9.3.0 \
    && apt-get clean \
    && /opt/microsoft/mlserver/9.3.0/bin/R/

# install C and Fortran compilers, needed for randomForest
RUN apt-get install -y make gcc gfortran

RUN Rscript -e "install.packages('randomForest')"

# copy model and one-box deployment script
RUN mkdir /data
COPY bos_rf_mls_deploy.R /data
COPY bos_rf.rds /data

RUN echo $'#!/bin/bash \n\
set -e \n\
/opt/microsoft/mlserver/9.3.0/o16n/ \n\
/opt/microsoft/mlserver/9.3.0/o16n/Microsoft.MLServer.ComputeNode/autoStartScriptsLinux/ start \n\
az ml admin node setup --webnode --admin-password "Microsoft@2018" --confirm-password "Microsoft@2018" --uri http://localhost:12805 \n\
/usr/bin/Rscript --no-save --verbose bos_rf_mls_deploy.R \n\
sleep infinity' >

RUN chmod +x

#### Modifications to config files to run onebox in Kubernetes

RUN echo $'library(jsonlite) \n\
settings_file <- "/opt/microsoft/mlserver/9.3.0/o16n/Microsoft.MLServer.WebNode/appsettings.json" \n\
settings <- fromJSON(settings_file) \n\
settings$Authentication$JWTSigningCertificate$Enabled <- TRUE \n\
settings$Authentication$JWTSigningCertificate$StoreName <- "Root" \n\
settings$Authentication$JWTSigningCertificate$StoreLocation <- "CurrentUser" \n\
settings$Authentication$JWTSigningCertificate$SubjectName <- "CN=LOCALHOST" \n\
writeLines(toJSON(settings, auto_unbox=TRUE, pretty=TRUE), settings_file) \n\
' > configure_jwt_cert.R

RUN chmod +x configure_jwt_cert.R

# insert your own cert here
RUN sed -i 's/grep docker/grep "kubepods\\|docker"/g' /opt/microsoft/mlserver/9.3.0/o16n/Microsoft.MLServer.*Node/autoStartScriptsLinux/*.sh \
    && mkdir -p /home/webnode_usr/.dotnet/corefx/cryptography/x509stores/root \
    && wget -O /home/webnode_usr/.dotnet/corefx/cryptography/x509stores/root/25706AA4612FC42476B8E6C72A97F58D4BB5721B.pfx \
    && chmod 666 /home/webnode_usr/.dotnet/corefx/cryptography/x509stores/root/*.pfx \
    && /usr/bin/Rscript configure_jwt_cert.R


EXPOSE 12800
ENTRYPOINT ["/data/"]

Kubernetes deployment file

The yaml file for the ML Server deployment is essentially identical to that for Plumber, with only the names and port number being changed.

# save as bos-rf-mls.yaml
apiVersion: extensions/v1beta1
kind: Deployment
  name: bos-rf-mls
  replicas: 1
        app: bos-rf-mls
      - name: bos-rf-mls
        - containerPort: 12800
            cpu: 250m
            cpu: 500m
      - name:
apiVersion: v1
kind: Service
  name: bos-rf-mls-svc
    app: bos-rf-mls
  type: LoadBalancer
  - protocol: TCP
    port: 12800

Deploying the service

The script for deploying to Kubernetes, given the above artifacts, is very simple. This reuses the ACR and AKS resources created in the Plumber vignette.


az <- AzureRMR::az_rm$new(

deployresgrp <- az$

# get container registry
deployreg <- deployresgrp$get_acr("deployreg")$get_docker_registry()

# build and upload image
call_docker("build -t bos-rf-mls .")

# get the Kubernetes cluster endpoint
deployclus <- deployresgrp$get_aks("deployclus")$get_cluster()

# create and start the service

Calling the service

It’s possible to call an ML Server prediction service in either synchronous or asynchronous mode. First, we’ll show the synchronous case. We login to the server to get an authentication token, and then call the service URI itself. The path in the URI includes the service name and version we supplied in the publishService function call previously.

Note also that ML Server returns a comprehensive response object, that includes the actual predicted values as a component. For more information, see

# get status of the service, including the IP address
deployclus$get("service bos-rf-mls-svc")
#> Kubernetes operation: get service bos-rf-svc  --kubeconfig=".../kubeconfigxxxx"
#> NAME         TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)           AGE
#> bos-rf-svc   LoadBalancer   12800:30365/TCP   5m 

# obtain an authentication token from the server
response <- POST("",
    body=list(username="admin", password="Microsoft@2018"),
token <- content(response)$access_token

# do the prediction, passing input values in the request body
bos_json <- jsonlite::toJSON(list(inputData=MASS::Boston[1:10,]),
response <- POST("",
    add_headers(Authorization=paste0("Bearer ", token),
content(response, simplifyVector=TRUE)$outputParameters$pred
#> [1] 25.9269 22.0636 34.1876 33.7737 34.8081 27.6394 21.8007 22.3577 16.7812 18.9785

To make an asynchronous (batch) request, we simply change the URI and pass a list of model inputs. The reason for passing a list is because, when in batch mode, ML Server can process multiple inputs in parallel from the one request. Here, we pass the first 20 rows of the Boston dataset as two sets of 10 rows each. We also set the number of threads that ML Server will use to two, via the parallelCount query parameter in the URI.

bos_json_list <- jsonlite::toJSON(list(
response <- POST("",
    add_headers(Authorization=paste0("Bearer ", token),
#> $name
#> [1] "bos-rf"
#> $version
#> [1] "1.0.0"
#> $batchExecutionId
#> [1] "9c6be3d2-f4a0-477b-830d-b07a43403c6e"

Once the request has been sent, we can obtain the predicted values by querying the server again, passing the batch execution ID as a parameter:

response <- GET("",
    add_headers(Authorization=paste0("Bearer ", token),
content(response, simplifyVector=TRUE)$batchExecutionResults$outputParameters
#>                                                                                                 pred
#> 1 25.92692, 22.06357, 34.18765, 33.77370, 34.80810, 27.63945, 21.80073, 22.35773, 16.78120, 18.97845
#> 2 17.22610, 20.05682, 21.63635, 20.13023, 18.69370, 20.14845, 22.33917, 17.92152, 19.33282, 18.75947