deeplearning

Create and train deep neural network of ReLU type with SGD and batch normalization

About

The deeplearning package is an R package that implements deep neural networks in R. It employes Rectifier Linear Unit functions as its building blocks and trains a neural network with stochastic gradient descent method with batch normalization to speed up the training and promote regularization. Neural networks of such kind of architecture and training methods are state of the art and even achieved suplassing human-level performance in ImageNet competition. The deeplearning package is inspired by another R package darch which implements layerwise Restricted Boltzmann Machine pretraining and dropout and uses its class DArch as the default class.

Installtion

Install deeplearning from CRAN

install.packages("deeplearning")

Or install it from github

devtools::install_github("rz1988/deeplearning")

Use deeplearning

Using the deeplearning package is designed to be easy and fun. It only takes two steps to run your first neural network.

In step one, the user will create a new neural network. You will need to specify the strucutre of the neural network which are the number of layers and neurons in the network and the type of activation functions. The default activation is rectifier linear unit function for the hidden layers but you can also use other types of activation such as sigmoidal function or write your own activation function.

In step two, the user will train the neural network with a training input and a traing target. There are a number of other training parameters. For how to choose these training parameters please refer to https://github.com/rz1988/deeplearning.

Examples

Train a neural networ for regression

input <- matrix(runif(1000), 500, 2)
input_valid <- matrix(runif(100), 50, 2)
target <- rowSums(input + input^2)
target_valid <- rowSums(input_valid + input_valid^2)


# create a new deep neural network for classificaiton
dnn_regression <- new_dnn(
                          c(2, 50, 50, 20, 1),  # The layer structure of the deep neural network.
                                                # The first element is the number of input variables.
                                                # The last element is the number of output variables.
                          hidden_layer_default = rectified_linear_unit_function, 
                          # for hidden layers, use rectified_linear_unit_function
                          output_layer_default = linearUnitDerivative # for regression, use linearUnitDerivative function
                          )

dnn_regression <- train_dnn(
                     dnn_regression,

                     # training data
                     input, # input variable for training
                     target, # target variable for training
                     input_valid, # input variable for validation
                     target_valid, # target variable for validation

                     # training parameters
                     learn_rate_weight = exp(-8) * 10, # learning rate for weights, higher if use dropout
                     learn_rate_bias = exp(-8) * 10, # learning rate for biases, hihger if use dropout
                     learn_rate_gamma = exp(-8) * 10, # learning rate for the gamma factor used
                     batch_size = 10, # number of observations in a batch during training. Higher for faster training. Lower for faster convergence
                     batch_normalization = T, # logical value, T to use batch normalization
                     dropout_input = 0.2, # dropout ratio in input.
                     dropout_hidden = 0.5, # dropout ratio in hidden layers
                     momentum_initial = 0.6, # initial momentum in Stochastic Gradient Descent training
                     momentum_final = 0.9, # final momentum in Stochastic Gradient Descent training
                     momentum_switch = 100, # after which the momentum is switched from initial to final momentum
                     num_epochs = 300, # number of iterations in training

                     # Error function
                     error_function = meanSquareErr, # error function to minimize during training. For regression, use meanSquareErr
                     report_classification_error = F # whether to print classification error during training
)

# the prediciton by dnn_regression
pred <- predict(dnn_regression)

# calculate the r-squared of the prediciton
rsq(dnn_regression)

# calcualte the r-squared of the prediciton in validation
rsq(dnn_regression, input = input_valid, target = target_valid)

Train a neural network for classification

input <- matrix(runif(1000), 500, 2)
input_valid <- matrix(runif(100), 50, 2)
target <- (cos(rowSums(input + input^2)) > 0.5) * 1
target_valid <- (cos(rowSums(input_valid + input_valid^2)) > 0.5) * 1

# create a new deep neural network for classificaiton
dnn_classification <- new_dnn(
  c(2, 50, 50, 20, 1),  # The layer structure of the deep neural network.
                        # The first element is the number of input variables.
                        # The last element is the number of output variables.
  hidden_layer_default = rectified_linear_unit_function, # for hidden layers, use rectified_linear_unit_function
  output_layer_default = sigmoidUnitDerivative # for classification, use sigmoidUnitDerivative function
)

dnn_classification <- train_dnn(
  dnn_classification,

  # training data
  input, # input variable for training
  target, # target variable for training
  input_valid, # input variable for validation
  target_valid, # target variable for validation

  # training parameters
  learn_rate_weight = exp(-8) * 10, # learning rate for weights, higher if use dropout
  learn_rate_bias = exp(-8) * 10, # learning rate for biases, hihger if use dropout
  learn_rate_gamma = exp(-8) * 10, # learning rate for the gamma factor used
  batch_size = 10, # number of observations in a batch during training. Higher for faster training. Lower for faster convergence
  batch_normalization = T, # logical value, T to use batch normalization
  dropout_input = 0.2, # dropout ratio in input.
  dropout_hidden = 0.5, # dropout ratio in hidden layers
  momentum_initial = 0.6, # initial momentum in Stochastic Gradient Descent training
  momentum_final = 0.9, # final momentum in Stochastic Gradient Descent training
  momentum_switch = 100, # after which the momentum is switched from initial to final momentum
  num_epochs = 100, # number of iterations in training

  # Error function
  error_function = crossEntropyErr, # error function to minimize during training. For regression, use crossEntropyErr
  report_classification_error = T # whether to print classification error during training
)

# the prediciton by dnn_regression
pred <- predict(dnn_classification)

hist(pred)

# calculate the r-squared of the prediciton
AR(dnn_classification)

# calcualte the r-squared of the prediciton in validation
AR(dnn_classification, input = input_valid, target = target_valid)

# print the layer weights
# this function can print heatmap, histogram, or a surface
print_weight(dnn_regression, 1, type = "heatmap")

print_weight(dnn_regression, 2, type = "surface")

print_weight(dnn_regression, 3, type = "histogram")

References

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2013, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958

Sergey Ioffe, Christian Szegedy, 2015, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv

X. Glorot, A. Bordes, and Y. Bengio, 2011,Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages 315–323

Drees, Martin (2013). "Implementierung und Analyse von tiefen Architekturen in R". German. Master's thesis. Fachhochschule Dortmund.

Rueckert, Johannes (2015). "Extending the Darch library for deep architectures". Project thesis. Fachhochschule Dortmund. URL: saviola.de