20 Introduction to Neural Networks

  .:rtemis 0.78.9003: Welcome, egenn
  [x86_64-apple-darwin15.6.0 (64-bit): Defaulting to 4/4 available cores]

  Need help? Online documentation & vignettes: https://rtemis.netlify.com

This is an introduction to Neural Networks in R using the MXN and Keras + TensorFlow frameworks. rtemis includes functions s.MXN annd s.TFN to easily train networks using the two libraries, respectively. Here, we start by looking at training simple networks using each package directly, to get an understanding of the basics.

20.1 Data

Let’s create some synthetic data. First, we create a simple dataset of 500 cases with 5 features each of which are drawn from a random normal distribution. Then we draw 5 random normal weights. Finally, we matrix multiply our features and weights to get our outcome, y:

x <- rnormmat(500, 5, seed = 2019)
w <- rnorm(5)
y <- c(x %*% w + rnorm(500))

20.2 GLM

Let’s start by getting the ordinary least squares solution using s.GLM

mod.glm <- s.GLM(x, y)
[2019-06-29 00:47:22 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 5 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-06-29 00:47:23 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 1.04 (86.23%)
   RMSE = 1.02 (62.89%)
    MAE = 0.81 (62.24%)
      r = 0.93 (p = 1.6e-216)
    rho = 0.92 (p = 0)
   R sq = 0.86


[2019-06-29 00:47:23 s.GLM] Run completed in 0.02 minutes (Real: 1.33; User: 1.05; System: 0.09) 

20.3 MXNet OLS

Now, let’s use MXNet to get a least squares fit by building a very simple network:

library(mxnet)
Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
net <- mx.symbol.Variable("data")
net <- mx.symbol.FullyConnected(net, num.hidden = 1, name = "FC1")
net <- mx.symbol.LinearRegressionOutput(net, name = "Output")
mod <- mx.model.FeedForward.create(net,
                                   x, y,
                                   initializer = mx.init.Xavier(),
                                   num.round = 10,
                                   optimizer = "sgd",
                                   learning.rate = 1,
                                   array.batch.size = length(y),
                                   array.layout = "rowmajor")
Start training with 1 devices
mxnet::graph.viz(net)

Let’s compare to the output of GLM

rbind(mod.glm$mod$coefficients, c(as.array(mod$arg.params$FC1_bias), as.array(mod$arg.params$FC1_weight)))
     (Intercept)       V1        V2        V3        V4         V5
[1,] -0.08339276 2.151795 -1.152021 0.4328304 0.4941808 -0.4252314
[2,] -0.08339272 2.151795 -1.152021 0.4328305 0.4941808 -0.4252315

Great - We got the exact same coefficients.

20.4 Keras + TensorFlow OLS

Let’s build a similar network in Keras. It is a very similar process to above:

library(keras)
net <- keras_model_sequential()
net <- layer_dense(net,
                   units = 1,
                   input_shape = 5,
                   kernel_initializer = initializer_glorot_uniform(),
                   activation = "linear",
                   name = "FC1")
net <- compile(net,
               loss = "mean_squared_error",
               optimizer = optimizer_sgd(lr = .1),
               metrics = "mse")
mod <- fit(net,
           x, y,
           epochs = 30,
           batch_size = NROW(y))
keras.weights <- get_weights(net)
rbind(c(keras.weights[[2]], keras.weights[[1]]), mod.glm$mod$coefficients)
     (Intercept)       V1        V2        V3        V4         V5
[1,] -0.08499461 2.146910 -1.149301 0.4327785 0.4943358 -0.4214662
[2,] -0.08339276 2.151795 -1.152021 0.4328304 0.4941808 -0.4252314
keras.fitted <- c(predict(net, x))
mplot3.fit(mod.glm$fitted, keras.fitted)

The coefficients are not identical, but very close. Note that we had to take learning rate down to .1 to achieve these results in Keras.

20.5 s.MXN

Above, we manually defined networks using MXNet and Keras to get a feel of the very basics of building a neural network. Now let’s see how you would achieve this in rtemis:

x <- rnormmat(400, 50, seed = 2019)
w <- rnorm(50)
y <- x %*% w + rnorm(400)
res <- resample(y)
dat <- data.frame(x, y)
dat.train <- dat[res$Subsample_1, ]
dat.test <- dat[-res$Subsample_1, ]
mod.mxn <- s.MXN(dat.train, dat.test, n.hidden.nodes = 50)
[2019-06-29 00:47:37 s.MXN] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 301 x 50 
    Training outcome: 301 x 1 
    Testing features: 99 x 50 
     Testing outcome: 99 x 1 

[2019-06-29 00:47:37 s.MXN] Training Neural Network Regression with 1 hidden layer...
 
[2019-06-29 00:47:39 epoch.end.callback] Early stopping threshold reached.
           absolute.threshold: NA 
                     minimize: TRUE 
                   last.value: 0.20918270945549 
                 check.thresh: FALSE 
              relative.change: NA 
           relative.threshold: NA 
                check.rthresh: FALSE 
   relativeVariance.threshold: 1e-05 
             relativeVariance: 9.31230891104807e-06 
                   check.rvar: TRUE 
                         stop: TRUE 
                      restart: FALSE 


[[ MXN Regression Training Summary ]]
    MSE = 0.23 (99.47%)
   RMSE = 0.48 (92.72%)
    MAE = 0.40 (92.26%)
      r = 1.00 (p = 0)
    rho = 1.00 (p = 0)
   R sq = 0.99

[[ MXN Regression Testing Summary ]]
    MSE = 1.91 (94.01%)
   RMSE = 1.38 (75.53%)
    MAE = 1.18 (73.71%)
      r = 0.97 (p = 2.4e-62)
    rho = 0.97 (p = 0)
   R sq = 0.94


[2019-06-29 00:47:39 s.MXN] Run completed in 0.03 minutes (Real: 2.07; User: 2.37; System: 0.35) 

20.6 s.TFN

mod <- s.TFN(dat.train, dat.test, n.hidden.nodes = 50, epochs = 200)
[2019-06-29 00:47:42 s.TFN] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 301 x 50 
    Training outcome: 301 x 1 
    Testing features: 99 x 50 
     Testing outcome: 99 x 1 

[2019-06-29 00:47:42 s.TFN] Training Neural Network Regression with 1 hidden layer...
 

[[ TFN Regression Training Summary ]]
    MSE = 0.42 (99.01%)
   RMSE = 0.65 (90.07%)
    MAE = 0.33 (93.57%)
      r = 1.00 (p = 2.7e-304)
    rho = 0.99 (p = 0)
   R sq = 0.99

[[ TFN Regression Testing Summary ]]
    MSE = 2.48 (92.21%)
   RMSE = 1.58 (72.10%)
    MAE = 1.24 (72.21%)
      r = 0.96 (p = 4.1e-56)
    rho = 0.95 (p = 0)
   R sq = 0.92


[2019-06-29 00:47:48 s.TFN] Run completed in 0.11 minutes (Real: 6.72; User: 5.90; System: 0.41)