15 Boosting

  .:rtemis 0.8.0: Welcome, egenn
  [x86_64-apple-darwin17.0 (64-bit): Defaulting to 4/4 available cores]
  Documentation & vignettes: https://rtemis.netlify.com

Boosting is one of the most powerful techniques in supervised learning. rtemis allows you to easily apply boosting to any learner for regression (but, like bagging, don’t try boosting an ordinary least squares model).

Let’s create some synthetic data:

set.seed(2018)
x <- rnormmat(500, 50)
colnames(x) <- paste0("Feature", 1:50)
w <- rnorm(50)
y <- x %*% w + rnorm(500)
dat <- data.frame(x, y)
res <- resample(dat, seed = 2018)
[2020-06-23 08:40:22 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 

[2020-06-23 08:40:22 resample] Created 10 stratified subsamples 
dat.train <- dat[res$Subsample_1, ]
dat.valid <- dat[-res$Subsample_1, ]

15.1 Boost CART stumps

Boosting works best by training a series of many weak learners. Let’s start by boosting the simplest trees, those with depth = 1, a.k.a. stumps.

boost.cart <- boost(dat.train, x.valid = dat.valid,
                    mod = 'cart',
                    maxdepth = 1,
                    max.iter = 50)
[2020-06-23 08:40:22 boost] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: Not available
     Testing outcome: Not available
[[ Parameters ]]
               mod: CART 
        mod.params:  
                    maxdepth: 1 
              init: -0.182762669446564 
          max.iter: 50 
     learning.rate: 0.1 
         tolerance: 0 
   tolerance.valid: 1e-05 
[2020-06-23 08:40:25 boost] [ Boosting Classification and Regression Trees... ] 
[2020-06-23 08:40:25 boost] Iteration #5: Training MSE = 49.08; Validation MSE = 52.02 
[2020-06-23 08:40:26 boost] Iteration #10: Training MSE = 45.91; Validation MSE = 49.65 
[2020-06-23 08:40:26 boost] Iteration #15: Training MSE = 43.30; Validation MSE = 47.54 
[2020-06-23 08:40:27 boost] Iteration #20: Training MSE = 40.92; Validation MSE = 45.75 
[2020-06-23 08:40:28 boost] Iteration #25: Training MSE = 38.78; Validation MSE = 44.10 
[2020-06-23 08:40:28 boost] Iteration #30: Training MSE = 36.85; Validation MSE = 42.97 
[2020-06-23 08:40:29 boost] Iteration #35: Training MSE = 35.08; Validation MSE = 41.76 
[2020-06-23 08:40:29 boost] Iteration #40: Training MSE = 33.45; Validation MSE = 40.69 
[2020-06-23 08:40:30 boost] Iteration #45: Training MSE = 31.93; Validation MSE = 39.59 
[2020-06-23 08:40:30 boost] Iteration #50: Training MSE = 30.53; Validation MSE = 38.68 
[2020-06-23 08:40:30 boost] Reached max iterations 

[[ Regression Training Summary ]]
    MSE = 30.53 (42.94%)
   RMSE = 5.53 (24.46%)
    MAE = 4.36 (23.22%)
      r = 0.82 (p = 1.3e-91)
    rho = 0.77 (p = 0.00)
   R sq = 0.43

[2020-06-23 08:40:30 boost] Run completed in 0.14 minutes (Real: 8.21; User: 3.58; System: 0.44) 

We notice the validation error is quite higher than the training error and is also less smooth.

15.2 Boost CART stumps: step slower

To get better results out of boosting, it usually helps to decrease the learning rate and increase the number of steps. From an optimization point of view, the lower learning rate does not mean that you simply take more, smallet steps instead of fewer bigger steps, but it makes you follow a different, more precise optimization path.

boost.cart <- boost(dat.train, x.valid = dat.valid, mod = 'cart',
                    maxdepth = 1,
                    max.iter = 500, learning.rate = .05,
                    print.progress.every = 100)
[2020-06-23 08:40:38 boost] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: Not available
     Testing outcome: Not available
[[ Parameters ]]
               mod: CART 
        mod.params:  
                    maxdepth: 1 
              init: -0.182762669446564 
          max.iter: 500 
     learning.rate: 0.05 
         tolerance: 0 
   tolerance.valid: 1e-05 
[2020-06-23 08:40:38 boost] [ Boosting Classification and Regression Trees... ] 
[2020-06-23 08:40:45 boost] Iteration #100: Training MSE = 30.84; Validation MSE = 38.78 
[2020-06-23 08:40:53 boost] Iteration #200: Training MSE = 20.96; Validation MSE = 31.99 
[2020-06-23 08:41:01 boost] Iteration #300: Training MSE = 15.16; Validation MSE = 27.27 
[2020-06-23 08:41:07 boost] Iteration #400: Training MSE = 11.38; Validation MSE = 23.75 
[2020-06-23 08:41:13 boost] Iteration #500: Training MSE = 8.78; Validation MSE = 21.26 
[2020-06-23 08:41:13 boost] Reached max iterations 

[[ Regression Training Summary ]]
    MSE = 8.78 (83.58%)
   RMSE = 2.96 (59.48%)
    MAE = 2.36 (58.35%)
      r = 0.95 (p = 1.1e-195)
    rho = 0.94 (p = 0.00)
   R sq = 0.84

[2020-06-23 08:41:13 boost] Run completed in 0.59 minutes (Real: 35.59; User: 20.21; System: 2.85) 

15.3 Boost deep CARTs

Let’s see what can go wrong if your base learners are too strong:

boost.cart <- boost(dat.train, x.valid = dat.valid, mod = 'cart',
                    maxdepth = 20,
                    max.iter = 50)
[2020-06-23 08:43:13 boost] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: Not available
     Testing outcome: Not available
[[ Parameters ]]
               mod: CART 
        mod.params:  
                    maxdepth: 20 
              init: -0.182762669446564 
          max.iter: 50 
     learning.rate: 0.1 
         tolerance: 0 
   tolerance.valid: 1e-05 
[2020-06-23 08:43:13 boost] [ Boosting Classification and Regression Trees... ] 
[2020-06-23 08:43:14 boost] Iteration #5: Training MSE = 25.97; Validation MSE = 44.30 
[2020-06-23 08:43:14 boost] Iteration #10: Training MSE = 12.59; Validation MSE = 36.50 
[2020-06-23 08:43:15 boost] Iteration #15: Training MSE = 6.06; Validation MSE = 33.27 
[2020-06-23 08:43:15 boost] Iteration #20: Training MSE = 2.96; Validation MSE = 31.19 
[2020-06-23 08:43:15 boost] Iteration #25: Training MSE = 1.46; Validation MSE = 30.16 
[2020-06-23 08:43:15 boost] Iteration #30: Training MSE = 0.72; Validation MSE = 29.47 
[2020-06-23 08:43:16 boost] Iteration #35: Training MSE = 0.35; Validation MSE = 28.75 
[2020-06-23 08:43:18 boost] Iteration #40: Training MSE = 0.17; Validation MSE = 28.29 
[2020-06-23 08:43:18 boost] Iteration #45: Training MSE = 0.09; Validation MSE = 28.09 
[2020-06-23 08:43:19 boost] Iteration #50: Training MSE = 0.04; Validation MSE = 27.86 
[2020-06-23 08:43:19 boost] Reached max iterations 

[[ Regression Training Summary ]]
    MSE = 0.04 (99.92%)
   RMSE = 0.20 (97.21%)
    MAE = 0.16 (97.10%)
      r = 1.00 (p = 0.00)
    rho = 1.00 (p = 0.00)
   R sq = 1.00

[2020-06-23 08:43:19 boost] Run completed in 0.09 minutes (Real: 5.61; User: 2.28; System: 0.29) 

We notice that training error quickly approached zero, while testing error remained high, i.e. the strong base learners overfit the data.

15.4 Boost any learner

While decision trees are the most common base learners used in boosting, you can boost any algorithm:

15.4.1 Projection pursuit Regression

boost.ppr <- boost(dat.train, x.valid = dat.valid, mod = 'ppr',
                   max.iter = 10)
[2020-06-23 08:43:32 boost] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: Not available
     Testing outcome: Not available
[[ Parameters ]]
               mod: PPR 
        mod.params: (empty list) 
              init: -0.182762669446564 
          max.iter: 10 
     learning.rate: 0.1 
         tolerance: 0 
   tolerance.valid: 1e-05 
[2020-06-23 08:43:32 boost] [ Boosting Projection Pursuit Regression... ] 
[2020-06-23 08:43:34 boost] Iteration #5: Training MSE = 18.78; Validation MSE = 20.45 
[2020-06-23 08:43:35 boost] Iteration #10: Training MSE = 6.62; Validation MSE = 7.99 
[2020-06-23 08:43:35 boost] Reached max iterations 

[[ Regression Training Summary ]]
    MSE = 6.62 (87.62%)
   RMSE = 2.57 (64.82%)
    MAE = 2.00 (64.78%)
      r = 1.00 (p = 0.00)
    rho = 1.00 (p = 0.00)
   R sq = 0.88

[2020-06-23 08:43:35 boost] Run completed in 0.05 minutes (Real: 2.86; User: 1.99; System: 0.08) 

15.4.2 Multivariate Adaptive Regression Splines (MARS)

boost.mars <- boost(dat.train,x.valid = dat.valid, mod = 'mars', max.iter = 30)
[2020-06-23 08:43:35 boost] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: Not available
     Testing outcome: Not available
[[ Parameters ]]
               mod: MARS 
        mod.params: (empty list) 
              init: -0.182762669446564 
          max.iter: 30 
     learning.rate: 0.1 
         tolerance: 0 
   tolerance.valid: 1e-05 
[2020-06-23 08:43:35 boost] [ Boosting Multivariate Adaptive Regression Splines... ] 

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:38 boost] Iteration #5: Training MSE = 25.85; Validation MSE = 31.00 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:40 boost] Iteration #10: Training MSE = 18.45; Validation MSE = 24.42 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:41 boost] Iteration #15: Training MSE = 13.70; Validation MSE = 19.16 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:43 boost] Iteration #20: Training MSE = 11.65; Validation MSE = 17.89 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:45 boost] Iteration #25: Training MSE = 9.82; Validation MSE = 16.40 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
Warning in if (class(y) == "character") {: the condition has length > 1 and only
the first element will be used

[[ Parameters ]]
   pmethod: forward 
    degree: 2 
    nprune: NULL 
    ncross: 1 
     nfold: 4 
   penalty: 3 
        nk: 101 
[2020-06-23 08:43:46 boost] Iteration #30: Training MSE = 8.89; Validation MSE = 16.03 
[2020-06-23 08:43:46 boost] Reached max iterations 

[[ Regression Training Summary ]]
    MSE = 8.89 (83.39%)
   RMSE = 2.98 (59.25%)
    MAE = 2.35 (58.65%)
      r = 0.95 (p = 1e-192)
    rho = 0.94 (p = 0.00)
   R sq = 0.83

[2020-06-23 08:43:46 boost] Run completed in 0.18 minutes (Real: 10.69; User: 7.12; System: 0.41)