19 Building more accurate Decision Trees with The Additive Tree

  .:rtemis 0.8.0: Welcome, egenn
  [x86_64-apple-darwin17.0 (64-bit): Defaulting to 4/4 available cores]
  Documentation & vignettes: https://rtemis.netlify.com

The Additive Tree walks like CART, but learns like Gradient Boosting. In other words, it is an algorithm that builds a single decision tree, similar to CART, but the training is similar to boosting stumps (a stump is a tree of depth 1). This results in increased accuracy without sacrificing interpretability (Luna et al. 2019). As with all supervised learning functions in rtemis, you can either provide a feature matrix / data frame, x, and an outcome vector, y, separately, or provide a combined dataset x alone, in which case the last column should be the outcome.
For classification, the outcome should be a factor where the first level is the ‘positive’ case.

19.1 Train AddTree

Let’s load a dataset from the UCI ML repository:

  • We convert the outcome variable “status” to a factor,
  • move it to the last column,
  • and set levels appropriately
  • We then use the checkData function to examine the dataset
parkinsons <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data")
parkinsons$Status <- factor(parkinsons$status, levels = c(1, 0))
parkinsons$status <- NULL
parkinsons$name <- NULL
checkData(parkinsons)
  Dataset: parkinsons 

  [[ Summary ]]
  195 cases with 23 features: 
  * 22 continuous features 
  * 0 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 duplicated cases 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

Let’s train an Additive Tree model on the full sample:

parkinsons.addtree <- s.ADDTREE(parkinsons, gamma = .8, learning.rate = .1)
[2020-06-23 08:44:56 s.ADDTREE] Hello, egenn 

[2020-06-23 08:44:56 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 195 x 22 
    Training outcome: 195 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2020-06-23 08:44:58 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  145   1
                0    2  47

                   Overall  
      Sensitivity  0.9864 
      Specificity  0.9792 
Balanced Accuracy  0.9828 
              PPV  0.9932 
              NPV  0.9592 
               F1  0.9898 
         Accuracy  0.9846 

  Positive Class:  1 
[2020-06-23 08:44:59 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 08:45:00 s.ADDTREE] Converting paths to rules... 
[2020-06-23 08:45:00 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 08:45:00 s.ADDTREE] Pruning tree... 

[2020-06-23 08:45:00 s.ADDTREE] Run completed in 0.07 minutes (Real: 4.42; User: 2.36; System: 0.17) 

19.1.1 Plot AddTree

AddTree trees are saved as data.tree objects. We can plot them using dplot3.addtree, which creates html output using graphviz.
The first line shows the rule, followed by the N of samples that match the rule, and lastly by the percent of the above that were outcome positive.
By default, leaf nodes with an estimate of 1 (positive class) are orange, and those with estimate 0 are teal.
You can mouse over nodes, edges, and the plot background for some popup info. (The font size in this html render may appear slightly larger for the given box size; it renders correctly in RStudio)

dplot3.addtree(parkinsons.addtree)

19.1.3 Predict

To get predicted values, use the predict S3 generic with the familiar syntax
predict(mod, newdata). If newdata is not supplied, it returns the training set predictions (which we call the ‘fitted’ values):

predict(parkinsons.addtree)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1
 [38] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1
 [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[112] 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0
[186] 0 0 0 0 0 0 1 0 0 0
Levels: 1 0

19.1.4 Training and testing

  • Create resamples of our data
  • Visualize them (white is testing, teal is training)
  • Split data to train and test sets
res <- resample(parkinsons, n.resamples = 10, resampler = "kfold", verbose = TRUE)
[2020-06-23 08:50:35 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2020-06-23 08:50:35 kfold] Using max n bins possible = 2 

[2020-06-23 08:50:35 resample] Created 10 independent folds 
mplot3.res(res)

parkinsons.train <- parkinsons[res$Fold_1, ]
parkinsons.test <- parkinsons[-res$Fold_1, ]
parkinsons.addtree <- s.ADDTREE(parkinsons.train, x.test = parkinsons.test,
gamma = .8, learning.rate = .1)
[2020-06-23 08:50:35 s.ADDTREE] Hello, egenn 

[2020-06-23 08:50:35 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2020-06-23 08:50:37 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  130   1
                0    2  42

                   Overall  
      Sensitivity  0.9848 
      Specificity  0.9767 
Balanced Accuracy  0.9808 
              PPV  0.9924 
              NPV  0.9545 
               F1  0.9886 
         Accuracy  0.9829 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  14  0
                0   1  5

                   Overall  
      Sensitivity  0.9333 
      Specificity  1.0000 
Balanced Accuracy  0.9667 
              PPV  1.0000 
              NPV  0.8333 
               F1  0.9655 
         Accuracy  0.9500 

  Positive Class:  1 
[2020-06-23 08:50:38 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 08:50:38 s.ADDTREE] Converting paths to rules... 
[2020-06-23 08:50:38 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 08:50:38 s.ADDTREE] Pruning tree... 


[2020-06-23 08:50:38 s.ADDTREE] Run completed in 0.06 minutes (Real: 3.51; User: 2.24; System: 0.13) 

19.1.5 Hyperparameter tuning

rtemis supervised learners, like s.ADDTREE, support automatic hyperparameter tuning. When more than a single value is passed to a tunable argument, grid search with internal resampling takes place using all available cores (threads).

parkinsons.addtree.tune <- s.ADDTREE(parkinsons.train, x.test = parkinsons.test,
gamma = seq(.6, .9, .1), learning.rate = .1)
[2020-06-23 08:50:39 s.ADDTREE] Hello, egenn 

[2020-06-23 08:50:39 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2020-06-23 08:50:39 gridSearchLearn] Running grid search... 
[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2020-06-23 08:50:39 kfold] Using max n bins possible = 2 

[2020-06-23 08:50:39 resample] Created 5 independent folds 
[[ Search parameters ]]
    grid.params:  
                         gamma: 0.6, 0.7, 0.8, 0.9 
                     max.depth: 30 
                 learning.rate: 0.1 
                   min.hessian: 0.001 
   fixed.params:  
                 catPredictors: NULL 
                           ipw: TRUE 
                      ipw.type: 2 
                      upsample: FALSE 
                 resample.seed: NULL 
[2020-06-23 08:50:39 gridSearchLearn] Tuning Additive Tree by exhaustive grid search: 
[2020-06-23 08:50:39 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin17.0)
 
[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
                      gamma: 0.7 
                  max.depth: 30 
              learning.rate: 0.1 
                min.hessian: 0.001 

[2020-06-23 08:50:51 gridSearchLearn] Run completed in 0.19 minutes (Real: 11.17; User: 0.03; System: 0.04) 

[2020-06-23 08:50:51 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  127   1
                0    5  42

                   Overall  
      Sensitivity  0.9621 
      Specificity  0.9767 
Balanced Accuracy  0.9694 
              PPV  0.9922 
              NPV  0.8936 
               F1  0.9769 
         Accuracy  0.9657 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  14  0
                0   1  5

                   Overall  
      Sensitivity  0.9333 
      Specificity  1.0000 
Balanced Accuracy  0.9667 
              PPV  1.0000 
              NPV  0.8333 
               F1  0.9655 
         Accuracy  0.9500 

  Positive Class:  1 
[2020-06-23 08:50:51 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 08:50:51 s.ADDTREE] Converting paths to rules... 
[2020-06-23 08:50:51 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 08:50:52 s.ADDTREE] Pruning tree... 


[2020-06-23 08:50:52 s.ADDTREE] Run completed in 0.21 minutes (Real: 12.86; User: 1.27; System: 0.11) 

We can define tuning resampling parameters with the grid.resampler.rtSet. The rtset.resample convenience function helps easily build the list needed by grid.resampler.rtset, providing auto-completion.

parkinsons.addtree.tune <- s.ADDTREE(parkinsons.train, x.test = parkinsons.test,
gamma = seq(.6, .9, .1), learning.rate = .1,
grid.resampler.rtset = rtset.resample(resampler = 'strat.boot',
n.resamples = 5))
[2020-06-23 08:50:54 s.ADDTREE] Hello, egenn 

[2020-06-23 08:50:54 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2020-06-23 08:50:54 gridSearchLearn] Running grid search... 
[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2020-06-23 08:50:54 kfold] Using max n bins possible = 2 

[2020-06-23 08:50:54 resample] Created 5 independent folds 
[[ Search parameters ]]
    grid.params:  
                         gamma: 0.6, 0.7, 0.8, 0.9 
                     max.depth: 30 
                 learning.rate: 0.1 
                   min.hessian: 0.001 
   fixed.params:  
                 catPredictors: NULL 
                           ipw: TRUE 
                      ipw.type: 2 
                      upsample: FALSE 
                 resample.seed: NULL 
[2020-06-23 08:50:54 gridSearchLearn] Tuning Additive Tree by exhaustive grid search: 
[2020-06-23 08:50:54 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin17.0)
 
[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
                      gamma: 0.7 
                  max.depth: 30 
              learning.rate: 0.1 
                min.hessian: 0.001 

[2020-06-23 08:51:12 gridSearchLearn] Run completed in 0.30 minutes (Real: 18.06; User: 0.03; System: 0.04) 

[2020-06-23 08:51:12 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  127   1
                0    5  42

                   Overall  
      Sensitivity  0.9621 
      Specificity  0.9767 
Balanced Accuracy  0.9694 
              PPV  0.9922 
              NPV  0.8936 
               F1  0.9769 
         Accuracy  0.9657 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  14  0
                0   1  5

                   Overall  
      Sensitivity  0.9333 
      Specificity  1.0000 
Balanced Accuracy  0.9667 
              PPV  1.0000 
              NPV  0.8333 
               F1  0.9655 
         Accuracy  0.9500 

  Positive Class:  1 
[2020-06-23 08:51:12 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 08:51:13 s.ADDTREE] Converting paths to rules... 
[2020-06-23 08:51:13 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 08:51:13 s.ADDTREE] Pruning tree... 


[2020-06-23 08:51:13 s.ADDTREE] Run completed in 0.33 minutes (Real: 19.93; User: 1.26; System: 0.11) 

Let’s look at the tuning results (this is a small dataset and tuning may not be very accurate):

parkinsons.addtree.tune$extra$gridSearch$tune.results
  gamma max.depth learning.rate min.hessian Sensitivity Specificity
1   0.6        30           0.1       0.001   0.8330484   0.7277778
2   0.7        30           0.1       0.001   0.8709402   0.7027778
3   0.8        30           0.1       0.001   0.8780627   0.6750000
4   0.9        30           0.1       0.001   0.8854701   0.6527778
  Balanced Accuracy       PPV       NPV        F1  Accuracy param.id
1         0.7804131 0.9056018 0.5893590 0.8659675 0.8057516        1
2         0.7868590 0.9010317 0.6371140 0.8847323 0.8284687        2
3         0.7765313 0.8940023 0.6500000 0.8846844 0.8276471        3
4         0.7691239 0.8883920 0.6630952 0.8853481 0.8276471        4

19.1.6 Nested resampling: Cross-validation and hyperparameter tuning

We now use the core rtemis supervised learning function elevate to use nested resampling for cross-validation and hyperparameter tuning:

parkinsons.addtree.10fold <- elevate(parkinsons, mod = 'addtree', 
                                 gamma = c(.8, .9), 
                                 learning.rate = c(.01, .05),
                                 seed = 2018)
[2020-06-23 08:51:16 elevate] Hello, egenn 

[[ Classification Input Summary ]]
   Training features: 195 x 22 
    Training outcome: 195 x 1 

[2020-06-23 08:51:16 resLearn] Training Additive Tree on 10 stratified subsamples... 

[[ elevate ADDTREE ]]
   N repeats = 1 
   N resamples = 10 
   Resampler = strat.sub 
   Mean Balanced Accuracy of 10 test sets in each repeat = 0.80


[2020-06-23 08:56:12 elevate] Run completed in 4.93 minutes (Real: 295.74; User: 24.19; System: 1.38) 

We can get a summary of the cross-validation by printing the elevate object:

parkinsons.addtree.10fold
.:rtemis Cross-Validated Model 
ADDTREE (Additive Tree)
                 Algorithm: ADDTREE (Additive Tree)
                Resampling: n = 10, type = strat.sub
              N of repeats: 1 
 Mean Balanced Accuracy across repeats = 0.8 

19.2 Bagging the Additive Tree (Addtree Random Forest)

You can use rtemisbag function to build a random forest with AddTree base learners.

data(Sonar, package = 'mlbench')
res <- resample(Sonar, seed = 2020)
[2020-06-23 08:57:51 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 
[2020-06-23 08:57:51 strat.sub] Using max n bins possible = 2 

[2020-06-23 08:57:51 resample] Created 10 stratified subsamples 
sonar.train <- Sonar[res$Subsample_1, ]
sonar.test <- Sonar[-res$Subsample_1, ]
mod.addtree <- s.ADDTREE(sonar.train, sonar.test)
[2020-06-23 08:57:51 s.ADDTREE] Hello, egenn 

[2020-06-23 08:57:51 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 155 x 60 
    Training outcome: 155 x 1 
    Testing features: 53 x 60 
     Testing outcome: 53 x 1 

[2020-06-23 08:57:51 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  M   R   
                M  82   2
                R   1  70

                   Overall  
      Sensitivity  0.9880 
      Specificity  0.9722 
Balanced Accuracy  0.9801 
              PPV  0.9762 
              NPV  0.9859 
               F1  0.9820 
         Accuracy  0.9806 

  Positive Class:  M 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  M   R   
                M  19   4
                R   9  21

                   Overall  
      Sensitivity  0.6786 
      Specificity  0.8400 
Balanced Accuracy  0.7593 
              PPV  0.8261 
              NPV  0.7000 
               F1  0.7451 
         Accuracy  0.7547 

  Positive Class:  M 
[2020-06-23 08:57:53 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 08:57:54 s.ADDTREE] Converting paths to rules... 
[2020-06-23 08:57:54 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 08:57:54 s.ADDTREE] Pruning tree... 


[2020-06-23 08:57:55 s.ADDTREE] Run completed in 0.07 minutes (Real: 4.35; User: 2.64; System: 0.09) 

Let’s train a random forest using AddTree base learner (with just 20 trees for this example)

bag.addtree <- bag(sonar.train, sonar.test, mod = "ADDTREE", k = 20, mtry = 5)
[2020-06-23 08:57:59 bag] Hello, egenn 

[2020-06-23 08:57:59 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 155 x 60 
    Training outcome: 155 x 1 
    Testing features: 53 x 60 
     Testing outcome: 53 x 1 
[[ Parameters ]]
          mod: ADDTREE 
   mod.params: (empty list) 
[2020-06-23 08:57:59 bag] Bagging 20 Additive Tree... 
[2020-06-23 08:57:59 strat.sub] Using max n bins possible = 2 

[2020-06-23 08:57:59 resLearn] Training Additive Tree on 20 stratified bootstraps... 
[2020-06-23 08:57:59 resLearn] Parallelizing by forking on 4 cores... 

[[ Classification Training Summary ]]
                   Reference 
        Estimated  M   R   
                M  82   1
                R   1  71

                   Overall  
      Sensitivity  0.9880 
      Specificity  0.9861 
Balanced Accuracy  0.9870 
              PPV  0.9880 
              NPV  0.9861 
               F1  0.9880 
         Accuracy  0.9871 

  Positive Class:  M 

[[ Classification Testing Summary ]]
                   Reference 
        Estimated  M   R   
                M  21   4
                R   7  21

                   Overall  
      Sensitivity  0.7500 
      Specificity  0.8400 
Balanced Accuracy  0.7950 
              PPV  0.8400 
              NPV  0.7500 
               F1  0.7925 
         Accuracy  0.7925 

  Positive Class:  M 


[2020-06-23 08:58:46 bag] Run completed in 0.79 minutes (Real: 47.60; User: 3.30; System: 0.83) 

19.3 More example datasets

19.3.1 OpenML: sleep

Let’s grab a dataset from the massive OpenML repository.
(We can read the .arff files as CSVs)

sleep <- read.csv("https://www.openml.org/data/get_csv/53273/sleep.arff",
                  header = TRUE, na.strings = "?")
checkData(sleep)
  Dataset: sleep 

  [[ Summary ]]
  62 cases with 8 features: 
  * 4 continuous features 
  * 3 integer features 
  * 0 categorical features
  * 0 constant features 
  * 0 duplicated cases 
  * 2 features include 'NA' values; 8 'NA' values total
    ** Max percent missing in a feature is 6.45% (max_life_span)
    ** Max percent missing in a case is 25% (case #13)

  [[ Recommendations ]]
  * Consider imputing missing values or use complete cases only
  * Check the 3 integer features and consider if they should be converted to factors

We can impute missing data with preprocess:

sleep <- preprocess(sleep, impute = TRUE)
[2020-06-23 09:00:12 preprocess] Imputing missing values using missRanger... 

Missing value imputation by random forests

  Variables to impute:      max_life_span, gestation_time
  Variables used to impute: body_weight, brain_weight, max_life_span, gestation_time, predation_index, sleep_exposure_index, danger_index, binaryClass
iter 1: ..
iter 2: ..
iter 3: ..
iter 4: ..
[2020-06-23 09:00:12 preprocess] Done 

Train and plot AddTree:

sleep.addtree <- s.ADDTREE(sleep, gamma = .8, learning.rate = .1)
[2020-06-23 09:00:12 s.ADDTREE] Hello, egenn 

[2020-06-23 09:00:12 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 62 x 7 
    Training outcome: 62 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2020-06-23 09:00:12 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  N   P   
                N  31   1
                P   2  28

                   Overall  
      Sensitivity  0.9394 
      Specificity  0.9655 
Balanced Accuracy  0.9525 
              PPV  0.9688 
              NPV  0.9333 
               F1  0.9538 
         Accuracy  0.9516 

  Positive Class:  N 
[2020-06-23 09:00:13 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 09:00:13 s.ADDTREE] Converting paths to rules... 
[2020-06-23 09:00:13 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 09:00:14 s.ADDTREE] Pruning tree... 


[2020-06-23 09:00:14 s.ADDTREE] Run completed in 0.03 minutes (Real: 1.79; User: 1.29; System: 0.05) 
dplot3.addtree(sleep.addtree)

19.3.2 PMLB: chess

Let’s load a dataset from the Penn ML Benchmarks github repository.
R allows us to read a gzipped file and unzip on the fly:

  • We open a remote connection to a gzipped tab-separated file,
  • read it in R with read.table,
  • set the target levels,
  • and check the data
rzd <- gzcon(url("https://github.com/EpistasisLab/penn-ml-benchmarks/raw/master/datasets/classification/chess/chess.tsv.gz"),
             text = TRUE)
chess <- read.table(rzd, header = TRUE)
chess$target <- factor(chess$target, levels = c(1, 0))
checkData(chess)
  Dataset: chess 

  [[ Summary ]]
  3196 cases with 37 features: 
  * 0 continuous features 
  * 36 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 duplicated cases 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good
chess.addtree <- s.ADDTREE(chess, gamma = .8, learning.rate = .1)
[2020-06-23 09:00:18 s.ADDTREE] Hello, egenn 

[2020-06-23 09:00:18 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 3196 x 36 
    Training outcome: 3196 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2020-06-23 09:00:18 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1     0     
                1  1623    28
                0    46  1499

                   Overall  
      Sensitivity  0.9724 
      Specificity  0.9817 
Balanced Accuracy  0.9771 
              PPV  0.9830 
              NPV  0.9702 
               F1  0.9777 
         Accuracy  0.9768 

  Positive Class:  1 
[2020-06-23 09:00:27 s.ADDTREE] Traversing tree by preorder... 
[2020-06-23 09:00:27 s.ADDTREE] Converting paths to rules... 
[2020-06-23 09:00:27 s.ADDTREE] Converting to data.tree object... 
[2020-06-23 09:00:28 s.ADDTREE] Pruning tree... 


[2020-06-23 09:00:29 s.ADDTREE] Run completed in 0.19 minutes (Real: 11.12; User: 8.26; System: 0.34) 
dplot3.addtree(chess.addtree)

References

Luna, José Marcio, Efstathios D Gennatas, Lyle H Ungar, Eric Eaton, Eric S Diffenderfer, Shane T Jensen, Charles B Simone, Jerome H Friedman, Timothy D Solberg, and Gilmer Valdes. 2019. “Building More Accurate Decision Trees with the Additive Tree.” Proceedings of the National Academy of Sciences 116 (40): 19887–93.