16 RuleFit

RuleFit is a powerful algorithm for regression and classification, which uses gradient boosting and the LASSO to train a highly accurate and interpretable model.

Given a dataset X and an outcome y:

  1. Train a Gradient Boosting model on the raw inputs X to predict y
  2. Take all decision tree base learners from (1) and convert them to a list of rules R (by following all paths from root node to leaf node). The rules represent a transformation of the raw input features.
  3. Train a LASSO model on the ruleset R to predict y.

Thanks to the LASSO’s variable selection, step 3. will usually greatly reduce the large number of rules in R with no loss of accuracy. In fact, RuleFit may outperform gradient boosting.

RuleFit Summary

Figure 16.1: RuleFit Summary

16.1 Data

Let’s grab the Parkinsons dataset from the UCI repository:

parkinsons <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data")
parkinsons$Status <- factor(parkinsons$status, levels = c(1, 0))
parkinsons$status <- NULL
parkinsons$name <- NULL
checkData(parkinsons)
  Dataset: parkinsons 

  [[ Summary ]]
  195 cases with 23 features: 
  * 22 continuous features 
  * 0 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 duplicated cases 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

16.1.1 Resample

res <- resample(parkinsons, seed = 2019)
[2020-06-23 08:43:52 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 
[2020-06-23 08:43:52 strat.sub] Using max n bins possible = 2 

[2020-06-23 08:43:52 resample] Created 10 stratified subsamples 
park.train <- parkinsons[res$Subsample_1, ]
park.test <- parkinsons[-res$Subsample_1, ]

16.2 RuleFeat

Since RuleFit is trademarked, the function is called s.RULEFEAT in rtemis.

park.rf <- s.RULEFEAT(park.train, park.test)
[2020-06-23 08:43:52 s.RULEFEAT] Hello, egenn 
[2020-06-23 08:43:56 s.RULEFEAT] Running Gradient Boosting... 
[2020-06-23 08:43:56 s.GBM] Hello, egenn 

[2020-06-23 08:43:56 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 146 x 22 
    Training outcome: 146 x 1 
    Testing features: Not available
     Testing outcome: Not available
[2020-06-23 08:43:56 s.GBM] Distribution set to bernoulli 

[2020-06-23 08:43:56 s.GBM] Running Gradient Boosting Classification with a bernoulli loss function 

[[ Parameters ]]
             n.trees: 100 
   interaction.depth: 5 
           shrinkage: 0.001 
        bag.fraction: 0.5 
      n.minobsinnode: 5 
             weights: NULL 
[2020-06-23 08:43:56 s.GBM] Training GBM on full training set... 

[[ GBM Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  108   1
                0    2  35

                   Overall  
      Sensitivity  0.9818 
      Specificity  0.9722 
Balanced Accuracy  0.9770 
              PPV  0.9908 
              NPV  0.9459 
               F1  0.9863 
         Accuracy  0.9795 
              AUC  0.9967 

  Positive Class:  1 
[2020-06-23 08:43:56 s.GBM] Calculating relative influence of variables... 

[2020-06-23 08:43:57 s.GBM] Run completed in 2.5e-03 minutes (Real: 0.15; User: 0.10; System: 0.01) 
[2020-06-23 08:43:57 s.RULEFEAT] Collecting Gradient Boosting Rules (Trees)... 
600 rules (length<=5) were extracted from the first 100 trees.
[2020-06-23 08:43:57 s.RULEFEAT] Extracted 600 rules... 
[2020-06-23 08:43:57 s.RULEFEAT] ...and kept 584 unique rules 
[2020-06-23 08:43:57 matchCasesByRules] Matching 584 rules to 146 cases... 
[2020-06-23 08:43:58 s.RULEFEAT] Running LASSO on GBM rules... 
[2020-06-23 08:43:58] Hello, egenn 

[2020-06-23 08:43:58 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 146 x 584 
    Training outcome: 146 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2020-06-23 08:43:58 gridSearchLearn] Running grid search... 
[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2020-06-23 08:43:58 kfold] Using max n bins possible = 2 

[2020-06-23 08:43:58 resample] Created 5 independent folds 
[[ Search parameters ]]
    grid.params:  
                 alpha: 1 
   fixed.params:  
                             .gs: TRUE 
                 which.cv.lambda: lambda.1se 
[2020-06-23 08:43:58 gridSearchLearn] Tuning Elastic Net by exhaustive grid search: 
[2020-06-23 08:43:58 gridSearchLearn] 5 resamples; 5 models total; running on 4 cores (x86_64-apple-darwin17.0)
 
[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
              lambda: 0.100061371489869 
               alpha: 1 

[2020-06-23 08:44:01 gridSearchLearn] Run completed in 0.06 minutes (Real: 3.36; User: 0.07; System: 0.05) 

[[ Parameters ]]
    alpha: 1 
   lambda: 0.100061371489869 

[2020-06-23 08:44:01] Training elastic net model... 

[[ GLMNET Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  108   0
                0    2  36

                   Overall  
      Sensitivity  0.9818 
      Specificity  1.0000 
Balanced Accuracy  0.9909 
              PPV  1.0000 
              NPV  0.9474 
               F1  0.9908 
         Accuracy  0.9863 
              AUC  0.9992 

  Positive Class:  1 

[2020-06-23 08:44:01] Run completed in 0.06 minutes (Real: 3.64; User: 0.25; System: 0.07) 

[[ RULEFEAT Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  108   0
                0    2  36

                   Overall  
      Sensitivity  0.9818 
      Specificity  1.0000 
Balanced Accuracy  0.9909 
              PPV  1.0000 
              NPV  0.9474 
               F1  0.9908 
         Accuracy  0.9863 
              AUC  0.9992 

  Positive Class:  1 
[2020-06-23 08:44:04 predict.ruleFeat] Matching newdata to rules... 
[2020-06-23 08:44:04 matchCasesByRules] Matching 584 rules to 49 cases... 

[[ RULEFEAT Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  34  3
                0   3  9

                   Overall  
      Sensitivity  0.9189 
      Specificity  0.7500 
Balanced Accuracy  0.8345 
              PPV  0.9189 
              NPV  0.7500 
               F1  0.9189 
         Accuracy  0.8776 
              AUC  0.8818 

  Positive Class:  1 

[2020-06-23 08:44:04 s.RULEFEAT] Run completed in 0.20 minutes (Real: 12.07; User: 3.06; System: 0.36) 

16.2.1 RuleFeat Output

Let’s explore the algorithm output:

datatable(park.rf$mod$rules.selected.coef.er) %>%
    formatRound(columns = c("Coefficient", "Empirical_Risk"), digits = 3)

16.2.2 R-readable rules

We can also access the R-readable rules directly:

park.rf$mod$rules.selected
 [1] "MDVP.Fo.Hz.>133.131 & MDVP.Fhi.Hz.<=204.673 & PPE<=0.182331"                                            
 [2] "MDVP.Fhi.Hz.>124.569 & PPE>0.1694975"                                                                   
 [3] "MDVP.Fo.Hz.>133.131 & MDVP.Fhi.Hz.<=205.981 & spread1<=-5.7140545"                                      
 [4] "spread1>-5.921087 & spread2>0.193776"                                                                   
 [5] "MDVP.Fo.Hz.>118.1285 & MDVP.Fhi.Hz.<=205.209 & spread2<=0.2007955"                                      
 [6] "MDVP.Fo.Hz.>129.945 & MDVP.Fhi.Hz.<=204.673 & PPE<=0.185278"                                            
 [7] "MDVP.Fhi.Hz.<=229.1795 & NHR>0.00488 & D2>1.9404505"                                                    
 [8] "MDVP.Fo.Hz.<=139.063 & MDVP.Fhi.Hz.<=225.83 & Shimmer.APQ3>0.00676 & MDVP.APQ<=0.019765 & PPE<=0.185278"
 [9] "MDVP.Fhi.Hz.>125.3035 & PPE>0.1808325"                                                                  
[10] "RPDE>0.4160695 & PPE>0.150139"                                                                          
[11] "Shimmer.APQ3>0.00885 & spread1>-6.4767615"                                                              
[12] "MDVP.Fhi.Hz.>198.55 & MDVP.Shimmer.dB.<=0.2815 & spread2<=0.1998995"                                    
[13] "MDVP.Fhi.Hz.<=204.975 & spread1<=-6.1894575 & spread2<=0.21515"                                         
[14] "spread1>-6.1894575 & D2>2.0459605"                                                                      
[15] "MDVP.Fhi.Hz.>124.159 & MDVP.Fhi.Hz.<=247.1835 & spread1>-6.3355025"                                     
[16] "MDVP.Fhi.Hz.>229.1795 & PPE<=0.1695005"                                                                 

16.2.3 Format rules

We can format the rules to a more human-readable format. Instead of using thresholds, as they are used in a decision tree, we can convert them to show the median (for continuous features) or mode (for categorical features) and range:

rules2medmod(park.rf$mod$rules.selected, park.train)
[2020-06-23 08:44:06 matchCasesByRules] Matching 16 rules to 146 cases... 
[2020-06-23 08:44:06 rules2medmod] Converting rules... 
[2020-06-23 08:44:06 rules2medmod] Done 
 [1] "MDVP.Fo.Hz. = 153.42 (136.93-187.73) & MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & PPE = 0.14 (0.09-0.18)"                                                                
 [2] "MDVP.Fhi.Hz. = 165.74 (125.21-586.57) & PPE = 0.24 (0.17-0.53)"                                                                                                       
 [3] "MDVP.Fo.Hz. = 152.49 (136.93-187.73) & MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & spread1 = -6.29 (-7.11--5.87)"                                                         
 [4] "spread1 = -5.09 (-5.90--2.43) & spread2 = 0.26 (0.20-0.45)"                                                                                                           
 [5] "MDVP.Fo.Hz. = 149.30 (118.75-184.06) & MDVP.Fhi.Hz. = 163.34 (123.72-203.52) & spread2 = 0.16 (0.06-0.20)"                                                            
 [6] "MDVP.Fo.Hz. = 153.42 (136.93-187.73) & MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & PPE = 0.14 (0.09-0.18)"                                                                
 [7] "MDVP.Fhi.Hz. = 159.87 (102.14-227.38) & NHR = 0.02 (5e-03-0.31) & D2 = 2.45 (1.96-3.67)"                                                                              
 [8] "MDVP.Fo.Hz. = 117.00 (107.33-129.34) & MDVP.Fhi.Hz. = 129.04 (113.60-144.47) & Shimmer.APQ3 = 0.01 (0.01-0.01) & MDVP.APQ = 0.01 (0.01-0.02) & PPE = 0.16 (0.10-0.18)"
 [9] "MDVP.Fhi.Hz. = 166.17 (125.39-586.57) & PPE = 0.25 (0.18-0.53)"                                                                                                       
[10] "RPDE = 0.56 (0.42-0.69) & PPE = 0.24 (0.16-0.53)"                                                                                                                     
[11] "Shimmer.APQ3 = 0.02 (0.01-0.06) & spread1 = -5.25 (-6.47--2.43)"                                                                                                      
[12] "MDVP.Fhi.Hz. = 245.14 (206.90-581.29) & MDVP.Shimmer.dB. = 0.15 (0.09-0.26) & spread2 = 0.13 (0.01-0.19)"                                                             
[13] "MDVP.Fhi.Hz. = 163.05 (123.72-198.35) & spread1 = -6.52 (-7.11--6.25) & spread2 = 0.18 (0.06-0.21)"                                                                   
[14] "spread1 = -5.25 (-6.18--2.43) & D2 = 2.55 (2.06-3.67)"                                                                                                                
[15] "MDVP.Fhi.Hz. = 162.52 (124.39-241.35) & spread1 = -5.40 (-6.31--2.43)"                                                                                                
[16] "MDVP.Fhi.Hz. = 248.08 (230.98-581.29) & PPE = 0.10 (0.07-0.17)"