2 rtemis in 60 seconds

2.1 Load rtemis

library(rtemis)
  .:rtemis 0.8.0: Welcome, egenn
  [x86_64-apple-darwin17.0 (64-bit): Defaulting to 4/4 available cores]
  Documentation & vignettes: https://rtemis.netlify.com

2.2 Regression

2.2.1 Check Data

x <- rnormmat(500, 50, seed = 2019)
w <- rnorm(50)
y <- x %*% w + rnorm(500)
dat <- data.frame(x, y)
res <- resample(dat)
[2020-06-23 09:15:43 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 

[2020-06-23 09:15:43 resample] Created 10 stratified subsamples 
dat.train <- dat[res$Subsample_1, ]
dat.test <- dat[-res$Subsample_1, ]
checkData(x)
  Dataset: x 

  [[ Summary ]]
  500 cases with 50 features: 
  * 50 continuous features 
  * 0 integer features 
  * 0 categorical features
  * 0 constant features 
  * 0 duplicated cases 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

2.2.2 Single Model

mod <- s.GLM(dat.train, dat.test)
[2020-06-23 09:15:43 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 374 x 50 
    Training outcome: 374 x 1 
    Testing features: 126 x 50 
     Testing outcome: 126 x 1 

[2020-06-23 09:15:44 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 1.02 (97.81%)
   RMSE = 1.01 (85.18%)
    MAE = 0.81 (84.62%)
      r = 0.99 (p = 1.3e-310)
    rho = 0.99 (p = 0.00)
   R sq = 0.98

[[ GLM Regression Testing Summary ]]
    MSE = 0.98 (97.85%)
   RMSE = 0.99 (85.35%)
    MAE = 0.76 (85.57%)
      r = 0.99 (p = 2.7e-105)
    rho = 0.98 (p = 0.00)
   R sq = 0.98

[2020-06-23 09:15:44 s.GLM] Run completed in 0.03 minutes (Real: 1.72; User: 1.27; System: 0.09) 

2.2.3 Crossvalidated Model

mod <- elevate(dat, mod = "glm")
[2020-06-23 09:15:45 elevate] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 50 
    Training outcome: 500 x 1 

[2020-06-23 09:15:45 resLearn] Training Generalized Linear Model on 10 stratified subsamples... 

[[ elevate GLM ]]
   N repeats = 1 
   N resamples = 10 
   Resampler = strat.sub 
   Mean MSE of 10 resamples in each repeat = 1.22
   Mean MSE reduction in each repeat =  97.50%


[2020-06-23 09:15:45 elevate] Run completed in 0.01 minutes (Real: 0.62; User: 0.52; System: 0.07) 

Use the describe function to get a summary in (plain) English:

mod$describe()
Regression was performed using Generalized Linear Model. Model generalizability was assessed using 10 stratified subsamples. The mean R-squared across all resamples was 0.97.
mod$plot()

2.3 Classification

2.3.1 Check Data

data(Sonar, package = 'mlbench')
checkData(Sonar)
  Dataset: Sonar 

  [[ Summary ]]
  208 cases with 61 features: 
  * 60 continuous features 
  * 0 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 duplicated cases 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

res <- resample(Sonar)
[2020-06-23 09:15:46 resample] Input contains more than one columns; will stratify on last 
[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 
[2020-06-23 09:15:46 strat.sub] Using max n bins possible = 2 

[2020-06-23 09:15:46 resample] Created 10 stratified subsamples 
sonar.train <- Sonar[res$Subsample_1, ]
sonar.test <- Sonar[-res$Subsample_1, ]

2.3.2 Single model

mod <- s.RANGER(sonar.train, sonar.test)
[2020-06-23 09:15:46 s.RANGER] Hello, egenn 

[2020-06-23 09:15:46 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 155 x 60 
    Training outcome: 155 x 1 
    Testing features: 53 x 60 
     Testing outcome: 53 x 1 

[[ Parameters ]]
   n.trees: 1000 
      mtry: NULL 

[2020-06-23 09:15:46 s.RANGER] Training Random Forest (ranger) Classification with 1000 trees... 

[[ RANGER Classification Training Summary ]]
                   Reference 
        Estimated  M   R   
                M  83   0
                R   0  72

                   Overall  
      Sensitivity  1      
      Specificity  1      
Balanced Accuracy  1      
              PPV  1      
              NPV  1      
               F1  1      
         Accuracy  1      
              AUC  1      

  Positive Class:  M 

[[ RANGER Classification Testing Summary ]]
                   Reference 
        Estimated  M   R   
                M  25  12
                R   3  13

                   Overall  
      Sensitivity  0.8929 
      Specificity  0.5200 
Balanced Accuracy  0.7064 
              PPV  0.6757 
              NPV  0.8125 
               F1  0.7692 
         Accuracy  0.7170 
              AUC  0.8479 

  Positive Class:  M 

[2020-06-23 09:15:46 s.RANGER] Run completed in 0.01 minutes (Real: 0.36; User: 0.47; System: 0.04) 

2.3.3 Crossvalidated Model

mod <- elevate(Sonar)
[2020-06-23 09:15:46 elevate] Hello, egenn 

[[ Classification Input Summary ]]
   Training features: 208 x 60 
    Training outcome: 208 x 1 

[2020-06-23 09:15:46 resLearn] Training Random Forest (ranger) on 10 stratified subsamples... 

[[ elevate RANGER ]]
   N repeats = 1 
   N resamples = 10 
   Resampler = strat.sub 
   Mean Balanced Accuracy of 10 test sets in each repeat = 0.83

[2020-06-23 09:15:49 elevate] Run completed in 0.04 minutes (Real: 2.48; User: 4.25; System: 0.17) 
mod$describe()
Classification was performed using Random Forest (ranger). Model generalizability was assessed using 10 stratified subsamples. The mean Balanced Accuracy across all resamples was 0.83.
mod$plot()

mod$plotROC()

mod$plotPR()