## Chapter VI - Linear Model Selection and Regularization

All the questions are as per the ISL seventh printing of the First edition1.

### Common

Instead of using the standard functions, we will leverage the mlr3 package2.

1#install.packages("mlr3","data.table","mlr3viz","mlr3learners")


Actually for R version 3.6.2, the steps to get it working were a bit more involved.

Load ISLR and other libraries.

1libsUsed<-c("dplyr","ggplot2","tidyverse",
2            "ISLR","caret","MASS", "gridExtra",
3            "pls","latex2exp","data.table")
4invisible(lapply(libsUsed, library, character.only = TRUE))


## Question 6.8 - Page 262

In this exercise, we will generate simulated data, and will then use this data to perform best subset selection.

(a) Use the =rnorm()=function to generate a predictor $$X$$ of length $$n = 100$$, as well as a noise vector $$\eta$$ of length $$n = 100$$.

(b) Generate a response vector $$Y$$ of length $$n = 100$$ according to the model $Y = \beta_0 + \beta_1X + \beta2X^2 + \beta_3X^3 + \eta$, where $$\beta_{0}$$ , $$\beta_{1}$$, $$\beta_{2}$$, and $$\beta_{3}$$ are constants of your choice.

(c) Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors $$X$$, $$X^{2}$$, …, $$X^{10}$$. What is the best model obtained according to $$C_p$$ , BIC, and adjusted $$R^2$$ ? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained. Note you will need to use the data.frame() function to create a single data set containing both $$X$$ and $$Y$$.

(d) Repeat (c), using forward stepwise selection and also using backwards stepwise selection. How does your answer compare to the results in (c)?

(e) Now fit a lasso model to the simulated data, again using $$X$$, $$X^{2}$$, …, $$X^{10}$$ as predictors. Use cross-validation to select the optimal value of $$\lambda$$. Create plots of the cross-validation error as a function of $$\lambda$$. Report the resulting coefficient estimates, and discuss the results obtained.

(f) Now generate a response vector Y according to the model $Y = \beta_{0} + \beta_{7}X^{7} + \eta,$ and perform best subset selection and the lasso. Discuss the results obtained.

#### a) Generate model

1set.seed(1984)
2x<-rnorm(100)
3noise<-rnorm(100)


#### b) Response vector

1beta=c(43,5,3,6)
2y<-beta[1] + beta[2]*x + beta[3]*x^2 + beta[4]*x^3 + noise
3qplot(x,y)


#### c) Subset selection

Since the question requires it, we will be using the leaps libraries.

1library(leaps)
2df<-data.frame(y=y,x=x)
3sets=regsubsets(y~poly(x,10,raw=T),data=df,nvmax=10)
4sets %>% summary

 1## Subset selection object
2## Call: regsubsets.formula(y ~ poly(x, 10, raw = T), data = df, nvmax = 10)
3## 10 Variables  (and intercept)
4##                        Forced in Forced out
5## poly(x, 10, raw = T)1      FALSE      FALSE
6## poly(x, 10, raw = T)2      FALSE      FALSE
7## poly(x, 10, raw = T)3      FALSE      FALSE
8## poly(x, 10, raw = T)4      FALSE      FALSE
9## poly(x, 10, raw = T)5      FALSE      FALSE
10## poly(x, 10, raw = T)6      FALSE      FALSE
11## poly(x, 10, raw = T)7      FALSE      FALSE
12## poly(x, 10, raw = T)8      FALSE      FALSE
13## poly(x, 10, raw = T)9      FALSE      FALSE
14## poly(x, 10, raw = T)10     FALSE      FALSE
15## 1 subsets of each size up to 10
16## Selection Algorithm: exhaustive
17##           poly(x, 10, raw = T)1 poly(x, 10, raw = T)2 poly(x, 10, raw = T)3
18## 1  ( 1 )  " "                   " "                   "*"
19## 2  ( 1 )  " "                   "*"                   "*"
20## 3  ( 1 )  "*"                   "*"                   "*"
21## 4  ( 1 )  "*"                   "*"                   "*"
22## 5  ( 1 )  "*"                   "*"                   "*"
23## 6  ( 1 )  "*"                   " "                   "*"
24## 7  ( 1 )  "*"                   "*"                   "*"
25## 8  ( 1 )  "*"                   "*"                   "*"
26## 9  ( 1 )  "*"                   "*"                   "*"
27## 10  ( 1 ) "*"                   "*"                   "*"
28##           poly(x, 10, raw = T)4 poly(x, 10, raw = T)5 poly(x, 10, raw = T)6
29## 1  ( 1 )  " "                   " "                   " "
30## 2  ( 1 )  " "                   " "                   " "
31## 3  ( 1 )  " "                   " "                   " "
32## 4  ( 1 )  "*"                   " "                   " "
33## 5  ( 1 )  " "                   " "                   " "
34## 6  ( 1 )  "*"                   " "                   "*"
35## 7  ( 1 )  "*"                   " "                   "*"
36## 8  ( 1 )  "*"                   "*"                   "*"
37## 9  ( 1 )  "*"                   " "                   "*"
38## 10  ( 1 ) "*"                   "*"                   "*"
39##           poly(x, 10, raw = T)7 poly(x, 10, raw = T)8 poly(x, 10, raw = T)9
40## 1  ( 1 )  " "                   " "                   " "
41## 2  ( 1 )  " "                   " "                   " "
42## 3  ( 1 )  " "                   " "                   " "
43## 4  ( 1 )  " "                   " "                   " "
44## 5  ( 1 )  " "                   " "                   "*"
45## 6  ( 1 )  " "                   "*"                   " "
46## 7  ( 1 )  " "                   "*"                   " "
47## 8  ( 1 )  " "                   "*"                   " "
48## 9  ( 1 )  "*"                   "*"                   "*"
49## 10  ( 1 ) "*"                   "*"                   "*"
50##           poly(x, 10, raw = T)10
51## 1  ( 1 )  " "
52## 2  ( 1 )  " "
53## 3  ( 1 )  " "
54## 4  ( 1 )  " "
55## 5  ( 1 )  "*"
56## 6  ( 1 )  "*"
57## 7  ( 1 )  "*"
58## 8  ( 1 )  "*"
59## 9  ( 1 )  "*"
60## 10  ( 1 ) "*"


We also want the best parameters.

1summarySet<-summary(sets)
2which.min(summarySet$cp) %>% print  1## [1] 3  1which.min(summarySet$bic) %>% print

1## [1] 3

1which.max(summarySet$adjr2) %>% print  1## [1] 7  We might want to see this as a plot. 1plot(summarySet$cp, xlab = "Subset Size", ylab = "Cp", pch = 20, type = "l")
2points(3,summarySet$cp[3],pch=4,col='red',lwd=7)  1plot(summarySet$bic, xlab = "Subset Size", ylab = "BIC", pch = 20, type = "l")
2points(3,summarySet$bic[3],pch=4,col='red',lwd=7)  1plot(summarySet$adjr2, xlab = "Subset Size", ylab = "Adjusted R2", pch = 20, type = "l")
2points(3,summarySet$adjr2[3],pch=4,col='red',lwd=7)  Lets check the coefficients. 1coefficients(sets,id=3) %>% print  1## (Intercept) poly(x, 10, raw = T)1 poly(x, 10, raw = T)2 2## 42.895657 5.108094 3.034408 3## poly(x, 10, raw = T)3 4## 5.989367  1beta %>% print  1## [1] 43 5 3 6  We see that we actually have a pretty good set of coefficients. #### d) Forward and backward stepwise models 1modelX<-poly(x,10,raw=T) 2forwardFit<-regsubsets(y~modelX,data=df,nvmax=10,method="forward") 3forwardFit %>% summary %>% print   1## Subset selection object 2## Call: regsubsets.formula(y ~ modelX, data = df, nvmax = 10, method = "forward") 3## 10 Variables (and intercept) 4## Forced in Forced out 5## modelX1 FALSE FALSE 6## modelX2 FALSE FALSE 7## modelX3 FALSE FALSE 8## modelX4 FALSE FALSE 9## modelX5 FALSE FALSE 10## modelX6 FALSE FALSE 11## modelX7 FALSE FALSE 12## modelX8 FALSE FALSE 13## modelX9 FALSE FALSE 14## modelX10 FALSE FALSE 15## 1 subsets of each size up to 10 16## Selection Algorithm: forward 17## modelX1 modelX2 modelX3 modelX4 modelX5 modelX6 modelX7 modelX8 18## 1 ( 1 ) " " " " "*" " " " " " " " " " " 19## 2 ( 1 ) " " "*" "*" " " " " " " " " " " 20## 3 ( 1 ) "*" "*" "*" " " " " " " " " " " 21## 4 ( 1 ) "*" "*" "*" "*" " " " " " " " " 22## 5 ( 1 ) "*" "*" "*" "*" "*" " " " " " " 23## 6 ( 1 ) "*" "*" "*" "*" "*" " " " " " " 24## 7 ( 1 ) "*" "*" "*" "*" "*" " " "*" " " 25## 8 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*" 26## 9 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" 27## 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" 28## modelX9 modelX10 29## 1 ( 1 ) " " " " 30## 2 ( 1 ) " " " " 31## 3 ( 1 ) " " " " 32## 4 ( 1 ) " " " " 33## 5 ( 1 ) " " " " 34## 6 ( 1 ) " " "*" 35## 7 ( 1 ) " " "*" 36## 8 ( 1 ) " " "*" 37## 9 ( 1 ) " " "*" 38## 10 ( 1 ) "*" "*"  We might want to take a look at these. 1par(mfrow=c(2,2)) 2plot(forwardFit) 3plot(forwardFit,scale='Cp') 4plot(forwardFit,scale='r2') 5plot(forwardFit,scale='adjr2')  I find these not as fun to look at, so we will do better.  1plotLEAP=function(leapObj){ 2 par(mfrow = c(2,2)) 3 bar2=which.max(leapObj$adjr2)
4  bbic=which.min(leapObj$bic) 5 bcp=which.min(leapObj$cp)
6  plot(leapObj$rss,xlab="Number of variables",ylab="RSS",type="b") 7 plot(leapObj$adjr2,xlab="Number of variables",ylab=TeX("Adjusted R^2"),type="b")
8  points(bar2,leapObj$adjr2[bar2],col="green",cex=2,pch=20) 9 plot(leapObj$bic,xlab="Number of variables",ylab=TeX("BIC"),type="b")
10  points(bbic,leapObj$bic[bbic],col="blue",cex=2,pch=20) 11 plot(leapObj$cp,xlab="Number of variables",ylab=TeX("C_p"),type="b")
12  points(bcp,leapObj$cp[bcp],col="red",cex=2,pch=20) 13}  1plotLEAP(forwardFit %>% summary)  Lets check the backward selection as well. 1modelX<-poly(x,10,raw=T) 2backwardFit<-regsubsets(y~modelX,data=df,nvmax=10,method="backward") 3backwardFit %>% summary %>% print   1## Subset selection object 2## Call: regsubsets.formula(y ~ modelX, data = df, nvmax = 10, method = "backward") 3## 10 Variables (and intercept) 4## Forced in Forced out 5## modelX1 FALSE FALSE 6## modelX2 FALSE FALSE 7## modelX3 FALSE FALSE 8## modelX4 FALSE FALSE 9## modelX5 FALSE FALSE 10## modelX6 FALSE FALSE 11## modelX7 FALSE FALSE 12## modelX8 FALSE FALSE 13## modelX9 FALSE FALSE 14## modelX10 FALSE FALSE 15## 1 subsets of each size up to 10 16## Selection Algorithm: backward 17## modelX1 modelX2 modelX3 modelX4 modelX5 modelX6 modelX7 modelX8 18## 1 ( 1 ) " " " " "*" " " " " " " " " " " 19## 2 ( 1 ) "*" " " "*" " " " " " " " " " " 20## 3 ( 1 ) "*" " " "*" "*" " " " " " " " " 21## 4 ( 1 ) "*" " " "*" "*" " " "*" " " " " 22## 5 ( 1 ) "*" " " "*" "*" " " "*" " " "*" 23## 6 ( 1 ) "*" " " "*" "*" " " "*" " " "*" 24## 7 ( 1 ) "*" "*" "*" "*" " " "*" " " "*" 25## 8 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" 26## 9 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" 27## 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" 28## modelX9 modelX10 29## 1 ( 1 ) " " " " 30## 2 ( 1 ) " " " " 31## 3 ( 1 ) " " " " 32## 4 ( 1 ) " " " " 33## 5 ( 1 ) " " " " 34## 6 ( 1 ) " " "*" 35## 7 ( 1 ) " " "*" 36## 8 ( 1 ) " " "*" 37## 9 ( 1 ) "*" "*" 38## 10 ( 1 ) "*" "*"  We might want to take a look at these. 1par(mfrow=c(2,2)) 2plot(backwardFit) 3plot(backwardFit,scale='Cp') 4plot(backwardFit,scale='r2') 5plot(backwardFit,scale='adjr2')  1plotLEAP(backwardFit %>% summary)  In spite of some slight variations, overall all methods converge to the same best set of parameters, that of the third model. #### e) LASSO and Cross Validation For this, instead of using glmnet directly, we will use caret. 1df<-df %>% mutate(x2=x^2,x3=x^3, 2 x4=x^4,x5=x^5, 3 x6=x^6,x7=x^7, 4 x8=x^8,x9=x^9, 5 x10=x^10)  1lambda<-10^seq(-3, 3, length = 100) 2lassoCaret= train(y~.,data=df,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=lambda))  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1lassoCaret %>% print   1## glmnet 2## 3## 100 samples 4## 10 predictor 5## 6## No pre-processing 7## Resampling: Bootstrapped (25 reps) 8## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 9## Resampling results across tuning parameters: 10## 11## lambda RMSE Rsquared MAE 12## 1.000000e-03 1.009696 0.9965632 0.8051425 13## 1.149757e-03 1.009696 0.9965632 0.8051425 14## 1.321941e-03 1.009696 0.9965632 0.8051425 15## 1.519911e-03 1.009696 0.9965632 0.8051425 16## 1.747528e-03 1.009696 0.9965632 0.8051425 17## 2.009233e-03 1.009696 0.9965632 0.8051425 18## 2.310130e-03 1.009696 0.9965632 0.8051425 19## 2.656088e-03 1.009696 0.9965632 0.8051425 20## 3.053856e-03 1.009696 0.9965632 0.8051425 21## 3.511192e-03 1.009696 0.9965632 0.8051425 22## 4.037017e-03 1.009696 0.9965632 0.8051425 23## 4.641589e-03 1.009696 0.9965632 0.8051425 24## 5.336699e-03 1.009696 0.9965632 0.8051425 25## 6.135907e-03 1.009696 0.9965632 0.8051425 26## 7.054802e-03 1.009696 0.9965632 0.8051425 27## 8.111308e-03 1.009696 0.9965632 0.8051425 28## 9.326033e-03 1.009696 0.9965632 0.8051425 29## 1.072267e-02 1.009696 0.9965632 0.8051425 30## 1.232847e-02 1.009696 0.9965632 0.8051425 31## 1.417474e-02 1.009696 0.9965632 0.8051425 32## 1.629751e-02 1.009696 0.9965632 0.8051425 33## 1.873817e-02 1.009696 0.9965632 0.8051425 34## 2.154435e-02 1.009696 0.9965632 0.8051425 35## 2.477076e-02 1.009696 0.9965632 0.8051425 36## 2.848036e-02 1.009696 0.9965632 0.8051425 37## 3.274549e-02 1.009696 0.9965632 0.8051425 38## 3.764936e-02 1.009696 0.9965632 0.8051425 39## 4.328761e-02 1.009696 0.9965632 0.8051425 40## 4.977024e-02 1.009696 0.9965632 0.8051425 41## 5.722368e-02 1.009696 0.9965632 0.8051425 42## 6.579332e-02 1.009696 0.9965632 0.8051425 43## 7.564633e-02 1.009637 0.9965632 0.8050666 44## 8.697490e-02 1.009216 0.9965637 0.8047862 45## 1.000000e-01 1.008901 0.9965636 0.8046468 46## 1.149757e-01 1.009470 0.9965616 0.8054790 47## 1.321941e-01 1.011206 0.9965561 0.8074253 48## 1.519911e-01 1.014475 0.9965476 0.8104930 49## 1.747528e-01 1.019202 0.9965383 0.8147296 50## 2.009233e-01 1.025943 0.9965259 0.8203974 51## 2.310130e-01 1.035374 0.9965094 0.8284187 52## 2.656088e-01 1.048294 0.9964878 0.8393282 53## 3.053856e-01 1.065717 0.9964592 0.8530952 54## 3.511192e-01 1.088903 0.9964215 0.8701072 55## 4.037017e-01 1.119433 0.9963715 0.8918217 56## 4.641589e-01 1.158919 0.9963053 0.9193677 57## 5.336699e-01 1.209841 0.9962136 0.9532842 58## 6.135907e-01 1.275467 0.9960778 0.9957151 59## 7.054802e-01 1.357247 0.9958966 1.0471169 60## 8.111308e-01 1.457886 0.9956561 1.1087362 61## 9.326033e-01 1.580743 0.9953362 1.1818188 62## 1.072267e+00 1.729330 0.9949070 1.2696235 63## 1.232847e+00 1.907599 0.9943306 1.3758463 64## 1.417474e+00 2.120178 0.9935518 1.5059031 65## 1.629751e+00 2.369642 0.9924954 1.6673393 66## 1.873817e+00 2.662906 0.9910539 1.8621728 67## 2.154435e+00 3.007271 0.9890638 2.0978907 68## 2.477076e+00 3.409377 0.9863097 2.3788439 69## 2.848036e+00 3.864727 0.9825900 2.7053428 70## 3.274549e+00 4.350785 0.9778541 3.0659309 71## 3.764936e+00 4.847045 0.9724311 3.4403210 72## 4.328761e+00 5.369017 0.9668240 3.8351441 73## 4.977024e+00 5.919492 0.9626812 4.2512694 74## 5.722368e+00 6.562134 0.9580843 4.7389049 75## 6.579332e+00 7.307112 0.9534537 5.2945905 76## 7.564633e+00 8.132296 0.9500300 5.8774541 77## 8.697490e+00 9.067321 0.9486589 6.4760997 78## 1.000000e+01 10.167822 0.9483195 7.1226569 79## 1.149757e+01 11.473284 0.9482975 7.8556639 80## 1.321941e+01 13.002703 0.9482975 8.6990451 81## 1.519911e+01 14.727852 0.9454119 9.6414650 82## 1.747528e+01 16.325210 0.9426796 10.5303097 83## 2.009233e+01 17.740599 0.9357286 11.3560865 84## 2.310130e+01 18.585795 0.9227167 11.8799668 85## 2.656088e+01 18.939596 0.9080584 12.1336575 86## 3.053856e+01 19.123568 0.9109065 12.2733471 87## 3.511192e+01 19.197966 NaN 12.3308613 88## 4.037017e+01 19.197966 NaN 12.3308613 89## 4.641589e+01 19.197966 NaN 12.3308613 90## 5.336699e+01 19.197966 NaN 12.3308613 91## 6.135907e+01 19.197966 NaN 12.3308613 92## 7.054802e+01 19.197966 NaN 12.3308613 93## 8.111308e+01 19.197966 NaN 12.3308613 94## 9.326033e+01 19.197966 NaN 12.3308613 95## 1.072267e+02 19.197966 NaN 12.3308613 96## 1.232847e+02 19.197966 NaN 12.3308613 97## 1.417474e+02 19.197966 NaN 12.3308613 98## 1.629751e+02 19.197966 NaN 12.3308613 99## 1.873817e+02 19.197966 NaN 12.3308613 100## 2.154435e+02 19.197966 NaN 12.3308613 101## 2.477076e+02 19.197966 NaN 12.3308613 102## 2.848036e+02 19.197966 NaN 12.3308613 103## 3.274549e+02 19.197966 NaN 12.3308613 104## 3.764936e+02 19.197966 NaN 12.3308613 105## 4.328761e+02 19.197966 NaN 12.3308613 106## 4.977024e+02 19.197966 NaN 12.3308613 107## 5.722368e+02 19.197966 NaN 12.3308613 108## 6.579332e+02 19.197966 NaN 12.3308613 109## 7.564633e+02 19.197966 NaN 12.3308613 110## 8.697490e+02 19.197966 NaN 12.3308613 111## 1.000000e+03 19.197966 NaN 12.3308613 112## 113## Tuning parameter 'alpha' was held constant at a value of 1 114## RMSE was used to select the optimal model using the smallest value. 115## The final values used for the model were alpha = 1 and lambda = 0.1.  1lassoCaret %>% ggplot  1lassoCaret %>% varImp %>% ggplot  1library(glmnet)  1## Loading required package: Matrix  1## 2## Attaching package: 'Matrix'  1## The following objects are masked from 'package:tidyr': 2## 3## expand, pack, unpack  1## Loaded glmnet 3.0-2  1library(boot)  1## 2## Attaching package: 'boot'  1## The following object is masked from 'package:lattice': 2## 3## melanoma  1lasso.mod <- cv.glmnet(as.matrix(df[-1]), y, alpha=1) 2lambda <- lasso.mod$lambda.min
3plot(lasso.mod)

1predict(lasso.mod, s=lambda, type="coefficients")

 1## 11 x 1 sparse Matrix of class "dgCMatrix"
2##                     1
3## (Intercept) 42.975240
4## x            5.005023
5## x2           2.947540
6## x3           5.989105
7## x4           .
8## x5           .
9## x6           .
10## x7           .
11## x8           .
12## x9           .
13## x10          .


Clearly, the only important variables are $$x$$, $$x^2$$ and $$x^3$$.

#### f) New model

Our new model requires a newly expanded set of betas as well.

1y2<-beta[1]+23*x^7+noise

1modelX<-poly(x,10,raw=T)
2newDF<-data.frame(x=as.matrix(modelX),y=y2)
3newSub<-regsubsets(y2~.,data=newDF,nvmax=10)
4newSub %>% summary

 1## Subset selection object
2## Call: regsubsets.formula(y2 ~ ., data = newDF, nvmax = 10)
3## 11 Variables  (and intercept)
4##      Forced in Forced out
5## x.1      FALSE      FALSE
6## x.2      FALSE      FALSE
7## x.3      FALSE      FALSE
8## x.4      FALSE      FALSE
9## x.5      FALSE      FALSE
10## x.6      FALSE      FALSE
11## x.7      FALSE      FALSE
12## x.8      FALSE      FALSE
13## x.9      FALSE      FALSE
14## x.10     FALSE      FALSE
15## y        FALSE      FALSE
16## 1 subsets of each size up to 10
17## Selection Algorithm: exhaustive
18##           x.1 x.2 x.3 x.4 x.5 x.6 x.7 x.8 x.9 x.10 y
19## 1  ( 1 )  " " " " " " " " " " " " " " " " " " " "  "*"
20## 2  ( 1 )  " " " " " " " " "*" " " " " " " " " " "  "*"
21## 3  ( 1 )  "*" "*" " " " " " " " " " " " " " " " "  "*"
22## 4  ( 1 )  "*" "*" "*" " " " " " " " " " " " " " "  "*"
23## 5  ( 1 )  "*" "*" "*" " " " " " " " " " " " " "*"  "*"
24## 6  ( 1 )  "*" "*" "*" "*" "*" " " " " " " " " " "  "*"
25## 7  ( 1 )  "*" "*" "*" "*" "*" " " "*" " " " " " "  "*"
26## 8  ( 1 )  "*" "*" "*" "*" "*" "*" "*" " " " " " "  "*"
27## 9  ( 1 )  "*" "*" "*" "*" "*" "*" "*" " " "*" " "  "*"
28## 10  ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" " " "*"  "*"

1plotLEAP(newSub %>% summary)


Or in its more native look,

1par(mfrow=c(2,2))
2plot(newSub)
3plot(newSub,scale='Cp')
4plot(newSub,scale='r2')

1library(glmnet)
2library(boot)
3lasso.mod2 <- cv.glmnet(as.matrix(newDF[-1]), y, alpha=1)
4lambda2 <- lasso.mod2$lambda.min 5plot(lasso.mod2)  1predict(lasso.mod2, s=lambda, type="coefficients")   1## 11 x 1 sparse Matrix of class "dgCMatrix" 2## 1 3## (Intercept) 42.67982691 4## x.2 3.22521396 5## x.3 8.56699146 6## x.4 . 7## x.5 -0.10229572 8## x.6 . 9## x.7 -0.03184905 10## x.8 . 11## x.9 . 12## x.10 . 13## y .  1lambda<-10^seq(-3, 3, length = 100) 2lassocaret2= train(y~.,data=newDF,method="glmnet",tuneGrid=expand.grid(alpha=1,lambda=lambda))  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1lassocaret2 %>% print   1## glmnet 2## 3## 100 samples 4## 10 predictor 5## 6## No pre-processing 7## Resampling: Bootstrapped (25 reps) 8## Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 9## Resampling results across tuning parameters: 10## 11## lambda RMSE Rsquared MAE 12## 1.000000e-03 40.03231 0.9999955 14.48774 13## 1.149757e-03 40.03231 0.9999955 14.48774 14## 1.321941e-03 40.03231 0.9999955 14.48774 15## 1.519911e-03 40.03231 0.9999955 14.48774 16## 1.747528e-03 40.03231 0.9999955 14.48774 17## 2.009233e-03 40.03231 0.9999955 14.48774 18## 2.310130e-03 40.03231 0.9999955 14.48774 19## 2.656088e-03 40.03231 0.9999955 14.48774 20## 3.053856e-03 40.03231 0.9999955 14.48774 21## 3.511192e-03 40.03231 0.9999955 14.48774 22## 4.037017e-03 40.03231 0.9999955 14.48774 23## 4.641589e-03 40.03231 0.9999955 14.48774 24## 5.336699e-03 40.03231 0.9999955 14.48774 25## 6.135907e-03 40.03231 0.9999955 14.48774 26## 7.054802e-03 40.03231 0.9999955 14.48774 27## 8.111308e-03 40.03231 0.9999955 14.48774 28## 9.326033e-03 40.03231 0.9999955 14.48774 29## 1.072267e-02 40.03231 0.9999955 14.48774 30## 1.232847e-02 40.03231 0.9999955 14.48774 31## 1.417474e-02 40.03231 0.9999955 14.48774 32## 1.629751e-02 40.03231 0.9999955 14.48774 33## 1.873817e-02 40.03231 0.9999955 14.48774 34## 2.154435e-02 40.03231 0.9999955 14.48774 35## 2.477076e-02 40.03231 0.9999955 14.48774 36## 2.848036e-02 40.03231 0.9999955 14.48774 37## 3.274549e-02 40.03231 0.9999955 14.48774 38## 3.764936e-02 40.03231 0.9999955 14.48774 39## 4.328761e-02 40.03231 0.9999955 14.48774 40## 4.977024e-02 40.03231 0.9999955 14.48774 41## 5.722368e-02 40.03231 0.9999955 14.48774 42## 6.579332e-02 40.03231 0.9999955 14.48774 43## 7.564633e-02 40.03231 0.9999955 14.48774 44## 8.697490e-02 40.03231 0.9999955 14.48774 45## 1.000000e-01 40.03231 0.9999955 14.48774 46## 1.149757e-01 40.03231 0.9999955 14.48774 47## 1.321941e-01 40.03231 0.9999955 14.48774 48## 1.519911e-01 40.03231 0.9999955 14.48774 49## 1.747528e-01 40.03231 0.9999955 14.48774 50## 2.009233e-01 40.03231 0.9999955 14.48774 51## 2.310130e-01 40.03231 0.9999955 14.48774 52## 2.656088e-01 40.03231 0.9999955 14.48774 53## 3.053856e-01 40.03231 0.9999955 14.48774 54## 3.511192e-01 40.03231 0.9999955 14.48774 55## 4.037017e-01 40.03231 0.9999955 14.48774 56## 4.641589e-01 40.03231 0.9999955 14.48774 57## 5.336699e-01 40.03231 0.9999955 14.48774 58## 6.135907e-01 40.03231 0.9999955 14.48774 59## 7.054802e-01 40.03231 0.9999955 14.48774 60## 8.111308e-01 40.03231 0.9999955 14.48774 61## 9.326033e-01 40.03231 0.9999955 14.48774 62## 1.072267e+00 40.03231 0.9999955 14.48774 63## 1.232847e+00 40.03231 0.9999955 14.48774 64## 1.417474e+00 40.03231 0.9999955 14.48774 65## 1.629751e+00 40.03231 0.9999955 14.48774 66## 1.873817e+00 40.03231 0.9999955 14.48774 67## 2.154435e+00 40.03231 0.9999955 14.48774 68## 2.477076e+00 40.03231 0.9999955 14.48774 69## 2.848036e+00 40.03231 0.9999955 14.48774 70## 3.274549e+00 40.03231 0.9999955 14.48774 71## 3.764936e+00 40.03231 0.9999955 14.48774 72## 4.328761e+00 40.03231 0.9999955 14.48774 73## 4.977024e+00 40.03231 0.9999955 14.48774 74## 5.722368e+00 40.03231 0.9999955 14.48774 75## 6.579332e+00 40.03231 0.9999955 14.48774 76## 7.564633e+00 40.43005 0.9999955 14.59881 77## 8.697490e+00 41.25214 0.9999955 14.81913 78## 1.000000e+01 42.30446 0.9999955 15.09937 79## 1.149757e+01 43.59429 0.9999955 15.44307 80## 1.321941e+01 45.43633 0.9999955 15.93255 81## 1.519911e+01 47.55425 0.9999955 16.49605 82## 1.747528e+01 49.98935 0.9999955 17.14447 83## 2.009233e+01 52.90533 0.9999955 17.91650 84## 2.310130e+01 57.57589 0.9999955 19.10125 85## 2.656088e+01 63.25484 0.9999955 20.53147 86## 3.053856e+01 70.51580 0.9999955 22.36400 87## 3.511192e+01 78.93391 0.9999955 24.49105 88## 4.037017e+01 88.61274 0.9999955 26.93830 89## 4.641589e+01 99.97831 0.9999955 29.83601 90## 5.336699e+01 113.48225 0.9999955 33.39320 91## 6.135907e+01 129.17536 0.9999955 37.58303 92## 7.054802e+01 147.76452 0.9999957 42.74333 93## 8.111308e+01 169.60027 0.9999961 48.98043 94## 9.326033e+01 194.94266 0.9999965 56.29001 95## 1.072267e+02 224.07631 0.9999969 64.70026 96## 1.232847e+02 257.56092 0.9999971 74.36989 97## 1.417474e+02 296.13382 0.9999971 85.51504 98## 1.629751e+02 340.49129 0.9999971 98.33212 99## 1.873817e+02 391.49185 0.9999971 113.06864 100## 2.154435e+02 450.13031 0.9999971 130.01206 101## 2.477076e+02 509.28329 0.9999970 147.15405 102## 2.848036e+02 564.17558 0.9999969 163.34475 103## 3.274549e+02 618.84080 0.9999969 179.85589 104## 3.764936e+02 681.69265 0.9999969 198.83969 105## 4.328761e+02 741.14452 0.9999967 217.28049 106## 4.977024e+02 807.25385 0.9999967 237.88938 107## 5.722368e+02 883.26360 0.9999967 261.58461 108## 6.579332e+02 970.65640 0.9999967 288.82836 109## 7.564633e+02 1037.84801 0.9999960 312.54099 110## 8.697490e+02 1088.92551 0.9999960 334.04769 111## 1.000000e+03 1131.46176 0.9999955 354.62317 112## 113## Tuning parameter 'alpha' was held constant at a value of 1 114## RMSE was used to select the optimal model using the smallest value. 115## The final values used for the model were alpha = 1 and lambda = 6.579332.  1lassocaret2 %>% ggplot  1lassocaret2 %>% varImp %>% ggplot  Clearly, the LASSO model has correctly reduced the model down to the correct single variable form, though best subset seems to suggest using more predictors, their coefficients are low enough to recognize that they are noise. ## Question 6.9 - Page 263 In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set. (b) Fit a linear model using least squares on the training set, and report the test error obtained. (c) Fit a ridge regression model on the training set, with $$\lambda$$ chosen by cross-validation. Report the test error obtained. (d) Fit a lasso model on the training set, with $$\lambda$$ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates. (e) Fit a PCR model on the training set, with $$M$$ chosen by crossvalidation. Report the test error obtained, along with the value of $$M$$ selected by cross-validation. (f) Fit a PLS model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation. (g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches? ### Answer We will use the caret package, since at the moment, mlr3 does not have learners for PCR and PLS. 1colDat<-ISLR::College 2colDat %>% summary %>% print   1## Private Apps Accept Enroll Top10perc 2## No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00 3## Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00 4## Median : 1558 Median : 1110 Median : 434 Median :23.00 5## Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56 6## 3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00 7## Max. :48094 Max. :26330 Max. :6392 Max. :96.00 8## Top25perc F.Undergrad P.Undergrad Outstate 9## Min. : 9.0 Min. : 139 Min. : 1.0 Min. : 2340 10## 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320 11## Median : 54.0 Median : 1707 Median : 353.0 Median : 9990 12## Mean : 55.8 Mean : 3700 Mean : 855.3 Mean :10441 13## 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925 14## Max. :100.0 Max. :31643 Max. :21836.0 Max. :21700 15## Room.Board Books Personal PhD 16## Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00 17## 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00 18## Median :4200 Median : 500.0 Median :1200 Median : 75.00 19## Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66 20## 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00 21## Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00 22## Terminal S.F.Ratio perc.alumni Expend 23## Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186 24## 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751 25## Median : 82.0 Median :13.60 Median :21.00 Median : 8377 26## Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660 27## 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830 28## Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233 29## Grad.Rate 30## Min. : 10.00 31## 1st Qu.: 53.00 32## Median : 65.00 33## Mean : 65.46 34## 3rd Qu.: 78.00 35## Max. :118.00  1colDat %>% str %>% print   1## 'data.frame': 777 obs. of 18 variables: 2##$ Private    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
3##  $Apps : num 1660 2186 1428 417 193 ... 4##$ Accept     : num  1232 1924 1097 349 146 ...
5##  $Enroll : num 721 512 336 137 55 158 103 489 227 172 ... 6##$ Top10perc  : num  23 16 22 60 16 38 17 37 30 21 ...
7##  $Top25perc : num 52 29 50 89 44 62 45 68 63 44 ... 8##$ F.Undergrad: num  2885 2683 1036 510 249 ...
9##  $P.Undergrad: num 537 1227 99 63 869 ... 10##$ Outstate   : num  7440 12280 11250 12960 7560 ...
11##  $Room.Board : num 3300 6450 3750 5450 4120 ... 12##$ Books      : num  450 750 400 450 800 500 500 450 300 660 ...
13##  $Personal : num 2200 1500 1165 875 1500 ... 14##$ PhD        : num  70 29 53 92 76 67 90 89 79 40 ...
15##  $Terminal : num 78 30 66 97 72 73 93 100 84 41 ... 16##$ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
17##  $perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ... 18##$ Expend     : num  7041 10527 8735 19016 10922 ...
19##  $Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ... 20## NULL  1colDat %>% sapply(unique) %>% sapply(length) %>% print  1## Private Apps Accept Enroll Top10perc Top25perc 2## 2 711 693 581 82 89 3## F.Undergrad P.Undergrad Outstate Room.Board Books Personal 4## 714 566 640 553 122 294 5## PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate 6## 78 65 173 61 744 81  Clearly, there are no psuedo-factors which might have been converted at this stage. #### a) Train-Test split 1train_ind<-createDataPartition(colDat$Apps,p=0.8,times=1,list=FALSE)
2train_set<-colDat[train_ind,]
3test_set<-colDat[-train_ind,]


#### b) Linear least squares

1linCol<-train(Apps~.,data=train_set,method="lm")
2linCol %>% summary

 1##
2## Call:
3## lm(formula = .outcome ~ ., data = dat)
4##
5## Residuals:
6##     Min      1Q  Median      3Q     Max
7## -5145.6  -414.8   -20.3   340.5  7526.8
8##
9## Coefficients:
10##               Estimate Std. Error t value Pr(>|t|)
11## (Intercept) -2.918e+02  4.506e+02  -0.648 0.517486
12## PrivateYes  -5.351e+02  1.532e+02  -3.494 0.000511 ***
13## Accept       1.617e+00  4.258e-02  37.983  < 2e-16 ***
14## Enroll      -1.012e+00  1.959e-01  -5.165 3.26e-07 ***
15## Top10perc    5.379e+01  6.221e+00   8.647  < 2e-16 ***
16## Top25perc   -1.632e+01  5.046e+00  -3.235 0.001282 **
17## F.Undergrad  6.836e-02  3.457e-02   1.978 0.048410 *
18## P.Undergrad  7.929e-02  3.367e-02   2.355 0.018854 *
19## Outstate    -7.303e-02  2.098e-02  -3.481 0.000536 ***
20## Room.Board   1.695e-01  5.367e-02   3.159 0.001663 **
21## Books        9.998e-02  2.578e-01   0.388 0.698328
22## Personal    -3.145e-03  6.880e-02  -0.046 0.963553
23## PhD         -8.926e+00  5.041e+00  -1.771 0.077112 .
24## Terminal    -2.298e+00  5.608e+00  -0.410 0.682152
25## S.F.Ratio    6.038e+00  1.420e+01   0.425 0.670757
26## perc.alumni -5.085e-01  4.560e+00  -0.112 0.911249
27## Expend       4.668e-02  1.332e-02   3.505 0.000490 ***
28## Grad.Rate    9.042e+00  3.379e+00   2.676 0.007653 **
29## ---
30## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
31##
32## Residual standard error: 1042 on 606 degrees of freedom
33## Multiple R-squared:  0.9332, Adjusted R-squared:  0.9313
34## F-statistic: 497.7 on 17 and 606 DF,  p-value: < 2.2e-16

1linPred<-predict(linCol,test_set)
2linPred %>% postResample(obs = test_set$Apps)  1## RMSE Rsquared MAE 2## 1071.6360025 0.9017032 625.7827996  Do note that the metrics are calculated in a manner to ensure no negative values are obtained. #### c) Ridge regression with CV for λ 1L2Grid <- expand.grid(alpha=0, 2 lambda=10^seq(from=-3,to=30,length=100))  1ridgCol<-train(Apps~.,data=train_set,method="glmnet",tuneGrid = L2Grid)  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1ridgCol %>% summary %>% print   1## Length Class Mode 2## a0 100 -none- numeric 3## beta 1700 dgCMatrix S4 4## df 100 -none- numeric 5## dim 2 -none- numeric 6## lambda 100 -none- numeric 7## dev.ratio 100 -none- numeric 8## nulldev 1 -none- numeric 9## npasses 1 -none- numeric 10## jerr 1 -none- numeric 11## offset 1 -none- logical 12## call 5 -none- call 13## nobs 1 -none- numeric 14## lambdaOpt 1 -none- numeric 15## xNames 17 -none- character 16## problemType 1 -none- character 17## tuneValue 2 data.frame list 18## obsLevels 1 -none- logical 19## param 0 -none- list  1coef(ridgCol$finalModel, ridgCol$bestTune$lambda) %>% print

 1## 18 x 1 sparse Matrix of class "dgCMatrix"
2##                         1
3## (Intercept) -1.407775e+03
4## PrivateYes  -5.854245e+02
5## Accept       1.042778e+00
6## Enroll       3.511219e-01
7## Top10perc    2.780211e+01
8## Top25perc    2.883536e-02
11## Outstate    -2.011504e-02
12## Room.Board   2.155224e-01
13## Books        1.517585e-01
14## Personal    -3.711406e-02
15## PhD         -4.453155e+00
16## Terminal    -3.783231e+00
17## S.F.Ratio    6.897360e+00
18## perc.alumni -9.301831e+00
19## Expend       5.601144e-02

1ggplot(ridgCol)

1ridgPred<-predict(ridgCol,test_set)
2ridgPred %>% postResample(obs = test_set$Apps)  1## RMSE Rsquared MAE 2## 1047.7545250 0.9051726 644.4535063  #### d) LASSO with CV for λ 1L1Grid <- expand.grid(alpha=1, # for lasso 2 lambda=10^seq(from=-3,to=30,length=100))  1lassoCol<-train(Apps~.,data=train_set,method="glmnet",tuneGrid = L1Grid)  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1lassoCol %>% summary %>% print   1## Length Class Mode 2## a0 81 -none- numeric 3## beta 1377 dgCMatrix S4 4## df 81 -none- numeric 5## dim 2 -none- numeric 6## lambda 81 -none- numeric 7## dev.ratio 81 -none- numeric 8## nulldev 1 -none- numeric 9## npasses 1 -none- numeric 10## jerr 1 -none- numeric 11## offset 1 -none- logical 12## call 5 -none- call 13## nobs 1 -none- numeric 14## lambdaOpt 1 -none- numeric 15## xNames 17 -none- character 16## problemType 1 -none- character 17## tuneValue 2 data.frame list 18## obsLevels 1 -none- logical 19## param 0 -none- list  1coef(lassoCol$finalModel, lassoCol$bestTune$lambda) %>% print

 1## 18 x 1 sparse Matrix of class "dgCMatrix"
2##                         1
3## (Intercept) -325.51554340
4## PrivateYes  -532.28956305
5## Accept         1.60370798
6## Enroll        -0.90158328
7## Top10perc     51.96610325
8## Top25perc    -14.87886847
11## Outstate      -0.07047302
12## Room.Board     0.16783269
13## Books          0.08836704
14## Personal       .
15## PhD           -8.67634519
16## Terminal      -2.18494018
17## S.F.Ratio      5.25050018
18## perc.alumni   -0.67848535
19## Expend         0.04597728

1ggplot(lassoCol)

1lassoPred<-predict(lassoCol,test_set)
2lassoPred %>% postResample(obs = test_set$Apps)  1## RMSE Rsquared MAE 2## 1068.9834769 0.9021268 622.7029418  #### e) PCR with CV for M 1mGrid <- expand.grid(ncomp=seq(from=1,to=20,length=10))  1pcrCol<-train(Apps~.,data=train_set,method="pcr",tuneGrid = mGrid) 2pcrCol %>% summary %>% print   1## Data: X dimension: 624 17 2## Y dimension: 624 1 3## Fit method: svdpc 4## Number of components considered: 17 5## TRAINING: % variance explained 6## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 7## X 48.2314 87.24 95.02 97.26 98.63 99.43 99.91 8## .outcome 0.2419 76.54 77.88 80.19 91.27 91.34 91.34 9## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 10## X 99.96 100.00 100.00 100.00 100.00 100.00 100.00 11## .outcome 91.65 91.66 92.26 92.65 92.66 92.67 92.76 12## 15 comps 16 comps 17 comps 13## X 100.00 100.00 100.00 14## .outcome 93.17 93.18 93.32 15## NULL  1ggplot(pcrCol)  1pcrPred<-predict(pcrCol,test_set) 2pcrPred %>% postResample(obs = test_set$Apps)

1##         RMSE     Rsquared          MAE
2## 1071.6360025    0.9017032  625.7827996


#### f) PLS with CV for M

1plsCol<-train(Apps~.,data=train_set,method="pls",tuneGrid = mGrid)
2plsCol %>% summary %>% print

 1## Data:    X dimension: 624 17
2##  Y dimension: 624 1
3## Fit method: oscorespls
4## Number of components considered: 17
5## TRAINING: % variance explained
6##           1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
7## X           39.02     56.4    91.83    96.61    98.62    99.22    99.49
8## .outcome    78.04     84.1    86.88    91.09    91.38    91.49    91.66
9##           8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
10## X           99.96    99.99    100.00    100.00    100.00    100.00    100.00
11## .outcome    91.68    91.85     92.64     92.87     93.16     93.18     93.18
12##           15 comps  16 comps  17 comps
13## X           100.00    100.00    100.00
14## .outcome     93.18     93.19     93.32
15## NULL

1ggplot(plsCol)

1plsPred<-predict(plsCol,test_set)
2plsPred %>% postResample(obs = test_set$Apps)  1## RMSE Rsquared MAE 2## 1071.6360039 0.9017032 625.7827987  #### g) Comments and Comparison 1models <- list(ridge = ridgCol, lasso = lassoCol, pcr = pcrCol, pls=plsCol,linear=linCol) 2resamples(models) %>% summary   1## 2## Call: 3## summary.resamples(object = .) 4## 5## Models: ridge, lasso, pcr, pls, linear 6## Number of resamples: 25 7## 8## MAE 9## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 10## ridge 536.9612 600.5398 623.2005 649.6713 707.4014 793.4972 0 11## lasso 573.8563 616.3883 671.9453 655.8858 691.7620 732.2155 0 12## pcr 576.1427 618.8694 650.0360 662.9040 714.8491 767.5535 0 13## pls 553.3999 607.9757 637.1985 638.6619 668.5120 735.4479 0 14## linear 556.5553 619.2395 654.1478 659.4635 686.7747 792.4912 0 15## 16## RMSE 17## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 18## ridge 882.2646 920.5934 1000.519 1168.603 1163.377 1939.541 0 19## lasso 801.9415 990.0724 1168.234 1184.329 1302.221 1584.712 0 20## pcr 828.1370 942.2678 1131.207 1144.071 1284.178 1544.078 0 21## pls 786.7989 1038.3265 1167.764 1157.026 1274.041 1461.434 0 22## linear 798.3771 1063.3690 1134.291 1135.977 1215.115 1403.576 0 23## 24## Rsquared 25## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 26## ridge 0.8735756 0.8962010 0.9185736 0.9136429 0.9306819 0.9474913 0 27## lasso 0.8851991 0.9132766 0.9217660 0.9191638 0.9284838 0.9398772 0 28## pcr 0.8658504 0.9080179 0.9235117 0.9146884 0.9281892 0.9471991 0 29## pls 0.8881249 0.9080786 0.9183968 0.9173632 0.9258994 0.9420894 0 30## linear 0.8840049 0.8986452 0.9222319 0.9160913 0.9296275 0.9492198 0  1resamples(models) %>% bwplot(scales="free")  • Given the tighter spread of PLS, it seems more reliable than PCR • Ridge is just poor in every way • OLS does well, but it also has a worrying outlier • LASSO appears to be doing alright as well We also have kept track of the performance on the test_set We might want to see the variable significance values as well. 1lgp<-linCol %>% varImp %>% ggplot + ggtitle("OLS Variable Importance") 2rgp<-ridgCol %>% varImp %>% ggplot + ggtitle("Ridge Variable Importance") 3lsgp<-lassoCol %>% varImp %>% ggplot + ggtitle("Lasso Variable Importance") 4pcgp<-pcrCol %>% varImp %>% ggplot + ggtitle("PCR Variable Importance") 5plgp<-plsCol %>% varImp %>% ggplot + ggtitle("PLS Variable Importance") 6grid.arrange(lgp,rgp,lsgp,pcgp,plgp,ncol=3)  ## Question 6.10 - Pages 263-264 We have seen that as the number of features used in a model increases, the training error will necessarily decrease, but the test error may not. We will now explore this in a simulated data set. (a) Generate a data set with $$p = 20$$ features, $$n = 1,000$$ observations, and an associated quantitative response vector generated according to the model $Y = X\beta + \eta,$ where $$\beta$$ has some elements that are exactly equal to zero. (b) Split your data set into a training set containing $$100$$ observations and a test set containing $$900$$ observations. (c) Perform best subset selection on the training set, and plot the training set MSE associated with the best model of each size. (d) Plot the test set MSE associated with the best model of each size. (e) For which model size does the test set MSE take on its minimum value? Comment on your results. If it takes on its minimum value for a model containing only an intercept or a model containing all of the features, then play around with the way that you are generating the data in (a) until you come up with a scenario in which the test set MSE is minimized for an intermediate model size. (f) How does the model at which the test set MSE is minimized compare to the true model used to generate the data? Comment on the coefficient values. (g) Create a plot displaying $$\sqrt{\Sum_{j=1}^{p}(\beta_{j}-\hat{\beta}_{j}^{r})^{2}}$$ for a range of values of $$r$$, where $$\hat{\beta}_{j}^{r}$$ is the$j$th coefficient estimate for the best model containing $$r$$ coefficients. Comment on what you observe. How does this compare to the test MSE plot from (d)? ### Answer #### Model creation 1p=20 2n=1000 3noise<-rnorm(n) 4xmat<-matrix(rnorm(n*p),nrow=n,ncol=p) 5beta<-sample(-10:34,20) 6beta[sample(1:20,4)]=0 7myY<-xmat %*% beta + noise 8modelDat<-data.frame(x=as.matrix(xmat),y=myY)  • As always we will want to take a peak 1modelDat %>% str %>% print   1## 'data.frame': 1000 obs. of 21 variables: 2##$ x.1 : num  -0.406 -1.375 0.858 -0.231 -0.601 ...
3##  $x.2 : num -0.129 -0.218 -0.17 0.573 -0.513 ... 4##$ x.3 : num  0.127 -0.224 1.014 0.896 0.159 ...
5##  $x.4 : num 0.499 -0.151 -0.488 -0.959 2.187 ... 6##$ x.5 : num  -0.235 -0.345 -0.773 -0.346 0.773 ...
7##  $x.6 : num 0.26 -0.429 -1.183 -1.159 0.959 ... 8##$ x.7 : num  0.567 1.647 0.149 -0.593 -0.902 ...
9##  $x.8 : num -0.092 0.8391 -1.4835 0.0229 -0.1353 ... 10##$ x.9 : num  -0.998 -1.043 -0.563 -0.377 0.324 ...
11##  $x.10: num -0.4401 -0.195 -0.5139 -0.0156 -0.9543 ... 12##$ x.11: num  -0.147 0.829 0.165 0.101 -0.105 ...
13##  $x.12: num -0.0118 1.02 1.0794 1.3184 -2.2844 ... 14##$ x.13: num  -1.683 0.487 -1.142 -0.744 -0.175 ...
15##  $x.14: num 0.228 -1.031 -2.798 -0.646 0.56 ... 16##$ x.15: num  -0.718 0.508 0.637 -0.556 0.585 ...
17##  $x.16: num -1.6378 0.581 -0.9939 0.0537 -0.5854 ... 18##$ x.17: num  1.758 -0.616 1.377 -0.876 -1.174 ...
19##  $x.18: num -1.438 0.373 1.364 0.399 0.949 ... 20##$ x.19: num  -0.715 -0.731 1.142 0.149 0.916 ...
21##  $x.20: num 2.774 -2.024 1.316 0.138 0.187 ... 22##$ y   : num  77.5 -82.8 -38.9 -79.7 64.9 ...
23## NULL

1modelDat %>% summary %>% print

 1##       x.1                x.2                x.3                x.4
2##  Min.   :-2.79766   Min.   :-3.13281   Min.   :-2.71232   Min.   :-4.29604
3##  1st Qu.:-0.60516   1st Qu.:-0.66759   1st Qu.:-0.60561   1st Qu.:-0.66598
4##  Median : 0.04323   Median : 0.03681   Median : 0.06556   Median : 0.06589
5##  Mean   : 0.06879   Mean   : 0.01004   Mean   : 0.06443   Mean   : 0.02244
6##  3rd Qu.: 0.74049   3rd Qu.: 0.68234   3rd Qu.: 0.70521   3rd Qu.: 0.71174
7##  Max.   : 3.50354   Max.   : 3.47268   Max.   : 3.02817   Max.   : 3.27326
8##       x.5                 x.6                x.7                x.8
9##  Min.   :-3.228376   Min.   :-4.24014   Min.   :-2.98577   Min.   :-3.27770
10##  1st Qu.:-0.698220   1st Qu.:-0.69448   1st Qu.:-0.59092   1st Qu.:-0.52939
11##  Median :-0.058778   Median :-0.01141   Median : 0.01732   Median : 0.05703
12##  Mean   : 0.000126   Mean   :-0.05158   Mean   : 0.04767   Mean   : 0.08231
13##  3rd Qu.: 0.663570   3rd Qu.: 0.64217   3rd Qu.: 0.67438   3rd Qu.: 0.72849
14##  Max.   : 3.036307   Max.   : 3.27572   Max.   : 2.72163   Max.   : 3.33409
15##       x.9                x.10               x.11               x.12
16##  Min.   :-3.08957   Min.   :-3.21268   Min.   :-3.00572   Min.   :-3.72016
17##  1st Qu.:-0.65456   1st Qu.:-0.69401   1st Qu.:-0.68226   1st Qu.:-0.63043
18##  Median :-0.04242   Median :-0.03069   Median :-0.04777   Median : 0.07079
19##  Mean   : 0.02049   Mean   :-0.02400   Mean   :-0.03729   Mean   : 0.03769
20##  3rd Qu.: 0.71209   3rd Qu.: 0.61540   3rd Qu.: 0.64873   3rd Qu.: 0.67155
21##  Max.   : 3.23110   Max.   : 2.76059   Max.   : 2.87306   Max.   : 3.48569
22##       x.13               x.14               x.15                x.16
23##  Min.   :-3.20126   Min.   :-3.55432   Min.   :-2.857575   Min.   :-3.5383
24##  1st Qu.:-0.68535   1st Qu.:-0.66752   1st Qu.:-0.658708   1st Qu.:-0.7813
25##  Median :-0.01329   Median :-0.03302   Median : 0.020581   Median :-0.0740
26##  Mean   : 0.01094   Mean   : 0.02113   Mean   : 0.007976   Mean   :-0.0883
27##  3rd Qu.: 0.64877   3rd Qu.: 0.74919   3rd Qu.: 0.670464   3rd Qu.: 0.5568
28##  Max.   : 2.78973   Max.   : 3.47923   Max.   : 2.891527   Max.   : 3.0938
29##       x.17               x.18               x.19              x.20
30##  Min.   :-3.28570   Min.   :-4.06416   Min.   :-3.0443   Min.   :-4.06307
31##  1st Qu.:-0.72302   1st Qu.:-0.72507   1st Qu.:-0.6684   1st Qu.:-0.70518
32##  Median :-0.02439   Median :-0.04941   Median :-0.0610   Median :-0.07697
33##  Mean   :-0.01459   Mean   :-0.03164   Mean   :-0.0414   Mean   :-0.05302
34##  3rd Qu.: 0.62692   3rd Qu.: 0.68115   3rd Qu.: 0.6381   3rd Qu.: 0.58597
35##  Max.   : 2.86446   Max.   : 3.32958   Max.   : 3.1722   Max.   : 3.01358
36##        y
37##  Min.   :-199.268
38##  1st Qu.: -54.758
39##  Median :  -1.607
40##  Mean   :  -1.710
41##  3rd Qu.:  49.367
42##  Max.   : 278.244


#### b) Train Test Split

1train_ind = sample(modelDat %>% nrow,100)
2test_ind = setdiff(seq_len(modelDat %>% nrow), train_set)


#### Best subset selection

1train_set<-modelDat[train_ind,]
2test_set<-modelDat[-train_ind,]

1linCol<-train(y~.,data=train_set,method="lm")
2linCol %>% summary

 1##
2## Call:
3## lm(formula = .outcome ~ ., data = dat)
4##
5## Residuals:
6##      Min       1Q   Median       3Q      Max
7## -2.12474 -0.53970 -0.00944  0.42398  2.21086
8##
9## Coefficients:
10##             Estimate Std. Error  t value Pr(>|t|)
11## (Intercept) -0.06052    0.09604   -0.630    0.530
12## x.1         -0.02265    0.09198   -0.246    0.806
13## x.2         28.91650    0.09879  292.719   <2e-16 ***
14## x.3         14.16532    0.09343  151.610   <2e-16 ***
15## x.4         28.16256    0.09828  286.564   <2e-16 ***
16## x.5          0.13742    0.09658    1.423    0.159
17## x.6         27.01497    0.08540  316.341   <2e-16 ***
18## x.7         31.15917    0.09003  346.092   <2e-16 ***
19## x.8         -9.66308    0.11095  -87.094   <2e-16 ***
20## x.9          0.11641    0.10768    1.081    0.283
21## x.10        19.06687    0.09662  197.344   <2e-16 ***
22## x.11        -9.09956    0.08627 -105.472   <2e-16 ***
23## x.12        -8.01933    0.10198  -78.633   <2e-16 ***
24## x.13         4.26852    0.09888   43.170   <2e-16 ***
25## x.14        20.22366    0.09853  205.247   <2e-16 ***
26## x.15        -0.16607    0.10466   -1.587    0.117
27## x.16         7.95594    0.11250   70.721   <2e-16 ***
28## x.17        10.89851    0.11157   97.684   <2e-16 ***
29## x.18        -1.09760    0.09391  -11.688   <2e-16 ***
30## x.19        22.05197    0.08697  253.553   <2e-16 ***
31## x.20        20.88796    0.09274  225.221   <2e-16 ***
32## ---
33## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
34##
35## Residual standard error: 0.8583 on 79 degrees of freedom
36## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999
37## F-statistic: 4.71e+04 on 20 and 79 DF,  p-value: < 2.2e-16

1linPred<-predict(linCol,test_set)
2linPred %>% postResample(obs = test_set$y)  1## RMSE Rsquared MAE 2## 1.2151815 0.9997265 0.9638378  1L2Grid <- expand.grid(alpha=0, 2 lambda=10^seq(from=-3,to=30,length=100))  1ridgCol<-train(y~.,data=train_set,method="glmnet",tuneGrid = L2Grid)  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1ridgCol %>% summary %>% print   1## Length Class Mode 2## a0 100 -none- numeric 3## beta 2000 dgCMatrix S4 4## df 100 -none- numeric 5## dim 2 -none- numeric 6## lambda 100 -none- numeric 7## dev.ratio 100 -none- numeric 8## nulldev 1 -none- numeric 9## npasses 1 -none- numeric 10## jerr 1 -none- numeric 11## offset 1 -none- logical 12## call 5 -none- call 13## nobs 1 -none- numeric 14## lambdaOpt 1 -none- numeric 15## xNames 20 -none- character 16## problemType 1 -none- character 17## tuneValue 2 data.frame list 18## obsLevels 1 -none- logical 19## param 0 -none- list  1coef(ridgCol$finalModel, ridgCol$bestTune$lambda) %>% print

 1## 21 x 1 sparse Matrix of class "dgCMatrix"
2##                       1
3## (Intercept)  0.03898376
4## x.1         -0.12140945
5## x.2         27.63771674
6## x.3         13.46853844
7## x.4         26.54402352
8## x.5         -0.13838118
9## x.6         25.87706885
10## x.7         29.90687677
11## x.8         -9.42088971
12## x.9         -0.08983349
13## x.10        17.45444598
14## x.11        -8.33991071
15## x.12        -7.23653865
16## x.13         3.35145521
17## x.14        19.42178898
18## x.15        -0.02794731
19## x.16         7.63951382
20## x.17        11.08083907
21## x.18        -1.36872894
22## x.19        20.90257005
23## x.20        20.07494414

1ggplot(ridgCol)

1ridgPred<-predict(ridgCol,test_set)
2ridgPred %>% postResample(obs = test_set$y)  1## RMSE Rsquared MAE 2## 3.7554417 0.9994231 3.0184859  1L1Grid <- expand.grid(alpha=1, # for lasso 2 lambda=10^seq(from=-3,to=30,length=100))  1lassoCol<-train(y~.,data=train_set,method="glmnet",tuneGrid = L1Grid)  1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : 2## There were missing values in resampled performance measures.  1lassoCol %>% summary %>% print   1## Length Class Mode 2## a0 47 -none- numeric 3## beta 940 dgCMatrix S4 4## df 47 -none- numeric 5## dim 2 -none- numeric 6## lambda 47 -none- numeric 7## dev.ratio 47 -none- numeric 8## nulldev 1 -none- numeric 9## npasses 1 -none- numeric 10## jerr 1 -none- numeric 11## offset 1 -none- logical 12## call 5 -none- call 13## nobs 1 -none- numeric 14## lambdaOpt 1 -none- numeric 15## xNames 20 -none- character 16## problemType 1 -none- character 17## tuneValue 2 data.frame list 18## obsLevels 1 -none- logical 19## param 0 -none- list  1coef(lassoCol$finalModel, lassoCol$bestTune$lambda) %>% print

 1## 21 x 1 sparse Matrix of class "dgCMatrix"
2##                      1
3## (Intercept)  0.1158884
4## x.1          .
5## x.2         28.5897869
6## x.3         13.3637110
7## x.4         27.2558797
8## x.5          .
9## x.6         26.6625588
10## x.7         30.6841774
11## x.8         -9.1388677
12## x.9          .
13## x.10        17.9220939
14## x.11        -8.2461257
15## x.12        -7.0603651
16## x.13         3.2052101
17## x.14        19.7219890
18## x.15         .
19## x.16         7.2082509
20## x.17        10.4137411
21## x.18        -0.6693664
22## x.19        21.5357460
23## x.20        20.5226071

1ggplot(lassoCol)

1lassoPred<-predict(lassoCol,test_set)
2lassoPred %>% postResample(obs = test_set$y)  1## RMSE Rsquared MAE 2## 2.7289452 0.9992454 2.2029482  1mGrid <- expand.grid(ncomp=seq(from=1,to=20,length=10))  1pcrCol<-train(y~.,data=train_set,method="pcr",tuneGrid = mGrid) 2pcrCol %>% summary %>% print   1## Data: X dimension: 100 20 2## Y dimension: 100 1 3## Fit method: svdpc 4## Number of components considered: 20 5## TRAINING: % variance explained 6## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 7## X 10.040 18.46 26.62 34.56 41.87 48.54 54.56 8## .outcome 8.425 34.90 41.09 43.12 45.06 48.09 66.44 9## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 10## X 60.33 65.37 70.01 74.46 78.72 82.32 85.67 11## .outcome 85.76 88.81 89.93 91.66 91.91 92.04 92.08 12## 15 comps 16 comps 17 comps 18 comps 19 comps 20 comps 13## X 88.85 91.94 94.59 96.73 98.51 100.00 14## .outcome 92.15 94.96 99.51 99.57 99.76 99.99 15## NULL  1ggplot(pcrCol)  1pcrPred<-predict(pcrCol,test_set) 2pcrPred %>% postResample(obs = test_set$y)

1##      RMSE  Rsquared       MAE
2## 1.2151815 0.9997265 0.9638378

1plsCol<-train(y~.,data=train_set,method="pls",tuneGrid = mGrid)
2plsCol %>% summary %>% print

 1## Data:    X dimension: 100 20
2##  Y dimension: 100 1
3## Fit method: oscorespls
4## Number of components considered: 20
5## TRAINING: % variance explained
6##           1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
7## X           7.762    14.79    21.01    26.89    31.55    36.13    41.12
8## .outcome   92.765    98.81    99.75    99.96    99.98    99.99    99.99
9##           8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
10## X           46.35    51.21     56.12     60.34     65.63     71.57     76.16
11## .outcome    99.99    99.99     99.99     99.99     99.99     99.99     99.99
12##           15 comps  16 comps  17 comps  18 comps  19 comps  20 comps
13## X            80.72     84.69     88.98     92.69     96.71    100.00
14## .outcome     99.99     99.99     99.99     99.99     99.99     99.99
15## NULL

1ggplot(plsCol)

1plsPred<-predict(plsCol,test_set)
2plsPred %>% postResample(obs = test_set$y)  1## RMSE Rsquared MAE 2## 1.2151815 0.9997265 0.9638378  #### d) Test MSE for best models • All the models have the same R² but Ridge does the worst followed by LASSO For the rest of the question, we will consider the OLS model. 1modelFit<-regsubsets(y~.,data=modelDat,nvmax=20) 2modelFit %>% summary %>% print   1## Subset selection object 2## Call: regsubsets.formula(y ~ ., data = modelDat, nvmax = 20) 3## 20 Variables (and intercept) 4## Forced in Forced out 5## x.1 FALSE FALSE 6## x.2 FALSE FALSE 7## x.3 FALSE FALSE 8## x.4 FALSE FALSE 9## x.5 FALSE FALSE 10## x.6 FALSE FALSE 11## x.7 FALSE FALSE 12## x.8 FALSE FALSE 13## x.9 FALSE FALSE 14## x.10 FALSE FALSE 15## x.11 FALSE FALSE 16## x.12 FALSE FALSE 17## x.13 FALSE FALSE 18## x.14 FALSE FALSE 19## x.15 FALSE FALSE 20## x.16 FALSE FALSE 21## x.17 FALSE FALSE 22## x.18 FALSE FALSE 23## x.19 FALSE FALSE 24## x.20 FALSE FALSE 25## 1 subsets of each size up to 20 26## Selection Algorithm: exhaustive 27## x.1 x.2 x.3 x.4 x.5 x.6 x.7 x.8 x.9 x.10 x.11 x.12 x.13 x.14 x.15 28## 1 ( 1 ) " " " " " " " " " " " " "*" " " " " " " " " " " " " " " " " 29## 2 ( 1 ) " " " " " " " " " " "*" "*" " " " " " " " " " " " " " " " " 30## 3 ( 1 ) " " "*" " " " " " " "*" "*" " " " " " " " " " " " " " " " " 31## 4 ( 1 ) " " "*" " " "*" " " "*" "*" " " " " " " " " " " " " " " " " 32## 5 ( 1 ) " " "*" " " "*" " " "*" "*" " " " " " " " " " " " " "*" " " 33## 6 ( 1 ) " " "*" " " "*" " " "*" "*" " " " " " " " " " " " " " " " " 34## 7 ( 1 ) " " "*" " " "*" " " "*" "*" " " " " " " " " " " " " "*" " " 35## 8 ( 1 ) " " "*" " " "*" " " "*" "*" " " " " "*" " " " " " " "*" " " 36## 9 ( 1 ) " " "*" "*" "*" " " "*" "*" " " " " "*" " " " " " " "*" " " 37## 10 ( 1 ) " " "*" "*" "*" " " "*" "*" " " " " "*" " " " " " " "*" " " 38## 11 ( 1 ) " " "*" "*" "*" " " "*" "*" " " " " "*" "*" " " " " "*" " " 39## 12 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" " " "*" "*" " " " " "*" " " 40## 13 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" " " "*" " " 41## 14 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" " " "*" " " 42## 15 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" " " 43## 16 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" " " 44## 17 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" " " 45## 18 ( 1 ) "*" "*" "*" "*" " " "*" "*" "*" " " "*" "*" "*" "*" "*" "*" 46## 19 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" " " "*" "*" "*" "*" "*" "*" 47## 20 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" 48## x.16 x.17 x.18 x.19 x.20 49## 1 ( 1 ) " " " " " " " " " " 50## 2 ( 1 ) " " " " " " " " " " 51## 3 ( 1 ) " " " " " " " " " " 52## 4 ( 1 ) " " " " " " " " " " 53## 5 ( 1 ) " " " " " " " " " " 54## 6 ( 1 ) " " " " " " "*" "*" 55## 7 ( 1 ) " " " " " " "*" "*" 56## 8 ( 1 ) " " " " " " "*" "*" 57## 9 ( 1 ) " " " " " " "*" "*" 58## 10 ( 1 ) " " "*" " " "*" "*" 59## 11 ( 1 ) " " "*" " " "*" "*" 60## 12 ( 1 ) " " "*" " " "*" "*" 61## 13 ( 1 ) " " "*" " " "*" "*" 62## 14 ( 1 ) "*" "*" " " "*" "*" 63## 15 ( 1 ) "*" "*" " " "*" "*" 64## 16 ( 1 ) "*" "*" "*" "*" "*" 65## 17 ( 1 ) "*" "*" "*" "*" "*" 66## 18 ( 1 ) "*" "*" "*" "*" "*" 67## 19 ( 1 ) "*" "*" "*" "*" "*" 68## 20 ( 1 ) "*" "*" "*" "*" "*"  We might want to take a look at these. 1par(mfrow=c(2,2)) 2plot(modelFit) 3plot(modelFit,scale='Cp') 4plot(modelFit,scale='r2') 5plot(modelFit,scale='adjr2')  1plotLEAP(modelFit %>% summary)  It would appear that 16 variables would be a good bet. We note that the lasso model did void out 4 parameters, namely x₁,x₃,x₁₃ and x₁₇. Lets take a quick look at the various model variable significance values. 1lgp<-linCol %>% varImp %>% ggplot + ggtitle("OLS Variable Importance") 2rgp<-ridgCol %>% varImp %>% ggplot + ggtitle("Ridge Variable Importance") 3lsgp<-lassoCol %>% varImp %>% ggplot + ggtitle("Lasso Variable Importance") 4pcgp<-pcrCol %>% varImp %>% ggplot + ggtitle("PCR Variable Importance") 5plgp<-plsCol %>% varImp %>% ggplot + ggtitle("PLS Variable Importance") 6grid.arrange(lgp,rgp,lsgp,pcgp,plgp,ncol=3,bottom="Effective Importance, scaled")  #### e) Model size The test set numeric minimum RMSE is a tie between OLS and PCR, and this was achieved for the (effective) 16 variable OLS model, as well as the 18 variable PCR model. #### f) Best model We will consider the OLS and PCR models and its parameters. 1linCol$finalModel %>% print

 1##
2## Call:
3## lm(formula = .outcome ~ ., data = dat)
4##
5## Coefficients:
6## (Intercept)          x.1          x.2          x.3          x.4          x.5
7##    -0.06052     -0.02265     28.91650     14.16532     28.16256      0.13742
8##         x.6          x.7          x.8          x.9         x.10         x.11
9##    27.01497     31.15917     -9.66308      0.11641     19.06687     -9.09956
10##        x.12         x.13         x.14         x.15         x.16         x.17
11##    -8.01933      4.26852     20.22366     -0.16607      7.95594     10.89851
12##        x.18         x.19         x.20
13##    -1.09760     22.05197     20.88796

1pcrCol$bestTune %>% print  1## ncomp 2## 10 20  Now to compare this to the original. 1beta %>% print  1## [1] 0 29 14 28 0 27 31 -10 0 19 -9 -8 4 20 0 8 11 -1 22 2## [20] 21  1t=data.frame(linCol$finalModel$coefficients[-1]) %>% rename("Model_Coeffs"=1) %>% add_column(beta) %>% rename("Original_Coeffs"=2) 2print(t)   1## Model_Coeffs Original_Coeffs 2## x.1 -0.02265289 0 3## x.2 28.91649699 29 4## x.3 14.16532050 14 5## x.4 28.16255937 28 6## x.5 0.13741621 0 7## x.6 27.01497459 27 8## x.7 31.15917172 31 9## x.8 -9.66308362 -10 10## x.9 0.11641282 0 11## x.10 19.06687041 19 12## x.11 -9.09955826 -9 13## x.12 -8.01932598 -8 14## x.13 4.26852334 4 15## x.14 20.22366153 20 16## x.15 -0.16606531 0 17## x.16 7.95593559 8 18## x.17 10.89851353 11 19## x.18 -1.09759687 -1 20## x.19 22.05196537 22 21## x.20 20.88795623 21  We see that the coefficients are pretty similar. #### g) Plotting differences  1val.errors = rep(NaN, p) 2a = rep(NaN, p) 3b = rep(NaN, p) 4x_cols = colnames(xmat, do.NULL = FALSE, prefix = "x.") 5for (i in 1:p) { 6 coefi = coef(modelFit, id = i) 7 a[i] = length(coefi) - 1 ## Not counting the intercept 8 b[i] = sqrt(sum((beta[x_cols %in% names(coefi)] - coefi[names(coefi) %in% x_cols])^2) + 9 sum(beta[!(x_cols %in% names(coefi))])^2) ## Handling the intercept 10} 11plot(x = a, y = b, xlab = "Number of Coefficients", ylab = "Relative Error")  ## Question 6.11 - Page 264 We will now try to predict per capita crime rate in the Boston data set. (a) Try out some of the regression methods explored in this chapter, such as best subset selection, the lasso, ridge regression, and PCR. Present and discuss results for the approaches that you consider. (b) Propose a model (or set of models) that seem to perform well on this data set, and justify your answer. Make sure that you are evaluating model performance using validation set error, crossvalidation, or some other reasonable alternative, as opposed to using training error. (c) Does your chosen model involve all of the features in the data set? Why or why not? ### Answer 1boston<-MASS::Boston  • Summarize 1boston %>% str %>% print   1## 'data.frame': 506 obs. of 14 variables: 2##$ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
3##  $zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ... 4##$ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
5##  $chas : int 0 0 0 0 0 0 0 0 0 0 ... 6##$ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
7##  $rm : num 6.58 6.42 7.18 7 7.15 ... 8##$ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
9##  $dis : num 4.09 4.97 4.97 6.06 6.06 ... 10##$ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
11##  $tax : num 296 242 242 222 222 222 311 311 311 311 ... 12##$ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
13##  $black : num 397 397 393 395 397 ... 14##$ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
15##  $medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ... 16## NULL  1boston %>% summary %>% print   1## crim zn indus chas 2## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000 3## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000 4## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000 5## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917 6## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000 7## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000 8## nox rm age dis 9## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130 10## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100 11## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207 12## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795 13## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188 14## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127 15## rad tax ptratio black 16## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32 17## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38 18## Median : 5.000 Median :330.0 Median :19.05 Median :391.44 19## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67 20## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23 21## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90 22## lstat medv 23## Min. : 1.73 Min. : 5.00 24## 1st Qu.: 6.95 1st Qu.:17.02 25## Median :11.36 Median :21.20 26## Mean :12.65 Mean :22.53 27## 3rd Qu.:16.95 3rd Qu.:25.00 28## Max. :37.97 Max. :50.00  1boston %>% sapply(unique) %>% sapply(length) %>% print  1## crim zn indus chas nox rm age dis rad tax 2## 504 26 76 2 81 446 356 412 9 66 3## ptratio black lstat medv 4## 46 357 455 229  #### a) Test regression models 1train_ind = sample(boston %>% nrow,100) 2test_ind = setdiff(seq_len(boston %>% nrow), train_set)  1train_set<-boston[train_ind,] 2test_set<-boston[-train_ind,]  1linCol<-train(crim~.,data=train_set,method="lm") 2linCol %>% summary   1## 2## Call: 3## lm(formula = .outcome ~ ., data = dat) 4## 5## Residuals: 6## Min 1Q Median 3Q Max 7## -6.2431 -1.0344 -0.0563 0.8187 8.1318 8## 9## Coefficients: 10## Estimate Std. Error t value Pr(>|t|) 11## (Intercept) 0.9339246 7.6508393 0.122 0.9031 12## zn 0.0046819 0.0157375 0.297 0.7668 13## indus 0.0276209 0.0875254 0.316 0.7531 14## chas -1.1602278 1.2386869 -0.937 0.3516 15## nox -7.5024503 5.0207818 -1.494 0.1388 16## rm 1.1240874 0.7462340 1.506 0.1356 17## age 0.0020182 0.0137404 0.147 0.8836 18## dis -0.3934753 0.2454365 -1.603 0.1126 19## rad 0.4540613 0.0791580 5.736 1.41e-07 *** 20## tax 0.0008469 0.0052593 0.161 0.8724 21## ptratio -0.2978204 0.1637629 -1.819 0.0725 . 22## black 0.0030642 0.0045281 0.677 0.5004 23## lstat 0.1322779 0.0626485 2.111 0.0376 * 24## medv -0.0841382 0.0590700 -1.424 0.1580 25## --- 26## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 27## 28## Residual standard error: 2.362 on 86 degrees of freedom 29## Multiple R-squared: 0.775, Adjusted R-squared: 0.741 30## F-statistic: 22.78 on 13 and 86 DF, p-value: < 2.2e-16  1linPred<-predict(linCol,test_set) 2linPred %>% postResample(obs = test_set$crim)

1##      RMSE  Rsquared       MAE
2## 7.3794735 0.4056002 2.6774969

1L2Grid <- expand.grid(alpha=0,
2                          lambda=10^seq(from=-3,to=30,length=100))

1ridgCol<-train(crim~.,data=train_set,method="glmnet",tuneGrid = L2Grid)

1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
2## There were missing values in resampled performance measures.

1ridgCol %>% summary %>% print

 1##             Length Class      Mode
2## a0           100   -none-     numeric
3## beta        1300   dgCMatrix  S4
4## df           100   -none-     numeric
5## dim            2   -none-     numeric
6## lambda       100   -none-     numeric
7## dev.ratio    100   -none-     numeric
8## nulldev        1   -none-     numeric
9## npasses        1   -none-     numeric
10## jerr           1   -none-     numeric
11## offset         1   -none-     logical
12## call           5   -none-     call
13## nobs           1   -none-     numeric
14## lambdaOpt      1   -none-     numeric
15## xNames        13   -none-     character
16## problemType    1   -none-     character
17## tuneValue      2   data.frame list
18## obsLevels      1   -none-     logical
19## param          0   -none-     list

1coef(ridgCol$finalModel, ridgCol$bestTune$lambda) %>% print   1## 14 x 1 sparse Matrix of class "dgCMatrix" 2## 1 3## (Intercept) -3.881166065 4## zn 0.002597790 5## indus -0.005103517 6## chas -0.674764337 7## nox -0.053645732 8## rm 0.600064844 9## age 0.001153570 10## dis -0.179295384 11## rad 0.267082956 12## tax 0.006447932 13## ptratio -0.075885753 14## black -0.001650403 15## lstat 0.086462700 16## medv -0.027519270  1ggplot(ridgCol)  1ridgPred<-predict(ridgCol,test_set) 2ridgPred %>% postResample(obs = test_set$crim)

1##      RMSE  Rsquared       MAE
2## 7.5065916 0.4017056 2.4777547

1L1Grid <- expand.grid(alpha=1, # for lasso
2                          lambda=10^seq(from=-3,to=30,length=100))

1lassoCol<-train(crim~.,data=train_set,method="glmnet",tuneGrid = L1Grid)

1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
2## There were missing values in resampled performance measures.

1lassoCol %>% summary %>% print

 1##             Length Class      Mode
2## a0            78   -none-     numeric
3## beta        1014   dgCMatrix  S4
4## df            78   -none-     numeric
5## dim            2   -none-     numeric
6## lambda        78   -none-     numeric
7## dev.ratio     78   -none-     numeric
8## nulldev        1   -none-     numeric
9## npasses        1   -none-     numeric
10## jerr           1   -none-     numeric
11## offset         1   -none-     logical
12## call           5   -none-     call
13## nobs           1   -none-     numeric
14## lambdaOpt      1   -none-     numeric
15## xNames        13   -none-     character
16## problemType    1   -none-     character
17## tuneValue      2   data.frame list
18## obsLevels      1   -none-     logical
19## param          0   -none-     list

1coef(lassoCol$finalModel, lassoCol$bestTune$lambda) %>% print   1## 14 x 1 sparse Matrix of class "dgCMatrix" 2## 1 3## (Intercept) -2.024006430 4## zn . 5## indus . 6## chas . 7## nox . 8## rm . 9## age . 10## dis -0.008506188 11## rad 0.386379255 12## tax 0.001779579 13## ptratio . 14## black . 15## lstat 0.040788606 16## medv .  1ggplot(lassoCol)  1lassoPred<-predict(lassoCol,test_set) 2lassoPred %>% postResample(obs = test_set$crim)

1##      RMSE  Rsquared       MAE
2## 7.5868293 0.3859121 2.4892258

1mGrid <- expand.grid(ncomp=seq(from=1,to=20,length=10))

• All the models have the same R² but Ridge does the worst followed by LASSO

For the rest of the question, we will consider the OLS model.

1modelFit<-regsubsets(crim~.,data=boston,nvmax=20)
2modelFit %>% summary %>% print

 1## Subset selection object
2## Call: regsubsets.formula(crim ~ ., data = boston, nvmax = 20)
3## 13 Variables  (and intercept)
4##         Forced in Forced out
5## zn          FALSE      FALSE
6## indus       FALSE      FALSE
7## chas        FALSE      FALSE
8## nox         FALSE      FALSE
9## rm          FALSE      FALSE
10## age         FALSE      FALSE
11## dis         FALSE      FALSE
13## tax         FALSE      FALSE
14## ptratio     FALSE      FALSE
15## black       FALSE      FALSE
16## lstat       FALSE      FALSE
17## medv        FALSE      FALSE
18## 1 subsets of each size up to 13
19## Selection Algorithm: exhaustive
20##           zn  indus chas nox rm  age dis rad tax ptratio black lstat medv
21## 1  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     " "   " "   " "
22## 2  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     " "   "*"   " "
23## 3  ( 1 )  " " " "   " "  " " " " " " " " "*" " " " "     "*"   "*"   " "
24## 4  ( 1 )  "*" " "   " "  " " " " " " "*" "*" " " " "     " "   " "   "*"
25## 5  ( 1 )  "*" " "   " "  " " " " " " "*" "*" " " " "     "*"   " "   "*"
26## 6  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " " "     "*"   " "   "*"
27## 7  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " "*"     "*"   " "   "*"
28## 8  ( 1 )  "*" " "   " "  "*" " " " " "*" "*" " " "*"     "*"   "*"   "*"
29## 9  ( 1 )  "*" "*"   " "  "*" " " " " "*" "*" " " "*"     "*"   "*"   "*"
30## 10  ( 1 ) "*" "*"   " "  "*" "*" " " "*" "*" " " "*"     "*"   "*"   "*"
31## 11  ( 1 ) "*" "*"   " "  "*" "*" " " "*" "*" "*" "*"     "*"   "*"   "*"
32## 12  ( 1 ) "*" "*"   "*"  "*" "*" " " "*" "*" "*" "*"     "*"   "*"   "*"
33## 13  ( 1 ) "*" "*"   "*"  "*" "*" "*" "*" "*" "*" "*"     "*"   "*"   "*"


We might want to take a look at these.

1par(mfrow=c(2,2))
2plot(modelFit)
3plot(modelFit,scale='Cp')
4plot(modelFit,scale='r2')

1plotLEAP(modelFit %>% summary)


It would appear that 16 variables would be a good bet. We note that the lasso model did void out 4 parameters, namely x₁,x₃,x₁₃ and x₁₇.

Lets take a quick look at the various model variable significance values.

1lgp<-linCol %>% varImp %>% ggplot + ggtitle("OLS Variable Importance")
2rgp<-ridgCol %>% varImp %>% ggplot + ggtitle("Ridge Variable Importance")
3lsgp<-lassoCol %>% varImp %>% ggplot + ggtitle("Lasso Variable Importance")
4grid.arrange(lgp,rgp,lsgp,ncol=2,bottom="Effective Importance, scaled")


#### b) Propose a model

• Given the data and plots, I would probably end up using the Ridge regression model

• Clearly, LASSO is not working very well since it seems to have taken mainly 3 variables, one of which is largely categorical (9 levels)

#### c) Model properties

1boston %>% str %>% print

 1## 'data.frame':    506 obs. of  14 variables:
2##  $crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ... 3##$ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
4##  $indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ... 5##$ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
6##  $nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ... 7##$ rm     : num  6.58 6.42 7.18 7 7.15 ...
8##  $age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ... 9##$ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
10##  $rad : int 1 2 2 3 3 3 5 5 5 5 ... 11##$ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
12##  $ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ... 13##$ black  : num  397 397 393 395 397 ...
14##  $lstat : num 4.98 9.14 4.03 2.94 5.33 ... 15##$ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
16## NULL

1boston %>% summary %>% print

 1##       crim                zn             indus            chas
2##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000
3##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000
4##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000
5##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917
6##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000
7##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000
8##       nox               rm             age              dis
9##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130
10##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100
11##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207
12##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795
13##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188
14##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127
16##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32
17##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38
18##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44
19##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67
20##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23
21##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90
22##      lstat            medv
23##  Min.   : 1.73   Min.   : 5.00
24##  1st Qu.: 6.95   1st Qu.:17.02
25##  Median :11.36   Median :21.20
26##  Mean   :12.65   Mean   :22.53
27##  3rd Qu.:16.95   3rd Qu.:25.00
28##  Max.   :37.97   Max.   :50.00

1boston %>% sapply(unique) %>% sapply(length) %>% print

1##    crim      zn   indus    chas     nox      rm     age     dis     rad     tax
2##     504      26      76       2      81     446     356     412       9      66
3## ptratio   black   lstat    medv
4##      46     357     455     229

• A good idea would be removing rad and chas from the regression
1boston<-boston %>% subset(select=-c(rad,chas))

1train_ind = sample(boston %>% nrow,100)
2test_ind = setdiff(seq_len(boston %>% nrow), train_set)

1train_set<-boston[train_ind,]
2test_set<-boston[-train_ind,]

1L2Grid <- expand.grid(alpha=0,
2                          lambda=10^seq(from=-3,to=30,length=100))

1ridgCol<-train(crim~.,data=train_set,method="glmnet",tuneGrid = L2Grid)

1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
2## There were missing values in resampled performance measures.

1ridgCol %>% summary %>% print

 1##             Length Class      Mode
2## a0           100   -none-     numeric
3## beta        1100   dgCMatrix  S4
4## df           100   -none-     numeric
5## dim            2   -none-     numeric
6## lambda       100   -none-     numeric
7## dev.ratio    100   -none-     numeric
8## nulldev        1   -none-     numeric
9## npasses        1   -none-     numeric
10## jerr           1   -none-     numeric
11## offset         1   -none-     logical
12## call           5   -none-     call
13## nobs           1   -none-     numeric
14## lambdaOpt      1   -none-     numeric
15## xNames        11   -none-     character
16## problemType    1   -none-     character
17## tuneValue      2   data.frame list
18## obsLevels      1   -none-     logical
19## param          0   -none-     list

1coef(ridgCol$finalModel, ridgCol$bestTune$lambda) %>% print   1## 12 x 1 sparse Matrix of class "dgCMatrix" 2## 1 3## (Intercept) 0.23015398 4## zn 0.01595378 5## indus -0.01006683 6## nox 3.42104450 7## rm -0.05904911 8## age 0.01419585 9## dis -0.16724715 10## tax 0.01021790 11## ptratio 0.05741508 12## black -0.01414707 13## lstat 0.15519923 14## medv -0.04483670  1ggplot(ridgCol)  1ridgPred<-predict(ridgCol,test_set) 2ridgPred %>% postResample(obs = test_set$crim)

1##      RMSE  Rsquared       MAE
2## 6.9365371 0.3619974 2.8570012

1L1Grid <- expand.grid(alpha=1, # for lasso
2                          lambda=10^seq(from=-3,to=30,length=100))

1lassoCol<-train(crim~.,data=train_set,method="glmnet",tuneGrid = L1Grid)

1## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
2## There were missing values in resampled performance measures.

1lassoCol %>% summary %>% print

 1##             Length Class      Mode
2## a0           74    -none-     numeric
3## beta        814    dgCMatrix  S4
4## df           74    -none-     numeric
5## dim           2    -none-     numeric
6## lambda       74    -none-     numeric
7## dev.ratio    74    -none-     numeric
8## nulldev       1    -none-     numeric
9## npasses       1    -none-     numeric
10## jerr          1    -none-     numeric
11## offset        1    -none-     logical
12## call          5    -none-     call
13## nobs          1    -none-     numeric
14## lambdaOpt     1    -none-     numeric
15## xNames       11    -none-     character
16## problemType   1    -none-     character
17## tuneValue     2    data.frame list
18## obsLevels     1    -none-     logical
19## param         0    -none-     list

1coef(lassoCol$finalModel, lassoCol$bestTune$lambda) %>% print   1## 12 x 1 sparse Matrix of class "dgCMatrix" 2## 1 3## (Intercept) 0.86635621 4## zn . 5## indus . 6## nox . 7## rm . 8## age . 9## dis . 10## tax 0.01402174 11## ptratio . 12## black -0.01454496 13## lstat 0.15290845 14## medv .  1ggplot(lassoCol)  1lassoPred<-predict(lassoCol,test_set) 2lassoPred %>% postResample(obs = test_set$crim)

1##      RMSE  Rsquared       MAE
2## 6.9630616 0.3693364 2.6767742

1rgp<-ridgCol %>% varImp %>% ggplot + ggtitle("Ridge Variable Importance")
2lsgp<-lassoCol %>% varImp %>% ggplot + ggtitle("Lasso Variable Importance")
3grid.arrange(rgp,lsgp,ncol=2,bottom="Effective Importance, scaled")


None of these models are actually any good apparently, given that we have an R² of 0.3619974 for the L2 regularization and 0.3693364 for the L1.

1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. Berlin, Germany: Springer Science & Business Media. ↩︎

2. Lang et al., (2019). mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 4(44), 1903, https://doi.org/10.21105/joss.01903 ↩︎