Purpose

The elastic net penalty uses a tuning parameter to interpolate between the Ridge regression (RR) penalty (alpha=0) and the LASSO penalty (alpha=1). When we think all inputs are useful for prediction, RR may be used. On the other hand, if we suspect some inputs are not needed, LASSO is recommended. As a compromise sometimes alpha=0.5 is useful.

The package glmnet uses a state-of-the-art algorithm, coordinate descent, to compute the regularization path for regressions with the elastic-net penalty. Our package bestglm provides functions glmnetGridTable() and glmnetPredict() which provide improved prediction capability by using CV-averaging.

We including attaching the packages glmnet and bestglm in the initialization. Make sure bestglm is version 0.35.2 or greater,

packageVersion("bestglm")
## [1] '0.35.2'

This package is available from my homepage, http://www.stats.uwo.ca/faculty/aim/2017/sdm/

Prostate data

The next step is to form the XyList, the list containing the 6 components with the training/test data. We use the split suggested by HTF.

data(prostate, package="ElemStatLearn")
indTr <- prostate[,10] #indicator Train/Test
Xy <- prostate[,1:9]
n <- nrow(Xy)
p <- ncol(Xy)-1
XyTr <- Xy[indTr, ] #select training data
XTr <- as.matrix.data.frame(XyTr[,1:p]) #need matrix for glmnet
yTr <- XyTr[,p+1] #as vector
XyTe <- Xy[!indTr, ] #select test data
XTe <- as.matrix.data.frame(XyTe[,1:p]) #need matrix for glmnet
yTe <- XyTe[,p+1] #as vector
XyList <- list(XyTr=XyTr, XTr=XTr, yTr=yTr, XyTe=XyTe, XTe=XTe, yTe=yTe)

LASSO Model Fitting

The function bestglm::glmnetGridTable produces four plots of the K-fold regularized cross-validation. The number of folds K is specified by the argument nfolds which defaults to 10 if not specified.

Figure 1. LASSO 10-fold CV.
set.seed(7755113) #for reproducibility
tb <- glmnetGridTable(XyList, alpha=1)

Table 1. LASSO 10-fold CV Compared with OLS and Subset
round(tb,4)
##             OLS StepAIC StepBIC LASSO1 LASSO2 LASSO3 LASSO4
## lcavol   0.7164  0.7132  0.7799 0.4426 0.4565 0.4579 0.4550
## lweight  0.2926  0.2951  0.3519 0.3494 0.4312 0.4430 0.4183
## age     -0.1425 -0.1461  0.0000 0.0000 0.0000 0.0000 0.0000
## lbph     0.2120  0.2114  0.0000 0.0000 0.0303 0.0398 0.0199
## svi      0.3096  0.3115  0.0000 0.1801 0.3016 0.3261 0.2746
## lcp     -0.2890 -0.2877  0.0000 0.0000 0.0000 0.0000 0.0000
## gleason -0.0209  0.0000  0.0000 0.0000 0.0000 0.0000 0.0000
## pgg45    0.2773  0.2621  0.0000 0.0000 0.0009 0.0012 0.0006
## NORM     0.9596  0.9541  0.8556 0.5919 0.6973 0.7169 0.6766
## RMSE     0.7411  0.7389  0.7405 0.7047 0.6783 0.6756 0.6823

The function bestglm::glmnetPredict is used to find the best predictions by averaging the predictions produced using the model selected by regularized K-fold CV. Usually NREP=15 replications is sufficient but this can be easily checked.

 yh <- glmnetPredict(XyList, NREP=15, alpha=1)
 RMSE <- rmse(yTe, yh)

So the best prediction produced by CV averaging has RMSE 0.6985.

As discussed in the previous notebook it is helpful to plot the predictions vs. the observed test data along with the \(45^\circ\)-line as well as the Bland-Altman/Tukey mean-difference plot.

In the next code block, we check the assumption that NREP=15 replications of the CV algorithm is sufficient. So the procedure is repeated 5 times and we find that each of the 5 estimates for the RMSE are quite similar. So we conclude that 15 replications is sufficient.

NCheck <- 5
RMSE <- numeric(NCheck)
for (i in 1:NCheck) {
 yh <- glmnetPredict(XyList, NREP=15, alpha=1)
 RMSE[i] <- rmse(yTe, yh)
}
RMSE
## [1] 0.6984374 0.6985975 0.6985674 0.6984320 0.6985218

More General Elastic Net Fitting

Sometimes the elastic-net penalty with alpha=0.5 is able to combine the strengths of both RR and LASSO to produce better forecasts.

Bonus Problem for Classroom Discussion

Show how the tuning parameter alpha could be estimated from the data.