Purpose

We show how to use grpreg::grpreg() to fit a penalized least squares regression with the MCP penalty function to the prostate dataset. We assume there are no factors present.

The grpreg() function provides other penalty functions. By setting penalty=c(“grLasso”, “grMCP”, “grSCAD”) and using the default setting for the argument group, these other penalized regression models may be fit using the BIC criterion to select the tuning parameter lambda.

In practice, MCP-regression predictions often beat other regression methods.

Non-convex MCP Penalty

Non-convex MCP Penalty

The package grpreg is loaded in the initialization.

Prostate data

The first step is to form the XyList, the list containing the 6 components with the training/test data. We use the split suggested by HTF.

data(prostate, package="ElemStatLearn")
indTr <- prostate[,10] #indicator Train/Test
Xy <- prostate[,1:9]
n <- nrow(Xy)
p <- ncol(Xy)-1
XyTr <- Xy[indTr, ] #select training data
XTr <- as.matrix.data.frame(XyTr[,1:p]) #need matrix for glmnet
yTr <- XyTr[,p+1] #as vector
XyTe <- Xy[!indTr, ] #select test data
XTe <- as.matrix.data.frame(XyTe[,1:p]) #need matrix for glmnet
yTe <- XyTe[,p+1] #as vector
XyList <- list(XyTr=XyTr, XTr=XTr, yTr=yTr, XyTe=XyTe, XTe=XTe, yTe=yTe)

MCP Model Fitting

The first code snippet fits the MCP-penalized regression and plots the regularization path.

ans <- grpreg(XTr, yTr, penalty="grMCP")
plot(ans)

The grpreg package also offers simple K-fold cross-validation. It has seed as an argument, so if you don’t set it, you always get the same answer! Very clever but it does really avoid the problem that the underlying code still depends on the initial random seed. The vertical line indicates the minimum CV error. And examining Figure 1 we see that indeed the choice of lambda is random. For this reason, I prefer using BIC or using CV averaging.

Figure 1. 10-fold CV
someSeeds <- c(77665533, 881155, 1231312, 44455511)
layout(matrix(1:4, ncol=2))
for (i in 1:4) {
 plot(cv.grpreg(XTr, yTr, seed=someSeeds[i]))
};layout(1)

Rather than using cross-validation, the grpreg package offers AIC/BIC selection for the tuning parameter lambda. We select the best BIC setting. The MCP regression coefficients are shown below.

best <- select(ans,"BIC")
best$beta
## (Intercept)      lcavol     lweight         age        lbph         svi 
##  -0.7500728   0.5690885   0.6561632   0.0000000   0.0417446   0.3243882 
##         lcp     gleason       pgg45 
##   0.0000000   0.0000000   0.0000000

Next the predictions are computed and a Bland-Altman plot is produced.

yh <- predict(ans, XTe, type="response", lambda=best$lambda)
x <- (yh+yTe)/2
y <- yh-yTe
plot(x, y, xlab="Mean of predictions and test values",
     ylab="Difference: prediction-test", pch=18, col="blue")
abline(v=0, col="red")
ind <- order(x)
x <- x[ind]
y <- y[ind]
ans <- loess(y~x, span=0.9, degree=1) #high smoothing for trend detection
lines(x, fitted(ans), col=rgb(1,0,1,0.7), lwd=2.5)
title(main="Figure 2. Bland-Altman/Tukey Plot\nForecasts vs Observed")

The RMSE on the test data is 0.6613.

You can use the function grpregPredict to compare predictions of Lasso, SCAD and MCP fitted using BIC selection for lambda.

bestglm::grpregPredict(XyList=XyList)
##     LASSO      SCAD       MCP 
## 0.6730472 0.6843042 0.6613436