Fit MCP with Group Lasso Penalty

Iowa Schools Dataset

In the intialization, we loaded the grpreg package and the dataset Iowa which is available in the bestglm package.

There are n=133 average test scores for schools in the K=6 largest cities. The test score offers a standardized measure of academic achievement. The purpose of the study is to investigate if there is a relationship between academic achievement, as measured by the test, and poverty. It is expected that students from economically disadvantaged backgrounds will do less well. Data on the average income in the school district was not available so a proxy variable for poverty was used. The percentage of students who received subsidized meals was available so this was used as the “Poverty” variable.

df <- as.data.frame.table(table(Iowa$City))
names(df) <- c("Factor Level", "Sample Size")
row.names(df) <- NULL
x <- xtable(df)
caption(x) <- "Table 1. Sample Size in Each Category      "
print(x, type="html", caption.placement="top", caption.width=90,
      include.rownames=FALSE)

Table 1. Sample Size in Each Category
Factor Level	Sample Size
CedarRapids	21
Davenport	22
DesMoines	38
IowaCity	17
SiouxCity	21
Waterloo	14

Split into Training/Test Using Stratified Random Sampling

From Table 1 we see the sample sizes are adequate in each category so we can use stratified random sampling to assign about 2/3 of the data in each factor level to the training dataset and the remainder to the test data.

In Figure 1 we present a dotchart showing the sample sizes for each factor level. The visualization makes it clear that Des Moines has the largest number of schools while Waterloo has the smallest.

#split the data randomly into training/test using stratified random sampling
# to get the same proportion in each category
#re-order in ascending order 
ind <- order(as.character(Iowa$City))
Iowa2 <- Iowa[ind,]
n <- table(Iowa2$City)
J <- length(n) 
nTr <- round(trainFrac*n)
nTe <- n-nTr
m <- matrix(c(nTr,nTe), ncol=2)
row.names(m) <- names(n)
dimnames(m)[[2]] <- c("Train", "Test")
dotchart(m, col=c("red", "blue"), pch=18, pt.cex=2, xlim=c(0,26),
         xlab="Sample size in each category",
         main="Figure 1. Train/Test Sample Sizes")

## Warning in mtext(labs, side = 2, line = loffset, at = y, adj = 0, col =
## color, : "pt.cex" is not a graphical parameter

## Warning in mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0, :
## "pt.cex" is not a graphical parameter

## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "pt.cex" is
## not a graphical parameter

r <- Map(function(n) sample(1:n), n)
indTr <- vector("list", J)
for (i in 1:J) {
 indTr[[i]] <- unlist(Map(function(x) ifelse(x<=nTr[i], TRUE, FALSE), r[[i]]))
}
indTr <- unlist(indTr)
Iowa2 <- cbind(Iowa2, train=indTr)
XyTr <- Iowa2[Iowa2$train,][,-4]
XyTe <- Iowa2[!Iowa2$train,][,-4]
XTr <- as.matrix.data.frame(XyTr[,-3])
XTe <- as.matrix.data.frame(XyTe[,-3])
yTr <- XyTr[,3]
yTe <- XyTe[,3]

The dotchart in Figure 2 compares the average test score and average value of the poverty variable in each city. The six cities have been ordered in terms of their test scores reading from the bottom, Iowa City has the highest average test score while Waterloo has the lowest. It is strikingly clear that there is indeed a relationship between poverty and test score.

# For the dotchart, let's order in terms of Test score
tb <- aggregate(XyTr[,-1], by=list(City = XyTr$City), FUN=mean, data=XyTr)
ind <- order(tb$Test)
tb <- tb[ind,]
m <- as.matrix.data.frame(tb[,-1])
row.names(m) <- tb[,1]
dotchart(t(m), pch=18, col="blue", lcolor="red",
         main="Figure 2. Dot Chart. Poverty/Test by City, Training Data.",
         xlab="Average Percentage Poverty or Test Score")

Data Visualization using Lattice Plots

The lattice plot in Figure 2 of the scatterplot of test score vs poverty for each city.

The dashed line shows the fitted simple linear regression in each panel. Basically the slope and intercept term are identical to the regression we fit in the next section for the full multiple linear regression in which not only is Poverty and City included but all interactions as well. In multiple linear regression theory, we have shown how we can test if the slopes in each panel are the same and/or if the intercepts are the same.

The solid red curve shows the linear loess smooth with \(\alpha=0.8\) in each panel. A straightline regression seems OK except perhaps with Cedar Rapids. Using generalized additive models (GAM) makes it easy to fit separate loess or smoothing splines to each panel. There are two packages in R that I recommend for GAM, gam and mgcv.

#lattice style plot
xyplot(Test ~ Poverty | City, data=XyTr, panel=function(x, y){
 panel.xyplot(x, y, pch=16, col="blue", cex=1.5)
 panel.loess(x, y, span=0.8, degree=1, col=rgb(1,0,0,0.7), lwd=2)
 panel.lmline(x, y, lwd=2, lty=2)
}, main="Figure 3. Test Scores vs Poverty, Schools, Six Iowa Cities")

Fit Models and Predict on Test Data

RMSE <- NULL
#Full model - separate slope and intercept for each city
ans <- lm(Test ~ Poverty*City, data=XyTr)
bOLS <- coef(ans)
yhF <- predict(ans, newdata=XyTe)
RMSE[1] <- rmse(yTe, yhF)
#backward stagewise regression
ansStep <- step(ans, k=log(nrow(XyTr)), trace=0)
bStep <- coef(ansStep)
bStepBIC <- numeric(length(bOLS))
names(bStepBIC) <- names(bOLS)
bStepBIC[names(bStep)] <- bStep
yhS <- predict(ansStep, newdata=XyTe)
RMSE[2] <- rmse(yTe, yhS)
#get model matrix for use with grpreg
#separate slope and intercept - remove 1st column
XFTr <- model.matrix(ans)[,-1] 
XFTe <- model.matrix(Test ~ Poverty*City, data=XyTe)[,-1]
#MCP
ansMCP <- grpreg(XFTr, yTr, penalty="grMCP")
lambdaBIC <- select(ansMCP, "BIC")$lambda
bMCP <- select(ansMCP, "BIC")$beta
yhM <- predict(ansMCP, X=XFTe, lambda=lambdaBIC, type="response")
RMSE[3] <- rmse(yTe, yhM)
#Group MCP
varNames <- names(bMCP)
ansMCPGroup <- grpreg(XFTr, yTr, penalty="grMCP", 
                 group=c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3))
lambdaBIC <- select(ansMCPGroup, "BIC")$lambda
bMCPGroup <- select(ansMCPGroup, "BIC")$beta 
yhMPen <- predict(ansMCPGroup, X=XFTe, lambda=lambdaBIC, type="response")
RMSE[4] <- rmse(yTe, yhMPen)
names(RMSE) <- c("OLS", "Step", "MCPX", "GroupMCP") 
#table
tb <- matrix(c(bOLS, bStepBIC, bMCP, bMCPGroup), ncol=4)
tb <- rbind(tb, RMSE)
rownames(tb) <- c(names(bOLS), "RMSE")
colnames(tb) <- c("OLS (incl. interactions)", "Step/BIC", "MCP", "GroupMCP")
#round(tb,2)

In fitting the final group MCP model, the argument group is specified as

group=c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)

This specification obtained by grouping the terms in the regression,

Poverty, CityDavenport, CityDesMoines, CityIowaCity, CitySiouxCity, CityWaterloo, Poverty:CityDavenport, Poverty:CityDesMoines, Poverty:CityIowaCity, Poverty:CitySiouxCity, Poverty:CityWaterloo

Note that the intercept is not included because it is not included in the penalty function.

Table 2 compares the regression coefficients for the four models were fit and their corresponding RMSE’s on the test data.

x <- xtable(tb)
caption(x) <- "Table 2. Comparison of Models."
print(x, type="html", caption.placement="top")

Table 2. Comparison of Models.
	OLS (incl. interactions)	Step/BIC	MCP	GroupMCP
(Intercept)	71.55	76.88	76.00	75.97
Poverty	-0.33	-0.51	-0.46	-0.53
CityDavenport	-9.90	-10.53	-11.91	-5.86
CityDesMoines	2.90	-5.83	0.00	-3.14
CityIowaCity	9.73	-1.34	0.00	-0.86
CitySiouxCity	9.90	2.64	0.00	1.73
CityWaterloo	-2.14	-7.05	-8.76	-3.70
Poverty:CityDavenport	-0.08	0.00	0.00	0.00
Poverty:CityDesMoines	-0.25	0.00	-0.14	0.00
Poverty:CityIowaCity	-0.44	0.00	-0.06	0.00
Poverty:CitySiouxCity	-0.22	0.00	0.00	0.00
Poverty:CityWaterloo	-0.17	0.00	0.00	0.00
RMSE	10.26	10.15	10.00	9.33

Fit MCP with Group Lasso Penalty

Ian McLeod

February 17, 2017

Iowa Schools Dataset

Split into Training/Test Using Stratified Random Sampling

Data Visualization using Lattice Plots

Fit Models and Predict on Test Data