--- title: "Fitting MLP and MARS to rmix Dataset" author: "A. I. McLeod" date: "March 25, 2018" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) #Remark: assume xtable package has been installed library("caret") library("NeuralNetTools") #library("nnet") XyTe <- read.csv( "http://www.stats.uwo.ca/faculty/aim/2018/4850G/data/rmixTe.csv", header=TRUE) XyTr <- read.csv( "http://www.stats.uwo.ca/faculty/aim/2018/4850G/data/rmixTr.csv", header=TRUE) # XyTr[,3] <- factor(XyTr[,3]) XyTe[,3] <- factor(XyTe[,3]) ``` In our previous lecture note, *15_2_kNNApplied_Mixture*, we compared fitting a logistic classifier and a kNN classifier for predicting with a sample of size $n=200$ from the data generated by our **gencve::rmix()**. We evaluated the predictors on a test sample of size 20,000 and found that the prediction errors were 27.10% and 26.15% when $k=7$ was selected using the pseudo-MLE method. Since the test sample is so large the 95\% MOE is *less than* 0.0045 so difference is not due to randomness -- it is more than 200 standard deviations away from zero! By comparison the theoretical optimum misclassification error rate (Bayes error rate) was shown to be 20.76\%. In this lecture we fit a multilayer perceptron (MLP) to this data using the functions **nnet::nnet()** and **RSNNS::mlp()**. Both of these functions are available in the **caret** package with methods **nnet** and **mlpWeightDecay**. When using caret, you should also examine the arguments for the underlying functions that caret uses since these may be helpful in fine-tuning the model. On persual for the documentation provided by caret::train() for the "nnet" method we set the tuning parameter size and decay. I experimented with several settings to find ones that worked well since we don't want a tuning parameter setting that is on the boundary! \newpage ##Fitting using caret and nnet::net() Also setting **metric="Accuracy"** and the output variable a factor ensures that caret/nnet recognizes that this is a classification problem. Additional setting for nnet are **trace=FALSE**, **skip==TRUE** and **maxit=2000**. The skip-layer setting is needed so the neural net includes a bias correction at every node. The fitted model is summarized below. ```{r NNET, echo=FALSE, cache=TRUE} #ctrl <- trainControl(method="repeatedcv", number=10, repeats=7, # summaryFunction=twoClassSummary) #produces error ctrl <- trainControl(method="repeatedcv", number=10, repeats=7) set.seed(3334) tuneGrid <- expand.grid(.size=1:4, .decay=c(0, 0.1, 0.25, 0.5, 0.75)) ans1 <- train(x=XyTr[,1:2], y=XyTr[,3], #preProc=c("center", "scale"), #doesn't hurt, but not used here! method="nnet", metric="Accuracy", tuneGrid=tuneGrid, trControl=ctrl, #these args are passed thru to nnet trace=FALSE, skip=TRUE, #usually set to TRUE maxit = 2000) summary(ans1) ``` Figure 1 below shows a schematic diagram of the fitted MLP using the library NeuralNetTools. ```{r PLOT-nnet, echo=FALSE, fig.height=4, fig.pos="H", fig.cap="Fitted MLP using nnet", cache=TRUE} yH1 <- predict(ans1, newdata=XyTe[,1:2]) accuracy1 <- mean(yH1==factor(XyTe[,3])) eta1 <- 1-accuracy1 cftb1 <- table(XyTe[,3], yH1, dnn=c("Truth", "Predicted")) MOE1 <- 1.96*sqrt(eta1*(1-eta1)/nrow(XyTe)) ci1 <- eta1+c(-1,1)*MOE1 plotnet(ans1$finalModel, x_names=c("x1", "x2"))#black=+wt, gray=-ve, ``` \newpage The fitted model was used to predict on the 20,000 test instances and the observed misclassiciation rate was $\hat{\eta}$ = `r round(100*eta1,2)`\%. The confusion matrix is shown in Table 1 below. ```{r ConfusionMatrix1, echo=FALSE, results="asis"} out <- xtable::xtable(cftb1, caption="Model fitted using nnet") print(out, comment=FALSE, type="latex") ``` \newpage ##Fitting using caret and RSNNS::mlp() From Figure 2 we see a different model was selected. This is not surprising since a completely different optimization algorithm was used and even more different local minima may be expected using the same algorithm since initial weights are chosen randomly. Most local minima are just as good. ```{r MLP, echo=FALSE, cache=TRUE} ctrl <- trainControl(method="repeatedcv", number=10, repeats=7) set.seed(7357351) tuneGrid <- expand.grid(.size=1:4, .decay=c(0, 0.1, 0.25, 0.5, 0.75)) ans2 <- train(x=XyTr[,1:2], y=XyTr[,3], #preProc=c("center", "scale"), #doesn't hurt method="mlpWeightDecay", metric="Accuracy", tuneGrid=tuneGrid, trControl=ctrl) #summary(ans2) ``` ```{r PLOT-mlp, echo=FALSE, fig.height=4, fig.width=4, fig.pos="H", fig.cap="Fitted MLP using RSNNS::mlp(). O1 and O2 are class probabilities.", cache=TRUE} #predict(ans2) #show both outputs yH2 <- predict(ans2, newdata=XyTe[,1:2]) cftb2 <- table(XyTe[,3], yH2, dnn=c("Truth", "Predicted")) accuracy2 <- mean(yH2==factor(XyTe[,3])) eta2 <- 1-accuracy2 MOE2 <- 1.96*sqrt(eta2*(1-eta2)/nrow(XyTe)) ci2 <- eta2+c(-1,1)*MOE2 plotnet(ans2$finalModel, y_names=c("", ""), x_names=c("x1", "x2")) #black=+wt, gray=-ve ``` The fitted model was used to predict on the 20,000 test instances and the observed misclassiciation rate was $\hat{\eta}$ = `r round(100*eta2,2)`\%. The confusion matrix is shown in Table 2 below. ```{r ConfusionMatrix2, echo=FALSE, results="asis"} out <- xtable::xtable(cftb2, caption="Confusion Matrix, fitted ") print(out, comment=FALSE, type="latex") ``` ##MARS classifier ```{r MARS, echo=FALSE} XyTrain <- XyTr XyTrain[,3] <- factor(XyTrain[,3]) ans <- earth::earth(y ~ ., data=XyTrain, glm=list(family=binomial(link="logit"))) XyTest <- XyTe XyTest[,3] <- factor(XyTest[,3]) pHTe <- predict(ans, newdata=XyTest, type="response") yHTest <- predict(ans, newdata=XyTest, type="class") rTest <- mean(yHTest!=XyTest$y) cftbMARS <- table(XyTest[,3], yHTest, dnn=c("Truth", "Predicted")) ``` The observed misclassiciation rate was $\hat{\eta}$ = `r round(100*rTest,2)`\%. The confusion matrix is shown in Table 3 below. ```{r ConfusionMatrixMARS, echo=FALSE, results="asis"} out <- xtable::xtable(cftbMARS, caption="MARS Model") print(out, comment=FALSE, type="latex") ```