Consider a logistic curve,
\[ {\cal L}(x) = \frac{1}{1+\exp(-\lambda x)}\]
Setting \(\lambda = 10\), the resulting curve is shown in Figure 1 below. We have also added random noise with a standard deviation of 0.02 and distributed following a Student-t on 5 DF. This simple regression is well-handled by loess, kernel regression and many other non-parametric regression methods. But what happens if we embed this problem in higher dimensions?
We simply add confusion variables, that is, variables which are not related to the output variable and see what happens to our predictions.
For the training data we taken \(n=60\) observations on \(p=4\) inputs. But only the first input is used to generate output. The other three variables are ignored. Then we fit multiple linear regression and loess regression with this training data.
Next we generate test data with \(n=100, p=4\) and the output is generate in the same way as before. Using the model fit the training data we compute the predictions and their average RMSE. The results are shown below and clearly show the linear regression works much better in higher dimensions even though the truth is far from linear!
## Regression Loess
## 0.08063187 0.46512418