Fitting best subset regression to the prostate data using BIC subset selection the best predictors are ‘lcavol’, ‘lweight’ and ‘svi’

Remark: The t-ratio statistic does not as useful a measure of variable importance as the RF-importance statistic. The t-ratio statistic indicates how significantly different from zero the regression coefficient is whereas the RF-importance indicates how important the input variable is in prediction.

## BIC
## BICq equivalent for q in (0.056139873352981, 0.759703853311213)
## Best Model:
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -0.7771566 0.62299945 -1.247444 2.153670e-01
## lcavol       0.5258519 0.07486323  7.024168 3.493565e-10
## lweight      0.6617699 0.17563516  3.767867 2.887126e-04
## svi          0.6656666 0.20708985  3.214385 1.797619e-03

Lasso also does variable selection. Fitting a Lasso regression using glmnet produces a result in agreement with best subset.

##    lcavol   lweight       age      lbph       svi       lcp   gleason 
## 0.4409790 0.2432206 0.0000000 0.0000000 0.3064360 0.0000000 0.0000000 
##     pgg45 
## 0.0000000

The Random Forest Importance plot shows which variables are most important in predicting ‘lpsa’. We see ‘lcavol’ is the most important one followed by ‘svi’ and then ‘lweight’.