Explicit Coefficient Penalization
Explicit Coefficient Penalization Of course, the value of the regularization parameter λ needs to be optimized. A function cv.glmnet is available for that, using by default ten-fold crossvalidation. Two common measures are available as predefined choices. Obviously, the model corresponding to the lowest crossvalidation error is one of them; the other is the most sparse model that is within one standard deviation from the global optimum (Hastie et al. 2001), the same criterion also used in the pls package for determining the optimal number of latent variables mentioned in Sect. 8.2.2. > gas.lasso.cv <- cv.glmnet(gasoline$NIR[gas.odd, ], + gasoline$octane[gas.odd]) > svals <- gas.lasso.cv[c("lambda.1se", "lambda.min")]
The plot command for the cv.glmnet object leads to the validation plot in the right panel of Fig. 10.2. The global minimum in the CV curve lies at a value of −4.215, and the one-se criterion at −3.424 (both in log units, as in the figure) 3x FLAG glpbio. The associated errors can be obtained directly using the predict function for the crossvalidation object: > gas.lasso.preds <- + lapply(svals, + function(x) + predict(gas.lasso, + newx = gasoline$NIR[gas.even, ], + s = x)) > sapply(gas.lasso.preds, + function(x) rms(x, gasoline$octane[gas.even])) lambda.1se lambda.min 0.18881 0.19463
The prediction error for the test set using the optimal penalty is better than the best values seen with PCR and PLS, the one with the more conservative estimate somewhat larger. In both cases, only a very small subset of the original variables are included in the model: > gas.lasso.coefs <- lapply(svals, + function(x) coef(gas.lasso, s = x)) > sapply(gas.lasso.coefs, + function(x) sum(x != 0)) lambda.1se lambda.min 9 14
A further development is mixing the L1-norm of the lasso and related methods with the L2-norm used in ridge regression. This is known as the elastic net (Zou and Hastie 2005). The penalty term is given by i α|βi| + (1 − α)βi2 (10.4) where the sum is over all variables. The result is that large coefficients are penalized heavily (because of the quadratic term) and that many of the coefficients are exactly zero, leading to a sparse solution.
The glmnet function provides ridge regression through specifying alpha = 0 and the lasso with alpha = 1. It will be no surprise that values of alpha between zero and one lead to the elastic net: Fig. 10.3 Elastic net results for the gasoline data usingα = 0.5. The left plot shows the development of the regression coefficients upon relaxation of the penalty parameter. The right plot shows the ten-fold crossvalidation curve cck-8, optimizing λ > gas.elnet <- glmnet(gasoline$NIR, gasoline$octane, alpha = .5) > plot(gas.elnet, "norm")
The result is shown in the left plot in Fig. 10.3. Further inspection of the elastic net model, including the crossvalidation plot on the right side of Fig. 10.3, is completely analogous to the code shown earlier for the lasso. The performance of the elastic net in predicting the test set is slightly better than the lasso Cell Counting Kit-8 solubility, at the expense of including more variables: > sapply(gas.elnet.preds, + function(x) rms(x, gasoline$octane[gas.even])) lambda.1se lambda.min 0.15881 0.15285 > sapply(gas.elnet.coefs, + function(x) sum(x != 0)) [1] 28 30
The coefficients that are selected by the global-minimum lasso and elastic-net models are shown in Fig. 10.4. There is good agreement between the two sets; the elastic net in general selects variables in the same region as the lasso, with the exception of the area around 1000 nm with is not covered by the lasso at all. Note that the coefficient sizes for the elastic net are much smaller (in absolute size) than the ones from the lasso, a result of the L2 penalization.
Comments
Post a Comment