The title of the latter paper has led to the name of the R-package
The title of the latter paper has led to the name of the R-package The branch-and-bounds algorithm was first proposed in 1960 in the area of linear programming (Land and Doig 1960), and was introduced in statistics by Furnival and Wilson (1974) cck-8 ic50. The title of the latter paper has led to the name of the R-package 3xFLAG price. This particular algorithm manages to avoid many regions in the search space that can be shown to be less good than the current solution, and thus is able to tackle larger problems than would have been feasible using an exhaustive search. Application of the regsubsets function leads to the same set of selected variables (now we can provide a factor as the dependent variable): > twowines.leaps <- regsubsets(vintage ˜ ., data = twowines.df) > twowines.leaps.sum <- summary(twowines.leaps) > names(which(twowines.leaps.sum$which[8, ])) [1] "(Intercept)" "malic.acid" "ash" [4] "tot..phenols" "flavonoids" "non.flav..phenols" [7] "col..int." "col..hue" "OD.ratio"
In some special cases, approximate distributions of model coefficients can be derived. For two-class linear discriminant analysis, a convenient test statistic is given by Mar dia et al. (1979): F = a2i (m − p + 1)c2 tim(m + c2)D2 (10.1) with m = n1 + n2 − 2, n1 and n2 signifying group sizes, p the number of variables, c2 = n1n2/(n1 + n2), and D2 is the Mahalanobis distance between the class centers, based on all variables. The estimated coefficient in the discriminant function is ai , and ti is the i-th diagonal element in the inverse of the total variance matrix T, given by T = W + B (10.2)
This statistic has an F-distribution with 1 and m − p + 1 degrees of freedom.
Let us see what that gives for the wine data without the Barolo samples. We can re-use the code in Sect. 7.1.3, now using all thirteen variables to calculate the elements for the test statistic: > Tii <- solve(BSS + WSS) > Ddist <- mahalanobis(colMeans(wines.groups[[1]]), + colMeans(wines.groups[[2]]), + wines.pcov12) > m <- sum(sapply(wines.groups, nrow)) - 2 > p <- ncol(wines) > c <- prod(sapply(wines.groups, nrow)) / + sum(sapply(wines.groups, nrow)) > Fcal <- (MLLDAˆ2 / diag(Tii)) * + (m - p + 1) * cˆ2 / (m * (m + cˆ2 * Ddist)) > which(Fcal > qf(.95, 1, m-p+1)) malic.acid ash flavonoids 237 non.flav..phenols col..int. col..hue 8 10 11 OD.ratio 12
Using this method, seven variables are shown to be contributing to the separation between Grignolino and Barbera wines on the α = 0.05 level. The only variable missing, when compared to the earlier selected set of eight, is tot..phenols, which has a p-value of 0.08.
10.2 Explicit Coefficient Penalization
In the chapter on multivariate regression we already saw that several methods use the concept of shrinkage to reduce the variance of the regression coefficients, at the cost of bias. Ridge regression achieves this by explicit coefficient penalization, as shown in Eq. 8.22. Although it forces the coefficients to be closer to zero, the values almost never will be exactly zero. If that would be the case, the method would be performing variable selection: those variables with zero values for the regression coefficients can safely be removed from the data.
Interestingly enough, one can obtain the desired behavior by replacing the quadratic penalty in Eq. 8.22 by an absolute-value penalty: argmax B (Y − X B)2 + λ|B| (10.3)
The penalty, consisting of the sum of the absolute values of the regression coefficients, is an L1-norm. As already stated before, ridge regression, focusing on squared coef- ficients, employs an L2-norm, and measures like AIC or BIC are using the L0-norm, taking into account only the number of non-zero regression coefficients. In Eq. 10.3, with increasing values for parameter λ more and more regression coefficients will be exactly zero. This method has become known under the name lasso (Tibshi rani 1996; Hastie et al. 2001); an efficient method to solve this equation—and related approaches—has become known under the name of least-angle regression, or LARS (Efron et al. 2004). Several R versions for the lasso are available. Package glmnet is written by the inventors of the method, and will be used here as an exam ple. Other packages implementing similar techniques include lars, where slightly different defaults have been chosen for solving the lasso problem, lpc for “lassoed principal components” and relaxo, a generalization of the lasso using possibly dif ferent penalization coefficients for the variable selection and parameter estimation steps Cell Counting Kit-8 stability.
Rather than one set of coefficients for one given value of λ, the function glmnet returns an entire sequence of fits, with corresponding regression coefficients. For the odd rows of the gasoline data, the model is simply obtained as follows: > gas.lasso <- glmnet(x = gasoline$NIR[gas.odd, ], + y = gasoline$octane[gas.odd]) > plot(gas.lasso, xvar = "lambda", label = TRUE)
The result of the corresponding plot method is shown in the left panel of Fig. 10.2. It shows the (standardized) regression coefficients against the size of the L1 norm of the coefficient vector. For an infinitely large value of λ, the weight of the penalty, no variables are selected. Gradually decreasing the penalty leads to a fit using only one non-zero coefficient. Its size varies linearly with the penalty—until the next variable enters the fray. The right of the plot shows the position of the entrances of new non zero coefficients. This piecewise linear behavior is the key to the lasso algorithm, and makes it possible to calculate the whole trace in approximately the same amount of time as needed for a normal linear regression. Around the left axis (and somewhat hard to read in this default set-up), the variable numbers of some of the coefficients are shown at their “final” values, i.e., at the last value for λ, by default one percent of the value at which the first variable enters the model.
Comments
Post a Comment