The goal of sgdnet is to be a one-stop solution for fitting elastic net-penalized generalized linear models in the \(n \gg p\) regime. This is of course not novel in and by itself. Several packages, such as the popular glmnet package (Friedman, Hastie, and Tibshirani 2010), already exists to serve this pupose. With sgdnet, however, we set out to improve upon the existing solutions in a number of ways:
sgdnet has on purpose been created to mimic the interface of glmnet. Transitioning between the two is a breeze.
In this vignette, we will look at the basics of fitting a model and reviewing the results of it.
We will look at Edgar Anderson’s well-known iris data set, giving the measurements of petal and sepal length and width of three species of iris flowers. Our objective will be to predict the species of the flower. First, we’ll split the set into a training set.
We fit the model by specifying our feature matrix to argument
x and our response to
family = "multinomial" to specify the type of model we would like to fit.
The elastic net mixing parameter in sgdnet is specified via the
alpha argument, where a value of 1 imposes the lasso (\(\ell_1\)) penalty, and 0 the ridge (\(\ell_2\)) penalty. For this example, we’ll stick with the default (
alpha = 1, the lasso).
The regularization strength is specified via the
lambda argument. A high value will impose a larger penalty. We did not specify it here, nor do we need to as sgdnet takes care of fitting our model along a regularization path of different \(\lambda\) values, starting at the value at which the solution is expected to be completely sparse, that is, the point at which all coefficients (save for the intercept if it is included) are zero.
The result can be printed, which will show a summary of the deviance ratio of the model along the regularization path. Usually, however, it is more effective to study the fit by visualizing it.
What we see here are the linear predictors for the multinomial model with the \(\ell_1\)-norm along the x-axis. These are always returned on the original scale of the variables even if the argument
standardize = TRUE is provided to
sgdnet(), which happens to be the default.
Now that we have fit our model, we would like to see how well it fits. This is why we left out a testing subset of the data at the start. sgdnet contains a method for
predict(), which takes a new set of data and computes predictions for the response based on this.
To predict the class (response) – in this case the species of iris – of the observation, we’ll use the “class” argument.
This gives us the class predictions along the entire regularization path. If we had wanted to predict at a specific \(\lambda\), we could have specified it using the
s argument in our call to
We’ll now consider the accuracy along the entire path.
Of course, the choice of \(\lambda\) cannot be based only on our training data. In any real application we would do best to rely on cross-validation to pick a suitable value.
Development on sgdnet begun as a Google Summer of Code project in 2018 with the R Project for Statistical Computing as mentor organization. Michael Weylandt and Toby Dylan Hocking mentored the project.
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien. 2014. “SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives.” In Advances in Neural Information Processing Systems 27, 2:1646–54. Montreal, Canada: Curran Associates, Inc.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.