Fit a Generalized Linear Model with Elastic Net Regularization
sgdnet(x, ...) # S3 method for default sgdnet(x, y, family = c("gaussian", "binomial", "multinomial", "mgaussian"), alpha = 1, nlambda = 100, lambda.min.ratio = if (NROW(x) < NCOL(x)) 0.01 else 1e-04, lambda = NULL, maxit = 1000, standardize = TRUE, intercept = TRUE, thresh = 0.001, standardize.response = FALSE, ...)
| x | input matrix |
|---|---|
| ... | ignored |
| y | response variable |
| family | reponse type, one of |
| alpha | elastic net mixing parameter |
| nlambda | number of penalties in the regualrization path |
| lambda.min.ratio | the ratio between |
| lambda | regularization strength |
| maxit | maximum number of effective passes (epochs) |
| standardize | whether to standardize |
| intercept | whether to fit an intercept or not |
| thresh | tolerance level for termination of the algorithm. The algorithm terminates when $$ \frac{|\beta^{(t)} - \beta^{(t-1)}|{\infty}}{|\beta^{(t)}|{\infty}} < \mathrm{thresh} $$ |
| standardize.response | whether |
An object of class 'sgdnet' with the following items:
a0the intercept
betathe coefficients stored in sparse matrix format "dgCMatrix". For the multivariate families, this is a list with one matrix of coefficients for each response or class.
nulldevthe deviance of the null (intercept-only model)
dev.ratiothe fraction of deviance explained, where the deviance is two times the difference in loglikelihood between the saturated model and the null model
dfthe number of nozero coefficients along the
regularization path. For family = "multinomial",
this is the number of variables with
a nonzero coefficient for any class.
dfmata matrix of the number of nonzero coefficients for any class (only available for multivariate models)
alphaelastic net mixing parameter. See the description of the arguments.
lambdathe sequence of lambda values scaled to the original scale of the input data.
nobsnumber of observations
npassesaccumulated number of outer iterations (epochs) for the entire regularization path
offseta logical indicating whether an offset was used
groupeda logical indicating if a group lasso penalty was used
callthe call that generated this fit
Three model families are currently supported: gaussian univariate
regression, binomial logistic regression, and multinomial logistic
regression. The choice of which is made
using the family argument. Next follows the objectives of the various
model families:
Gaussian univariate regression:
$$ \frac{1}{2n} \sum_{i=1}^n (y_i - \beta_0 - x_i^\mathsf{T} \beta)^2 + \lambda \left( \frac{1 - \alpha}{2} ||\beta||_2^2 + \alpha||\beta||_1 \right). $$
Binomial logistic regression:
$$ -\frac1n \sum_{i=1}^n \bigg[y_i (\beta_0 + \beta^\mathsf{T} x_i) - \log\Big(1 + e^{\beta_0 + \beta^\mathsf{T} x_i}\Big)\bigg] + \lambda \left( \frac{1 - \alpha}{2} ||\beta||_2^2 + \alpha||\beta||_1 \right), $$ where \(y_i \in \{0, 1\}\).
Multinomial logistic regression: $$ -\bigg\{\frac1n \sum_{i=1}^n \Big[\sum_{k=1}^m y_{i_k} (\beta_{0_k} + x_i^\mathsf{T} \beta_k) - \log \sum_{k=1}^m e^{\beta_{0_k}+x_i^\mathsf{T} \beta_k}\Big]\bigg\} + \lambda \left( \frac{1 - \alpha}{2}||\beta||_F^2 + \alpha \sum_{j=1}^p ||\beta_j||_q \right), $$ where \(q \in {1, 2}\) invokes the standard lasso and 2 the group lasso penalty respectively, \(F\) indicates the Frobenius norm, and \(p\) is the number of classes.
Multivariate gaussian regression: $$ \frac{1}{2n} ||\mathbf{Y} -\mathbf{B}_0\mathbf{1} - \mathbf{B} \mathbf{X}||^2_F + \lambda \left((1 - \alpha)/2||\mathbf{B}||_F^2 + \alpha ||\mathbf{B}||_{12}\right), $$ where \(\mathbf{1}\) is a vector of all zeros, \(\mathbf{B}\) is a matrix of coefficients, and \(||\dot||_{12}\) is the mixed \(\ell_{1/2}\) norm. Note, also, that Y is a matrix of responses in this form.
The default regularization path is a sequence of nlambda
log-spaced elements
from \(\lambda_{\mathrm{max}}\) to
\(\lambda_{\mathrm{max}} \times \mathtt{lambda.min.ratio}\),
For the gaussian family, for instance,
\(\lambda_{\mathrm{max}}\) is
the largest absolute inner product of the feature vectors and the response
vector,
$$\max_i \frac{1}{n}|\langle\mathbf{x}_i, y\rangle|.$$
sgdnet is modeled to resemble glmnet closely so that users
can expect to receive more or less equivalent output regardless of whether
sgdnet() or glmnet::glmnet() is called. Nevertheless, there are a
few instances where we have decided to diverge from the behavior of
glmnet:
When the ridge penalty is used (alpha = 0), and a regularization
path (\(\lambda~s\)) is automatically generated,
glmnet::glmnet() fits the null model as the start of the path
(as if \(\lambda = \infty\))
even though the first \(\lambda\) reported actually
doesn't yield this fit. In sgdnet, we have opted
to fit the model so that it is true to the path that is returned.
# Gaussian regression with sparse features with ridge penalty fit <- sgdnet(abalone$x, abalone$y, alpha = 0) # Binomial logistic regression with elastic net penalty, no intercept binom_fit <- sgdnet(heart$x, heart$y, family = "binomial", alpha = 0.5, intercept = FALSE) # Multinomial logistic regression with lasso multinom_fit <- sgdnet(wine$x, wine$y, family = "multinomial") # Multivariate gaussian regression mgaussian_fit <- sgdnet(student$x, student$y, family = "mgaussian")