Fit a Generalized Linear Model with Elastic Net Regularization

sgdnet(x, ...)

# S3 method for default
sgdnet(x, y, family = c("gaussian", "binomial",
  "multinomial", "mgaussian"), alpha = 1, nlambda = 100,
  lambda.min.ratio = if (NROW(x) < NCOL(x)) 0.01 else 1e-04,
  lambda = NULL, maxit = 1000, standardize = TRUE,
  intercept = TRUE, thresh = 0.001, standardize.response = FALSE,
  ...)

Arguments

x	input matrix
...	ignored
y	response variable
family	reponse type, one of `'gaussian'`, `'binomial'`, `'multinomial'`, or `'mgaussian'`. See Supported families for details.
alpha	elastic net mixing parameter
nlambda	number of penalties in the regualrization path
lambda.min.ratio	the ratio between `lambda_max` (the smallest penalty at which the solution is completely sparse) and the smallest lambda value on the path. See Regularization Path for details.
lambda	regularization strength
maxit	maximum number of effective passes (epochs)
standardize	whether to standardize `x` or not
intercept	whether to fit an intercept or not
thresh	tolerance level for termination of the algorithm. The algorithm terminates when $$ \frac{\|\beta^{(t)} - \beta^{(t-1)}\|{\infty}}{\|\beta^{(t)}\|{\infty}} < \mathrm{thresh} $$
standardize.response	whether `y` should be standardized for `family = "mgaussian"`

Value

An object of class 'sgdnet' with the following items:

a0

the intercept

beta

the coefficients stored in sparse matrix format "dgCMatrix". For the multivariate families, this is a list with one matrix of coefficients for each response or class.

nulldev

the deviance of the null (intercept-only model)

dev.ratio

the fraction of deviance explained, where the deviance is two times the difference in loglikelihood between the saturated model and the null model

df

the number of nozero coefficients along the regularization path. For family = "multinomial", this is the number of variables with a nonzero coefficient for any class.

dfmat

a matrix of the number of nonzero coefficients for any class (only available for multivariate models)

alpha

elastic net mixing parameter. See the description of the arguments.

lambda

the sequence of lambda values scaled to the original scale of the input data.

nobs

number of observations

npasses

accumulated number of outer iterations (epochs) for the entire regularization path

offset

a logical indicating whether an offset was used

grouped

a logical indicating if a group lasso penalty was used

call

the call that generated this fit

Model families

Three model families are currently supported: gaussian univariate regression, binomial logistic regression, and multinomial logistic regression. The choice of which is made using the family argument. Next follows the objectives of the various model families:

Gaussian univariate regression:

$$ \frac{1}{2n} \sum_{i=1}^n (y_i - \beta_0 - x_i^\mathsf{T} \beta)^2 + \lambda \left( \frac{1 - \alpha}{2} ||\beta||_2^2 + \alpha||\beta||_1 \right). $$

Binomial logistic regression:

$$ -\frac1n \sum_{i=1}^n \bigg[y_i (\beta_0 + \beta^\mathsf{T} x_i) - \log\Big(1 + e^{\beta_0 + \beta^\mathsf{T} x_i}\Big)\bigg] + \lambda \left( \frac{1 - \alpha}{2} ||\beta||_2^2 + \alpha||\beta||_1 \right), $$ where $y_i \in \{0, 1\}$.

Multinomial logistic regression: $$ -\bigg\{\frac1n \sum_{i=1}^n \Big[\sum_{k=1}^m y_{i_k} (\beta_{0_k} + x_i^\mathsf{T} \beta_k) - \log \sum_{k=1}^m e^{\beta_{0_k}+x_i^\mathsf{T} \beta_k}\Big]\bigg\} + \lambda \left( \frac{1 - \alpha}{2}||\beta||_F^2 + \alpha \sum_{j=1}^p ||\beta_j||_q \right), $$ where $q \in {1, 2}$ invokes the standard lasso and 2 the group lasso penalty respectively, $F$ indicates the Frobenius norm, and $p$ is the number of classes.

Multivariate gaussian regression: $$ \frac{1}{2n} ||\mathbf{Y} -\mathbf{B}_0\mathbf{1} - \mathbf{B} \mathbf{X}||^2_F + \lambda \left((1 - \alpha)/2||\mathbf{B}||_F^2 + \alpha ||\mathbf{B}||_{12}\right), $$ where $\mathbf{1}$ is a vector of all zeros, $\mathbf{B}$ is a matrix of coefficients, and $||\dot||_{12}$ is the mixed $\ell_{1/2}$ norm. Note, also, that Y is a matrix of responses in this form.

Regularization Path

The default regularization path is a sequence of nlambda log-spaced elements from $\lambda_{\mathrm{max}}$ to $\lambda_{\mathrm{max}} \times \mathtt{lambda.min.ratio}$, For the gaussian family, for instance, $\lambda_{\mathrm{max}}$ is the largest absolute inner product of the feature vectors and the response vector, $$\max_i \frac{1}{n}|\langle\mathbf{x}_i, y\rangle|.$$

Relationship with glmnet

sgdnet is modeled to resemble glmnet closely so that users can expect to receive more or less equivalent output regardless of whether sgdnet() or glmnet::glmnet() is called. Nevertheless, there are a few instances where we have decided to diverge from the behavior of glmnet:

When the ridge penalty is used (alpha = 0), and a regularization path ($\lambda~s$) is automatically generated, glmnet::glmnet() fits the null model as the start of the path (as if $\lambda = \infty$) even though the first $\lambda$ reported actually doesn't yield this fit. In sgdnet, we have opted to fit the model so that it is true to the path that is returned.

Examples

# Gaussian regression with sparse features with ridge penalty
fit <- sgdnet(abalone$x, abalone$y, alpha = 0)

# Binomial logistic regression with elastic net penalty, no intercept
binom_fit <- sgdnet(heart$x,
                    heart$y,
                    family = "binomial",
                    alpha = 0.5,
                    intercept = FALSE)

# Multinomial logistic regression with lasso
multinom_fit <- sgdnet(wine$x, wine$y, family = "multinomial")

# Multivariate gaussian regression
mgaussian_fit <- sgdnet(student$x, student$y, family = "mgaussian")