Generalized linear models regularized with the SLOPE (OWL) norm

Fit a generalized linear model regularized with the SLOPE (Sorted L-One Penalized Estimation) norm, which applies a decreasing $\lambda$ (penalty sequence) to the coefficient vector ($\beta$) after having sorted it in decreasing order according to its absolute values.

owl(
  x,
  y,
  family = c("gaussian", "binomial", "multinomial", "poisson"),
  intercept = TRUE,
  standardize_features = TRUE,
  sigma = NULL,
  lambda = c("gaussian", "bh", "oscar"),
  lambda_min_ratio = if (n < p) 0.01 else 1e-04,
  n_sigma = 100,
  q = 0.1 * min(1, n/p),
  screening = TRUE,
  tol_dev_change = 1e-05,
  tol_dev_ratio = 0.995,
  max_variables = n * m,
  max_passes = 1e+06,
  tol_rel_gap = 1e-05,
  tol_infeas = 1e-04,
  diagnostics = FALSE,
  verbosity = 0
)

Arguments

x	the feature matrix, which can be either a dense matrix of the standard matrix class, or a sparse matrix inheriting from Matrix::sparseMatrix Data frames will be converted to matrices internally.
y	the response. For Gaussian models this must be numeric; for binomial models, it can be a factor.
family	response type. See Families for details.
intercept	whether to fit an intercept
standardize_features	whether to standardize features (predictors)
sigma	scale of lambda sequence
lambda	either a character vector indicating the method used to construct the lambda path or a vector with length equal to the number of coefficients in the model
lambda_min_ratio	smallest value for `lambda` as a fraction of `lambda_max`
n_sigma	length of regularization path
q	shape of lambda sequence
screening	whether the strong rule for SLOPE be used to screen variables for inclusion
tol_dev_change	the regularization path is stopped if the fractional change in deviance falls below this value. Note that this is automatically set to 0 if a sigma is manually entered
tol_dev_ratio	the regularization path is stopped if the deviance ratio $1 - \mathrm{deviance}/\mathrm{(null-deviance)}$ is above this threshold
max_variables	criterion for stopping the path in terms of the maximum number of unique, nonzero coefficients in absolute value in model
max_passes	maximum number of passes for optimizer
tol_rel_gap	stopping criterion for the duality gap
tol_infeas	stopping criterion for the level of infeasibility
diagnostics	should diagnostics be saved for the model fit (timings, primal and dual objectives, and infeasibility)
verbosity	level of verbosity for displaying output from the program. Setting this to 1 displays basic information on the path level, 2 a little bit more information on the path level, and 3 displays information from the solver.

Value

An object of class "Owl" with the following slots:

coefficients

a three-dimensional array of the coefficients from the model fit, including the intercept if it was fit. There is one row for each coefficient, one column for each target (dependent variable), and one slice for each penalty.

nonzeros

a three-dimensional boolean array indicating whether a coefficient was zero or not

lambda

the lambda vector that when multiplied by a value in sigma gives the penalty vector at that point along the regularization path

sigma

the vector of sigma, indicating the scale of the lambda vector

class_names

a character vector giving the names of the classes for binomial and multinomial families

passes

the number of passes the solver took at each path

violations

the number of violations of the screening rule

active_sets

a list where each element indicates the indices of the coefficients that were active at that point in the regularization path

unique

the number of unique predictors (in absolute value)

deviance_ratio

the deviance ratio (as a fraction of 1)

null_deviance

the deviance of the null (intercept-only) model

family

the name of the family used in the model fit

diagnostics

a data.frame of objective values for the primal and dual problems, as well as a measure of the infeasibility, time, and iteration. Only available if diagnostics = TRUE in the call to owl().

call

the call used for fitting the model

Details

owl() tries to minimize the following composite objective, given in Lagrangian form. $$ f(\beta) + \sigma \sum_{i=j}^p \lambda_j |\beta|_{(j)}, $$ where $f(\beta)$ is a smooth, convex function of $\beta$, whereas the second part is the SLOPE norm, which is convex but non-smooth. In ordinary least-squares regression, for instance, $f(\beta)$ is simply the squared norm of the least-squares residuals. See section Families for specifics regarding the various types of $f(\beta)$ (model families) that are allowed in owl().

By default, owl() fits a path of lambda sequences, starting from the null (intercept-only) model to an almost completely unregularized model. The path will end prematurely, however, if the criteria related to any of the arguments tol_dev_change, tol_dev_ratio, or max_variables are reached before the path is complete. This means that unless these arguments are modified, the path is not guaranteed to be of length n_sigma.

Families

Gaussian

The Gaussian model (Ordinary Least Squares) minimizes the following objective. $$ ||y - X\beta||_2^2 $$

Binomial

The binomial model (logistic regression) has the following objective. $$ \sum_{i=1}^n \log\left(1+ \exp\left(- y_i \left(x_i^T\beta + \alpha \right) \right) \right) $$ with $y \in \{-1, 1\}$.

Poisson

In poisson regression, we use the following objective.

$$ -\sum_{i=1}^n \left(y_i\left(x_i^T\beta + \alpha\right) - \exp\left(x_i^T\beta + \alpha\right)\right) $$

Multinomial

In multinomial regression, we minimize the full-rank objective $$ -\sum_{i=1}^n\left( \sum_{k=1}^{m-1} y_{ik}(x_i^T\beta_k + \alpha_k) - \log\sum_{k=1}^{m-1} \exp(x_i^T\beta_k + \alpha_k) \right) $$ with $y_{ik}$ being the element in a $n$ by $(m-1)$ matrix, where $m$ is the number of classes in the response.

Regularization sequences

There are multiple ways of specifying the lambda sequence in owl(). It is, first of all, possible to select the sequence manually by using a non-increasing numeric vector as argument instead of a character. If all lambda are the same value, this will lead to the ordinary lasso penalty. The greater the differences are between consecutive values along the sequence, the more clustering behavior will the model exhibit. Note, also, that the scale of the $\lambda$ vector makes no difference if sigma = NULL, since sigma will be selected automatically to ensure that the model is completely sparse at the beginning and almost unregularized at the end. If, however, both sigma and lambda are manually specified, both of the scales will matter.

Instead of choosing the sequence manually, one of the following automatically generated sequences may be chosen.

BH (Benjamini--Hochberg)

If lambda = "bh", the sequence used is that referred to as $\lambda^{(\mathrm{BH})}$ by Bogdan et al, which sets $\lambda$ according to $$ \lambda_i = \Phi^{-1}(1 - iq/(2p)), $$ where $\Phi^{-1}$ is the quantile function for the standard normal distribution and $q$ is a parameter that can be set by the user in the call to owl().

Gaussian

This penalty sequence is related to BH, such that $$ \lambda_i = \lambda^{(\mathrm{BH})}_i \sqrt{1 + w(i-1)\cdot \mathrm{cumsum}(\lambda^2)_i}, $$ where $w(k) = 1/(n-k-1)$. We let $\lambda_1 = \lambda^{(\mathrm{BH})}_1$ and adjust the sequence to make sure that it's non-increasing. Note that if $p$ is large relative to $n$, this option will result in a constant sequence, which is usually not what you would want.

OSCAR

This sequence comes from Bondell and Reich and is a linearly non-increasing sequence such that $$ \lambda_i = q(p - i) + 1. $$

References

Bogdan, M., van den Berg, E., Sabatti, C., Su, W., & Candès, E. J. (2015). SLOPE -- adaptive variable selection via convex optimization. The Annals of Applied Statistics, 9(3), 1103–1140. https://doi.org/10/gfgwzt

Bondell, H. D., & Reich, B. J. (2008). Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR. Biometrics, 64(1), 115–123. JSTOR. https://doi.org/10.1111/j.1541-0420.2007.00843.x

Examples


# Gaussian response, default lambda sequence

fit <- owl(bodyfat$x, bodyfat$y)

# Binomial response, BH-type lambda sequence

fit <- owl(heart$x, heart$y, family = "binomial", lambda = "bh")

# Poisson response, OSCAR-type lambda sequence

fit <- owl(abalone$x,
           abalone$y,
           family = "poisson",
           lambda = "oscar",
           q = 0.4)

# Multinomial response, custom sigma and lambda

m <- length(unique(wine$y)) - 1
p <- ncol(wine$x)

sigma <- 0.005
lambda <- exp(seq(log(2), log(1.8), length.out = p*m))

fit <- owl(wine$x,
           wine$y,
           family = "multinomial",
           lambda = lambda,
           sigma = sigma)