Getting Started

Estimators in sortedl1 are compatible with the scikit-learn interface. Here is a simple example of fitting a model to some random data.

We start by generating the data.

import numpy as np
from numpy.random import default_rng

from sortedl1 import Slope

# Generate some random data
n = 100
p = 10

seed = 31
rng = default_rng(seed)

x = rng.standard_normal((n, p))
beta = rng.standard_normal(p)
y = x @ beta + rng.standard_normal(n)

Next, we create the estimator by calling Slope() with all the desired parameters.

model = Slope(alpha=0.1)

Now we can fit the model to the data using the fit method, which provides a fitted model for the given value of alpha above.

model.fit(x, y)
model.coef_
array([ 0.57298856, -1.03709127,  2.55814536,  0.56922823, -1.22227224,
       -1.54242725,  0.        , -1.30545222,  0.        ,  0.49147967])

Path Fitting

The package also supports fitting the full SLOPE path via the path method to Slope. In this case, the value of alpha is ignored and unless path() is called with a specific sequence of alpha values, a sequence will automatically be generated to cover solutions from the point where the first coefficient enters the model

res = model.path(x, y)

Unlike the fit method, calling path() does not modify the model object and instead returns a named tuple of class PathResults, with the full set of coefficients and intercepts for each value of alpha.

PathResults also includes concise helpers for quick inspection:

res
res.summary()
{'n_alphas': 77,
 'n_features': 10,
 'n_targets': 1,
 'alpha_min': 0.0010078421223305043,
 'alpha_max': 1.1860406557259944,
 'coef_shape': (10, 1, 77),
 'intercepts_shape': (1, 77),
 'lambda_shape': (10,),
 'nnz_first_alpha': 0,
 'nnz_last_alpha': 10}

It also comes with a plot() method to visualize the path of coefficients:

fig, ax = res.plot()
_images/29729aa350ecee3fc50f5647ced6302e08164a614423edc30a175ffcf998c719.png

Cross-Validation

It is also easy to cross-validate in the sortedl1 package. Since the estimator is scikit-learn compatible, we could use the functionality from scikit-learn directly, but sortedl1 also includes native cross-validation routines that are optimized for the SLOPE package.

In the following example, we cross-validate across different levels of the gamma parameter, which fits the relaxed SLOPE model (a linear combination of SLOPE and ordinary least squares fit to the cluster structure from SLOPE).

cv_res = model.cv(x, y, q=[0.1], gamma=[0.0,0.5, 1.0])
fig, ax = cv_res.plot()
cv_res
cv_res.summary()
{'metric': 'mse',
 'n_param_sets': 3,
 'best_ind': 1,
 'best_alpha_ind': 76,
 'best_score': 1.0,
 'best_alpha': 0.0010078421223305043,
 'n_alphas_per_param': [77, 77, 77],
 'param_keys': ['gamma', 'q']}
_images/29882c22d06ab0532b899a309b4c5fe37393c49c37b656a031a04704e74b318e.png

If you want both cross-validation results and a best model fitted on the full data used in cross-validation, use refit=True:

cv_res, best_model = model.cv(x, y, q=[0.1], gamma=[0.0, 0.5, 1.0], refit=True)
best_model.coef_
array([ 0.67405566, -1.32637578,  2.77084901,  0.75603046, -1.37102155,
       -1.63196265,  0.15177179, -1.53242048,  0.08127653,  0.76057772])

In this low-dimensional example, we see that there is, unsurprisingly, little benefit to regularization.

Using scikit-learn model selection

The Slope estimator can also be used directly with scikit-learn model-selection tools such as GridSearchCV.

from sklearn.model_selection import GridSearchCV

param_grid = {"alpha": [0.01, 0.1, 1.0], "q": [0.05, 0.1, 0.2]}
search = GridSearchCV(Slope(), param_grid=param_grid, cv=5)
search.fit(x, y)

search.best_params_
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 5
      1 from sklearn.model_selection import GridSearchCV
      2 
      3 param_grid = {"alpha": [0.01, 0.1, 1.0], "q": [0.05, 0.1, 0.2]}
      4 search = GridSearchCV(Slope(), param_grid=param_grid, cv=5)
----> 5 search.fit(x, y)
      6 
      7 search.best_params_

File ~/.local/lib/python3.12/site-packages/sklearn/base.py:1336, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1329     estimator._validate_params()
   1331 with config_context(
   1332     skip_parameter_validation=(
   1333         prefer_skip_nested_validation or global_skip_validation
   1334     )
   1335 ):
-> 1336     return fit_method(estimator, *args, **kwargs)

File ~/.local/lib/python3.12/site-packages/sklearn/model_selection/_search.py:955, in BaseSearchCV.fit(self, X, y, **params)
    921 """Run fit with all sets of parameters.
    922 
    923 Parameters
   (...)    952     Instance of fitted estimator.
    953 """
    954 estimator = self.estimator
--> 955 scorers, refit_metric = self._get_scorers()
    957 X, y = indexable(X, y)
    958 params = _check_method_params(X, params=params)

File ~/.local/lib/python3.12/site-packages/sklearn/model_selection/_search.py:852, in BaseSearchCV._get_scorers(self)
    850     scorers = self.scoring
    851 elif self.scoring is None or isinstance(self.scoring, str):
--> 852     scorers = check_scoring(self.estimator, self.scoring)
    853 else:
    854     scorers = _check_multimetric_scoring(self.estimator, self.scoring)

File ~/.local/lib/python3.12/site-packages/sklearn/utils/_param_validation.py:218, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    212 try:
    213     with config_context(
    214         skip_parameter_validation=(
    215             prefer_skip_nested_validation or global_skip_validation
    216         )
    217     ):
--> 218         return func(*args, **kwargs)
    219 except InvalidParameterError as e:
    220     # When the function is just a wrapper around an estimator, we allow
    221     # the function to delegate validation to the estimator, but we replace
    222     # the name of the estimator by the name of the function in the error
    223     # message to avoid confusion.
    224     msg = re.sub(
    225         r"parameter of \w+ must be",
    226         f"parameter of {func.__qualname__} must be",
    227         str(e),
    228     )

File ~/.local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py:996, in check_scoring(estimator, scoring, allow_none, raise_exc)
    994     return None
    995 else:
--> 996     raise TypeError(
    997         "If no scoring is specified, the estimator passed should "
    998         "have a 'score' method. The estimator %r does not." % estimator
    999     )

TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator Slope() does not.