ds385 API reference

ds385.cd(X, y, initbeta=None, tol=1e-08, maxiters=5000, l=0, l1=False, l2=False)[source]

Coordinate descent for regularized linear regression models

\[l_{\lambda}(\beta) = \sum_{n=1}^N(y_n - \beta_0 - \sum_{j=1}^J x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^J \beta_j^q\]

where \(q \in \{1, 2\}\).

The loss function \(l(\beta)\) is minimized, for a fixed and user supplied penalty parameter \(\lambda > 0\) using coordinate descent and starting from the initial values given in initbeta. If no initial values are supplied, random numbers uniformly distributed in \((-2, 2)\) are used.

The routine terminates when either the parameters \(\beta_j\) for \(j \in 1:J\) stop changing to within the specified tolerance tol, which defaults to \(1e-8\), or when the maximum number of iterations is reached, maxiters = 5_000.

The arguments l, l1, and l2 control the penalty parameter. The argument l specifies the value of the penalty parameter \(\lambda\) and defaults to \(0\). When l is equal to \(0\) the model fit with coordinate descent is an unregularized linear regression model. If l is positive, then either the lasso or the ridge linear regression model is fit, depending on which of l1 (lasso) or l2 (ridge) is true.

Parameters
  • X (2d np.array) – A 2 dimensional numpy array with \(N\) observations in the rows and \(K\) predictors. This form is known in the statistics community as a model matrix.

  • y (1d np.array) – A 1 dimensional numpy array with \(N\) observations in the rows. This is known as a response vector in the statistics community.

  • initbeta (1d np.array) – A 1 dimensional numpy array of initial values for the

Returns

Minimized values of \(\beta\) for the supplied values of \(X, y, \lambda\).

Return type

np.array

ds385.penalty(X, y, num=200)[source]

Generate a numpy array of penalty parameters \(\lambda\) such that there’s a good chance that the largest value in the array shrinks all coefficients in Lasso regression based on X and y to zero.

Parameters
  • X (2d np.array) – A 2 dimensional numpy array with \(N\) observations in the rows and \(K\) predictors. This form is known in the statistics community as a model matrix.

  • y (1d np.array) – A 1 dimensional numpy array with \(N\) observations in the rows. This is known as a response vector in the statistics community.

Returns

Values of penalty parameter \(\lambda\) to be used in some regularized linear method.

Return type

np.array

ds385.soft_threshold(b, l)[source]

Soft threshold operator used within coordinate descent for lasso.

The soft threshold operator is defined as

\[f(b, \lambda) = \text{sign}(b) \max{(|b| - \lambda, 0)}\]