ds385
API reference¶
- ds385.cd(X, y, initbeta=None, tol=1e-08, maxiters=5000, l=0, l1=False, l2=False)[source]¶
Coordinate descent for regularized linear regression models
\[l_{\lambda}(\beta) = \sum_{n=1}^N(y_n - \beta_0 - \sum_{j=1}^J x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^J \beta_j^q\]where \(q \in \{1, 2\}\).
The loss function \(l(\beta)\) is minimized, for a fixed and user supplied penalty parameter \(\lambda > 0\) using coordinate descent and starting from the initial values given in initbeta. If no initial values are supplied, random numbers uniformly distributed in \((-2, 2)\) are used.
The routine terminates when either the parameters \(\beta_j\) for \(j \in 1:J\) stop changing to within the specified tolerance tol, which defaults to \(1e-8\), or when the maximum number of iterations is reached, maxiters = 5_000.
The arguments l, l1, and l2 control the penalty parameter. The argument l specifies the value of the penalty parameter \(\lambda\) and defaults to \(0\). When l is equal to \(0\) the model fit with coordinate descent is an unregularized linear regression model. If l is positive, then either the lasso or the ridge linear regression model is fit, depending on which of l1 (lasso) or l2 (ridge) is true.
- Parameters
X (2d np.array) – A 2 dimensional numpy array with \(N\) observations in the rows and \(K\) predictors. This form is known in the statistics community as a model matrix.
y (1d np.array) – A 1 dimensional numpy array with \(N\) observations in the rows. This is known as a response vector in the statistics community.
initbeta (1d np.array) – A 1 dimensional numpy array of initial values for the
- Returns
Minimized values of \(\beta\) for the supplied values of \(X, y, \lambda\).
- Return type
np.array
- ds385.penalty(X, y, num=200)[source]¶
Generate a numpy array of penalty parameters \(\lambda\) such that there’s a good chance that the largest value in the array shrinks all coefficients in Lasso regression based on X and y to zero.
- Parameters
X (2d np.array) – A 2 dimensional numpy array with \(N\) observations in the rows and \(K\) predictors. This form is known in the statistics community as a model matrix.
y (1d np.array) – A 1 dimensional numpy array with \(N\) observations in the rows. This is known as a response vector in the statistics community.
- Returns
Values of penalty parameter \(\lambda\) to be used in some regularized linear method.
- Return type
np.array