anchorboosting package

Submodules

anchorboosting.models module

class anchorboosting.models.AnchorBooster(gamma, dataset_params=None, num_boost_round=100, objective='regression', learning_rate=0.1, **kwargs)

Bases: object

Boost the anchor loss.

For regression, the anchor loss [Rothenhäusler et al., 2021] with causal regularization parameter \(\gamma\) is

\[\ell(f, y) = \frac{1}{2} \| y - f \|_2^2 + \frac{1}{2} (\gamma - 1) \|P_A (y - f) \|_2^2,\]

where \(P_A = A (A^T A)^{-1} A^T\) is the linear projection onto the anchor \(A\)’s column space .

Let \(\Phi\) and \(\varphi\) be cumulative distribution function and probability density function of the Gaussian distribution. For binary classification with \(y \in \{-1, 1\}\) and a probit link function, the anchor loss [Kook et al., 2022] is

\[\ell(f, y) = - \sum_{i=1}^n \log( \Phi(y_i f_i) ) + \frac{1}{2} (\gamma - 1) \|P_A r \|_2^2,\]

where \(r = - y \varphi(f) / \Phi(y f)\) is the gradient of the probit loss \(- \sum_{i=1}^n \log( \Phi(y_i f_i) )\) with respect to the scores \(f\). We use a probit link instead of logistic as the resulting anchor loss is convex.

We boost the anchor loss with LightGBM. Let \(\hat f^j\) be the boosted learner after \(j\) steps of boosting, with \(\hat f^0 = \frac{1}{n} \sum_{i=1}^n y_i\) (regression) or \(\hat f^0 = \Phi^{-1}(\frac{1}{n} \sum_{i=1}^n y_i)\) (binary classification). We fit a decision tree \(\hat t^{j+1} := - \left. \frac{\mathrm{d}}{\mathrm{d} f} \ell(f, y) \right|_{f = \hat f^j(X)} \sim X\) to the anchor loss’ negative gradient. Let \(M \in \mathbb{R}^{n \times \mathrm{num. \ leafs}}\) be the one-hot encoding of \(\hat t^{j+1}(X)\)’s leaf node indices. Then \(M^T \left. \frac{\mathrm{d}}{\mathrm{d} f} \ell(f, y) \right|_{f = \hat f^j(X)}\) and \(M^T \left.\frac{\mathrm{d}^2}{\mathrm{d} f^2}\ell(f, y)\right|_{f = \hat f^j(X)} M\) are the gradient and Hessian of the loss function \(\ell(\hat f^j(X) + \hat t^{j+1}(X), y) = \ell(\hat f^j(X) + M \hat\beta^{j+1}, y)\) with respect to \(\hat t^{j+1}\)’s leaf node values \(\hat\beta^{j+1} \in \mathbb{R}^{\mathrm{num. \ leafs}}\). We set them using a second order optimization step

\[\hat \beta^{j+1} = - \mathrm{lr} \, \cdot \, \left( M^T \left.\frac{\mathrm{d}^2}{\mathrm{d} f^2}\ell(f, y)\right|_{f = \hat f^j(X)} M \right)^{-1} M^T \left.\frac{\mathrm{d}}{\mathrm{d} f}\ell(f, y)\right|_{f = \hat f^j(X)},\]

where \(\mathrm{lr}\) is the learning rate, 0.1 by default. Finally, we set \(\hat f^{j+1} = \hat f^j + \hat t^{j+1}\).

For optimal speed, set the environment variable OMP_NUM_THREADS to the number of CPU cores available (not threads) before training. For performance, we recommend reducing the tree’s variance by restricting their maximum depth or number of leaves, e.g., by setting max_depth=3. Also, consider setting min_gain_to_split=0.1 (or some other small, non-zero value) to keep LightGBM from splitting leaves with zero variance.

Parameters:

gamma (float) – The \(\gamma\) parameter for the anchor objective function. Must be non-negative. If 1, the objective is equivalent to a standard regression or probit classification objective. Larger values correspond to more causal regularization.
dataset_params (dict or None) – The parameters for the LightGBM dataset. See LightGBM documentation for details. If None, LightGBM defaults are used.
num_boost_round (int) – The number of boosting iterations. Default is 100.
objective (str, optional, default="regression") – The objective function to use. Can be "regression" for regression or "binary" for classification with a probit link function. If "binary", the outcome values must be 0 or 1.
learning_rate (float, optional, default=0.1) – The learning rate for the boosting. This is the \(\mathrm{lr}\) in the second order optimization step. It controls the step size of the updates.
**kwargs (dict) – Additional parameters for the LightGBM model. See LightGBM documentation for details. We suggest reducing the tree’s complexity by reducing max_depth or num_leaves and setting min_gain_to_split to a non-zero value.

booster_

The LightGBM booster containing the trained model.

Type:: lightgbm.Booster

init_score_

The initial score used for the boosting. For regression, this is the mean of the outcome values. For binary classification, this is the inverse probit link applied to the prevalence.

Type:: float

References

[KSBuhlmann22] (1,2)

Lucas Kook, Beate Sick, and Peter Bühlmann. Distributional anchor regression. Statistics and Computing, 2022.

[RMBP21] (1,2)

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: heterogeneous data meet causality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021.

fit(X, y, Z=None, categorical_feature=None)

Fit the AnchorBooster.

Parameters:

X (pl.DataFrame or np.ndarray or pyarrow.Table or pd.DataFrame) – The input data.
y (np.ndarray) – The outcome.
Z (np.ndarray) – Anchors. One-hot encode categorical anchors.
categorical_feature (list of str or int or None, optional) – List of categorical feature names or indices. If None, all features are assumed to be numerical.

Returns:

self

Return type:

AnchorBooster

predict(X, raw_score=False, **kwargs)

Predict the outcome.

Parameters:

X (numpy.ndarray, polars.DataFrame, or pyarrow.Table) – The input data.
raw_score (bool) – If True, returns scores. Returns predicted probabilities if objective is "binary" and raw_score is False.
kwargs (dict) – Passed to lgb.Booster.predict.

refit(X, y, decay_rate=0)

Refit the model using new data.

Set \(\hat f^0_\mathrm{refit} =\) self.init_score_. Starting from \(\hat f^j_\mathrm{refit}\), we drop the new data \((X, y)\) down the \(j + 1\)’th tree \(\hat t^{j+1}\). Let \(\hat \beta_\mathrm{new}^{j+1}\) be the second order optimization of the loss \(\ell(\hat f^j_\mathrm{refit} + \hat t^{j+1}(X), y)\) with respect to the leaf node values \(\beta^{j+1}\) of \(\hat t^{j+1}(X)\). We set \(\hat \beta^{j+1}_\mathrm{refit} = \mathrm{decay \ rate} \cdot \hat \beta^{j+1}_\mathrm{old} + (1 - \mathrm{decay \ rate}) \cdot \hat \beta^{j+1}_\mathrm{new}\). Refitting updates the tree’s leaf values, but not their structure. AnchorBooster.refit differs from lgbm.Booster.refit by not reestimating \(\hat f^0_\mathrm{refit}\) from the new \(y\), supporting probit regression, and by not updating leaf node values with no samples from the new data, instead of shrinking them towards zero.

Parameters:

X (numpy.ndarray, polars.DataFrame, or pyarrow.Table) – The new data.
y (np.ndarray) – The new outcomes.
decay_rate (float) – The decay rate for the leaf values. Must be in [0, 1]. Default is 0. If 0, the leaf values are set to the new values. If 1, the leaf values are not updated. This matches the behavior of LightGBM’s refit method.

Returns:

self

Return type:

AnchorBooster

class anchorboosting.models.Proj(Z)

Bases: object

Cache the projection onto the subspace spanned by Z.

Parameters:: Z (np.ndarray of dimension (n, d_Z) or (n,), optional, default=None) – The Z matrix or 1d array of integers.

sandwich(leaves, num_leaves, weights)

For M = weights * one_hot(leaves), return proj(Z, M).T @ proj(Z, M).

Parameters:

leaves (np.ndarray of shape (n,)) – The leaf indices for each sample in f. Integers in [0, num_leaves).
num_leaves (int) – The number of leaves in the decision tree.
weights (np.ndarray of shape (n,)) – The input array to project.

Returns:

The sandwich product.

Return type:

np.ndarray of shape (d, d)

anchorboosting.simulate module

anchorboosting.simulate.f1(x2, x3)

anchorboosting.simulate.f2(x2, x3)

anchorboosting.simulate.simulate(f, n=100, shift=0, seed=0, return_dtype='polars')

Module contents

class anchorboosting.AnchorBooster(gamma, dataset_params=None, num_boost_round=100, objective='regression', learning_rate=0.1, **kwargs)