API Reference

BinaryClassifier

The main entry point. Construct from (y_true, y_score) or from bare confusion matrix counts via from_cm.

`classifier_uncertainty._classifier.BinaryClassifier`

Uncertainty-aware binary classifier evaluator.

Implements Bayesian uncertainty quantification for classifier metrics following Tötsch & Hoffmann (2020). Metrics are derived by sampling the confusion matrix probability matrix (θ) from three independent Beta posteriors for prevalence (φ), TPR, and TNR.

Parameters:

Name	Type	Description	Default
`y_true`	`ndarray`	Ground-truth binary labels (`bool` or `0`/`1`).	required
`y_score`	`ndarray`	Classifier scores; higher values indicate a more positive prediction.	required
`n_samples`	`int`	Number of posterior CM samples. Default is `20_000`.	`20000`
`prior`	`tuple[float, float]`	Beta(α, β) prior applied uniformly to φ, TPR, and TNR. Default is the Laplace prior `(1.0, 1.0)`. The same prior is used for all three distributions; per-distribution priors are not currently supported. If you need that, open an issue on GitHub.	`(1.0, 1.0)`
`seed`	`int`	Random seed for reproducibility.	`None`

`at_threshold(threshold=0.5)`

Return metric distributions at a fixed score threshold.

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Decision boundary applied to `y_score`. Default is `0.5`.	`0.5`

Returns:

Type	Description
`ThresholdResult`	Posterior metric distributions at this threshold.

`from_cm(tp, fn, tn, fp, n_samples=20000, prior=(1.0, 1.0), seed=None)` `classmethod`

Construct from observed confusion matrix counts.

Parameters:

Name	Type	Description	Default
`tp`	`int`	True positive count.	required
`fn`	`int`	False negative count.	required
`tn`	`int`	True negative count.	required
`fp`	`int`	False positive count.	required
`n_samples`	`int`	Number of posterior CM samples. Default is `20_000`.	`20000`
`prior`	`tuple[float, float]`	Beta(α, β) prior applied uniformly to φ, TPR, and TNR. Default is Laplace `(1.0, 1.0)`. Per-distribution priors are not currently supported; open a GitHub issue if you need them.	`(1.0, 1.0)`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`BinaryClassifier`	Instance with a fixed CM; :meth:`roc_curve` and :meth:`pr_curve` are not available.

`pr_curve(n_thresholds=50)`

Return an uncertainty-aware PR curve over a quantile-spaced threshold grid.

Parameters:

Name	Type	Description	Default
`n_thresholds`	`int`	Number of thresholds in the grid. Default is `50`.	`50`

Returns:

Type	Description
`PRResult`	PR curve with per-threshold posterior uncertainty ellipses.

`roc_curve(n_thresholds=50)`

Return an uncertainty-aware ROC curve over a quantile-spaced threshold grid.

Parameters:

Name	Type	Description	Default
`n_thresholds`	`int`	Number of thresholds in the grid. Default is `50`.	`50`

Returns:

Type	Description
`ROCResult`	ROC curve with per-threshold posterior uncertainty ellipses.

ThresholdResult

Returned by BinaryClassifier.at_threshold(). All metrics share the same posterior CM samples, preserving correlations.

`classifier_uncertainty._results.ThresholdResult`

Metric distributions at a fixed classification threshold.

All metrics share the same CM samples, preserving their correlations. Custom metrics receive CM entry proportions (θ values) as numpy arrays summing to ~1 per sample, so standard ratio metrics work unchanged.

`accuracy()`

Return the posterior distribution of accuracy: (TP + TN) / N.

`at_prevalence(phi, seed=None)`

Return a new ThresholdResult with prevalence replaced by phi.

Re-uses the TPR and TNR posterior samples from this result unchanged, replacing only the prevalence (φ). This implements the prevalence-exchange technique from Tötsch & Hoffmann (2020): because TPR and TNR are sampled independently of φ, swapping φ is exact.

Parameters:

Name	Type	Description	Default
`phi`	`float or tuple[float, float]`	New prevalence. A `float` fixes φ exactly (e.g. the known population rate); a `(α, β)` tuple draws φ from `Beta(α, β)` to encode uncertainty over the production prevalence (e.g. `(2, 398)` for φ ≈ 0.005 ± uncertainty).	required
`seed`	`int`	Random seed used when `phi` is a tuple. Ignored for float.	`None`

Returns:

Type	Description
`ThresholdResult`	New result sharing the same TPR/TNR posterior but with the specified φ.

Raises:

Type	Description
`ValueError`	If `phi` is a float outside the open interval `(0, 1)`.

`balanced_accuracy()`

Return the posterior distribution of balanced accuracy: (TPR + TNR) / 2.

`bookmaker_informedness()`

Return the posterior distribution of bookmaker informedness: TPR + TNR − 1.

`f1()`

Return the posterior distribution of F1: 2TP / (2TP + FP + FN).

`mcc()`

Return the posterior distribution of Matthews correlation coefficient.

`mean_expense(cost, loss)`

Return the posterior distribution of mean expense per observation.

Protective actions (TP and FP) each incur cost; missed events (FN) incur loss; correct negatives (TN) have no cost.

The formula is (TP + FP) * cost + FN * loss evaluated on CM entry proportions, which equals (hits + false_alarms) * cost + misses * loss divided by N.

Parameters:

Name	Type	Description	Default
`cost`	`float`	Cost of a protective action (incurred for both hits and false alarms).	required
`loss`	`float`	Loss incurred for a missed event (false negative).	required

Returns:

Type	Description
`MetricResult`	Posterior distribution of mean expense per observation.

`metric(func)`

Compute a custom metric from CM entry proportions.

Parameters:

Name	Type	Description	Default
`func`	`callable`	A function `f(tp, fn, tn, fp) -> array` where each argument is a numpy array of CM entry proportions (θ values summing to ~1 per sample). Standard ratio metrics require no rescaling.	required

Returns:

Type	Description
`MetricResult`	Posterior distribution of the custom metric.

`npv()`

Return the posterior distribution of NPV: TN / (TN + FN).

`precision()`

Return the posterior distribution of precision (PPV): TP / (TP + FP).

`relative_value(cost_loss_ratio)`

Return the Value Score distribution at a given cost/loss ratio (Wilks 2001).

Parameters:

Name	Type	Description	Default
`cost_loss_ratio`	`float`	C/L in the open interval `(0, 1)`. Cost of protective action divided by loss suffered when the event occurs without protection.	required

Returns:

Type	Description
`MetricResult`	Posterior distribution of the Value Score at the given C/L.

Raises:

Type	Description
`ValueError`	If `cost_loss_ratio` is not in `(0, 1)`.

`tnr()`

Return the posterior distribution of TNR: TN / (TN + FP).

`tpr()`

Return the posterior distribution of TPR: TP / (TP + FN).

`value_score_curve(n_cl=100)`

Return the Value Score curve across all cost/loss ratios (Wilks 2001).

Parameters:

Name	Type	Description	Default
`n_cl`	`int`	Number of C/L grid points in the open interval `(0, 1)`. Default is `100`.	`100`

Returns:

Type	Description
`ValueScoreCurve`	VS posterior distributions over the C/L grid.

MetricResult

Returned by every metric method. Wraps posterior samples and provides credible intervals and plotting.

`classifier_uncertainty._results.MetricResult`

Posterior distribution of a scalar classifier metric.

Attributes:

Name	Type	Description
`samples`	`ndarray`	Raw posterior samples of shape `(n_samples,)`.
`point_estimate`	`float`	Posterior mean.
`metric_uncertainty`	`float`	Length of the 95 % HPDI — the metric uncertainty (MU) of Tötsch & Hoffmann (2020).

`metric_uncertainty` `property`

Length of the 95 % HPDI — metric uncertainty (MU) of Tötsch & Hoffmann.

`point_estimate` `property`

Posterior mean.

`samples` `property`

Raw posterior samples of shape (n_samples,).

`credible_interval(level=0.95)`

Return the highest posterior density interval (HPDI).

Parameters:

Name	Type	Description	Default
`level`	`float`	Probability mass to enclose. Default is `0.95`.	`0.95`

Returns:

Type	Description
`tuple[float, float]`	`(lower, upper)` bounds of the HPDI.

`plot(ax=None, level=0.95, **kwargs)`

Plot a histogram of posterior samples with HPDI shading.

Parameters:

Name	Type	Description	Default
`ax`	`Axes`	Axes to draw on. Uses `plt.gca()` if `None`.	`None`
`level`	`float`	HPDI level to shade. Default is `0.95`.	`0.95`
`**kwargs`		Forwarded to `ax.hist`.	`{}`

Returns:

Type	Description
`Axes`	The axes with the plot.

ValueScoreCurve

Returned by ThresholdResult.value_score_curve().

`classifier_uncertainty._results.ValueScoreCurve`

Value Score as a function of cost/loss ratio, with posterior uncertainty.

Produced by :meth:ThresholdResult.value_score_curve. The VS curve (Wilks 2001) shows the relative economic value of a classifier as a function of the decision-maker's cost/loss ratio.

`plot(ax=None, level=0.95, color='C0', alpha=0.25)`

Plot the VS curve with a posterior credible band.

Parameters:

Name	Type	Description	Default
`ax`	`Axes`	Axes to draw on. Uses `plt.gca()` if `None`.	`None`
`level`	`float`	HPDI level for the shaded band. Default is `0.95`.	`0.95`
`color`	`str`	Line and fill colour. Default is `"C0"`.	`'C0'`
`alpha`	`float`	Fill opacity. Default is `0.25`.	`0.25`

Returns:

Type	Description
`Axes`	The axes with the plot.

ROCResult

Returned by BinaryClassifier.roc_curve().

`classifier_uncertainty._curves.ROCResult`

Uncertainty-aware ROC curve.

Produced by :meth:BinaryClassifier.roc_curve. Uncertainty is shown as a 95 % HPDI band computed by interpolating TPR samples onto a fixed FPR grid.

Attributes:

Name	Type	Description
`auc`	`MetricResult`	Posterior distribution of AUC-ROC, computed via per-sample trapezoid integration.

`auc` `property`

Posterior distribution of AUC-ROC via per-sample trapezoid integration.

`plot(ax=None, level=0.95, color='C0', alpha=0.3)`

Plot the ROC curve with a posterior HPDI band.

Parameters:

Name	Type	Description	Default
`ax`	`Axes`	Axes to draw on. Uses `plt.gca()` if `None`.	`None`
`level`	`float`	HPDI level for the shaded band. Default is `0.95`.	`0.95`
`color`	`str`	Curve and band colour. Default is `"C0"`.	`'C0'`
`alpha`	`float`	Band opacity. Default is `0.3`.	`0.3`

Returns:

Type	Description
`Axes`	The axes with the plot.

PRResult

Returned by BinaryClassifier.pr_curve().

`classifier_uncertainty._curves.PRResult`

Uncertainty-aware Precision-Recall curve.

Produced by :meth:BinaryClassifier.pr_curve. Uncertainty is shown as a 95 % HPDI band computed by interpolating Precision samples onto a fixed Recall grid.

Attributes:

Name	Type	Description
`auc`	`MetricResult`	Posterior distribution of AUC-PR (average precision), computed via per-sample trapezoid integration.

`auc` `property`

Posterior distribution of AUC-PR via per-sample trapezoid integration.

`plot(ax=None, level=0.95, color='C0', alpha=0.3)`

Plot the PR curve with a posterior HPDI band.

Parameters:

Name	Type	Description	Default
`ax`	`Axes`	Axes to draw on. Uses `plt.gca()` if `None`.	`None`
`level`	`float`	HPDI level for the shaded band. Default is `0.95`.	`0.95`
`color`	`str`	Curve and band colour. Default is `"C0"`.	`'C0'`
`alpha`	`float`	Band opacity. Default is `0.3`.	`0.3`

Returns:

Type	Description
`Axes`	The axes with the plot.

API Reference

BinaryClassifier

classifier_uncertainty._classifier.BinaryClassifier

at_threshold(threshold=0.5)

from_cm(tp, fn, tn, fp, n_samples=20000, prior=(1.0, 1.0), seed=None) classmethod

pr_curve(n_thresholds=50)

roc_curve(n_thresholds=50)

ThresholdResult

classifier_uncertainty._results.ThresholdResult

accuracy()

at_prevalence(phi, seed=None)

balanced_accuracy()

bookmaker_informedness()

f1()

mcc()

mean_expense(cost, loss)

metric(func)

npv()

precision()

relative_value(cost_loss_ratio)

tnr()

tpr()

value_score_curve(n_cl=100)

MetricResult

classifier_uncertainty._results.MetricResult

metric_uncertainty property

point_estimate property

samples property

credible_interval(level=0.95)

plot(ax=None, level=0.95, **kwargs)

ValueScoreCurve

classifier_uncertainty._results.ValueScoreCurve

plot(ax=None, level=0.95, color='C0', alpha=0.25)

ROCResult

classifier_uncertainty._curves.ROCResult

auc property

plot(ax=None, level=0.95, color='C0', alpha=0.3)

PRResult

classifier_uncertainty._curves.PRResult

auc property

plot(ax=None, level=0.95, color='C0', alpha=0.3)

`classifier_uncertainty._classifier.BinaryClassifier`

`at_threshold(threshold=0.5)`

`from_cm(tp, fn, tn, fp, n_samples=20000, prior=(1.0, 1.0), seed=None)` `classmethod`

`pr_curve(n_thresholds=50)`

`roc_curve(n_thresholds=50)`

`classifier_uncertainty._results.ThresholdResult`

`accuracy()`

`at_prevalence(phi, seed=None)`

`balanced_accuracy()`

`bookmaker_informedness()`

`f1()`

`mcc()`

`mean_expense(cost, loss)`

`metric(func)`

`npv()`

`precision()`

`relative_value(cost_loss_ratio)`

`tnr()`

`tpr()`

`value_score_curve(n_cl=100)`

`classifier_uncertainty._results.MetricResult`

`metric_uncertainty` `property`

`point_estimate` `property`

`samples` `property`

`credible_interval(level=0.95)`

`plot(ax=None, level=0.95, **kwargs)`

`classifier_uncertainty._results.ValueScoreCurve`

`plot(ax=None, level=0.95, color='C0', alpha=0.25)`

`classifier_uncertainty._curves.ROCResult`

`auc` `property`

`plot(ax=None, level=0.95, color='C0', alpha=0.3)`

`classifier_uncertainty._curves.PRResult`

`auc` `property`

`plot(ax=None, level=0.95, color='C0', alpha=0.3)`