From d5e576592f686394ca32858e24e907cfac6eed11 Mon Sep 17 00:00:00 2001 From: Christian Lorentzen Date: Sun, 7 Dec 2025 11:56:21 +0100 Subject: [PATCH 1/2] SLEP 25 Killing Accuracy --- index.rst | 1 + slep025/proposal.rst | 105 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+) create mode 100644 slep025/proposal.rst diff --git a/index.rst b/index.rst index 9848922..75481f7 100644 --- a/index.rst +++ b/index.rst @@ -24,6 +24,7 @@ slep017/proposal slep019/proposal + slep025/proposal .. toctree:: :maxdepth: 1 diff --git a/slep025/proposal.rst b/slep025/proposal.rst new file mode 100644 index 0000000..668384e --- /dev/null +++ b/slep025/proposal.rst @@ -0,0 +1,105 @@ +.. _slep_025: + +========================================= +SLEP025: Killing Accuracy in Scikit-Learn +========================================= + +:Author: Christian Lorentzen +:Status: Draft +:Type: Standards Track +:Created: 2025-12-07 +:Resolution: TODO (required for Accepted | Rejected | Withdrawn) + +Abstract +-------- + +This SLEP proposes to rectify the default ``score`` method. Currently, the ease of +``classifier.score(X, y)`` favors the use of *accuracy*, which has many well known +deficiencies. This SLEP changes the default scoring method. + +Motivation +---------- + +As it stands, *accuracy* is the most used metric for classifiers in scikit-learn. This +is manifest in `classifier.score(..)` which applies accuracy. While the original goal +might have been to provide a score method that works for all classifiers, the actual +implication was the blind usage, without critical thinking, of the accuracy score. +This has mislead many researchers and users because accuracy is well known for its +severe deficiencies: To the point, it is not a *strictly proper scoring rule* and +scikit-learn's implementation hard-coded a probability threshold of 50% into it. + +This situation calls for a correction. Ideally, scikit-learn provides good defaults +or fosters a conscious decision by users, e.g. by forcing engagement with the subject, +see [2]_ subsection "Which scoring function should I use?". + +Solution +-------- + +The solution is a multi-step approach: + +1. Introduce the new keyword ``scoring`` to the ``score`` method. The default for + classifiers is ``scoring="accuracy"``, for regressors ``scoring="r2"``. +2. Deprecate the default ``"accuracy"``. +3. Set a new default. + +There are three questions with this approach: + +a. The time frame of the deprecation period. Should it be longer than the usual 2 minor + releases? Should step 1 and 2 happen in the same minor release? +b. What is the new default scoring parameter in step 3? Possibilities are + - D2 Brier score, which is basically the same as R2 for regressors. + - The objective function of the estimator, i.e. the penalized log loss for + ``LogisticRegression``. + + The fact that different scoring metrics focus on different things, i.e. ``predict`` + vs. ``predict_proba``, and not all classifiers provide ``predict_proba`` complicates + a unified choice. + +Backward compatibility +---------------------- + +The outlined solution would be feasible within the usual deprecation strategy of +scikit-learn releases. + +Alternatives +------------ + +An alternative is to remove the ``score`` method altogether. Scoring metrics are well +available in scikit-learn, see ``sklearn.metric`` module and [2]_. The advantages of +removing ``score`` are: + +- An active choice by the user is triggered as there is no more default. +- Defaults for ``score`` are tricky anyway. Different estimators estimate different + things and the output of their ``score`` method most likely is not comparable, e.g. + consider a hinge loss based SVM vs. log loss based logistic regression. + +Disadvantages: + +- Disruption of the API. +- More imports required and a bit longer code as compared to just + ``my_estimator.score(X, y)``. + +Discussion +---------- + +The following issues contain discussions on this subject: + +- https://github.com/scikit-learn/scikit-learn/issues/28995 + + +References and Footnotes +------------------------ + +.. [1] Each SLEP must either be explicitly labeled as placed in the public + domain (see this SLEP as an example) or licensed under the `Open + Publication License`_. + +.. _Open Publication License: https://www.opencontent.org/openpub/ + +.. [2] Scikit-Learn User Guide on "Metrics and Scoring" + https://scikit-learn.org/stable/modules/model_evaluation.html + +Copyright +--------- + +This document has been placed in the public domain. [1]_ From 30e5d52df998cd69823a5ade7682211328e720da Mon Sep 17 00:00:00 2001 From: Christian Lorentzen Date: Sun, 7 Dec 2025 18:26:04 +0100 Subject: [PATCH 2/2] Change title --- slep025/proposal.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/slep025/proposal.rst b/slep025/proposal.rst index 668384e..5b84179 100644 --- a/slep025/proposal.rst +++ b/slep025/proposal.rst @@ -1,7 +1,7 @@ .. _slep_025: ========================================= -SLEP025: Killing Accuracy in Scikit-Learn +SLEP025: Losing Accuracy in Scikit-Learn ========================================= :Author: Christian Lorentzen