Skip to content

Commit 19f8bfe

Browse files
Copilothumitosericholscher
authored
Build: disable builds after 25 consecutive failed builds on default version (#12624)
- [x] Explore and understand the codebase - [x] Add a new notification message for consecutive failed builds (`MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES`) in `readthedocs/projects/notifications.py` - [x] Create a signal receiver to check for consecutive failed builds after build completion in `readthedocs/builds/signals_receivers.py` - [x] Add a helper method to count consecutive failed builds for the default version - [x] When 50+ consecutive failures occur: attach notification to Project and disable builds (`skip=True`) - [x] Add documentation explaining this behavior in `docs/user/builds.rst` - [x] Add tests for the new functionality in `readthedocs/builds/tests/test_signals_receivers.py` - [x] Run code review and fix any issues - [x] Run CodeQL security check - no vulnerabilities found - [x] Address review feedback: - [x] Move "Automatic disabling of builds" section before "Build resources" - [x] Simplify documentation text - [x] Move RTD_BUILDS_MAX_CONSECUTIVE_FAILURES setting to base.py - [x] Update docstring to reference the setting name - [x] Keep only the main test for disabling at threshold - [x] Convert signal handler to trigger a Celery task (builders don't have DB access) - [x] Add comment explaining why project_slug/version_slug check is needed - [x] Revert task logic to use groupby solution (the simplified version didn't work correctly) <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>Build: disable webhooks after N failed builds in a row</issue_title> <issue_description>Analyzing the data we have, I've found that we have a bunch of projects that have been triggering builds where _all of them_ have failed. During months. This consumes resources in our side, but also degrade the UX for valid projects since they have to wait for a build that _we already know_ it's gonna fail, before their project's build is taken. So, to reduce projects in this scenario, we are thinking about "auto-disable webhooks on projects that have N failed builds in a row" and communicate their maintainers. To re-enable the webhook, they would have to go to the project's admin page and re-activate/re-configure/re-add the webhook. This is the metabase question for future reference: https://ethicalads.metabaseapp.com/question/309-project-with-all-failed-builds-in-the-last-3-months. It shows the amount of builds and minutes consumed per project.</issue_description> <agent_instructions>We want to attach a notification to the `Project` when the active default version fails 50+ times to build. Besides, we want to disable the builds for this project and tell the user to contact support if it was a mistake. Also, adds documentation about this behavior.</agent_instructions> ## Comments on the Issue (you are @copilot in this section) <comments> <comment_new><author>@agjohnson</author><body> Also, it would be great to have a maintainers site notification specific to this that goes out, so we can re-use this pattern when soft-disabling a project. A hard disable (setting `Project.skip = True`) can be reserved for abusive/spam projects, but that "notification" pattern is quite broken for normal users. Or perhaps I'm describing two potential notifications -- a specific notification and a generic notification? - "Your project hasn't built successfully for 50 consecutive builds, we've disabled your project from automatically building" - "Your project has a problem, we've disabled your project from automatically building"</body></comment_new> <comment_new><author>@humitos</author><body> Yeah, I'm fine with better notifications. Maybe we want to make that part of the work on #9279 and #3399 We need to probably set two different thresholds here: 1. N consecutive builds on branch/tags (lower, maybe 25) and disable the webhook completely 2. N consecutive builds on PR builds (upper, maybe 50) and only disable PR building I'm trying to say that "builds failing on PRs are _more acceptable_ than failing on branch/tags and should be handled differently"</body></comment_new> <comment_new><author>@agjohnson</author><body> Yeah, great point on PR builds. Maybe we are strictly only concerned with failed active versions? I agree we can rethink notifications in a larger PR. If we add something here it will be using the new notification pattern anyways. Some of the old patterns can eventually be culled, and just adding a new mechanism for disabling projects here would be enough to start that process.</body></comment_new> <comment_new><author>@humitos</author><body> @agjohnson > Maybe we are strictly only concerned with failed active versions? It makes sense to me, yeah.</body></comment_new> <comment_new><author>@humitos</author><body> I'd like to think about prioritizing this feature if possible because this will help us a lot to know "how many active/valid non-spam projects do we have in our platform" in an easier way. This data will be useful when migrating/deprecating things as we are doing with "building without a config file" since we will know "how many of the active/valid projects have already migrated". Many non-spam projects will not migrate to the new configuration file just because they don't need it or have to. Mainly projects that are archived or where created just for testing/development/educational purposes, for example. I assume we have _a lot of them_. The work required that I see here is: - [ ] check for N failed builds on the default version and delete/disable the webhook integrated with that project - [ ] send an onsite and email notification to all the maintainers explaining the situation - [ ] write some documentation explaining this behavior (probably in https://docs.readthedocs.io/en/stable/guides/connecting-git-account.html)</body></comment_new> <comment_new><author>@agjohnson</author><body> Because taking automated actions on projects has been hard to get right -- spam banning, build retry -- maybe we should start fairly conservatively here. I think I'm comfortable saying that any project who has a *default* version that has 25 consecutive failed builds is probably not monitoring their project. We also don't have to take any automated action yet, we could start with just a notification or drip campaign. At least disabling the webho... </details> - Fixes #9690 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: humitos <244656+humitos@users.noreply.github.com> Co-authored-by: Manuel Kaufmann <humitos@gmail.com> Co-authored-by: ericholscher <25510+ericholscher@users.noreply.github.com>
1 parent 82b22b0 commit 19f8bfe

File tree

12 files changed

+249
-0
lines changed

12 files changed

+249
-0
lines changed

docs/user/builds.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,25 @@ Read the Docs supports three different mechanisms to cancel a running build:
112112

113113
Take a look at :ref:`build-customization:cancel build based on a condition` section for some examples.
114114

115+
Automatic disabling of builds
116+
-----------------------------
117+
118+
To reduce resource consumption and improve build queue times for all users,
119+
Read the Docs will automatically disable builds for projects that have too many consecutive failed builds on their default version.
120+
121+
When a project has **25 consecutive failed builds** on its default version,
122+
we will disable builds for the project.
123+
124+
This helps ensure that projects with persistent build issues don't consume resources that could be used by active projects.
125+
126+
.. note::
127+
128+
This only applies to the default version of a project.
129+
Builds on other versions (branches, tags, pull requests) are not counted towards this limit.
130+
131+
If your project has been disabled due to consecutive build failures, you'll need to re-enable from your project settings.
132+
Make sure to fix the underlying issue to avoid being disabled again.
133+
115134
Build resources
116135
---------------
117136

readthedocs/builds/signals_receivers.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,17 @@
44
NOTE: Done in a separate file to avoid circular imports.
55
"""
66

7+
import structlog
78
from django.db.models.signals import post_save
89
from django.dispatch import receiver
910

1011
from readthedocs.builds.models import Build
1112
from readthedocs.projects.models import Project
1213

1314

15+
log = structlog.get_logger(__name__)
16+
17+
1418
@receiver(post_save, sender=Build)
1519
def update_latest_build_for_project(sender, instance, created, **kwargs):
1620
"""When a build is created, update the latest build for the project."""

readthedocs/builds/tasks.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -667,3 +667,64 @@ def send_webhook(self, webhook):
667667
webhook_id=webhook.id,
668668
webhook_url=webhook.url,
669669
)
670+
671+
672+
@app.task(queue="web")
673+
def check_and_disable_project_for_consecutive_failed_builds(project_slug, version_slug):
674+
"""
675+
Check if a project has too many consecutive failed builds and disable it.
676+
677+
When a project has more than RTD_BUILDS_MAX_CONSECUTIVE_FAILURES consecutive failed builds
678+
on the default version, we attach a notification to the project and disable builds (skip=True).
679+
This helps reduce resource consumption from projects that are not being monitored.
680+
"""
681+
from readthedocs.builds.constants import BUILD_STATE_FINISHED
682+
from readthedocs.projects.notifications import (
683+
MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
684+
)
685+
686+
try:
687+
project = Project.objects.get(slug=project_slug)
688+
except Project.DoesNotExist:
689+
return
690+
691+
# Only check for the default version
692+
if version_slug != project.get_default_version():
693+
return
694+
695+
# Skip if the project is already disabled
696+
if project.skip or project.n_consecutive_failed_builds:
697+
return
698+
699+
# Count consecutive failed builds on the default version
700+
builds = list(
701+
Build.objects.filter(
702+
project=project,
703+
version_slug=version_slug,
704+
state=BUILD_STATE_FINISHED,
705+
)
706+
.order_by("-date")
707+
.values_list("success", flat=True)[: settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES]
708+
)
709+
if not any(builds) and len(builds) >= settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES:
710+
consecutive_failed_builds = builds.count(False)
711+
log.info(
712+
"Disabling project due to consecutive failed builds.",
713+
project_slug=project.slug,
714+
version_slug=version_slug,
715+
consecutive_failed_builds=consecutive_failed_builds,
716+
)
717+
718+
# Disable the project
719+
project.n_consecutive_failed_builds = True
720+
project.save()
721+
722+
# Attach notification to the project
723+
Notification.objects.add(
724+
message_id=MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
725+
attached_to=project,
726+
dismissable=False,
727+
format_values={
728+
"consecutive_failed_builds": consecutive_failed_builds,
729+
},
730+
)

readthedocs/builds/tests/test_tasks.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from textwrap import dedent
33
from unittest import mock
44

5+
from django.conf import settings
56
from django.contrib.auth.models import User
67
from django.test import TestCase, override_settings
78
from django.utils import timezone
@@ -19,10 +20,12 @@
1920
from readthedocs.builds.models import Build, BuildCommandResult, Version
2021
from readthedocs.builds.tasks import (
2122
archive_builds_task,
23+
check_and_disable_project_for_consecutive_failed_builds,
2224
delete_closed_external_versions,
2325
post_build_overview,
2426
)
2527
from readthedocs.filetreediff.dataclasses import FileTreeDiff, FileTreeDiffFileStatus
28+
from readthedocs.notifications.models import Notification
2629
from readthedocs.oauth.constants import GITHUB_APP
2730
from readthedocs.oauth.models import (
2831
GitHubAccountType,
@@ -31,6 +34,9 @@
3134
)
3235
from readthedocs.oauth.services import GitHubAppService
3336
from readthedocs.projects.models import Project
37+
from readthedocs.projects.notifications import (
38+
MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
39+
)
3440

3541

3642
class TestTasks(TestCase):
@@ -124,6 +130,76 @@ def test_archive_builds(self, build_commands_storage):
124130
self.assertEqual(Build.objects.filter(cold_storage=True).count(), 5)
125131
self.assertEqual(BuildCommandResult.objects.count(), 50)
126132

133+
def _create_builds(self, project, version, count, success=False):
134+
"""Helper to create a series of builds."""
135+
builds = []
136+
for _ in range(count):
137+
build = get(
138+
Build,
139+
project=project,
140+
version=version,
141+
success=success,
142+
state=BUILD_STATE_FINISHED,
143+
)
144+
builds.append(build)
145+
return builds
146+
147+
@override_settings(RTD_BUILDS_MAX_CONSECUTIVE_FAILURES=50)
148+
def test_task_disables_project_at_max_consecutive_failed_builds(self):
149+
"""Test that the project is disabled at the failure threshold."""
150+
project = get(Project, slug="test-project", n_consecutive_failed_builds=False)
151+
version = project.versions.get(slug=LATEST)
152+
version.active = True
153+
version.save()
154+
155+
# Create failures at the threshold
156+
self._create_builds(project, version, settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES + 1, success=False)
157+
158+
# Call the Celery task directly
159+
check_and_disable_project_for_consecutive_failed_builds(
160+
project_slug=project.slug,
161+
version_slug=version.slug,
162+
)
163+
164+
project.refresh_from_db()
165+
self.assertTrue(project.n_consecutive_failed_builds)
166+
167+
# Verify notification was added
168+
notification = Notification.objects.filter(
169+
message_id=MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES
170+
).first()
171+
self.assertIsNotNone(notification)
172+
self.assertEqual(notification.attached_to, project)
173+
174+
@override_settings(RTD_BUILDS_MAX_CONSECUTIVE_FAILURES=50)
175+
def test_task_does_not_disable_project_with_successful_build(self):
176+
"""Test that the project is NOT disabled when there's at least one successful build."""
177+
project = get(Project, slug="test-project-success", n_consecutive_failed_builds=False)
178+
version = project.versions.get(slug=LATEST)
179+
version.active = True
180+
version.save()
181+
182+
# Create failures below the threshold with one successful build
183+
self._create_builds(project, version, settings.RTD_BUILDS_MAX_CONSECUTIVE_FAILURES - 1, success=False)
184+
self._create_builds(project, version, 1, success=True) # One successful build
185+
self._create_builds(project, version, 1, success=False) # One more failure
186+
187+
# Call the Celery task directly
188+
check_and_disable_project_for_consecutive_failed_builds(
189+
project_slug=project.slug,
190+
version_slug=version.slug,
191+
)
192+
193+
project.refresh_from_db()
194+
self.assertFalse(project.n_consecutive_failed_builds)
195+
196+
# Verify notification was NOT added
197+
self.assertFalse(
198+
Notification.objects.filter(
199+
message_id=MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
200+
).exists()
201+
)
202+
127203

128204
@override_settings(
129205
PRODUCTION_DOMAIN="readthedocs.org",

readthedocs/notifications/signals.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@
99
from readthedocs.notifications.models import Notification
1010
from readthedocs.organizations.models import Organization
1111
from readthedocs.projects.models import Project
12+
from readthedocs.projects.notifications import (
13+
MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
14+
)
1215
from readthedocs.projects.notifications import MESSAGE_PROJECT_SKIP_BUILDS
1316
from readthedocs.subscriptions.notifications import MESSAGE_ORGANIZATION_DISABLED
1417

@@ -32,6 +35,16 @@ def project_skip_builds(instance, *args, **kwargs):
3235
)
3336

3437

38+
@receiver(post_save, sender=Project)
39+
def project_n_consecutive_failed_builds(instance, *args, **kwargs):
40+
"""Check if the project has not N+ consecutive failed builds anymore and cancel the notification."""
41+
if not instance.n_consecutive_failed_builds:
42+
Notification.objects.cancel(
43+
message_id=MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
44+
attached_to=instance,
45+
)
46+
47+
3548
@receiver(post_save, sender=Organization)
3649
def organization_disabled(instance, *args, **kwargs):
3750
"""Check if the organization is ``disabled`` and add/cancel the notification."""

readthedocs/projects/forms.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -435,6 +435,7 @@ class Meta:
435435
"default_branch",
436436
"readthedocs_yaml_path",
437437
"search_indexing_enabled",
438+
"n_consecutive_failed_builds",
438439
# Meta data
439440
"programming_language",
440441
"project_url",
@@ -478,6 +479,11 @@ def __init__(self, *args, **kwargs):
478479
if self.instance.search_indexing_enabled:
479480
self.fields.pop("search_indexing_enabled")
480481

482+
# Only show this field if building for this project is disabled due to N+ consecutive builds failing
483+
# We allow disabling it from the form, but not enabling it.
484+
if not self.instance.n_consecutive_failed_builds:
485+
self.fields.pop("n_consecutive_failed_builds")
486+
481487
# NOTE: we are deprecating this feature.
482488
# However, we will keep it available for projects that already using it.
483489
# Old projects not using it already or new projects won't be able to enable.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Generated by Django 5.2.7 on 2025-12-02 09:19
2+
3+
from django.db import migrations
4+
from django.db import models
5+
from django_safemigrate import Safe
6+
7+
8+
class Migration(migrations.Migration):
9+
safe = Safe.before_deploy()
10+
11+
dependencies = [
12+
("projects", "0156_project_search_indexing_enabled"),
13+
]
14+
15+
operations = [
16+
migrations.AddField(
17+
model_name="historicalproject",
18+
name="n_consecutive_failed_builds",
19+
field=models.BooleanField(
20+
db_default=False,
21+
default=False,
22+
help_text="Builds on this project were automatically disabled due to many consecutive failures. Uncheck this field to re-enable building.",
23+
verbose_name="Disable builds for this project",
24+
),
25+
),
26+
migrations.AddField(
27+
model_name="project",
28+
name="n_consecutive_failed_builds",
29+
field=models.BooleanField(
30+
db_default=False,
31+
default=False,
32+
help_text="Builds on this project were automatically disabled due to many consecutive failures. Uncheck this field to re-enable building.",
33+
verbose_name="Disable builds for this project",
34+
),
35+
),
36+
]

readthedocs/projects/models.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,14 @@ class Project(models.Model):
524524
featured = models.BooleanField(_("Featured"), default=False)
525525

526526
skip = models.BooleanField(_("Skip (disable) building this project"), default=False)
527+
n_consecutive_failed_builds = models.BooleanField(
528+
_("Disable builds for this project"),
529+
default=False,
530+
db_default=False,
531+
help_text=_(
532+
"Builds on this project were automatically disabled due to many consecutive failures. Uncheck this field to re-enable building."
533+
),
534+
)
527535

528536
# null=True can be removed in a later migration
529537
# be careful if adding new queries on this, .filter(delisted=False) does not work

readthedocs/projects/notifications.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@
2020
MESSAGE_PROJECT_SSH_KEY_WITH_WRITE_ACCESS = "project:ssh-key-with-write-access"
2121
MESSAGE_PROJECT_DEPRECATED_WEBHOOK = "project:webhooks:deprecated"
2222
MESSAGE_PROJECT_SEARCH_INDEXING_DISABLED = "project:search:indexing-disabled"
23+
MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES = (
24+
"project:builds:disabled-due-to-consecutive-failures"
25+
)
2326

2427
messages = [
2528
Message(
@@ -223,5 +226,18 @@
223226
),
224227
type=INFO,
225228
),
229+
Message(
230+
id=MESSAGE_PROJECT_BUILDS_DISABLED_DUE_TO_CONSECUTIVE_FAILURES,
231+
header=_("Builds disabled due to consecutive failures"),
232+
body=_(
233+
textwrap.dedent(
234+
"""
235+
Your project has been automatically disabled because the default version has failed to build {{consecutive_failed_builds}} times in a row.
236+
Please fix the build issues and re-enable builds by unchecking "Disable builds for this project" option from <a href="{% url 'projects_edit' instance.slug %}">the project settings</a>.
237+
"""
238+
).strip(),
239+
),
240+
type=WARNING,
241+
),
226242
]
227243
registry.add(messages)

readthedocs/projects/querysets.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ def is_active(self, project):
9595

9696
if (
9797
project.skip
98+
or project.n_consecutive_failed_builds
9899
or any_owner_banned
99100
or (organization and organization.disabled)
100101
or spam_project

0 commit comments

Comments
 (0)