SOLR-18025 Attempt to fix graceful shutdown of LeaderTragicEventTest #3965

janhoy · 2025-12-18T08:56:52Z

Analysis:

Root Cause

LeaderTragicEventTest fails during class-level shutdown when Jetty's Server.doStop() exceeds its internal timeout and throws ExecutionException(TimeoutException). After tragic events corrupt cores, shutdown naturally takes longer and can timeout - this is expected behavior, not a test failure. See develocity logs here.

Fix

Added shutdownTimeoutIsError configuration to MiniSolrCloudCluster:

Default: true - normal tests fail on unexpected timeouts
LeaderTragicEventTest: false - accepts timeouts as expected outcome

Implementation:

Added 60s shutdown timeout to the shutdown process (2x Jetty's internal timeout)
checkForExceptions() treats ExecutionException(TimeoutException) as warning when shutdownTimeoutIsError=false

https://issues.apache.org/jira/browse/SOLR-18025

dsmiley

Was an analysis done as to the state of the various threads when the test timed out (I'm assuming test timeout was the ultimate symptom)? Hopefully it would show a clue as to a thread busy or waiting that is preventing the node it lives on from shutting down.

A few weeks ago, I noticed another test (ugh, I forget which) reliably taking a long time to shut down (I forget if it led to a failure or not) and partially root caused it in this way. I have a shelved change to ZkContainer.close() to call shutdownNowAndAwaitTermination (with the "Now" in there, which wasn't there before). I noticed a test trying to shut down had cores that were stuck registering in ZK for some reason. I suppose that's unrelated to the failure here but without seeing the threads -- who knows.

janhoy · 2025-12-18T21:39:50Z

Was an analysis done as to the state of the various threads when the test timed out

No, I have not dived into the cause of hung nodes. I appreciate that all these failures may be a symptom of a real bug that prevents Solr from gracefully shutting down and giving up control / releasing zk.

I'll mark this as draft, and give some more time to fix the root instead of the symptom then...

Attempt to fix shutdown of LeaderTragicEventTest

87b238a

github-actions bot added test-framework cat:cloud labels Dec 18, 2025

janhoy requested review from Copilot and risdenk December 18, 2025 08:57

Copilot started reviewing on behalf of janhoy December 18, 2025 09:02 View session

This comment was marked as outdated.

Sign in to view

Review feedback

51c8849

github-actions bot added dependencies Dependency upgrades tool:build labels Dec 18, 2025

Make MiniSolrCloudCluster shutdown behavior configurable

0b179a1

github-actions bot added the tests label Dec 18, 2025

janhoy added 2 commits December 18, 2025 12:53

Cleanup non-stopped jettys

5f2906c

Do not touch libs.versions.toml in this PR

7c22b04

github-actions bot removed dependencies Dependency upgrades tool:build labels Dec 18, 2025

janhoy requested a review from dsmiley December 18, 2025 12:08

janhoy added 3 commits December 18, 2025 13:20

Use same error msg

c6b27df

Add check for timeoutException

b7d6f13

Remove configurability of timeout as it was not needed

6248c7a

dsmiley reviewed Dec 18, 2025

View reviewed changes

janhoy marked this pull request as draft December 18, 2025 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SOLR-18025 Attempt to fix graceful shutdown of LeaderTragicEventTest #3965

SOLR-18025 Attempt to fix graceful shutdown of LeaderTragicEventTest #3965

Uh oh!

janhoy commented Dec 18, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

dsmiley left a comment

Uh oh!

janhoy commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SOLR-18025 Attempt to fix graceful shutdown of LeaderTragicEventTest #3965

Are you sure you want to change the base?

SOLR-18025 Attempt to fix graceful shutdown of LeaderTragicEventTest #3965

Uh oh!

Conversation

janhoy commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Fix

Uh oh!

This comment was marked as outdated.

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

janhoy commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janhoy commented Dec 18, 2025 •

edited

Loading