fix: remove unnecessary select! macro #6374

hanabi1224 · 2025-12-19T13:56:19Z

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Bug Fixes
- Standardized file download retry timeout to 30 seconds and added a 1s delay between retries.
- Increased retry allowance for state migration tests to reduce flaky failures.
- Improved consistency of error handling for network-dependent downloads.
Refactor
- Simplified asynchronous retry and timeout control flow for clearer, more robust behavior.
Chores
- Updated test runner profile to apply consistent slow-timeout and retry settings for networked tests.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-19T13:56:26Z

Walkthrough

Refactors async control flow and standardizes retry logic across multiple modules: removes use of the select! macro in daemon code, converts select-based timeout racing to tokio::time::timeout, and wraps downloads with a consistent retry configuration (30s timeout, retries, 1s delay), with tests adjusted.

Changes

Cohort / File(s)	Summary
Retry mechanism standardization `src/state_manager/utils.rs`, `src/tool/subcommands/api_cmd/test_snapshot.rs`, `src/state_migration/tests/mod.rs`	Adds retry wrapper for downloading state/snapshot files using `RetryArgs` (timeout 30s, delay 1s, max_retries 5 in code; test increases max_retries to 15). Changes retry closures to return the download future directly and await the retry result. Imports `Duration` where needed.
Async control flow refactor (daemon) `src/daemon/mod.rs`	Removes `select!` import and replaces a select-based `propagate_error` loop with an explicit `join_next().await` loop that awaits service results and returns the first observed `Err`.
Timeout / futures simplification `src/utils/mod.rs`	Removes heavy futures utilities (`FutureExt`, `FusedFuture`, `select`, etc.), replaces fused select-based timeout racing with conditional use of `tokio::time::timeout` around a single retry task (or direct await if no timeout). Updates tests/imports impacted by the change.
Config for test timeouts `.config/nextest.toml`	Adds a `profile.default.overrides` entry to extend slow-timeout/retries to tests matching `state_compute_` (mirrors existing rpc_snapshot_ override).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review src/utils/mod.rs carefully to ensure the new tokio::time::timeout usage preserves prior behavior and edge-case semantics.
Verify retry parameter propagation and that tests intentionally increase retries to 15 in src/state_migration/tests/mod.rs.
Confirm src/daemon/mod.rs error-propagation behavior remains equivalent after replacing select! with join_next().await.

Possibly related PRs

test: unit test for state compute #6355 — Modifies the same get_state_compute_snapshot logic in src/state_manager/utils.rs (retry-related changes).
test: refactor rpc_regression_tests into individual tests #5930 — Touches src/tool/subcommands/api_cmd/test_snapshot.rs and related snapshot download/retry logic.

Suggested reviewers

LesnyRumcajs
sudo-shashank
akaladarshi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: remove unnecessary select! macro' directly aligns with the core changes across multiple files (daemon/mod.rs, utils/mod.rs) where the select! macro is removed and replaced with explicit async loops and simpler control flow.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/remove-unnecessary-select

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e1d214 and bf4e33e.

📒 Files selected for processing (6)

.config/nextest.toml (1 hunks)
src/daemon/mod.rs (2 hunks)
src/state_manager/utils.rs (2 hunks)
src/state_migration/tests/mod.rs (1 hunks)
src/tool/subcommands/api_cmd/test_snapshot.rs (1 hunks)
src/utils/mod.rs (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/state_migration/tests/mod.rs
src/tool/subcommands/api_cmd/test_snapshot.rs
src/state_manager/utils.rs

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.

📚 Learning: 2025-08-08T12:10:45.218Z

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:553-558
Timestamp: 2025-08-08T12:10:45.218Z
Learning: Forest project targets Rust stable >=1.89; features stabilized in 1.88 like let-chains are acceptable in this codebase.

Applied to files:

src/daemon/mod.rs

📚 Learning: 2025-10-17T09:36:15.757Z

Learnt from: elmattic
Repo: ChainSafe/forest PR: 6128
File: src/ipld/util.rs:23-30
Timestamp: 2025-10-17T09:36:15.757Z
Learning: Always run `cargo check` or `cargo build` to verify actual compilation errors in the Forest codebase before flagging them as issues. Do not rely solely on documentation or assumptions about trait implementations.

Applied to files:

src/daemon/mod.rs

📚 Learning: 2025-08-28T12:52:46.927Z

Learnt from: LesnyRumcajs
Repo: ChainSafe/forest PR: 6011
File: src/cli/main.rs:18-25
Timestamp: 2025-08-28T12:52:46.927Z
Learning: In Forest CLI (src/cli/main.rs), the early RPC network check before Cli::parse_from() does not block help/version commands because clap processes these internally before reaching the RPC call. LesnyRumcajs confirmed this implementation works correctly and that RPC call failures are acceptable in this context.

Applied to files:

src/daemon/mod.rs

🧬 Code graph analysis (1)

src/daemon/mod.rs (2)

src/state_manager/cache.rs (4)

futures (373-376)

futures (417-420)

futures (462-465)

futures (510-513)

src/rpc/error.rs (1)

error_message (20-22)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: cargo-publish-dry-run
GitHub Check: Coverage
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks
GitHub Check: tests-release

🔇 Additional comments (6)

src/utils/mod.rs (3)

23-23: LGTM - Cleaner imports.

The import simplification reflects the removal of manual future manipulation with select!.

125-145: LGTM - Excellent refactoring.

Replacing the select!-based timeout racing with tokio::time::timeout significantly simplifies the code while preserving the exact same behavior. The conditional timeout application when args.timeout is Some is clean and idiomatic.

The cancel-safety requirement documented on line 114 remains valid with tokio::time::timeout.

183-183: LGTM - Test import adjustment.

The explicit import of futures::future::pending is necessary since the broader futures imports were removed.

.config/nextest.toml (1)

44-51: LGTM - Consistent test retry configuration.

The new override for state_compute_ tests mirrors the rpc_snapshot_test_ configuration, which is appropriate since both download snapshots from the network and can experience transient failures. The exponential backoff with jitter aligns with best practices to avoid thundering herd issues.

src/daemon/mod.rs (2)

39-39: LGTM - Updated imports for refactored control flow.

The FutureExt import is retained for the .then() call in the asyncify function (line 757), while select is no longer needed after the refactoring.

741-747: LGTM - Idiomatic refactoring.

Replacing select! with JoinSet::join_next() is the standard, idiomatic way to wait for any task in a set to complete. The logic correctly returns the first error encountered and falls back to pending() if all services complete without errors (which should never happen in practice since services are expected to run indefinitely).

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/state_migration/tests/mod.rs (1)

84-89: Potential timeout/retry mismatch.

The timeout of 5 seconds is the overall timeout for the entire retry operation (as per the refactored retry function), not a per-attempt timeout. With 15 retries, most won't execute before the 5-second overall timeout expires. The per-request timeout on line 94 (global_http_client().get(...).timeout(timeout)) also uses the same 5-second value.

Consider whether the intent is:

Overall 5s timeout with up to 15 quick retries (current behavior), or

A longer overall timeout (e.g., 30s or more) to allow multiple retries to complete

Based on learnings, hanabi1224 prefers fail-fast with reasonable timeouts. If long downloads are expected, consider increasing the overall timeout proportionally to the retry count.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ea21a4 and 9e1d214.

📒 Files selected for processing (5)

src/daemon/mod.rs (2 hunks)
src/state_manager/utils.rs (2 hunks)
src/state_migration/tests/mod.rs (1 hunks)
src/tool/subcommands/api_cmd/test_snapshot.rs (1 hunks)
src/utils/mod.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.

📚 Learning: 2025-08-08T12:10:45.218Z

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:553-558
Timestamp: 2025-08-08T12:10:45.218Z
Learning: Forest project targets Rust stable >=1.89; features stabilized in 1.88 like let-chains are acceptable in this codebase.

Applied to files:

src/daemon/mod.rs

📚 Learning: 2025-10-17T09:36:15.757Z

Learnt from: elmattic
Repo: ChainSafe/forest PR: 6128
File: src/ipld/util.rs:23-30
Timestamp: 2025-10-17T09:36:15.757Z
Learning: Always run `cargo check` or `cargo build` to verify actual compilation errors in the Forest codebase before flagging them as issues. Do not rely solely on documentation or assumptions about trait implementations.

Applied to files:

src/daemon/mod.rs

📚 Learning: 2025-08-28T12:52:46.927Z

Learnt from: LesnyRumcajs
Repo: ChainSafe/forest PR: 6011
File: src/cli/main.rs:18-25
Timestamp: 2025-08-28T12:52:46.927Z
Learning: In Forest CLI (src/cli/main.rs), the early RPC network check before Cli::parse_from() does not block help/version commands because clap processes these internally before reaching the RPC call. LesnyRumcajs confirmed this implementation works correctly and that RPC call failures are acceptable in this context.

Applied to files:

src/daemon/mod.rs

📚 Learning: 2025-08-25T13:35:24.230Z

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5969
File: src/tool/subcommands/snapshot_cmd.rs:412-412
Timestamp: 2025-08-25T13:35:24.230Z
Learning: In src/tool/subcommands/snapshot_cmd.rs, the +1 in `last_epoch = ts.epoch() - epochs as i64 + 1` fixes an off-by-1 bug where specifying --check-stateroots=N would validate N+1 epochs instead of N epochs, causing out-of-bounds errors when the snapshot contains only N recent state roots.

Applied to files:

src/tool/subcommands/api_cmd/test_snapshot.rs

📚 Learning: 2025-08-25T14:17:09.129Z

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5978
File: .github/workflows/unit-tests.yml:0-0
Timestamp: 2025-08-25T14:17:09.129Z
Learning: hanabi1224's download_file_with_cache function in src/utils/net/download_file.rs preserves URL path structure in local cache directories by using cache_dir.join(url.path().strip_prefix('/').unwrap_or_else(|| url.path())), so snapshots from https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/ are cached locally at ~/.cache/forest/test/rpc-snapshots/rpc_test/ (including the rpc_test subdirectory from the URL path).

Applied to files:

src/tool/subcommands/api_cmd/test_snapshot.rs
src/state_manager/utils.rs

📚 Learning: 2025-08-25T14:17:09.129Z

Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5978
File: .github/workflows/unit-tests.yml:0-0
Timestamp: 2025-08-25T14:17:09.129Z
Learning: hanabi1224's download_file_with_cache function preserves URL path structure in local cache directories, so snapshots from https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/ are cached locally at ~/.cache/forest/test/rpc-snapshots/rpc_test (including the rpc_test subdirectory from the URL path).

Applied to files:

src/tool/subcommands/api_cmd/test_snapshot.rs
src/state_manager/utils.rs

🧬 Code graph analysis (2)

src/tool/subcommands/api_cmd/test_snapshot.rs (1)

src/utils/net/download_file.rs (1)

download_file_with_cache (30-89)

src/state_manager/utils.rs (2)

src/utils/mod.rs (1)

retry (116-145)

src/utils/net/download_file.rs (1)

download_file_with_cache (30-89)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Coverage
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu

🔇 Additional comments (4)

src/utils/mod.rs (1)

125-144: Clean refactoring of the retry logic.

The new implementation is simpler and more readable. The timeout now correctly wraps the entire retry operation (including all retries and delays), which is the expected behavior. The removal of select! macro and associated futures machinery is a good simplification.

src/state_manager/utils.rs (1)

219-234: Good addition of retry logic with consistent parameters.

The retry configuration (30s timeout, 5 retries, 1s delay) aligns well with the pattern used elsewhere in the codebase. The closure correctly returns the future directly without awaiting, which is the expected pattern for the refactored retry function.

src/daemon/mod.rs (1)

741-747: Clean replacement of select! with explicit join_next loop.

The refactored propagate_error correctly iterates through completed services and returns the first error. The pending().await ensures the function never returns if all services complete successfully, matching the documented behavior.

Note: Task panics (Err from join_next) are silently ignored. This appears to be pre-existing behavior, but you may want to consider logging or handling panics in a follow-up.

src/tool/subcommands/api_cmd/test_snapshot.rs (1)

218-228: Consistent retry configuration with other download operations.

The updated parameters (30s timeout, 5 retries, 1s delay) align with the pattern established in src/state_manager/utils.rs. The closure correctly returns the future directly, matching the expected signature for the refactored retry function.

codecov · 2025-12-19T14:28:24Z

Codecov Report

❌ Patch coverage is 75.86207% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.16%. Comparing base (4ea21a4) to head (bf4e33e).

Files with missing lines	Patch %	Lines
src/daemon/mod.rs	0.00%	3 Missing ⚠️
src/utils/mod.rs	70.00%	1 Missing and 2 partials ⚠️
src/state_manager/utils.rs	92.30%	0 Missing and 1 partial ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/tool/subcommands/api_cmd/test_snapshot.rs	`85.53% <100.00%> (+0.44%)`	⬆️
src/state_manager/utils.rs	`77.03% <92.30%> (+0.75%)`	⬆️
src/daemon/mod.rs	`28.46% <0.00%> (+0.10%)`	⬆️
src/utils/mod.rs	`80.53% <70.00%> (-2.66%)`	⬇️

... and 9 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ea21a4...bf4e33e. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hanabi1224 marked this pull request as ready for review December 19, 2025 13:56

hanabi1224 requested a review from a team as a code owner December 19, 2025 13:56

hanabi1224 requested review from LesnyRumcajs and sudo-shashank and removed request for a team December 19, 2025 13:56

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

fix: remove unnecessary select! macro

bf4e33e

hanabi1224 force-pushed the hm/remove-unnecessary-select branch from 9e1d214 to bf4e33e Compare December 19, 2025 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remove unnecessary select! macro #6374

fix: remove unnecessary select! macro #6374

Uh oh!

hanabi1224 commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: remove unnecessary select! macro #6374

Are you sure you want to change the base?

fix: remove unnecessary select! macro #6374

Uh oh!

Conversation

hanabi1224 commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 19, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanabi1224 commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading