Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Dec 19, 2025

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Bug Fixes

    • Standardized file download retry timeout to 30 seconds and added a 1s delay between retries.
    • Increased retry allowance for state migration tests to reduce flaky failures.
    • Improved consistency of error handling for network-dependent downloads.
  • Refactor

    • Simplified asynchronous retry and timeout control flow for clearer, more robust behavior.
  • Chores

    • Updated test runner profile to apply consistent slow-timeout and retry settings for networked tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

Walkthrough

Refactors async control flow and standardizes retry logic across multiple modules: removes use of the select! macro in daemon code, converts select-based timeout racing to tokio::time::timeout, and wraps downloads with a consistent retry configuration (30s timeout, retries, 1s delay), with tests adjusted.

Changes

Cohort / File(s) Summary
Retry mechanism standardization
src/state_manager/utils.rs, src/tool/subcommands/api_cmd/test_snapshot.rs, src/state_migration/tests/mod.rs
Adds retry wrapper for downloading state/snapshot files using RetryArgs (timeout 30s, delay 1s, max_retries 5 in code; test increases max_retries to 15). Changes retry closures to return the download future directly and await the retry result. Imports Duration where needed.
Async control flow refactor (daemon)
src/daemon/mod.rs
Removes select! import and replaces a select-based propagate_error loop with an explicit join_next().await loop that awaits service results and returns the first observed Err.
Timeout / futures simplification
src/utils/mod.rs
Removes heavy futures utilities (FutureExt, FusedFuture, select, etc.), replaces fused select-based timeout racing with conditional use of tokio::time::timeout around a single retry task (or direct await if no timeout). Updates tests/imports impacted by the change.
Config for test timeouts
.config/nextest.toml
Adds a profile.default.overrides entry to extend slow-timeout/retries to tests matching state_compute_ (mirrors existing rpc_snapshot_ override).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review src/utils/mod.rs carefully to ensure the new tokio::time::timeout usage preserves prior behavior and edge-case semantics.
  • Verify retry parameter propagation and that tests intentionally increase retries to 15 in src/state_migration/tests/mod.rs.
  • Confirm src/daemon/mod.rs error-propagation behavior remains equivalent after replacing select! with join_next().await.

Possibly related PRs

Suggested reviewers

  • LesnyRumcajs
  • sudo-shashank
  • akaladarshi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: remove unnecessary select! macro' directly aligns with the core changes across multiple files (daemon/mod.rs, utils/mod.rs) where the select! macro is removed and replaced with explicit async loops and simpler control flow.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/remove-unnecessary-select

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e1d214 and bf4e33e.

📒 Files selected for processing (6)
  • .config/nextest.toml (1 hunks)
  • src/daemon/mod.rs (2 hunks)
  • src/state_manager/utils.rs (2 hunks)
  • src/state_migration/tests/mod.rs (1 hunks)
  • src/tool/subcommands/api_cmd/test_snapshot.rs (1 hunks)
  • src/utils/mod.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/state_migration/tests/mod.rs
  • src/tool/subcommands/api_cmd/test_snapshot.rs
  • src/state_manager/utils.rs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2025-08-08T12:10:45.218Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:553-558
Timestamp: 2025-08-08T12:10:45.218Z
Learning: Forest project targets Rust stable >=1.89; features stabilized in 1.88 like let-chains are acceptable in this codebase.

Applied to files:

  • src/daemon/mod.rs
📚 Learning: 2025-10-17T09:36:15.757Z
Learnt from: elmattic
Repo: ChainSafe/forest PR: 6128
File: src/ipld/util.rs:23-30
Timestamp: 2025-10-17T09:36:15.757Z
Learning: Always run `cargo check` or `cargo build` to verify actual compilation errors in the Forest codebase before flagging them as issues. Do not rely solely on documentation or assumptions about trait implementations.

Applied to files:

  • src/daemon/mod.rs
📚 Learning: 2025-08-28T12:52:46.927Z
Learnt from: LesnyRumcajs
Repo: ChainSafe/forest PR: 6011
File: src/cli/main.rs:18-25
Timestamp: 2025-08-28T12:52:46.927Z
Learning: In Forest CLI (src/cli/main.rs), the early RPC network check before Cli::parse_from() does not block help/version commands because clap processes these internally before reaching the RPC call. LesnyRumcajs confirmed this implementation works correctly and that RPC call failures are acceptable in this context.

Applied to files:

  • src/daemon/mod.rs
🧬 Code graph analysis (1)
src/daemon/mod.rs (2)
src/state_manager/cache.rs (4)
  • futures (373-376)
  • futures (417-420)
  • futures (462-465)
  • futures (510-513)
src/rpc/error.rs (1)
  • error_message (20-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Coverage
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: tests-release
🔇 Additional comments (6)
src/utils/mod.rs (3)

23-23: LGTM - Cleaner imports.

The import simplification reflects the removal of manual future manipulation with select!.


125-145: LGTM - Excellent refactoring.

Replacing the select!-based timeout racing with tokio::time::timeout significantly simplifies the code while preserving the exact same behavior. The conditional timeout application when args.timeout is Some is clean and idiomatic.

The cancel-safety requirement documented on line 114 remains valid with tokio::time::timeout.


183-183: LGTM - Test import adjustment.

The explicit import of futures::future::pending is necessary since the broader futures imports were removed.

.config/nextest.toml (1)

44-51: LGTM - Consistent test retry configuration.

The new override for state_compute_ tests mirrors the rpc_snapshot_test_ configuration, which is appropriate since both download snapshots from the network and can experience transient failures. The exponential backoff with jitter aligns with best practices to avoid thundering herd issues.

src/daemon/mod.rs (2)

39-39: LGTM - Updated imports for refactored control flow.

The FutureExt import is retained for the .then() call in the asyncify function (line 757), while select is no longer needed after the refactoring.


741-747: LGTM - Idiomatic refactoring.

Replacing select! with JoinSet::join_next() is the standard, idiomatic way to wait for any task in a set to complete. The logic correctly returns the first error encountered and falls back to pending() if all services complete without errors (which should never happen in practice since services are expected to run indefinitely).


Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 marked this pull request as ready for review December 19, 2025 13:56
@hanabi1224 hanabi1224 requested a review from a team as a code owner December 19, 2025 13:56
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and sudo-shashank and removed request for a team December 19, 2025 13:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/state_migration/tests/mod.rs (1)

84-89: Potential timeout/retry mismatch.

The timeout of 5 seconds is the overall timeout for the entire retry operation (as per the refactored retry function), not a per-attempt timeout. With 15 retries, most won't execute before the 5-second overall timeout expires. The per-request timeout on line 94 (global_http_client().get(...).timeout(timeout)) also uses the same 5-second value.

Consider whether the intent is:

  1. Overall 5s timeout with up to 15 quick retries (current behavior), or
  2. A longer overall timeout (e.g., 30s or more) to allow multiple retries to complete

Based on learnings, hanabi1224 prefers fail-fast with reasonable timeouts. If long downloads are expected, consider increasing the overall timeout proportionally to the retry count.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ea21a4 and 9e1d214.

📒 Files selected for processing (5)
  • src/daemon/mod.rs (2 hunks)
  • src/state_manager/utils.rs (2 hunks)
  • src/state_migration/tests/mod.rs (1 hunks)
  • src/tool/subcommands/api_cmd/test_snapshot.rs (1 hunks)
  • src/utils/mod.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2025-08-08T12:10:45.218Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:553-558
Timestamp: 2025-08-08T12:10:45.218Z
Learning: Forest project targets Rust stable >=1.89; features stabilized in 1.88 like let-chains are acceptable in this codebase.

Applied to files:

  • src/daemon/mod.rs
📚 Learning: 2025-10-17T09:36:15.757Z
Learnt from: elmattic
Repo: ChainSafe/forest PR: 6128
File: src/ipld/util.rs:23-30
Timestamp: 2025-10-17T09:36:15.757Z
Learning: Always run `cargo check` or `cargo build` to verify actual compilation errors in the Forest codebase before flagging them as issues. Do not rely solely on documentation or assumptions about trait implementations.

Applied to files:

  • src/daemon/mod.rs
📚 Learning: 2025-08-28T12:52:46.927Z
Learnt from: LesnyRumcajs
Repo: ChainSafe/forest PR: 6011
File: src/cli/main.rs:18-25
Timestamp: 2025-08-28T12:52:46.927Z
Learning: In Forest CLI (src/cli/main.rs), the early RPC network check before Cli::parse_from() does not block help/version commands because clap processes these internally before reaching the RPC call. LesnyRumcajs confirmed this implementation works correctly and that RPC call failures are acceptable in this context.

Applied to files:

  • src/daemon/mod.rs
📚 Learning: 2025-08-25T13:35:24.230Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5969
File: src/tool/subcommands/snapshot_cmd.rs:412-412
Timestamp: 2025-08-25T13:35:24.230Z
Learning: In src/tool/subcommands/snapshot_cmd.rs, the +1 in `last_epoch = ts.epoch() - epochs as i64 + 1` fixes an off-by-1 bug where specifying --check-stateroots=N would validate N+1 epochs instead of N epochs, causing out-of-bounds errors when the snapshot contains only N recent state roots.

Applied to files:

  • src/tool/subcommands/api_cmd/test_snapshot.rs
📚 Learning: 2025-08-25T14:17:09.129Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5978
File: .github/workflows/unit-tests.yml:0-0
Timestamp: 2025-08-25T14:17:09.129Z
Learning: hanabi1224's download_file_with_cache function in src/utils/net/download_file.rs preserves URL path structure in local cache directories by using cache_dir.join(url.path().strip_prefix('/').unwrap_or_else(|| url.path())), so snapshots from https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/ are cached locally at ~/.cache/forest/test/rpc-snapshots/rpc_test/ (including the rpc_test subdirectory from the URL path).

Applied to files:

  • src/tool/subcommands/api_cmd/test_snapshot.rs
  • src/state_manager/utils.rs
📚 Learning: 2025-08-25T14:17:09.129Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5978
File: .github/workflows/unit-tests.yml:0-0
Timestamp: 2025-08-25T14:17:09.129Z
Learning: hanabi1224's download_file_with_cache function preserves URL path structure in local cache directories, so snapshots from https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/ are cached locally at ~/.cache/forest/test/rpc-snapshots/rpc_test (including the rpc_test subdirectory from the URL path).

Applied to files:

  • src/tool/subcommands/api_cmd/test_snapshot.rs
  • src/state_manager/utils.rs
🧬 Code graph analysis (2)
src/tool/subcommands/api_cmd/test_snapshot.rs (1)
src/utils/net/download_file.rs (1)
  • download_file_with_cache (30-89)
src/state_manager/utils.rs (2)
src/utils/mod.rs (1)
  • retry (116-145)
src/utils/net/download_file.rs (1)
  • download_file_with_cache (30-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Coverage
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: All lint checks
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
🔇 Additional comments (4)
src/utils/mod.rs (1)

125-144: Clean refactoring of the retry logic.

The new implementation is simpler and more readable. The timeout now correctly wraps the entire retry operation (including all retries and delays), which is the expected behavior. The removal of select! macro and associated futures machinery is a good simplification.

src/state_manager/utils.rs (1)

219-234: Good addition of retry logic with consistent parameters.

The retry configuration (30s timeout, 5 retries, 1s delay) aligns well with the pattern used elsewhere in the codebase. The closure correctly returns the future directly without awaiting, which is the expected pattern for the refactored retry function.

src/daemon/mod.rs (1)

741-747: Clean replacement of select! with explicit join_next loop.

The refactored propagate_error correctly iterates through completed services and returns the first error. The pending().await ensures the function never returns if all services complete successfully, matching the documented behavior.

Note: Task panics (Err from join_next) are silently ignored. This appears to be pre-existing behavior, but you may want to consider logging or handling panics in a follow-up.

src/tool/subcommands/api_cmd/test_snapshot.rs (1)

218-228: Consistent retry configuration with other download operations.

The updated parameters (30s timeout, 5 retries, 1s delay) align with the pattern established in src/state_manager/utils.rs. The closure correctly returns the future directly, matching the expected signature for the refactored retry function.

@hanabi1224 hanabi1224 force-pushed the hm/remove-unnecessary-select branch from 9e1d214 to bf4e33e Compare December 19, 2025 14:00
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 75.86207% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.16%. Comparing base (4ea21a4) to head (bf4e33e).

Files with missing lines Patch % Lines
src/daemon/mod.rs 0.00% 3 Missing ⚠️
src/utils/mod.rs 70.00% 1 Missing and 2 partials ⚠️
src/state_manager/utils.rs 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/tool/subcommands/api_cmd/test_snapshot.rs 85.53% <100.00%> (+0.44%) ⬆️
src/state_manager/utils.rs 77.03% <92.30%> (+0.75%) ⬆️
src/daemon/mod.rs 28.46% <0.00%> (+0.10%) ⬆️
src/utils/mod.rs 80.53% <70.00%> (-2.66%) ⬇️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ea21a4...bf4e33e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants