Skip to content

Conversation

@curryxbo
Copy link
Contributor

@curryxbo curryxbo commented Dec 11, 2025

Summary by CodeRabbit

  • New Features

    • Optional L2Next Geth endpoints and CLI flags to enable runtime upgrade switching.
  • Enhancements

    • Startup fetch of node upgrade config to drive transition behavior.
    • Improved block delivery and validation flow during upgrade windows with clearer, actionable error messages and logs.
  • Chores

    • Updated go-ethereum dependency across modules.

✏️ Tip: You can customize this high-level summary in your review settings.

@curryxbo curryxbo requested a review from a team as a code owner December 11, 2025 08:14
@curryxbo curryxbo requested review from Web3Jumb0 and removed request for a team December 11, 2025 08:14
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

📝 Walkthrough

Walkthrough

Adds optional L2Next geth endpoints and a runtime-controlled client switch driven by fetched geth upgrade configuration; wiring includes startup config fetch, dual clients (current/next), timestamp-aware switching, and upgrade-aware validation/logging changes across derivation and executor flows.

Changes

Cohort / File(s) Summary
Configuration & CLI
node/core/config.go, node/derivation/config.go, node/flags/flags.go
Added optional L2Next config and CLI flags L2NextEthAddr / L2NextEngineAddr; CLI populates L2Next when both addresses provided.
Retryable Client & Geth Config
node/types/retryable_client.go
Introduced GethConfig and FetchGethConfig; extended RetryableClient with nextAuth/nextEth clients, switchTime, atomic switched, per-call routing (aClient()/eClient()), EnsureSwitched() and internal switchClient(); signature of NewRetryableClient changed to accept next clients and switchTime.
Derivation Logic
node/derivation/derivation.go
Fetches geth config at startup; added switchTime and useZktrie fields; reworked mismatch handling with skipValidation logic during upgrade transitions and adjusted logging/validator behavior.
Executor Block Handling
node/core/executor.go
DeliverBlock() now calls l2Client.EnsureSwitched(), re-reads height, conditionally updates height or returns params/validators in devSequencer mode, and produces improved error/critical logs and MPT-compatibility hints.
Module & Build Updates
node/go.mod, bindings/go.mod, contracts/go.mod, oracle/go.mod, token-price-oracle/go.mod, tx-submitter/go.mod, Makefile
Bumped github.com/morph-l2/go-ethereum pseudo-version across modules; updated ETHEREUM_TARGET_VERSION in Makefile.
Repo housekeeping
.gitignore
Restored *.log ignore and added ops/mpt-switch-test ignore entry for local testing.

Sequence Diagram(s)

sequenceDiagram
    participant Executor
    participant RetryableClient
    participant CurrentGeth
    participant NextGeth

    Note over Executor,RetryableClient: Startup
    Executor->>RetryableClient: NewRetryableClient(currentAuth,currentEth,nextAuth?,nextEth?,switchTime)
    RetryableClient->>CurrentGeth: FetchGethConfig() (via RPC)
    CurrentGeth-->>RetryableClient: GethConfig{SwitchTime,UseZktrie}

    Note over Executor,CurrentGeth: Normal operation (before switch)
    Executor->>RetryableClient: RPC call (e.g., BlockNumber)
    RetryableClient->>RetryableClient: aClient()/eClient() -> current
    RetryableClient->>CurrentGeth: RPC request
    CurrentGeth-->>RetryableClient: response
    RetryableClient-->>Executor: response

    Note over Executor,NextGeth: At or after switchTime
    Executor->>RetryableClient: EnsureSwitched(timestamp, blockNumber)
    RetryableClient->>NextGeth: Poll for sync (getBlockNumber) until synced
    NextGeth-->>RetryableClient: currentBlockNumber
    RetryableClient->>RetryableClient: Set switched=true

    Note over Executor,NextGeth: Post-switch
    Executor->>RetryableClient: RPC call
    RetryableClient->>RetryableClient: aClient()/eClient() -> next
    RetryableClient->>NextGeth: RPC request
    NextGeth-->>RetryableClient: response
    RetryableClient-->>Executor: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • r3aker86
  • Web3Jumb0
  • panos-xyz

Poem

🐰 I fetched the config at dawn’s gentle twitch,
Built two little clients — current and niche,
When switch time arrives I poll, then I hop,
Swap hearts with a wink, and the RPCs stop,
Hooray for upgrades — a carrot-sweet switch! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'WIP:ZK to MPT' is vague and uses non-descriptive abbreviations that don't clearly convey the technical scope or main objective of the changes. Expand the title to clearly describe the main change, such as 'Add L2Next client support for ZK-to-MPT upgrade switching' or 'Implement MPT fork upgrade switching mechanism'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
node/derivation/derivation.go (2)

293-299: Log level should be Info or Warn, not Error.

Skipping validation during the upgrade transition is expected behavior when the geth type doesn't match the block era. Using Error level here may trigger alerts and confuse operators into thinking something is wrong.

Suggested fix
 				skipValidation = true
-				d.logger.Error("Root validation skipped during upgrade transition",
+				d.logger.Info("Root validation skipped during upgrade transition",
 					"originStateRootHash", batchInfo.root,
 					"deriveStateRootHash", lastHeader.Root.Hex(),
 					"blockTimestamp", lastHeader.Time,
 					"switchTime", d.switchTime,
 					"useZktrie", d.useZktrie,
 				)

312-318: Log level should be Error, not Info.

A root hash or withdrawal hash mismatch outside of the upgrade transition window is a serious validation failure that may indicate chain divergence or data corruption. Using Info level undersells the severity.

Suggested fix
-			d.logger.Info("root hash or withdrawal hash is not equal",
+			d.logger.Error("root hash or withdrawal hash is not equal",
 				"originStateRootHash", batchInfo.root,
 				"deriveStateRootHash", lastHeader.Root.Hex(),
 				"batchWithdrawalRoot", batchInfo.withdrawalRoot.Hex(),
 				"deriveWithdrawalRoot", common.BytesToHash(withdrawalRoot[:]).Hex(),
 			)

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5dea362 and 772df41.

📒 Files selected for processing (3)
  • node/core/executor.go
  • node/derivation/derivation.go
  • node/types/retryable_client.go
🧰 Additional context used
🧬 Code graph analysis (1)
node/derivation/derivation.go (1)
node/types/retryable_client.go (2)
  • FetchGethConfig (64-112)
  • NewRetryableClient (132-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
node/core/executor.go (4)

74-108: L2Next client wiring and geth config fetch look correct.

The optional L2Next client setup with proper nil checks and the startup geth config fetch provide the foundation for the upgrade switch mechanism. The logging is informative for operators.

One minor improvement: consider closing aNextClient if eNextClient.Dial fails (lines 92-95) to avoid resource leak, though this is during startup where process exit would reclaim resources anyway.


310-334: Block delivery logic correctly handles switch during P2P sync scenarios.

The EnsureSwitched call ensures nodes that receive blocks via P2P sync still trigger the client switch. The height re-check after switch correctly handles the case where a block exists in legacy geth but not in the target geth.

Note: The EnsureSwitched call at line 314 uses context.Background(). If the underlying switchClient implementation blocks indefinitely (which it currently can), this could halt consensus. This concern is tracked in the retryable_client.go review.


339-344: Improved error messaging for behind-block scenario.

The CRITICAL banner and actionable guidance ("Switch to MPT-compatible geth IMMEDIATELY") will help operators quickly identify and resolve upgrade-related issues.


375-384: Good error banners and MPT compatibility hint.

The visual banners and contextual hint about MPT compatibility will aid debugging during upgrade transitions.

node/derivation/derivation.go (2)

66-70: New fields for upgrade switch configuration.

The switchTime and useZktrie fields are appropriately scoped and will be populated from the geth config fetch at startup.


141-174: Geth config fetch and struct initialization look correct.

The FetchGethConfig call provides the switch parameters, and the NewRetryableClient receives the proper arguments including the fetched switchTime. The new fields are properly initialized in the returned struct.

node/types/retryable_client.go (2)

34-112: FetchGethConfig implementation is well-structured.

The fallback logic (current → next → last config) provides resilience for different geth configurations. The logging at each step aids debugging. Proper error handling and resource cleanup (defer client.Close()).


132-177: Constructor correctly handles both upgrade-enabled and disabled scenarios.

The fallback to current client when next is nil (lines 136-147) is a clean pattern that avoids nil checks throughout the codebase. The immediate switch detection at startup (lines 149-174) ensures nodes starting after the switch time use the correct client from the beginning.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
node/derivation/derivation.go (1)

81-97: Legacy clients use the same endpoints as current clients.

Both laClient/leClient and aClient/eClient are dialed using identical configuration values (cfg.L2.EngineAddr, cfg.L2.EthAddr). This means the timestamp-based routing in RetryableClient will always connect to the same backend, negating the dual-client purpose.

If this is WIP, consider adding separate configuration fields (e.g., cfg.L2.LegacyEngineAddr, cfg.L2.LegacyEthAddr) for the legacy endpoints. Otherwise, clarify the TODO on line 81.

node/core/executor.go (1)

74-92: Legacy clients use identical endpoints as current clients.

Same issue as in derivation.go: laClient and leClient dial the same config.L2.EngineAddr and config.L2.EthAddr as aClient and eClient. The dual-client routing will effectively route to the same backend.

Consider adding separate configuration for legacy endpoints, or clarify the TODO comment regarding the intended behavior.

🧹 Nitpick comments (2)
node/derivation/derivation.go (1)

141-141: Use the existing logger instead of creating a new one.

A new TMLogger is created here, but the logger variable is already available and configured with the "derivation" module context (line 115). Use the existing logger for consistency.

-		l2Client:              types.NewRetryableClient(laClient, leClient, aClient, eClient, tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout))),
+		l2Client:              types.NewRetryableClient(laClient, leClient, aClient, eClient, logger),
node/types/retryable_client.go (1)

3-16: Import ordering: standard library imports should come first.

Go convention places standard library imports before third-party imports. Consider reordering:

 import (
 	"context"
+	"math/big"
+	"strings"
+	"time"
+
 	"github.com/cenkalti/backoff/v4"
 	"github.com/morph-l2/go-ethereum"
 	"github.com/morph-l2/go-ethereum/common"
 	eth "github.com/morph-l2/go-ethereum/core/types"
 	"github.com/morph-l2/go-ethereum/eth/catalyst"
 	"github.com/morph-l2/go-ethereum/ethclient"
 	"github.com/morph-l2/go-ethereum/ethclient/authclient"
 	tmlog "github.com/tendermint/tendermint/libs/log"
-	"math/big"
-	"strings"
-	"time"
 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24abaa7 and 2cdfb07.

⛔ Files ignored due to path filters (1)
  • node/go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • node/core/executor.go (3 hunks)
  • node/derivation/derivation.go (2 hunks)
  • node/go.mod (1 hunks)
  • node/types/retryable_client.go (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
node/derivation/derivation.go (1)
node/types/retryable_client.go (1)
  • NewRetryableClient (41-51)
node/core/executor.go (1)
node/types/retryable_client.go (2)
  • RetryableClient (29-37)
  • NewRetryableClient (41-51)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
  • Client (14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (2)
node/types/retryable_client.go (1)

67-70: AssembleL2Block uses wall-clock time for routing.

The timestamp is derived from time.Now().Unix() rather than from block data. This is appropriate for assembling new blocks, but be aware that:

  1. Clock drift could cause unexpected routing behavior
  2. This differs from other methods that extract timestamps from block/transaction data

Ensure this is the intended behavior for block assembly operations.

node/go.mod (1)

14-14: Go-ethereum dependency update is correctly applied.

The new pseudo-version v1.10.14-0.20251211075654-796834acba86 is present in node/go.mod line 14 and the codebase already calls AssembleL2Block with the timestamp parameter as the second argument after context, confirming the dependency provides the necessary API changes for the RetryableClient functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
node/types/retryable_client.go (3)

117-135: AssembleL2Block routing is fine, but timestamp is captured only once.

The overall flow (compute a timestamp, call switchClient, then go through aClient) looks consistent with the MPT‑based routing.

Note that timestamp is computed once before backoff.Retry; if retries happen after the intended MPT switch time, this call will still behave as “pre‑switch” because switchClient always sees the original timestamp. If you want retries to respect the current time, recomputing inside the retry closure would be safer.


193-225: Commit/Append paths rely on existing MPT state only.

CommitBatch and AppendBlsSignature always use aClient() without calling switchClient. That means:

  • They will hit the legacy client until some earlier call (e.g., New/Validate/Assemble) has performed the switch and set mpt=true.
  • After that, they’ll always go to the new client.

If that ordering is intentional (i.e., commits never lead the cutover), this is fine; otherwise, consider explicitly invoking switchClient here as well, using an appropriate timestamp/height.


299-309: Retryable error policy is simple and consistent.

Treating everything except DiscontinuousBlockError as retryable is straightforward and matches the comment above. Just be aware this interacts with the “infinite” backoff behavior discussed earlier.

node/core/executor.go (1)

70-81: Clarify legacy vs new client endpoints; avoid duplicate dials if not needed.

laClient/leClient and aClient/eClient are all dialed against the same EngineAddr/EthAddr. That’s OK functionally, but it:

  • Doubles the TCP/JWT handshakes and keeps four separate connections open.
  • Makes it unclear which pair is “legacy” vs “MPT” until configs diverge.

If the intent is to eventually point laClient/leClient at different legacy endpoints, consider:

  • Adding explicit config fields for legacy addresses now, or
  • Reusing the same client instances until those endpoints exist, to avoid unnecessary duplication.

Also, you may want to replace the placeholder // TODO with a more descriptive comment about the planned split.

Also applies to: 91-92

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cdfb07 and 105c335.

📒 Files selected for processing (2)
  • node/core/executor.go (2 hunks)
  • node/types/retryable_client.go (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/db/store.go (1)
  • Store (16-18)
node/core/executor.go (1)
node/types/retryable_client.go (1)
  • NewRetryableClient (44-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (4)
node/types/retryable_client.go (3)

57-69: Client selection helpers look correct.

aClient/eClient cleanly encapsulate legacy vs new routing via the atomic mpt flag, which keeps the call‑sites simple and centralizes switching behavior.


137-191: Validate/New/NewSafe L2 block flows look consistent with MPT switching.

These methods all:

  • Call switchClient with the block’s own Timestamp/Number.
  • Then use aClient so they follow the MPT/legacy split consistently.

Given the constructor wiring, this cleanly moves traffic over once mpt is set; no issues from what’s visible here.


227-297: Read‑only ETH RPC wrappers correctly use eClient.

BlockNumber, HeaderByNumber, CallContract, and CodeAt all delegate through eClient(), so they respect the current MPT/legacy state without introducing fresh switching logic. The retry pattern matches the other methods.

node/core/executor.go (1)

91-135: Executor wiring to the new RetryableClient looks correct.

NewExecutor now constructs l2Client via types.NewRetryableClient(laClient, leClient, aClient, eClient, config.Logger) and threads it through:

  • getNextL1MsgIndex
  • contract callers (SequencerCaller, GovCaller, L2StakingCaller)
  • all existing executor methods (RequestBlockData, CheckBlockData, DeliverBlock, RequestHeight, etc.)

Struct initialization (Executor{ l2Client: l2Client, ... }) matches this and there are no obvious miswirings here.

Comment on lines 31 to 39
type RetryableClient struct {
authClient *authclient.Client
ethClient *ethclient.Client
b backoff.BackOff
logger tmlog.Logger
legacyAuthClient *authclient.Client
legacyEthClient *ethclient.Client
authClient *authclient.Client
ethClient *ethclient.Client
mptTime uint64
mpt atomic.Bool
b backoff.BackOff
logger tmlog.Logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make mptTime configurable instead of hard‑coding now+1h.

Right now mptTime is derived from time.Now().Add(time.Hour) at construction, which means:

  • Each node instance may switch at a different wall‑clock time.
  • Restarts will delay switching by another hour, regardless of the canonical MPT cutover time.

Before shipping, this should be driven by config (or passed explicitly into NewRetryableClient) so that routing behavior is deterministic and restart‑safe, not tied to process start time. The existing TODO hints at this, but it’s important for correctness of the migration window.

Also applies to: 43-55


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the retryable_client.go file
find . -name "retryable_client.go" -type f

Repository: morph-l2/morph

Length of output: 90


🏁 Script executed:

# Read the entire file to understand the structure and all usages of rc.b
cat -n node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 11289


🏁 Script executed:

# Check the backoff library being used to understand its concurrency characteristics
rg "backoff.Retry|backoff.BackOff|backoff.NewExponential" node/types/retryable_client.go -A 3 -B 1

Repository: morph-l2/morph

Length of output: 3325


🌐 Web query:

cenkalti/backoff v4 ExponentialBackOff thread-safe concurrent

💡 Result:

Short answer: No — ExponentialBackOff in cenkalti/backoff v4 is explicitly documented as not thread‑safe for concurrent use. See the package docs and source which state "Note: Implementation is not thread‑safe." [1][2]

If you need concurrent use, either:

  • create one ExponentialBackOff per goroutine/operation, or
  • guard a single instance with your own synchronization (sync.Mutex), or
  • upgrade to a different policy/implementation that documents thread safety. [1][2]

Sources:
[1] pkg.go.dev github.com/cenkalti/backoff/v4 (ExponentialBackOff docs).
[2] ExponentialBackOff implementation (v4) in repository/source.


🏁 Script executed:

# Check if there's any documentation or comments about thread safety in the codebase
rg -i "thread.*safe|concurrent|mutex|lock" node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 2510


🏁 Script executed:

# Look for how this client is used - are methods called concurrently from goroutines?
rg -r "RetryableClient" --type go -A 3 | head -100

Repository: morph-l2/morph

Length of output: 454


🏁 Script executed:

# Search for goroutine usage with RetryableClient to verify concurrent calls
rg "go.*rc\.|go.*client\.|RetryableClient" --type go -B 2 -A 2 | head -150

Repository: morph-l2/morph

Length of output: 8154


🏁 Script executed:

# Check if RetryableClient methods might be called from multiple goroutines
fd "\.go$" --type f -x rg -l "RetryableClient" {} \; | head -10

Repository: morph-l2/morph

Length of output: 174


Shared backoff.BackOff instance is not thread-safe and unsafe for concurrent use.

The single rc.b instance is shared across all retry calls in AssembleL2Block, ValidateL2Block, NewL2Block, NewSafeL2Block, CommitBatch, AppendBlsSignature, BlockNumber, HeaderByNumber, CallContract, and CodeAt methods. Since cenkalti/backoff/v4's ExponentialBackOff is explicitly documented as not thread-safe, and RetryableClient is used concurrently by sequencer, derivation, and executor subsystems, concurrent calls will cause data race conditions on the backoff state.

Create a new ExponentialBackOff per request instead:

 type RetryableClient struct {
-    b                backoff.BackOff
+    // remove shared BackOff; create per-call instances instead
 }

 func (rc *RetryableClient) AssembleL2Block(...) (..., error) {
-    if retryErr := backoff.Retry(func() error { ... }, rc.b); retryErr != nil {
+    b := backoff.NewExponentialBackOff()
+    if retryErr := backoff.Retry(func() error { ... }, b); retryErr != nil {
         ...
     }
 }

Apply the same fix to all other methods using backoff.Retry.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In node/types/retryable_client.go around lines 31-39, the struct holds a shared
backoff.BackOff (rc.b) which is not thread-safe; replace uses of the shared rc.b
with a fresh ExponentialBackOff instance per request: inside each method that
currently calls backoff.Retry(rc.b, ...) create a new
backoff.NewExponentialBackOff(), configure any required parameters (or copy them
from rc.b if rc.b is only used as a template), and pass that new instance to
backoff.Retry (or backoff.WithContext if you need context cancellation). Apply
this same change to all other methods that call backoff.Retry so no goroutine
reuses a shared backoff state.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
node/types/retryable_client.go (2)

87-114: switchClient can block indefinitely if context is canceled.

The loop at lines 87-114 never checks ctx.Done(), so it will continue polling forever if the context is canceled or the MPT node never catches up. This is a reliability concern that can cause goroutine leaks or hung operations.

Apply this diff to respect context cancellation:

 	for {
+		select {
+		case <-ctx.Done():
+			rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err())
+			return
+		default:
+		}
+
 		remote, err := rc.ethClient.BlockNumber(ctx)
 		if err != nil {
 			rc.logger.Error("Failed to get MPT block number, retrying...", "error", err)
-			<-ticker.C
+			select {
+			case <-ctx.Done():
+				rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err())
+				return
+			case <-ticker.C:
+			}
 			continue
 		}

38-38: Shared backoff.BackOff instance is not thread-safe.

The cenkalti/backoff/v4 ExponentialBackOff is explicitly documented as not thread-safe. Since RetryableClient methods are called concurrently from sequencer, derivation, and executor subsystems, the shared rc.b will cause data races.

Create a new backoff instance per call instead of sharing one:

 type RetryableClient struct {
 	legacyAuthClient *authclient.Client
 	legacyEthClient  *ethclient.Client
 	authClient       *authclient.Client
 	ethClient        *ethclient.Client
 	mptTime          uint64
 	mpt              atomic.Bool
-	b                backoff.BackOff
 	logger           tmlog.Logger
 }

Then in each method:

 func (rc *RetryableClient) AssembleL2Block(...) (..., error) {
+	b := backoff.NewExponentialBackOff()
 	timestamp := uint64(time.Now().Unix())
-	if retryErr := backoff.Retry(func() error {
+	if retryErr := backoff.Retry(func() error {
 		...
-	}, rc.b); retryErr != nil {
+	}, b); retryErr != nil {
🧹 Nitpick comments (2)
node/core/config.go (1)

31-31: Inconsistent JSON tag: l_2_legacy vs l2_legacy.

The JSON tag "l_2_legacy" uses underscores inconsistently compared to the derivation config which uses "l2_legacy". This could cause issues with configuration file parsing or serialization.

-	L2Legacy                      *types.L2Config `json:"l_2_legacy"`
+	L2Legacy                      *types.L2Config `json:"l2_legacy"`
node/core/executor.go (1)

73-81: Consider adding validation for empty legacy endpoint addresses.

If the legacy endpoints are not configured (empty strings), the dial calls will likely fail with confusing errors. Adding explicit validation would provide clearer error messages.

 	// legacy zk endpoint
+	if config.L2Legacy.EngineAddr == "" || config.L2Legacy.EthAddr == "" {
+		return nil, errors.New("L2Legacy endpoints (EngineAddr and EthAddr) must be configured")
+	}
 	laClient, err := authclient.DialContext(context.Background(), config.L2Legacy.EngineAddr, config.L2Legacy.JwtSecret)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 105c335 and a3d76f2.

📒 Files selected for processing (6)
  • node/core/config.go (2 hunks)
  • node/core/executor.go (2 hunks)
  • node/derivation/config.go (2 hunks)
  • node/derivation/derivation.go (2 hunks)
  • node/flags/flags.go (2 hunks)
  • node/types/retryable_client.go (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
node/derivation/config.go (2)
node/types/config.go (1)
  • L2Config (14-18)
node/flags/flags.go (3)
  • MptTime (43-47)
  • L2LegacyEthAddr (31-35)
  • L2LegacyEngineAddr (37-41)
node/core/config.go (2)
node/types/config.go (1)
  • L2Config (14-18)
node/flags/flags.go (3)
  • MptTime (43-47)
  • L2LegacyEthAddr (31-35)
  • L2LegacyEngineAddr (37-41)
node/core/executor.go (2)
node/types/retryable_client.go (1)
  • NewRetryableClient (44-55)
node/flags/flags.go (1)
  • MptTime (43-47)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/db/store.go (1)
  • Store (16-18)
node/derivation/derivation.go (2)
node/types/retryable_client.go (1)
  • NewRetryableClient (44-55)
node/flags/flags.go (1)
  • MptTime (43-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (5)
node/types/retryable_client.go (2)

193-225: CommitBatch and AppendBlsSignature don't invoke switchClient.

These methods use rc.aClient() but never call switchClient(), unlike ValidateL2Block, NewL2Block, and NewSafeL2Block. If these operations occur during the MPT transition window, they may route to the wrong client.

Is this intentional? If these operations should always use the current MPT state without waiting for sync, consider adding a comment explaining why. Otherwise, add switchClient calls with appropriate timestamp/block parameters.


227-297: BlockNumber, HeaderByNumber, CallContract, and CodeAt always use current MPT state.

These methods call rc.eClient() without switchClient(). They will use the legacy client until mpt is set to true by another operation. This could cause inconsistent behavior if these are called before any switchClient-invoking method.

Verify this is the intended behavior. If these should reflect the MPT state based on a block number, consider adding routing logic similar to the auth client methods.

node/core/executor.go (1)

73-91: Legacy client initialization depends on L2Legacy being non-nil.

The code at lines 74 and 78 accesses config.L2Legacy.EngineAddr and config.L2Legacy.EthAddr. If the config initialization issue in node/core/config.go is not fixed, this will panic with a nil pointer dereference.

Ensure that config.L2Legacy is properly initialized before this code runs. Consider adding a nil check or ensuring the config struct always initializes L2Legacy:

if config.L2Legacy == nil {
    return nil, errors.New("L2Legacy config is required")
}
node/derivation/derivation.go (1)

81-89: Legacy client initialization depends on L2Legacy being non-nil.

Similar to executor.go, lines 82 and 86 access cfg.L2Legacy.EngineAddr and cfg.L2Legacy.EthAddr. If L2Legacy is not initialized in derivation/config.go, this will panic.

Ensure the config initialization fix in node/derivation/config.go is applied. Consider adding defensive validation:

if cfg.L2Legacy == nil {
    return nil, errors.New("L2Legacy config is required for derivation")
}
node/flags/flags.go (1)

325-327: LGTM! Flags properly integrated into the CLI.

The new flags are correctly added to the Flags slice and positioned logically with other L2-related configuration options.

Comment on lines 43 to 47
MptTime = cli.Uint64Flag{
Name: "upgrade.mptTime",
Usage: "Batch index at which the sequencers start to upgrade the batch format",
EnvVar: prefixEnvVar("MPT_TIME"),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the flags.go file and examine the MptTime flag and UpgradeBatchTime flag
find . -name "flags.go" -type f | grep -E "node.*flags" | head -5

Repository: morph-l2/morph

Length of output: 79


🏁 Script executed:

# Check if node/flags/flags.go exists and read it
if [ -f "node/flags/flags.go" ]; then
  echo "=== Lines 43-47 (MptTime) ===" 
  sed -n '43,47p' node/flags/flags.go
  echo ""
  echo "=== Lines 240-250 (around UpgradeBatchTime) ===" 
  sed -n '240,250p' node/flags/flags.go
else
  echo "File not found. Searching for flags files:"
  find . -name "flags.go" -type f 2>/dev/null | head -10
fi

Repository: morph-l2/morph

Length of output: 608


Rename MptTime to clarify it represents a batch index, and distinguish its usage description from UpgradeBatchTime.

The flag name MptTime is misleading since it uses a "Time" suffix despite representing a batch index. Additionally, both MptTime and UpgradeBatchTime have identical usage text ("Batch index at which the sequencers start to upgrade the batch format"), which obscures their different purposes.

Consider:

  1. Renaming to reflect the batch index nature (e.g., MptUpgradeBatchIndex or MptSwitchBatchIndex)
  2. Updating the usage text to clarify the MPT-specific distinction (e.g., "Batch index at which sequencers switch from legacy to MPT-based batch format")
🤖 Prompt for AI Agents
In node/flags/flags.go around lines 43-47, the MptTime identifier and its Usage
text are misleading because the suffix "Time" suggests a timestamp while the
flag actually holds a batch index and duplicates the Usage of UpgradeBatchTime;
rename the flag to something clear (e.g., MptUpgradeBatchIndex or
MptSwitchBatchIndex), update the Usage string to explicitly state it is an
MPT-specific batch index (for example: "Batch index at which sequencers switch
from legacy to MPT-based batch format"), change the EnvVar key passed to
prefixEnvVar to a matching name, and update every reference in the codebase to
the new identifier and env var name to keep behavior consistent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
node/types/retryable_client.go (2)

38-38: Shared backoff.BackOff instance is not thread-safe.

The single rc.b instance is shared across all retry calls. As noted in a previous review, cenkalti/backoff/v4's ExponentialBackOff is explicitly documented as not thread-safe. With RetryableClient used concurrently by sequencer, derivation, and executor subsystems, concurrent calls can cause data races on the backoff state.

Create a new ExponentialBackOff per request instead:

🔎 Proposed fix pattern
 type RetryableClient struct {
-    b                backoff.BackOff
+    // Removed shared BackOff; create per-call instances instead
 }

 func (rc *RetryableClient) AssembleL2Block(...) (..., error) {
+    b := backoff.NewExponentialBackOff()
-    if retryErr := backoff.Retry(func() error { ... }, rc.b); retryErr != nil {
+    if retryErr := backoff.Retry(func() error { ... }, b); retryErr != nil {
         ...
     }
 }

Apply this pattern to all methods using backoff.Retry.

Also applies to: 150-150


101-133: switchClient ignores context cancellation and can block indefinitely.

The loop polls rc.ethClient.BlockNumber(ctx) until MPT geth catches up, but never checks ctx.Done(). If the context is canceled (e.g., node shutdown) or if MPT geth never catches up, this loop runs forever, blocking the calling goroutine.

🔎 Proposed fix
 for {
+    select {
+    case <-ctx.Done():
+        rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err())
+        return
+    default:
+    }
+
     remote, err := rc.ethClient.BlockNumber(ctx)
     if err != nil {
         rc.logger.Error("Failed to get MPT geth block number",
             "error", err,
             "hint", "Please ensure MPT geth is running and accessible")
-        <-ticker.C
+        select {
+        case <-ctx.Done():
+            rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err())
+            return
+        case <-ticker.C:
+        }
         continue
     }
     // ... rest of loop ...
-    <-ticker.C
+    select {
+    case <-ctx.Done():
+        rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err())
+        return
+    case <-ticker.C:
+    }
 }

Consider also adding a maximum wait duration to prevent indefinite blocking if MPT geth never syncs.

🧹 Nitpick comments (7)
ops/mpt-switch-test/README.md (1)

7-18: Add language specifiers to fenced code blocks.

The documentation would benefit from explicit language identifiers for better rendering and syntax highlighting.

🔎 Recommended fixes

For the ASCII diagram (lines 7-18):

-```
+```text
 升级前:                              升级后 (交换):

For the log output example (lines 89-98):

-```
+```log
 MPT switch time reached, MUST wait for MPT node to sync

For the file structure (lines 102-118):

-```
+```text
 mpt-switch-test/

Also applies to: 89-98, 102-118

ops/mpt-switch-test/send-txs.sh (1)

7-7: Remove unused variable.

GETH_BIN is defined but never referenced in the script. The script relies on the cast command instead.

🔎 Proposed fix
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-GETH_BIN="${SCRIPT_DIR}/bin/geth"
 ZK_GETH_HTTP="http://127.0.0.1:8545"
node/core/executor.go (2)

300-305: Consider log verbosity for production.

While the enhanced logging with "CRITICAL" prefix and actionable hints is useful during the MPT upgrade window, this verbose style may be noisy in steady-state production. Consider gating this enhanced messaging on a debug flag or removing the dramatic formatting after the upgrade stabilizes.


336-344: Consider consolidating error logging.

The multiple separate logger.Error calls with decorative separators could be consolidated into a single structured log entry. This would be cleaner and easier to parse programmatically.

🔎 Suggested consolidation
-		e.logger.Error("========================================")
-		e.logger.Error("CRITICAL: Failed to deliver block to geth!")
-		e.logger.Error("========================================")
-		e.logger.Error("failed to NewL2Block",
-			"error", err,
-			"block_number", l2Block.Number,
-			"block_timestamp", l2Block.Timestamp)
-		e.logger.Error("HINT: If this occurs after MPT upgrade, your geth node may not support MPT blocks. " +
-			"Please ensure you are running an MPT-compatible geth node.")
+		e.logger.Error("CRITICAL: Failed to deliver block to geth",
+			"error", err,
+			"block_number", l2Block.Number,
+			"block_timestamp", l2Block.Timestamp,
+			"hint", "If this occurs after MPT upgrade, ensure you are running an MPT-compatible geth node")
ops/mpt-switch-test/test-mpt-switch-local.sh (3)

27-29: Remove unused variable MORPH_ROOT.

Static analysis (SC2034) correctly identifies that MORPH_ROOT is declared but never used in this script.

🔎 Proposed fix
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 cd "$SCRIPT_DIR"
 
-# 路径配置
-MORPH_ROOT="${SCRIPT_DIR}/../.."
 BIN_DIR="${SCRIPT_DIR}/bin"

250-278: Test environment uses permissive security settings.

The Geth instances are configured with --http.corsdomain="*", --http.vhosts="*", --ws.origins="*", and --authrpc.vhosts="*". This is acceptable for a local test environment but should never be used in production. Consider adding a comment to make this explicit.


196-196: Declare and assign separately to avoid masking return values (SC2155).

Shell best practice is to separate local declaration from command assignment to ensure the exit status of the command is not masked.

🔎 Proposed fix
-        local seq_node_id=$("$TENDERMINT_BIN" show-node-id --home "${SEQUENCER_NODE_DIR}")
+        local seq_node_id
+        seq_node_id=$("$TENDERMINT_BIN" show-node-id --home "${SEQUENCER_NODE_DIR}")

This pattern should also be applied to similar occurrences at lines 376-377, 457, 469, 475, and 533.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3d76f2 and 8ae80bd.

📒 Files selected for processing (15)
  • .gitignore
  • node/core/config.go
  • node/core/executor.go
  • node/types/retryable_client.go
  • ops/mpt-switch-test/README.md
  • ops/mpt-switch-test/check-nodes.sh
  • ops/mpt-switch-test/genesis-mpt.json
  • ops/mpt-switch-test/genesis-zk.json
  • ops/mpt-switch-test/local-test/engine-make-block-loop.sh
  • ops/mpt-switch-test/local-test/engine-make-block.sh
  • ops/mpt-switch-test/local-test/genesis.json
  • ops/mpt-switch-test/local-test/jwt-secret.txt
  • ops/mpt-switch-test/local-test/run-l2-exec-only.sh
  • ops/mpt-switch-test/send-txs.sh
  • ops/mpt-switch-test/test-mpt-switch-local.sh
✅ Files skipped from review due to trivial changes (1)
  • ops/mpt-switch-test/local-test/jwt-secret.txt
🧰 Additional context used
🧬 Code graph analysis (3)
node/core/config.go (2)
node/types/config.go (1)
  • L2Config (14-18)
node/flags/flags.go (3)
  • MptTime (43-47)
  • L2LegacyEthAddr (31-35)
  • L2LegacyEngineAddr (37-41)
node/core/executor.go (2)
node/types/retryable_client.go (1)
  • NewRetryableClient (44-55)
node/flags/flags.go (1)
  • MptTime (43-47)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/db/store.go (1)
  • Store (16-18)
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh

[high] 20-20: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


89-89: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


102-102: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/check-nodes.sh

[warning] 21-21: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 36-36: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 40-40: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 41-41: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 42-42: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 47-47: Declare and assign separately to avoid masking return values.

(SC2155)

ops/mpt-switch-test/local-test/engine-make-block.sh

[warning] 45-45: In POSIX sh, arithmetic base conversion is undefined.

(SC3052)

ops/mpt-switch-test/local-test/run-l2-exec-only.sh

[warning] 10-10: Remove space after = if trying to assign a value (for empty string, use var='' ... ).

(SC1007)

ops/mpt-switch-test/local-test/engine-make-block-loop.sh

[warning] 8-8: Remove space after = if trying to assign a value (for empty string, use var='' ... ).

(SC1007)

ops/mpt-switch-test/send-txs.sh

[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 46-46: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 59-59: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 100-100: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 118-118: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 119-119: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 127-127: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 182-182: Quote this to prevent word splitting.

(SC2046)


[warning] 189-189: Quote this to prevent word splitting.

(SC2046)

ops/mpt-switch-test/test-mpt-switch-local.sh

[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 196-196: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 376-376: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 377-377: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 457-457: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 469-469: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 475-475: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 533-533: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (19)
.gitignore (1)

34-39: LGTM!

The ignore patterns appropriately exclude test artifacts, logs, and generated data from the MPT switch test infrastructure.

ops/mpt-switch-test/check-nodes.sh (2)

14-54: LGTM!

The node health check logic correctly handles error cases (lines 25-29) and provides clear status output. The shellcheck SC2155 warnings about combined declaration and assignment are acceptable here since error handling is present.


73-106: Well-designed stateRoot comparison logic.

The script correctly selects the smaller block number for comparison (lines 78-82), avoiding false negatives when nodes are at different heights. The result messaging appropriately notes that differences are expected between ZK and MPT implementations.

ops/mpt-switch-test/local-test/engine-make-block.sh (1)

28-35: LGTM: JWT token generation follows Engine API spec.

The HS256 JWT implementation correctly encodes the header and payload, computes the HMAC-SHA256 signature using the hex secret, and applies base64url encoding per the Engine API authentication specification.

ops/mpt-switch-test/local-test/run-l2-exec-only.sh (2)

44-67: Verify that this configuration is restricted to isolated test environments.

The geth configuration includes security-sensitive settings appropriate only for local testing:

  • Line 54: Exposes personal API enabling account/key management
  • Line 64: --allow-insecure-unlock permits account unlocking over HTTP
  • Lines 50-53: Binds to 0.0.0.0 with unrestricted CORS

Ensure this script is never used in production or network-accessible environments.


10-10: Shellcheck SC1007 is a false positive.

The CDPATH= cd pattern (line 10) is a well-known idiom to prevent CDPATH environment variable interference when resolving script directories. The space after = is intentional and correct.

ops/mpt-switch-test/send-txs.sh (1)

18-22: LGTM: Standard test account key.

Lines 19-22 correctly document that this is the well-known Hardhat/Foundry test account private key. The Gitleaks alert is expected and acceptable for test infrastructure. The key corresponds to the first account in the standard test mnemonic and should never hold real funds.

ops/mpt-switch-test/local-test/engine-make-block-loop.sh (2)

13-22: LGTM: Resilient loop design with proper error handling.

The script correctly disables set -e (line 15) to prevent loop termination on individual failures, captures the exit code (line 17), and provides observability via timestamped logging (line 20). This ensures continuous operation for the test harness.


8-8: Shellcheck SC1007 is a false positive.

The CDPATH= cd pattern is a standard idiom to neutralize the CDPATH environment variable when resolving script directories. The syntax is correct.

node/core/config.go (2)

31-33: LGTM: Configuration fields properly initialized.

The new L2Legacy and MptTime fields are correctly declared and initialized. Line 47 ensures L2Legacy is initialized in DefaultConfig(), addressing the previously reported nil pointer dereference issue. The initialization pattern is consistent with the existing L2 field.

Also applies to: 47-47


128-133: LGTM: Legacy L2 configuration wiring is correct.

The legacy endpoint configuration properly:

  • Retrieves values from the new CLI flags (lines 128-129)
  • Assigns them to the initialized L2Legacy struct (lines 130-131)
  • Reuses the same JWT secret as the primary L2 endpoint (line 132), which is appropriate for local testing
  • Populates MptTime from the corresponding flag (line 133)
node/core/executor.go (2)

73-91: LGTM: Legacy L2 client initialization and updated RetryableClient construction.

The legacy ZK endpoint clients (laClient, leClient) are properly initialized with context and error handling. The updated NewRetryableClient call correctly passes all six parameters including config.MptTime and config.Logger, aligning with the constructor signature in node/types/retryable_client.go.


287-296: LGTM: EnsureSwitched call for P2P-synced blocks.

Good addition. The EnsureSwitched call ensures that sentry nodes (which receive blocks via P2P sync rather than NewL2Block) properly trigger the MPT client switch when the upgrade timestamp is reached. The inline comment clearly documents this edge case.

ops/mpt-switch-test/test-mpt-switch-local.sh (2)

1-26: Well-structured test harness for MPT switch testing.

The script provides a comprehensive local test environment with clear architecture documentation, proper lifecycle management (start/stop/clean), and helpful monitoring utilities. The inline architecture diagram (lines 4-22) is particularly useful for understanding the test setup.


340-342: Sentry node configuration with --dev-sequencer flag is intentional for this test setup.

The --dev-sequencer flag, defined in node/flags/flags.go, is designed "explicitly specify that running as a sequencer. If it enables, the tendermint validator/batch params won't be changed no matter what happens to staking/gov contacts. Only use in dev/test mode." Both the sequencer and sentry nodes use this flag in this test to avoid relying on L1 staking/governance contract interactions, which is appropriate for an isolated dev/test environment (test-mpt-switch-local.sh). The flag disables contract-based parameter updates rather than forcing sequencer behavior. The sentry node still functions as a follower node that syncs blocks, as documented at line 317.

Likely an incorrect or invalid review comment.

node/types/retryable_client.go (4)

31-55: LGTM: Updated struct and constructor with MPT switch support.

The RetryableClient struct now properly includes legacy and MPT clients, with mptTime passed as a constructor parameter (addressing the previous review concern). The atomic.Bool for mpt state is appropriate for concurrent access.


57-69: LGTM: Client accessor methods.

The aClient() and eClient() methods provide clean routing logic based on the atomic mpt flag. The use of atomic.Bool ensures thread-safe reads.


136-154: Verify timestamp source for AssembleL2Block.

AssembleL2Block uses time.Now().Unix() (line 137) for the timestamp passed to switchClient and the RPC call. However, the actual block timestamp will be determined by the sequencer/geth. If there's clock drift or the assembled block gets a different timestamp, the switch decision may be made prematurely or delayed.

Consider whether this timestamp should come from a more authoritative source, or if using current wall-clock time is acceptable for the switch check.


212-244: CommitBatch and AppendBlsSignature don't trigger MPT switch check.

Unlike block-related methods, CommitBatch and AppendBlsSignature don't call switchClient. This may be intentional since batch operations might not have a timestamp context, but verify this is the expected behavior—especially if batches could span the MPT upgrade boundary.

@@ -0,0 +1,69 @@
#!/bin/sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix shebang or use POSIX-compatible arithmetic.

Line 1 declares #!/bin/sh but line 45 uses bash-specific arithmetic base conversion (16#$CUR_NOX), which is not POSIX-compliant. This will fail on systems where /bin/sh is dash, ash, or another POSIX shell.

🔎 Recommended fix (change shebang to bash)
-#!/bin/sh
+#!/bin/bash
 # Assemble and insert a block via Engine API (authrpc :8551, JWT-based)

Alternatively, use POSIX-compatible arithmetic:

 CUR_NOX=${CUR_HEX#0x}
-NEXT_DEC=$((16#$CUR_NOX + 1))
+NEXT_DEC=$(printf '%d' "0x$CUR_NOX")
+NEXT_DEC=$((NEXT_DEC + 1))
 NEXT_HEX=$(printf '0x%X' "$NEXT_DEC")

Also applies to: 44-46

🤖 Prompt for AI Agents
In ops/mpt-switch-test/local-test/engine-make-block.sh around lines 1 and 44-46,
the script uses bash-specific arithmetic (16#$CUR_NOX) while declaring
#!/bin/sh; either update the shebang to a bash interpreter (e.g., #!/usr/bin/env
bash) so the existing base-16 arithmetic works, or rewrite the conversion to a
POSIX-compatible form (for example use printf/awk to convert hex to decimal or
use a portable printf lookup) and keep the POSIX #!/bin/sh; ensure the chosen
approach is used consistently and test on a system where /bin/sh is dash.

;;
stop)
if [ -f "$PID_FILE" ]; then
kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Quote command substitutions to prevent word splitting.

Lines 182 and 189 use unquoted $(cat "$PID_FILE") which can cause unexpected behavior if the PID file contains whitespace or special characters.

🔎 Proposed fix
     stop)
         if [ -f "$PID_FILE" ]; then
-            kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
+            kill "$(cat "$PID_FILE")" 2>/dev/null && rm "$PID_FILE"
             log_success "Stopped"
         else
             log_info "Not running"
     status)
-        if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
-            log_success "Running (PID: $(cat $PID_FILE))"
+        if [ -f "$PID_FILE" ] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
+            log_success "Running (PID: $(cat "$PID_FILE"))"
         else

Also applies to: 189-189

🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 182-182: Quote this to prevent word splitting.

(SC2046)

🤖 Prompt for AI Agents
In ops/mpt-switch-test/send-txs.sh around lines 182 and 189, the command
substitutions used to read the PID file are unquoted which can cause
word-splitting if the PID file contains whitespace or special characters; update
those lines to quote the command substitution (or better, read the PID into a
variable first and then quote the variable) so the PID is treated as a single
token when passed to kill and file operations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
ops/mpt-switch-test/send-txs.sh (1)

182-182: Quote command substitutions to prevent word splitting.

This was previously flagged. The unquoted $(cat "$PID_FILE") can cause unexpected behavior.

Also applies to: 189-190

🧹 Nitpick comments (7)
ops/mpt-switch-test/README.md (1)

7-18: Consider adding language identifiers to fenced code blocks.

Per markdownlint MD040, fenced code blocks should specify a language for proper syntax highlighting. Use text or plaintext for ASCII diagrams and log output.

🔎 Proposed fix for line 7
-```
+```text
 Before upgrade:                      After upgrade (swap):
🔎 Proposed fix for line 89
-```
+```text
 MPT switch time reached, MUST wait for MPT node to sync
🔎 Proposed fix for line 102
-```
+```text
 mpt-switch-test/

Also applies to: 89-98, 102-117

ops/mpt-switch-test/send-txs.sh (4)

7-8: Remove unused GETH_BIN variable and verify port assignment.

GETH_BIN is declared but never used. Additionally, ZK_GETH_HTTP points to port 9545, but according to test-mpt-switch-local.sh, port 9545 is the MPT Geth port (ZK Geth uses 8545). This naming inconsistency could cause confusion.

🔎 Proposed fix
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-GETH_BIN="${SCRIPT_DIR}/bin/geth"
-ZK_GETH_HTTP="http://127.0.0.1:9545"
+# Default to MPT Geth port (9545) - can be overridden
+GETH_HTTP="${GETH_HTTP:-http://127.0.0.1:9545}"

Then update references from $ZK_GETH_HTTP to $GETH_HTTP throughout the script.


77-89: Remove unused send_tx_with_cast function.

This function is defined but never called. The main loop (lines 124-145) contains its own inline cast send implementation. Either remove this dead code or refactor the main loop to use this function.

🔎 Proposed fix - Option 1: Remove dead code
-# Send transaction using cast (if foundry is installed)
-send_tx_with_cast() {
-    if command -v cast &> /dev/null; then
-        cast send --private-key "$PRIVATE_KEY" \
-            --rpc-url "$ZK_GETH_HTTP" \
-            "$TO_ADDRESS" \
-            --value 1wei \
-            --gas-price 0 \
-            2>/dev/null
-        return $?
-    fi
-    return 1
-}
-

52-75: Remove unused send_tx function.

Similar to send_tx_with_cast, this function using eth_sendTransaction is defined but never called in the main loop. The main loop exclusively uses cast send.


103-111: Balance check may incorrectly reject valid accounts.

The condition [ "$balance" == "0x0" ] only catches exact zero balance, but check_balance returns the raw hex string which might be empty or malformed on RPC errors. The -z "$balance" check helps, but consider also validating that balance starts with 0x to distinguish between zero balance and RPC failure.

🔎 Proposed improvement
     if [ "$balance" == "0x0" ] || [ -z "$balance" ]; then
+        if [ -z "$balance" ]; then
+            echo "Warning: Could not retrieve account balance (RPC error?)"
+        else
+            echo "Warning: Account balance is 0!"
+        fi
         echo ""
-        echo "Warning: Account balance is 0 or cannot be retrieved!"
ops/mpt-switch-test/test-mpt-switch-local.sh (2)

28-28: Remove unused MORPH_ROOT variable.

This variable is declared but never referenced in the script.

🔎 Proposed fix
 # Path configuration
-MORPH_ROOT="${SCRIPT_DIR}/../.."
 BIN_DIR="${SCRIPT_DIR}/bin"

376-377: Consider separating variable declaration and assignment.

Multiple lines combine local declaration with command substitution, which can mask return values. While this is low-risk for a test script, separating them improves debuggability.

🔎 Example fix for lines 376-377
     for pid_file in "$SENTRY_NODE_PID" "$SEQUENCER_NODE_PID" "$MPT_GETH_PID" "$ZK_GETH_PID"; do
         if [ -f "$pid_file" ]; then
-            local pid=$(cat "$pid_file")
-            local name=$(basename "$pid_file" .pid)
+            local pid
+            local name
+            pid=$(cat "$pid_file")
+            name=$(basename "$pid_file" .pid)

Also applies to: 457-457, 469-469, 475-475, 533-533

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae80bd and d5f2d8d.

📒 Files selected for processing (4)
  • node/core/sequencers.go
  • ops/mpt-switch-test/README.md
  • ops/mpt-switch-test/send-txs.sh
  • ops/mpt-switch-test/test-mpt-switch-local.sh
✅ Files skipped from review due to trivial changes (1)
  • node/core/sequencers.go
🧰 Additional context used
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh

[high] 20-20: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


89-89: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


102-102: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/send-txs.sh

[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 46-46: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 59-59: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 100-100: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 118-118: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 119-119: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 127-127: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 182-182: Quote this to prevent word splitting.

(SC2046)


[warning] 189-189: Quote this to prevent word splitting.

(SC2046)

ops/mpt-switch-test/test-mpt-switch-local.sh

[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 196-196: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 376-376: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 377-377: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 457-457: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 469-469: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 475-475: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 533-533: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (5)
ops/mpt-switch-test/send-txs.sh (1)

18-22: Test private key is acceptable, but clarify the comment.

This is Foundry's default test account #0 private key, which is fine for local testing. The comment already notes this, which is good practice.

ops/mpt-switch-test/test-mpt-switch-local.sh (4)

253-267: Permissive CORS and vhosts settings are acceptable for local testing.

The --http.corsdomain="*", --http.vhosts="*", --ws.origins="*", and --authrpc.vhosts="*" settings are intentionally permissive for a local test environment. This is appropriate given the test-only context.


93-121: Well-structured dependency checking.

The check_dependencies function clearly validates required binaries and provides actionable error messages. Good implementation.


525-606: Main function and command dispatch are well-organized.

The command-line interface with start/stop/clean/status/monitor/logs/help commands is cleanly implemented with good user feedback.


340-342: Clarify the node mode flag for the sentry node.

The sentry node is explicitly described as a "non-validator (follower node)" but is started with --dev-sequencer. The flag's documentation states it "explicitly specify[ies] that running as a sequencer," which semantically contradicts the sentry node's role. Either add a comment explaining why sentry nodes use this flag in dev/test mode, or consider if a different flag or approach is needed.

export MORPH_NODE_L2_ENGINE_RPC="http://127.0.0.1:${MPT_GETH_AUTH_PORT}"
export MORPH_NODE_L2_ENGINE_AUTH="$jwt_file"
export MORPH_NODE_MPT_TIME="$mpt_time"
export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:9545}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

L1_ETH_RPC defaults to port 9545, which conflicts with MPT Geth.

Both sequencer and sentry nodes default L1_ETH_RPC to http://127.0.0.1:9545, but port 9545 is already assigned to MPT Geth's HTTP endpoint. This will likely cause issues unless L1_ETH_RPC is explicitly set by the caller.

🔎 Proposed fix

Consider using a different default port or requiring explicit configuration:

-    export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:9545}"
+    export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:8545}"

Or document that L1_ETH_RPC must be set before running the test.

Also applies to: 335-335

🤖 Prompt for AI Agents
In ops/mpt-switch-test/test-mpt-switch-local.sh around lines 300 and 335, the
script defaults L1_ETH_RPC to http://127.0.0.1:9545 which conflicts with MPT
Geth’s HTTP endpoint; update both occurrences to avoid the port collision by
either choosing a non-conflicting default port (e.g., 8545 or another free port)
or require the caller to set L1_ETH_RPC and fail fast with a clear error if it
is unset, and add a short comment or doc note in the script explaining that
L1_ETH_RPC must be explicitly configured to avoid conflicts with MPT Geth.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
node/types/retryable_client.go (1)

225-243: Use the actual block timestamp from the RPC response for MPT switching instead of system time.

In AssembleL2Block, the timestamp passed to switchClient at line 228 is generated from time.Now().Unix() at the moment of the call. However, this approach is inconsistent with other RPC methods (ValidateL2Block, NewL2Block, NewSafeL2Block) which all use the actual block timestamp for the switch decision. If the system time differs significantly from the assembled block's timestamp, the MPT upgrade switch will trigger at the wrong time. The timestamp should be obtained from the RPC response after the block is assembled rather than generated before the call.

🤖 Fix all issues with AI agents
In @node/core/config.go:
- Line 31: The JSON tag for the L2Legacy struct field is inconsistent with the
existing L2 field; update the struct tag on L2Legacy (symbol: L2Legacy) from
`json:"l_2_legacy"` to `json:"l2_legacy"` so it follows the existing `l2` style
and maintains consistent JSON naming conventions with the L2 field (symbol: L2).
- Around line 127-131: The L2Legacy CLI flags are assigned unconditionally
(l2LegacyEthAddr, l2LegacyEngineAddr) and may be empty; add validation before
assigning to c.L2Legacy.EthAddr and c.L2Legacy.EngineAddr: check that the flag
values are non-empty (or that the flags were set) and return/configure an error
if required, or skip assignment when unset; also ensure JwtSecret (secret)
handling remains consistent with this validation so legacy client initialization
cannot receive empty addresses.

In @ops/mpt-switch-test/README.md:
- Around line 39-40: The README mentions different genesis filenames; unify
references so operators aren’t misled: pick the actual file name present (e.g.,
genesis-l2.json) and update all occurrences in the “File Structure” section and
elsewhere (including the noted block around lines 102–123) to that chosen name,
or alternatively rename the files in the repo to match the README
(genesis-zk.json / genesis-mpt.json) and update any examples, scripts, and docs
(search for genesis-l2.json, genesis-zk.json, genesis-mpt.json) so all
references are consistent.
- Around line 7-18: The markdown has fenced code blocks without language tags
causing markdownlint MD040 failures; update each ASCII diagram block (e.g., the
block containing "Before upgrade:                      After upgrade (swap):"
and the box art) to use ```text and any shell/snippet blocks to use ```bash, and
apply the same change to the other affected blocks noted (the blocks around the
referenced diagrams and shell snippets) so all fenced blocks include the
appropriate language tag.

In @ops/mpt-switch-test/send-txs.sh:
- Around line 24-29: The script currently hard-codes PRIVATE_KEY (and related
test addresses) which trips secret scanners; change the script to read
PRIVATE_KEY from an environment variable (e.g., ${PRIVATE_KEY}) or an untracked
local file (like .env or .secret) and exit with a clear error if it is not set;
keep FROM_ADDRESS and TO_ADDRESS configurable via env vars too (e.g.,
${FROM_ADDRESS:-...} / ${TO_ADDRESS:-...}) so defaults can be provided without
committing secrets, and ensure no default plaintext private key remains in the
code or repo.
- Around line 178-204: The bg/start/stop/status PID handling should avoid
word-splitting and ensure the .testdata dir exists: create the directory before
launching (e.g., mkdir -p "${SCRIPT_DIR}/.testdata"), when starting use nohup
... & and capture the PID into a variable (pid=$!; echo "$pid" > "$PID_FILE")
and log that variable, and everywhere you read the PID use a quoted expansion or
a read into a variable (pid="$(cat "$PID_FILE")") and call kill with the quoted
pid (kill "$pid" 2>/dev/null) and kill -0 "$pid" for status checks; also check
the PID file is non-empty before attempting kill and remove the PID file only
after a successful kill.

In @ops/mpt-switch-test/test-mpt-switch-local.sh:
- Around line 390-396: The clean_data() function currently removes
"${SCRIPT_DIR}/genesis-l2.json" which is incorrect for this script that uses
genesis-zk.json and genesis-mpt.json; update the cleanup to remove the actual
files used (e.g., genesis-zk.json and genesis-mpt.json) or remove the
genesis-l2.json deletion entirely and instead delete the correct filenames to
align with the README/scripts naming; modify the rm -f target in clean_data() to
reference the correct genesis filenames (or make it a noop for genesis-l2.json)
so cleanup matches the files created/used by this script.
- Around line 233-279: The start_geth function currently exposes admin/debug
JSON-RPC on 0.0.0.0 with wildcard CORS/VHosts; change the safe defaults to bind
HTTP/WS to loopback (replace --http.addr=0.0.0.0 and --ws.addr=0.0.0.0 with
127.0.0.1), restrict --http.api and --ws.api to non-privileged sets (remove
debug, admin, morph, engine, etc.), and set --http.corsdomain and --http.vhosts
to be restrictive (avoid "*"); keep the ability to override these defaults via
the extra_args parameter so tests can opt into broader exposure when explicitly
required.
🧹 Nitpick comments (5)
ops/mpt-switch-test/README.md (1)

43-45: Avoid user-specific absolute paths in usage examples.

The cd /Users/... path is not portable; use a repo-relative example (or $MORPH_ROOT).

ops/mpt-switch-test/send-txs.sh (1)

7-8: Trim dead code or make the fallback path real (GETH_BIN + eth_sendTransaction helpers).

Right now the script effectively requires cast (Line 131+) and exits otherwise, while GETH_BIN, send_tx(), and send_tx_with_cast() aren’t used in the main loop. Either remove them or wire a real fallback so the script works without Foundry.

Also applies to: 58-96, 130-154

ops/mpt-switch-test/test-mpt-switch-local.sh (1)

233-279: Quoting/word-splitting hardening for bash args + readiness checks.

extra_args as a string (Line 243-275) can behave unexpectedly when flags contain spaces; also wait_for_geth checks only TCP/HTTP, not JSON-RPC readiness (e.g., eth_blockNumber).

Also applies to: 350-369

node/derivation/derivation.go (1)

141-141: Consider using the existing logger instance for consistency.

The code creates a new logger with tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout)) instead of using the existing logger configured at line 115. This may bypass any custom logging configuration.

♻️ Proposed fix to use existing logger
-		l2Client:              types.NewRetryableClient(laClient, leClient, aClient, eClient, cfg.L2Legacy.EthAddr, tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout))),
+		l2Client:              types.NewRetryableClient(laClient, leClient, aClient, eClient, cfg.L2Legacy.EthAddr, logger),
node/types/retryable_client.go (1)

58-98: Consider adding a timeout for the RPC call to prevent initialization hangs.

The fetchMPTForkTime function makes an RPC call without a timeout context, which could cause initialization to hang if the legacy geth is unresponsive.

♻️ Proposed fix to add timeout
 func fetchMPTForkTime(rpcURL string, logger tmlog.Logger) (uint64, error) {
-	client, err := rpc.Dial(rpcURL)
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+	defer cancel()
+	
+	client, err := rpc.DialContext(ctx, rpcURL)
 	if err != nil {
 		return 0, fmt.Errorf("failed to connect to geth: %w", err)
 	}
 	defer client.Close()
 
 	var result json.RawMessage
-	if err := client.Call(&result, "eth_config"); err != nil {
+	if err := client.CallContext(ctx, &result, "eth_config"); err != nil {
 		return 0, fmt.Errorf("eth_config call failed: %w", err)
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5f2d8d and fa2307f.

⛔ Files ignored due to path filters (7)
  • bindings/go.sum is excluded by !**/*.sum
  • contracts/go.sum is excluded by !**/*.sum
  • ops/l2-genesis/go.sum is excluded by !**/*.sum
  • ops/tools/go.sum is excluded by !**/*.sum
  • oracle/go.sum is excluded by !**/*.sum
  • token-price-oracle/go.sum is excluded by !**/*.sum
  • tx-submitter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • bindings/go.mod
  • contracts/go.mod
  • go-ethereum
  • node/core/config.go
  • node/core/executor.go
  • node/derivation/config.go
  • node/derivation/derivation.go
  • node/flags/flags.go
  • node/types/retryable_client.go
  • ops/l2-genesis/go.mod
  • ops/mpt-switch-test/README.md
  • ops/mpt-switch-test/send-txs.sh
  • ops/mpt-switch-test/test-mpt-switch-local.sh
  • ops/tools/go.mod
  • oracle/go.mod
  • token-price-oracle/go.mod
  • tx-submitter/go.mod
✅ Files skipped from review due to trivial changes (2)
  • go-ethereum
  • oracle/go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
  • node/flags/flags.go
  • node/derivation/config.go
🧰 Additional context used
🧬 Code graph analysis (4)
node/derivation/derivation.go (1)
node/types/retryable_client.go (1)
  • NewRetryableClient (114-133)
node/core/executor.go (1)
node/types/retryable_client.go (1)
  • NewRetryableClient (114-133)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/core/config.go (2)
node/types/config.go (1)
  • L2Config (14-18)
node/flags/flags.go (2)
  • L2LegacyEthAddr (31-35)
  • L2LegacyEngineAddr (37-41)
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh

[high] 26-26: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


89-89: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


102-102: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


171-171: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/test-mpt-switch-local.sh

[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 196-196: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 377-377: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 378-378: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 458-458: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 470-470: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 476-476: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 534-534: Declare and assign separately to avoid masking return values.

(SC2155)

ops/mpt-switch-test/send-txs.sh

[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 36-36: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 44-44: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 52-52: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 65-65: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 107-107: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 125-125: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 126-126: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 134-134: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 192-192: Quote this to prevent word splitting.

(SC2046)


[warning] 199-199: Quote this to prevent word splitting.

(SC2046)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (14)
ops/mpt-switch-test/test-mpt-switch-local.sh (2)

315-348: Verify sentry node mode/flags (currently started with --dev-sequencer).

The comments describe sentry as a non-validator follower, but it’s started with --dev-sequencer (Line 342), which sounds like it enables block production behavior. Please confirm this is intended, or switch to the correct follower/sentry mode if one exists.


559-567: Confirm geth supports --override.mptforktime for your shipped binary.

The whole test hinges on --override.mptforktime=$mpt_time being a valid flag in the bin/geth you expect operators to use.

node/core/config.go (1)

46-46: LGTM!

The L2Legacy initialization follows the same pattern as the L2 configuration at line 45.

tx-submitter/go.mod (1)

12-12: Verify the commit 796834acba86 from morph-l2/go-ethereum and confirm stability.

The go-ethereum dependency v1.10.14-0.20251211075654-796834acba86 is consistently pinned across 6+ modules in the repository (tx-submitter, ops/tools, token-price-oracle, bindings, ops/l2-genesis, node, oracle, contracts). However, the underlying commit cannot be verified via public repository access. Confirm that commit 796834acba86 exists in your internal morph-l2/go-ethereum fork, is stable, and free from security issues.

node/derivation/derivation.go (1)

81-89: LGTM! Legacy client initialization is correct.

The legacy L2 client initialization properly handles errors and uses appropriate contexts for setup.

node/core/executor.go (4)

73-81: LGTM! Legacy client initialization is correct.

The initialization follows the same pattern as in derivation.go with appropriate error handling.


91-91: LGTM! RetryableClient construction is correct.

The construction properly passes all legacy and current clients, along with the configured logger.


317-321: LGTM! Enhanced CRITICAL logging improves observability.

The enhanced logging clearly indicates when geth is behind and provides actionable guidance.


352-360: LGTM! Enhanced error logging improves debuggability.

The multi-line CRITICAL logging with context and hints significantly improves the ability to diagnose NewL2Block failures, especially MPT-related issues.

node/types/retryable_client.go (3)

100-133: LGTM! RetryableClient refactor is well-designed.

The expanded struct properly handles legacy and current clients with atomic state management for MPT switching. The constructor gracefully handles errors in MPT fork time fetching.


135-147: LGTM! Client routing helpers are correct and thread-safe.

The aClient and eClient methods properly route to legacy or current clients based on atomic MPT state.


320-424: LGTM! RPC methods correctly route through client selectors.

The remaining RPC methods properly use aClient() or eClient() which respect the global MPT switch state. The methods that don't have timestamp information (CommitBatch, AppendBlsSignature, etc.) correctly rely on the global switch state set by previous calls.

ops/l2-genesis/go.mod (1)

9-9: All go-ethereum dependencies are already consistent across modules. All 8 go.mod files (bindings, contracts, node, ops/l2-genesis, ops/tools, oracle, token-price-oracle, tx-submitter) use the same version: v1.10.14-0.20251211075654-796834acba86. No action needed.

contracts/go.mod (1)

9-9: Verify the updated dependency version against the upstream repository and security advisories.

The pseudo-version format is syntactically valid, but the actual commit hash (796834acba86) and any security advisories require verification via GitHub or local testing (e.g., go mod tidy, go mod verify). Ensure the upstream commit exists in morph-l2/go-ethereum and check for any known vulnerabilities before merging.

)

type Config struct {
L2Legacy *types.L2Config `json:"l_2_legacy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

JSON tag naming inconsistency.

The JSON tag l_2_legacy uses an underscore between "2" and "legacy", while the existing L2 field at line 32 uses l2 without separating underscores. For consistency, consider using l2_legacy instead of l_2_legacy.

📝 Suggested fix
-	L2Legacy                      *types.L2Config `json:"l_2_legacy"`
+	L2Legacy                      *types.L2Config `json:"l2_legacy"`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
L2Legacy *types.L2Config `json:"l_2_legacy"`
L2Legacy *types.L2Config `json:"l2_legacy"`
🤖 Prompt for AI Agents
In @node/core/config.go at line 31, The JSON tag for the L2Legacy struct field
is inconsistent with the existing L2 field; update the struct tag on L2Legacy
(symbol: L2Legacy) from `json:"l_2_legacy"` to `json:"l2_legacy"` so it follows
the existing `l2` style and maintains consistent JSON naming conventions with
the L2 field (symbol: L2).

Comment on lines 7 to 18
```
Before upgrade: After upgrade (swap):
┌─────────────┐ ┌─────────────┐
│ Sequencer │──► ZK Geth │ Sequencer │──► MPT Geth
│ Node │ (:8545) │ Node │ (:9545)
└─────────────┘ └─────────────┘
──────►
┌─────────────┐ ┌─────────────┐
│ Sentry │──► MPT Geth │ Sentry │──► ZK Geth
│ Node │ (:9545) │ Node │ (:8545)
└─────────────┘ └─────────────┘
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add fenced code block languages to satisfy markdownlint (MD040).

Add text for the ASCII diagrams/output blocks and bash for shell snippets.

Also applies to: 89-98, 102-123, 171-187

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In @ops/mpt-switch-test/README.md around lines 7 - 18, The markdown has fenced
code blocks without language tags causing markdownlint MD040 failures; update
each ASCII diagram block (e.g., the block containing "Before upgrade:           
After upgrade (swap):" and the box art) to use ```text and any shell/snippet
blocks to use ```bash, and apply the same change to the other affected blocks
noted (the blocks around the referenced diagrams and shell snippets) so all
fenced blocks include the appropriate language tag.

Comment on lines 39 to 40
Genesis file (`genesis-l2.json`) is already included in the directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix genesis filename inconsistency (genesis-l2.json vs genesis-zk.json/genesis-mpt.json).

Line 39 mentions genesis-l2.json, but the “File Structure” section lists genesis-zk.json and genesis-mpt.json. This will mislead operators when setting up the test harness.

Also applies to: 102-123

🤖 Prompt for AI Agents
In @ops/mpt-switch-test/README.md around lines 39 - 40, The README mentions
different genesis filenames; unify references so operators aren’t misled: pick
the actual file name present (e.g., genesis-l2.json) and update all occurrences
in the “File Structure” section and elsewhere (including the noted block around
lines 102–123) to that chosen name, or alternatively rename the files in the
repo to match the README (genesis-zk.json / genesis-mpt.json) and update any
examples, scripts, and docs (search for genesis-l2.json, genesis-zk.json,
genesis-mpt.json) so all references are consistent.

Comment on lines 24 to 29
# Test account (needs balance in genesis)
# This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"
TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove hard-coded private key (will trip secret scanning + unsafe by default).

Even if it’s a “commonly used test key”, committing a private key is typically a CI blocker (gitleaks) and a bad precedent. Load it from an env var or a local untracked file instead.

Proposed fix
-# Test account (needs balance in genesis)
-# This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
-PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
+## Test account (needs balance in genesis)
+## Do NOT hardcode private keys in-repo; provide via env var.
+## Example:
+##   export MORPH_TEST_PRIVATE_KEY=0x...
+PRIVATE_KEY="${MORPH_TEST_PRIVATE_KEY:-}"
 FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"
 TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8"
     log_info "Account balance: $balance"
+
+    if [ -z "$PRIVATE_KEY" ]; then
+        echo ""
+        echo "Error: MORPH_TEST_PRIVATE_KEY is not set"
+        echo "Set it to the private key for FROM_ADDRESS (must be funded in genesis)."
+        exit 1
+    fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Test account (needs balance in genesis)
# This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"
TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8"
## Test account (needs balance in genesis)
## Do NOT hardcode private keys in-repo; provide via env var.
## Example:
## export MORPH_TEST_PRIVATE_KEY=0x...
PRIVATE_KEY="${MORPH_TEST_PRIVATE_KEY:-}"
FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"
TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8"
🧰 Tools
🪛 Gitleaks (8.30.0)

[high] 26-26: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
In @ops/mpt-switch-test/send-txs.sh around lines 24 - 29, The script currently
hard-codes PRIVATE_KEY (and related test addresses) which trips secret scanners;
change the script to read PRIVATE_KEY from an environment variable (e.g.,
${PRIVATE_KEY}) or an untracked local file (like .env or .secret) and exit with
a clear error if it is not set; keep FROM_ADDRESS and TO_ADDRESS configurable
via env vars too (e.g., ${FROM_ADDRESS:-...} / ${TO_ADDRESS:-...}) so defaults
can be provided without committing secrets, and ensure no default plaintext
private key remains in the code or repo.

Comment on lines 178 to 204
PID_FILE="${SCRIPT_DIR}/.testdata/send-txs.pid"

case "${1:-start}" in
start)
main
;;
bg)
log_info "Starting in background..."
nohup "$0" start > "${SCRIPT_DIR}/.testdata/send-txs.log" 2>&1 &
echo $! > "$PID_FILE"
log_success "Started (PID: $(cat $PID_FILE))"
;;
stop)
if [ -f "$PID_FILE" ]; then
kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
log_success "Stopped"
else
log_info "Not running"
fi
;;
status)
if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
log_success "Running (PID: $(cat $PID_FILE))"
else
log_info "Not running"
fi
;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix PID handling and quoting (ShellCheck SC2046) + ensure .testdata exists for bg mode.

kill $(cat ...) risks word-splitting, and bg may fail if .testdata/ doesn’t exist yet.

Proposed fix
 PID_FILE="${SCRIPT_DIR}/.testdata/send-txs.pid"

 case "${1:-start}" in
@@
     bg)
         log_info "Starting in background..."
+        mkdir -p "$(dirname "$PID_FILE")"
         nohup "$0" start > "${SCRIPT_DIR}/.testdata/send-txs.log" 2>&1 &
         echo $! > "$PID_FILE"
         log_success "Started (PID: $(cat $PID_FILE))"
         ;;
     stop)
         if [ -f "$PID_FILE" ]; then
-            kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
+            pid="$(cat "$PID_FILE")"
+            kill "$pid" 2>/dev/null && rm "$PID_FILE"
             log_success "Stopped"
         else
             log_info "Not running"
         fi
         ;;
     status)
-        if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
+        if [ -f "$PID_FILE" ] && pid="$(cat "$PID_FILE")" && kill -0 "$pid" 2>/dev/null; then
             log_success "Running (PID: $(cat $PID_FILE))"
         else
             log_info "Not running"
         fi
         ;;
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 192-192: Quote this to prevent word splitting.

(SC2046)


[warning] 199-199: Quote this to prevent word splitting.

(SC2046)

🤖 Prompt for AI Agents
In @ops/mpt-switch-test/send-txs.sh around lines 178 - 204, The
bg/start/stop/status PID handling should avoid word-splitting and ensure the
.testdata dir exists: create the directory before launching (e.g., mkdir -p
"${SCRIPT_DIR}/.testdata"), when starting use nohup ... & and capture the PID
into a variable (pid=$!; echo "$pid" > "$PID_FILE") and log that variable, and
everywhere you read the PID use a quoted expansion or a read into a variable
(pid="$(cat "$PID_FILE")") and call kill with the quoted pid (kill "$pid"
2>/dev/null) and kill -0 "$pid" for status checks; also check the PID file is
non-empty before attempting kill and remove the PID file only after a successful
kill.

Comment on lines 233 to 279
# Start Geth generic function
start_geth() {
local name=$1
local geth_bin=$2
local datadir=$3
local http_port=$4
local ws_port=$5
local auth_port=$6
local p2p_port=$7
local pid_file=$8
local extra_args=${9:-""}

log_info "Starting ${name} on ports HTTP:${http_port} AUTH:${auth_port} P2P:${p2p_port}..."
log_info " Binary: $(basename $geth_bin)"

local jwt_file="${TEST_DATA_DIR}/jwt-secret.txt"

"$geth_bin" \
--datadir="$datadir" \
--networkid=53077 \
--http \
--http.addr=0.0.0.0 \
--http.port=$http_port \
--http.api=web3,debug,eth,txpool,net,morph,engine,admin \
--http.corsdomain="*" \
--http.vhosts="*" \
--ws \
--ws.addr=0.0.0.0 \
--ws.port=$ws_port \
--ws.api=web3,debug,eth,txpool,net,morph,engine,admin \
--ws.origins="*" \
--authrpc.addr=0.0.0.0 \
--authrpc.port=$auth_port \
--authrpc.vhosts="*" \
--authrpc.jwtsecret="$jwt_file" \
--port=$p2p_port \
--nodiscover \
--gcmode=archive \
--cache.preimages \
--miner.gasprice=0 \
--verbosity=3 \
$extra_args \
> "${TEST_DATA_DIR}/${name}.log" 2>&1 &

echo $! > "$pid_file"
log_success "${name} started (PID: $(cat $pid_file))"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t expose admin/debug JSON-RPC on 0.0.0.0 with CORS/VHosts wildcard.

With --http.addr=0.0.0.0, --http.api=...debug...admin..., --http.corsdomain="*", and --http.vhosts="*", anyone on the network can hit powerful APIs. Default this to loopback and restrict APIs.

Proposed fix (safer defaults)
     "$geth_bin" \
         --datadir="$datadir" \
         --networkid=53077 \
         --http \
-        --http.addr=0.0.0.0 \
+        --http.addr=127.0.0.1 \
         --http.port=$http_port \
-        --http.api=web3,debug,eth,txpool,net,morph,engine,admin \
-        --http.corsdomain="*" \
-        --http.vhosts="*" \
+        --http.api=web3,eth,txpool,net,morph,engine \
+        --http.corsdomain="http://localhost" \
+        --http.vhosts="localhost" \
         --ws \
-        --ws.addr=0.0.0.0 \
+        --ws.addr=127.0.0.1 \
         --ws.port=$ws_port \
-        --ws.api=web3,debug,eth,txpool,net,morph,engine,admin \
-        --ws.origins="*" \
-        --authrpc.addr=0.0.0.0 \
+        --ws.api=web3,eth,txpool,net,morph,engine \
+        --ws.origins="http://localhost" \
+        --authrpc.addr=127.0.0.1 \
         --authrpc.port=$auth_port \
-        --authrpc.vhosts="*" \
+        --authrpc.vhosts="localhost" \
         --authrpc.jwtsecret="$jwt_file" \
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/test-mpt-switch-local.sh around lines 233 - 279, The
start_geth function currently exposes admin/debug JSON-RPC on 0.0.0.0 with
wildcard CORS/VHosts; change the safe defaults to bind HTTP/WS to loopback
(replace --http.addr=0.0.0.0 and --ws.addr=0.0.0.0 with 127.0.0.1), restrict
--http.api and --ws.api to non-privileged sets (remove debug, admin, morph,
engine, etc.), and set --http.corsdomain and --http.vhosts to be restrictive
(avoid "*"); keep the ability to override these defaults via the extra_args
parameter so tests can opt into broader exposure when explicitly required.

Comment on lines 390 to 396
# Clean data
clean_data() {
log_info "Cleaning test data..."
rm -rf "$TEST_DATA_DIR"
rm -f "${SCRIPT_DIR}/genesis-l2.json"
log_success "Test data cleaned"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clean step deletes genesis-l2.json, but this script uses genesis-zk.json/genesis-mpt.json.

Either remove the genesis-l2.json cleanup or align naming across the README/scripts to avoid surprises.

🤖 Prompt for AI Agents
In @ops/mpt-switch-test/test-mpt-switch-local.sh around lines 390 - 396, The
clean_data() function currently removes "${SCRIPT_DIR}/genesis-l2.json" which is
incorrect for this script that uses genesis-zk.json and genesis-mpt.json; update
the cleanup to remove the actual files used (e.g., genesis-zk.json and
genesis-mpt.json) or remove the genesis-l2.json deletion entirely and instead
delete the correct filenames to align with the README/scripts naming; modify the
rm -f target in clean_data() to reference the correct genesis filenames (or make
it a noop for genesis-l2.json) so cleanup matches the files created/used by this
script.

@curryxbo curryxbo changed the title WIP:Update l2 RetryableClient WIP:ZK to MPT Jan 12, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @node/derivation/derivation.go:
- Line 163: The l2Client construction is creating a new TM logger with os.Stdout
instead of using the passed-in logger; update the NewRetryableClient call (where
l2Client is assigned) to pass the configured logger variable (e.g., logger or
cfg.Logger) instead of tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout)) so it
matches how executor.go passes config.Logger to clients.

In @node/types/retryable_client.go:
- Around line 239-282: The loop in switchClient that repeatedly calls
rc.nextEthClient.BlockNumber can block forever if next geth is unreachable; make
it bounded and abortable by adding a configurable max wait and/or max retries
and by honoring context cancellation: introduce a config value (e.g.,
maxWaitDuration or maxRetries) read by switchClient, replace the for loop with a
select over ticker.C and ctx.Done() (and a time.After for the overall timeout),
increment an error counter or retry count on failures from
rc.nextEthClient.BlockNumber/HeaderByNumber and break/return an error after
exceeding maxRetries or when the overall timeout elapses, and ensure you return
a clear error state instead of looping indefinitely (update rc.switched only on
success as currently done).
🧹 Nitpick comments (6)
ops/mpt-switch-test/test-mpt-switch-local.sh (3)

28-28: Unused variable MORPH_ROOT.

This variable is declared but never used in the script.

🔧 Suggested fix
-MORPH_ROOT="${SCRIPT_DIR}/../.."
+# MORPH_ROOT="${SCRIPT_DIR}/../.." # Uncomment if needed

Or remove entirely if not planned for future use.


352-370: Consider a more robust health check.

The current check only verifies TCP connectivity. For production-grade testing, consider calling a JSON-RPC method like eth_blockNumber to confirm geth is fully operational.

🔧 Suggested enhancement
-    while ! curl -s "http://127.0.0.1:${port}" > /dev/null 2>&1; do
+    while ! curl -s -X POST -H "Content-Type: application/json" \
+        --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
+        "http://127.0.0.1:${port}" 2>/dev/null | grep -q '"result"'; do

569-574: Hardcoded sleep delays could be fragile.

The sleep 2 statements between service starts could be replaced with readiness checks for more reliable startup sequencing. However, this is acceptable for a local test script.

node/core/config.go (1)

127-136: Consider warning when only one L2Next address is provided.

If a user provides l2next.eth but not l2next.engine (or vice versa), the configuration is silently ignored. Consider logging a warning to help users identify misconfiguration.

🔧 Suggested enhancement
 	// L2Next is optional - only for upgrade switch (e.g., ZK to MPT)
 	l2NextEthAddr := ctx.GlobalString(flags.L2NextEthAddr.Name)
 	l2NextEngineAddr := ctx.GlobalString(flags.L2NextEngineAddr.Name)
 	if l2NextEthAddr != "" && l2NextEngineAddr != "" {
 		c.L2Next = &types.L2Config{
 			EthAddr:    l2NextEthAddr,
 			EngineAddr: l2NextEngineAddr,
 			JwtSecret:  secret, // same secret
 		}
+	} else if l2NextEthAddr != "" || l2NextEngineAddr != "" {
+		logger.Warn("L2Next partially configured - both l2next.eth and l2next.engine are required for upgrade switch")
 	}
node/types/retryable_client.go (2)

150-162: Fallback design prevents nil checks but could mask issues.

When next clients are not configured, they fallback to current clients (lines 156-157). This simplifies the code but means switched.Load() could return true while actually using the same client. Consider adding a field to track if actual switching is possible.


324-359: Verbose logging helpful for debugging but noisy in production.

The detailed logging at each step of NewL2Block is helpful for debugging the upgrade process. Consider making this configurable or reducing to debug level after the feature is stable.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa2307f and c38ea30.

📒 Files selected for processing (7)
  • node/core/config.go
  • node/core/executor.go
  • node/derivation/config.go
  • node/derivation/derivation.go
  • node/flags/flags.go
  • node/types/retryable_client.go
  • ops/mpt-switch-test/test-mpt-switch-local.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/derivation/config.go
🧰 Additional context used
🧬 Code graph analysis (4)
node/derivation/derivation.go (2)
node/types/retryable_client.go (2)
  • FetchGethConfig (64-112)
  • NewRetryableClient (140-192)
node/flags/flags.go (1)
  • ChallengeEnable (185-189)
node/core/config.go (2)
node/types/config.go (1)
  • L2Config (14-18)
node/flags/flags.go (2)
  • L2NextEthAddr (31-35)
  • L2NextEngineAddr (37-41)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/db/store.go (1)
  • Store (16-18)
node/core/executor.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/types/retryable_client.go (1)
  • NewRetryableClient (140-192)
🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/test-mpt-switch-local.sh

[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 196-196: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 378-378: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 379-379: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 459-459: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 471-471: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 477-477: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 535-535: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (14)
ops/mpt-switch-test/test-mpt-switch-local.sh (3)

342-344: Sentry node using --dev-sequencer flag seems contradictory.

The comments indicate this is a non-validator follower node (Lines 315-318), but it's started with --dev-sequencer. Is this intentional for the test environment, or should the sentry use different flags?


253-268: Security settings are appropriate for local testing only.

The wildcard CORS (--http.corsdomain="*", --ws.origins="*") and vhost settings (--http.vhosts="*", --authrpc.vhosts="*") are acceptable for a local test script but should never be used in production. Consider adding a comment to clarify this is test-only configuration.


372-389: LGTM - Stop function handles cleanup appropriately.

The stop function properly iterates through PID files, checks if processes are running, and cleans up. Using default SIGTERM is appropriate for graceful shutdown.

node/core/config.go (1)

32-32: LGTM - L2Next field properly declared as optional.

The omitempty JSON tag and nil default ensure backward compatibility.

node/flags/flags.go (2)

31-41: LGTM - New CLI flags follow existing conventions.

The new flags are well-documented with appropriate usage text indicating they are optional and for upgrades. Naming convention is consistent with existing L2 flags.


319-320: LGTM - Flags properly registered.

The new flags are correctly added to the Flags slice alongside the existing L2 endpoint flags.

node/core/executor.go (3)

83-98: LGTM - L2Next client initialization properly guarded.

The nil checks ensure backward compatibility when L2Next is not configured. Logging clearly indicates whether upgrade switch is enabled.


296-320: Good handling of post-switch block delivery edge case.

The logic correctly handles the scenario where a block exists in the legacy geth (received via P2P) but not yet in the target geth after switching. The re-check of height after EnsureSwitched ensures blocks are properly delivered to the new client.


361-369: Helpful error context for debugging upgrade issues.

The banner-style logging and MPT compatibility hint will help operators diagnose issues during upgrades. Consider whether this logging level should be FATAL to halt the node more visibly.

node/derivation/derivation.go (2)

286-322: Validation skip logic during upgrade is critical - verify thoroughly.

The skip condition (beforeSwitch && !useZktrie) || (!beforeSwitch && useZktrie) handles the root mismatch during upgrade transition. This is security-sensitive as it bypasses state root validation.

Please verify:

  1. The condition correctly identifies only legitimate mismatch scenarios during upgrade
  2. After the upgrade window, validation resumes normally
  3. Consider adding metrics to track how often validation is skipped

142-148: LGTM - Geth config fetch failure is correctly fatal.

Failing to fetch the geth configuration would lead to incorrect validation behavior during upgrades. Making this fatal at startup is the right choice.

node/types/retryable_client.go (3)

249-249: Verify the sync condition allows switching when 1 block behind.

The condition remote+1 >= number allows switching when the next geth is at block number-1. Is this intentional to allow the current block to be delivered to the next geth, or should it be remote >= number to ensure the target block already exists?


63-112: LGTM - Robust config parsing with fallback sources.

The cascading fallback (current → next → last) for MPT fork time handles various geth configurations gracefully. Proper resource cleanup with defer client.Close().


195-207: LGTM - Thread-safe client routing.

The atomic.Bool ensures thread-safe reads of the switched state.

corey added 3 commits January 12, 2026 17:10
- Add ops/mpt-switch-test to .gitignore
- This directory is for local testing only
The go-ethereum changes should be managed in go-ethereum repo, not morph repo.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @node/derivation/derivation.go:
- Around line 93-109: The aNextClient created by authclient.DialContext can leak
if ethclient.Dial for eNextClient fails; after authclient.DialContext returns
aNextClient successfully (in the L2Next block), either defer aNextClient.Close()
immediately or explicitly call aNextClient.Close() before returning the error
from ethclient.Dial so the client is always closed on the error path; update the
L2Next initialization in derivation.go to ensure aNextClient is cleaned up when
ethclient.Dial fails (referencing aNextClient, eNextClient,
authclient.DialContext and ethclient.Dial).
- Around line 289-291: The fork activation check treats blocks with timestamp
equal to d.switchTime as before the switch; update the condition in the block
that computes beforeSwitch (using lastHeader.Time and d.switchTime) from a <=
comparison to a < comparison so that blocks at exactly d.switchTime are
considered post-switch; ensure the surrounding conditional that compares
beforeSwitch with d.useZktrie (the if (beforeSwitch && !d.useZktrie) ||
(!beforeSwitch && d.useZktrie) branch) remains intact.
🧹 Nitpick comments (1)
node/derivation/derivation.go (1)

290-300: Incorrect log level for expected upgrade transition behavior.

Using logger.Error for an expected condition during upgrade transition is misleading. This will pollute error logs during planned upgrades. Consider using Info or Warn level instead.

Additionally, the log message doesn't include withdrawal root information when withdrawalMismatch is the cause.

Suggested fix
 			if (beforeSwitch && !d.useZktrie) || (!beforeSwitch && d.useZktrie) {
 				skipValidation = true
-				d.logger.Error("Root validation skipped during upgrade transition",
+				d.logger.Info("Root validation skipped during upgrade transition",
 					"originStateRootHash", batchInfo.root,
 					"deriveStateRootHash", lastHeader.Root.Hex(),
+					"batchWithdrawalRoot", batchInfo.withdrawalRoot.Hex(),
+					"deriveWithdrawalRoot", common.BytesToHash(withdrawalRoot[:]).Hex(),
 					"blockTimestamp", lastHeader.Time,
 					"switchTime", d.switchTime,
 					"useZktrie", d.useZktrie,
 				)
 			}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b2a833 and 0968304.

📒 Files selected for processing (1)
  • node/derivation/derivation.go
🧰 Additional context used
🧬 Code graph analysis (1)
node/derivation/derivation.go (3)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/types/retryable_client.go (2)
  • FetchGethConfig (64-112)
  • NewRetryableClient (140-192)
node/flags/flags.go (1)
  • ChallengeEnable (185-189)
🔇 Additional comments (4)
node/derivation/derivation.go (4)

66-69: LGTM!

The new fields for tracking geth upgrade configuration are well-placed and appropriately typed. The naming is consistent with usage in retryable_client.go.


141-147: LGTM!

Fetching geth config at startup with appropriate error handling. The fatal behavior on fetch failure is acceptable since it indicates geth connectivity issues.


162-162: LGTM!

The NewRetryableClient invocation correctly passes the L2Next clients for upgrade switch support.


285-320: Security-sensitive: Root validation bypass during upgrade.

This logic bypasses critical root hash validation during upgrade transitions. While the conditions seem reasonable, verify that:

  1. An attacker cannot manipulate switchTime or useZktrie to bypass validation outside of legitimate upgrade scenarios
  2. The geth config fetch (via eth_config RPC) is trusted and cannot be spoofed

The bypass conditions are:

  • switchTime > 0 AND
  • (beforeSwitch && !useZktrie) OR (!beforeSwitch && useZktrie)

Comment on lines +93 to +109

// L2Next endpoint (optional - for upgrade switch)
var aNextClient *authclient.Client
var eNextClient *ethclient.Client
if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" {
aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret)
if err != nil {
return nil, err
}
eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr)
if err != nil {
return nil, err
}
logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr)
} else {
logger.Info("L2Next geth not configured (no upgrade switch)")
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resource leak on error path.

If eNextClient dial fails at line 102, the already-created aNextClient is never closed. Consider using defer-based cleanup or explicit cleanup on error paths.

Suggested fix pattern
 	// L2Next endpoint (optional - for upgrade switch)
 	var aNextClient *authclient.Client
 	var eNextClient *ethclient.Client
 	if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" {
 		aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret)
 		if err != nil {
 			return nil, err
 		}
 		eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr)
 		if err != nil {
+			aNextClient.Close()
 			return nil, err
 		}
 		logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// L2Next endpoint (optional - for upgrade switch)
var aNextClient *authclient.Client
var eNextClient *ethclient.Client
if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" {
aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret)
if err != nil {
return nil, err
}
eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr)
if err != nil {
return nil, err
}
logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr)
} else {
logger.Info("L2Next geth not configured (no upgrade switch)")
}
// L2Next endpoint (optional - for upgrade switch)
var aNextClient *authclient.Client
var eNextClient *ethclient.Client
if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" {
aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret)
if err != nil {
return nil, err
}
eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr)
if err != nil {
aNextClient.Close()
return nil, err
}
logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr)
} else {
logger.Info("L2Next geth not configured (no upgrade switch)")
}
🤖 Prompt for AI Agents
In @node/derivation/derivation.go around lines 93 - 109, The aNextClient created
by authclient.DialContext can leak if ethclient.Dial for eNextClient fails;
after authclient.DialContext returns aNextClient successfully (in the L2Next
block), either defer aNextClient.Close() immediately or explicitly call
aNextClient.Close() before returning the error from ethclient.Dial so the client
is always closed on the error path; update the L2Next initialization in
derivation.go to ensure aNextClient is cleaned up when ethclient.Dial fails
(referencing aNextClient, eNextClient, authclient.DialContext and
ethclient.Dial).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is process startup initialization. If dial fails, the process exits immediately and OS reclaims all resources. No actual leak occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@curryxbo, you're right—in a startup initialization path where the entire process would exit on error, the OS will reclaim the connection resources. The practical impact is minimal in this specific context. Thanks for the clarification!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @node/types/retryable_client.go:
- Around line 216-283: The switchClient method can block forever while polling
rc.nextEthClient, so add a bounded retry/timeout and single-writer guarantee:
introduce a context-aware timeout or maxRetries and bail out with a clear
error/log if the next node doesn't reach target within that limit (use ctx with
deadline or a local retry counter around the loop that calls
rc.nextEthClient.BlockNumber and HeaderByNumber), ensure only one goroutine runs
the switch by protecting the critical section with a sync.Mutex or sync.Once
(guard the polling loop and the rc.switched.Store(true) path using the chosen
sync primitive) and change the periodic status log call (currently using
rc.logger.Error for waiting state) to rc.logger.Warn or Info; keep all other
logging and the existing remote/block checks in switchClient.
🧹 Nitpick comments (1)
node/types/retryable_client.go (1)

324-359: Consider reducing log verbosity for production.

The extensive logging in NewL2Block (5 separate log statements per call) is useful for debugging the upgrade process but may be too noisy for production. Consider:

  • Using Debug level instead of Info for some messages
  • Consolidating into fewer log statements
  • Adding a verbose/debug flag to control this
♻️ Example consolidation
 func (rc *RetryableClient) NewL2Block(ctx context.Context, executableL2Data *catalyst.ExecutableL2Data, batchHash *common.Hash) (err error) {
-	rc.logger.Info("NewL2Block called",
+	rc.logger.Debug("NewL2Block called",
 		"block_number", executableL2Data.Number,
 		"parent_hash", executableL2Data.ParentHash.Hex(),
-		"state_root", executableL2Data.StateRoot.Hex(),
-		"switched", rc.switched.Load())
+		"switched", rc.switched.Load())

 	rc.switchClient(ctx, executableL2Data.Timestamp, executableL2Data.Number)

-	rc.logger.Info("After switchClient",
-		"block_number", executableL2Data.Number,
-		"switched", rc.switched.Load())
-
 	if retryErr := backoff.Retry(func() error {
-		rc.logger.Info("Sending block to geth",
-			"block_number", executableL2Data.Number,
-			"using_next_client", rc.switched.Load())
-
 		respErr := rc.aClient().NewL2Block(ctx, executableL2Data, batchHash)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4d7b0c and 32512d0.

📒 Files selected for processing (2)
  • node/derivation/derivation.go
  • node/types/retryable_client.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/derivation/derivation.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (8)
node/types/retryable_client.go (8)

34-61: LGTM!

The config types are well-structured for parsing the eth_config RPC response. The use of pointer types for optional fields (*morphExtension, *uint64) is appropriate.


63-112: LGTM with a note on convention.

The function correctly fetches and parses the eth_config response with proper cleanup (defer client.Close()). The fallback logic through current → next → last is clear.

Note: SwitchTime = 0 is used to indicate "switch disabled." This implicit convention works since Line 166 checks switchTime > 0, but consider documenting this in the GethConfig struct comments for clarity.


114-121: LGTM!

Simple internal wrapper that extracts only the SwitchTime from FetchGethConfig.


123-132: LGTM!

The struct extension is clean. Using atomic.Bool for switched is the correct choice for thread-safe access across concurrent RPC calls.


195-207: LGTM!

The client routing helpers are clean and thread-safe. The fallback initialization in NewRetryableClient ensures these never return nil.


285-303: LGTM with note on switch placement.

The switch check is performed inside the retry loop. This is fine since switchClient is idempotent after the switch completes (early return when switched.Load() is true). Using wall-clock time for the switch decision during block assembly is appropriate.


380-484: Note: Read operations don't trigger switch proactively.

BlockNumber, HeaderByNumber, CallContract, and CodeAt use eClient() without calling switchClient. This means if these are called after the switch time but before any block-writing operation triggers the switch, they'll read from the old geth.

This may be intentional (switch only happens during block processing), but verify this is the desired behavior. If reads should also use the new geth after switch time, consider adding:

func (rc *RetryableClient) BlockNumber(ctx context.Context) (ret uint64, err error) {
    // Optionally check switch time for read operations
    if rc.switchTime > 0 && uint64(time.Now().Unix()) >= rc.switchTime {
        rc.EnsureSwitched(ctx, uint64(time.Now().Unix()), 0)
    }
    // ... existing code
}

Is it intentional that read-only RPC methods (BlockNumber, HeaderByNumber, etc.) don't proactively trigger the client switch?


134-192: Call Reset() on the exponential backoff before first use.

The backoff.NewExponentialBackOff() must have Reset() called before it is used. According to the cenkalti/backoff package documentation, Reset initializes the internal currentInterval and startTime that the backoff relies on. Without calling Reset(), the first retry will behave unexpectedly.

Fix
 	rc := &RetryableClient{
 		authClient:     authClient,
 		ethClient:      ethClient,
 		nextAuthClient: nextAuthClient,
 		nextEthClient:  nextEthClient,
 		switchTime:     switchTime,
 		b:              backoff.NewExponentialBackOff(),
 		logger:         logger,
 	}
+	rc.b.Reset()

Likely an incorrect or invalid review comment.

Comment on lines +216 to 283
func (rc *RetryableClient) switchClient(ctx context.Context, timeStamp uint64, number uint64) {
if rc.switched.Load() {
return
}
if timeStamp < rc.switchTime {
return
}

rc.logger.Info("========================================")
rc.logger.Info("GETH UPGRADE: Switch time reached!")
rc.logger.Info("========================================")
rc.logger.Info("Switch time reached, switching from current client to next client",
"switch_time", rc.switchTime,
"current_time", timeStamp,
"target_block", number)
rc.logger.Info("Current status: connected to current geth, waiting for next geth to sync...")

ticker := time.NewTicker(500 * time.Millisecond)
defer ticker.Stop()

startTime := time.Now()
lastLogTime := startTime

for {
remote, err := rc.nextEthClient.BlockNumber(ctx)
if err != nil {
rc.logger.Error("Failed to get next geth block number",
"error", err,
"hint", "Please ensure next geth is running and accessible")
<-ticker.C
continue
}

if remote+1 >= number {
// Get next geth's latest block hash for debugging
targetHeader, headerErr := rc.nextEthClient.HeaderByNumber(ctx, big.NewInt(int64(remote)))
targetBlockHash := "unknown"
targetStateRoot := "unknown"
if headerErr == nil && targetHeader != nil {
targetBlockHash = targetHeader.Hash().Hex()
targetStateRoot = targetHeader.Root.Hex()
}

rc.switched.Store(true)
rc.logger.Info("========================================")
rc.logger.Info("GETH UPGRADE: Successfully switched!")
rc.logger.Info("========================================")
rc.logger.Info("Successfully switched to next client",
"remote_block", remote,
"target_block", number,
"target_block_hash", targetBlockHash,
"target_state_root", targetStateRoot,
"wait_duration", time.Since(startTime))
return
}

if time.Since(lastLogTime) >= 5*time.Second {
rc.logger.Error("!!! WAITING: Node BLOCKED waiting for next geth !!!",
"next_geth_block", remote,
"target_block", number,
"blocks_behind", number-remote-1,
"wait_duration", time.Since(startTime))
lastLogTime = time.Now()
}

<-ticker.C
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Infinite blocking loop can halt the node permanently.

The switchClient method enters an unbounded loop (lines 239-282) waiting for the next geth to sync. If the next geth is misconfigured, unreachable, or permanently behind, this will block the caller forever with no way to recover.

Additionally:

  • Multiple concurrent callers can enter the waiting loop simultaneously before switched.Store(true) at line 259.
  • Line 273 uses logger.Error for a status message that's more of a warning.

Consider adding:

  1. A maximum timeout or retry limit with graceful degradation
  2. A mutex or sync.Once to ensure only one goroutine performs the switch logic
🛠️ Suggested approach with timeout
 func (rc *RetryableClient) switchClient(ctx context.Context, timeStamp uint64, number uint64) {
 	if rc.switched.Load() {
 		return
 	}
 	if timeStamp < rc.switchTime {
 		return
 	}

+	// Add a reasonable timeout for the switch operation
+	switchCtx, cancel := context.WithTimeout(ctx, 30*time.Minute)
+	defer cancel()

 	// ... existing logging ...

 	ticker := time.NewTicker(500 * time.Millisecond)
 	defer ticker.Stop()

 	for {
+		select {
+		case <-switchCtx.Done():
+			rc.logger.Error("Switch timeout exceeded, failing switch operation")
+			return // or panic, depending on desired behavior
+		default:
+		}
+
 		remote, err := rc.nextEthClient.BlockNumber(ctx)
 		// ... rest of loop ...
 	}
 }
🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 216 - 283, The switchClient
method can block forever while polling rc.nextEthClient, so add a bounded
retry/timeout and single-writer guarantee: introduce a context-aware timeout or
maxRetries and bail out with a clear error/log if the next node doesn't reach
target within that limit (use ctx with deadline or a local retry counter around
the loop that calls rc.nextEthClient.BlockNumber and HeaderByNumber), ensure
only one goroutine runs the switch by protecting the critical section with a
sync.Mutex or sync.Once (guard the polling loop and the rc.switched.Store(true)
path using the chosen sync primitive) and change the periodic status log call
(currently using rc.logger.Error for waiting state) to rc.logger.Warn or Info;
keep all other logging and the existing remote/block checks in switchClient.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @node/types/retryable_client.go:
- Around line 419-420: The log message in the HeaderByNumber retry path is
incorrect; update the rc.logger.Info call used when respErr != nil inside
HeaderByNumber to reflect the proper operation (e.g., change the message from
"failed to call BlockNumber" to "failed to call HeaderByNumber"), keeping the
same structured fields ("error", respErr) and using the existing rc.logger.Info
invocation so locating the change is done in the HeaderByNumber method where
respErr is checked.
- Around line 239-282: The polling loop that waits for next geth sync (using
rc.nextEthClient.BlockNumber and HeaderByNumber) lacks context cancellation and
a maximum timeout, risking indefinite hang; update the loop to select on three
cases: ctx.Done() to abort and return an error, a timeout channel (configurable
default e.g., 30 minutes) to abort with a clear log/error if exceeded, and
ticker.C for the existing polling path; ensure you propagate the cancel reason
(ctx.Err() or timeout) in the log and return, and keep the existing success path
that sets rc.switched.Store(true) and logs timing (use startTime) while removing
the unconditional blocking read from ticker.C.
🧹 Nitpick comments (1)
node/types/retryable_client.go (1)

140-162: Potential issue: Falling back next clients to current clients may mask configuration errors.

When nextAuthClient or nextEthClient is nil, both nextAuthClient and nextEthClient are set to the current clients. This means aClient() and eClient() will silently return the current client even after switched becomes true. Consider logging a warning or returning an error if switch is expected but next clients aren't configured.

Also, the condition checks if either is nil but sets both to fallback. If only one is nil (misconfiguration), the other valid next client is discarded.

Suggested improvement
 	// If next client is not configured, disable switch
 	if nextAuthClient == nil || nextEthClient == nil {
+		if nextAuthClient != nil || nextEthClient != nil {
+			logger.Error("Partial L2Next configuration: both nextAuthClient and nextEthClient must be provided or neither")
+		}
 		logger.Info("L2Next client not configured, switch disabled")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32512d0 and b5e977c.

📒 Files selected for processing (1)
  • node/types/retryable_client.go
🧰 Additional context used
🧬 Code graph analysis (1)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
  • Client (14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
node/types/retryable_client.go (7)

34-61: LGTM!

The configuration types are well-structured for parsing the eth_config RPC response. Good use of pointer types for optional fields like MPTForkTime.


83-87: Consider fallback for UseZktrie similar to MPTForkTime.

UseZktrie is only fetched from resp.Current.Morph, but MPTForkTime has fallback logic to Next and Last configs. If Current is nil but Next/Last exists, UseZktrie will default to false. Verify this is the intended behavior.


114-121: LGTM!

Clean internal helper that delegates to FetchGethConfig.


195-207: LGTM!

Clean routing helpers with proper atomic read for the switch state.


249-270: Verify the sync condition remote+1 >= number.

The condition allows switching when the next geth is one block behind the target (remote == number - 1). This seems intentional but could cause issues if subsequent operations immediately require the target block from the next geth.

If the intent is to wait for next geth to have the target block, the condition should be remote >= number. Please verify this is the intended behavior.


285-303: LGTM!

Using current wall clock time for block assembly is appropriate since this creates a new block. The switch check and client routing are correctly implemented.


364-396: Verify: CommitBatch and AppendBlsSignature don't trigger client switch.

These methods use aClient() but don't call switchClient(). If called after the switch time but before any block operation triggers the switch, they'll use the old client.

This might be intentional (e.g., batches should complete on the current client before switching), but please verify this is the expected behavior during the upgrade window.

Comment on lines +239 to 282
for {
remote, err := rc.nextEthClient.BlockNumber(ctx)
if err != nil {
rc.logger.Error("Failed to get next geth block number",
"error", err,
"hint", "Please ensure next geth is running and accessible")
<-ticker.C
continue
}

if remote+1 >= number {
// Get next geth's latest block hash for debugging
targetHeader, headerErr := rc.nextEthClient.HeaderByNumber(ctx, big.NewInt(int64(remote)))
targetBlockHash := "unknown"
targetStateRoot := "unknown"
if headerErr == nil && targetHeader != nil {
targetBlockHash = targetHeader.Hash().Hex()
targetStateRoot = targetHeader.Root.Hex()
}

rc.switched.Store(true)
rc.logger.Info("========================================")
rc.logger.Info("GETH UPGRADE: Successfully switched!")
rc.logger.Info("========================================")
rc.logger.Info("Successfully switched to next client",
"remote_block", remote,
"target_block", number,
"target_block_hash", targetBlockHash,
"target_state_root", targetStateRoot,
"wait_duration", time.Since(startTime))
return
}

if time.Since(lastLogTime) >= 5*time.Second {
rc.logger.Error("!!! WAITING: Node BLOCKED waiting for next geth !!!",
"next_geth_block", remote,
"target_block", number,
"blocks_behind", number-remote-1,
"wait_duration", time.Since(startTime))
lastLogTime = time.Now()
}

<-ticker.C
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Blocking loop lacks timeout and context cancellation, risking indefinite hang.

The polling loop that waits for the next geth to sync has no exit conditions other than successful synchronization:

  1. No timeout: If the next geth is down or never catches up, this blocks forever.
  2. No context cancellation: The ctx parameter is passed but ctx.Done() is never checked. A cancelled context won't stop the loop.

This could cause the entire node to hang during an upgrade if the next geth is unavailable.

Proposed fix
 	for {
+		select {
+		case <-ctx.Done():
+			rc.logger.Error("Context cancelled while waiting for next geth to sync",
+				"error", ctx.Err(),
+				"wait_duration", time.Since(startTime))
+			return
+		default:
+		}
+
 		remote, err := rc.nextEthClient.BlockNumber(ctx)
 		if err != nil {
 			rc.logger.Error("Failed to get next geth block number",
 				"error", err,
 				"hint", "Please ensure next geth is running and accessible")
-			<-ticker.C
+			select {
+			case <-ctx.Done():
+				return
+			case <-ticker.C:
+			}
 			continue
 		}

Consider also adding a maximum wait timeout (e.g., configurable or a sensible default like 30 minutes) to prevent indefinite blocking in production.

🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 239 - 282, The polling loop that
waits for next geth sync (using rc.nextEthClient.BlockNumber and HeaderByNumber)
lacks context cancellation and a maximum timeout, risking indefinite hang;
update the loop to select on three cases: ctx.Done() to abort and return an
error, a timeout channel (configurable default e.g., 30 minutes) to abort with a
clear log/error if exceeded, and ticker.C for the existing polling path; ensure
you propagate the cancel reason (ctx.Err() or timeout) in the log and return,
and keep the existing success path that sets rc.switched.Store(true) and logs
timing (use startTime) while removing the unconditional blocking read from
ticker.C.

Comment on lines 419 to 420
if respErr != nil {
rc.logger.Info("failed to call BlockNumber", "error", respErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect log message.

The log message says "failed to call BlockNumber" but this is the HeaderByNumber method.

Proposed fix
-			rc.logger.Info("failed to call BlockNumber", "error", respErr)
+			rc.logger.Info("failed to call HeaderByNumber", "error", respErr)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if respErr != nil {
rc.logger.Info("failed to call BlockNumber", "error", respErr)
if respErr != nil {
rc.logger.Info("failed to call HeaderByNumber", "error", respErr)
🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 419 - 420, The log message in
the HeaderByNumber retry path is incorrect; update the rc.logger.Info call used
when respErr != nil inside HeaderByNumber to reflect the proper operation (e.g.,
change the message from "failed to call BlockNumber" to "failed to call
HeaderByNumber"), keeping the same structured fields ("error", respErr) and
using the existing rc.logger.Info invocation so locating the change is done in
the HeaderByNumber method where respErr is checked.

corey added 3 commits January 13, 2026 11:18
# Conflicts:
#	bindings/go.mod
#	bindings/go.sum
#	contracts/go.mod
#	contracts/go.sum
#	node/go.mod
#	node/go.sum
#	ops/l2-genesis/go.mod
#	ops/l2-genesis/go.sum
#	ops/tools/go.mod
#	ops/tools/go.sum
#	oracle/go.mod
#	oracle/go.sum
#	token-price-oracle/go.mod
#	token-price-oracle/go.sum
#	tx-submitter/go.mod
#	tx-submitter/go.sum
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
node/core/executor.go (1)

87-99: Consider logging a warning for partial L2Next configuration.

If L2Next is configured but only one of EngineAddr or EthAddr is provided (non-empty), the partial configuration is silently ignored. This could mask a deployment misconfiguration.

💡 Suggested improvement
 	if config.L2Next != nil && config.L2Next.EngineAddr != "" && config.L2Next.EthAddr != "" {
 		aNextClient, err = authclient.DialContext(context.Background(), config.L2Next.EngineAddr, config.L2Next.JwtSecret)
 		if err != nil {
 			return nil, err
 		}
 		eNextClient, err = ethclient.Dial(config.L2Next.EthAddr)
 		if err != nil {
 			return nil, err
 		}
 		logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", config.L2Next.EngineAddr, "ethAddr", config.L2Next.EthAddr)
+	} else if config.L2Next != nil && (config.L2Next.EngineAddr != "" || config.L2Next.EthAddr != "") {
+		logger.Warn("L2Next partially configured - both EngineAddr and EthAddr are required for upgrade switch",
+			"engineAddr", config.L2Next.EngineAddr,
+			"ethAddr", config.L2Next.EthAddr)
 	} else {
 		logger.Info("L2Next geth not configured (no upgrade switch)")
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5e977c and 5dea362.

⛔ Files ignored due to path filters (6)
  • bindings/go.sum is excluded by !**/*.sum
  • contracts/go.sum is excluded by !**/*.sum
  • node/go.sum is excluded by !**/*.sum
  • oracle/go.sum is excluded by !**/*.sum
  • token-price-oracle/go.sum is excluded by !**/*.sum
  • tx-submitter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • Makefile
  • bindings/go.mod
  • contracts/go.mod
  • node/core/config.go
  • node/core/executor.go
  • node/flags/flags.go
  • node/go.mod
  • oracle/go.mod
  • token-price-oracle/go.mod
  • tx-submitter/go.mod
🚧 Files skipped from review as they are similar to previous changes (5)
  • node/core/config.go
  • node/go.mod
  • contracts/go.mod
  • node/flags/flags.go
  • tx-submitter/go.mod
🧰 Additional context used
🧬 Code graph analysis (1)
node/core/executor.go (2)
tx-submitter/iface/client.go (1)
  • Client (14-23)
node/types/retryable_client.go (1)
  • NewRetryableClient (140-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: check
  • GitHub Check: test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (7)
Makefile (1)

3-3: LGTM - Version bump aligns with go.mod updates across modules.

The ETHEREUM_TARGET_VERSION update is consistent with the corresponding dependency updates in all module go.mod files within this PR.

token-price-oracle/go.mod (1)

12-12: LGTM - Dependency version aligned with project-wide update.

oracle/go.mod (1)

10-10: LGTM - Dependency version aligned with project-wide update.

bindings/go.mod (1)

7-7: LGTM - Dependency version aligned with project-wide update.

node/core/executor.go (3)

303-327: LGTM - Switch handling logic is well-designed.

The flow correctly:

  1. Checks if the switch should occur based on block timestamp
  2. Re-validates block height against the potentially new geth client
  3. Continues delivery if the block doesn't exist in the target geth after switch

This handles the edge case where a block exists in legacy geth (via P2P sync) but needs to be delivered to the new MPT geth after switching.


332-337: LGTM - Clear critical error messaging with actionable guidance.

The enhanced logging provides operators with clear context (consensus_block, geth_height) and an actionable directive when the node is blocked due to geth being behind.


367-377: LGTM - Enhanced error logging aids debugging during MPT transition.

The banner-style critical error with block details and the MPT compatibility hint will help operators quickly identify and resolve issues during the upgrade process.

continue
}

if remote+1 >= number {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the debug code?

startTime := time.Now()
lastLogTime := startTime

for {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add maxWaitTime here? because this function will block the normal new block process

beforeSwitch := lastHeader.Time < d.switchTime
if (beforeSwitch && !d.useZktrie) || (!beforeSwitch && d.useZktrie) {
skipValidation = true
d.logger.Error("Root validation skipped during upgrade transition",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.logger.Info("xxx")

return
}
}
d.logger.Info("root hash or withdrawal hash is not equal",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root hash not equal should d.logger.Error("xxx")

Co-authored-by: corey <corey.zhang@bitget.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants