-
Notifications
You must be signed in to change notification settings - Fork 70
WIP:ZK to MPT #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP:ZK to MPT #827
Conversation
📝 WalkthroughWalkthroughAdds optional L2Next geth endpoints and a runtime-controlled client switch driven by fetched geth upgrade configuration; wiring includes startup config fetch, dual clients (current/next), timestamp-aware switching, and upgrade-aware validation/logging changes across derivation and executor flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Executor
participant RetryableClient
participant CurrentGeth
participant NextGeth
Note over Executor,RetryableClient: Startup
Executor->>RetryableClient: NewRetryableClient(currentAuth,currentEth,nextAuth?,nextEth?,switchTime)
RetryableClient->>CurrentGeth: FetchGethConfig() (via RPC)
CurrentGeth-->>RetryableClient: GethConfig{SwitchTime,UseZktrie}
Note over Executor,CurrentGeth: Normal operation (before switch)
Executor->>RetryableClient: RPC call (e.g., BlockNumber)
RetryableClient->>RetryableClient: aClient()/eClient() -> current
RetryableClient->>CurrentGeth: RPC request
CurrentGeth-->>RetryableClient: response
RetryableClient-->>Executor: response
Note over Executor,NextGeth: At or after switchTime
Executor->>RetryableClient: EnsureSwitched(timestamp, blockNumber)
RetryableClient->>NextGeth: Poll for sync (getBlockNumber) until synced
NextGeth-->>RetryableClient: currentBlockNumber
RetryableClient->>RetryableClient: Set switched=true
Note over Executor,NextGeth: Post-switch
Executor->>RetryableClient: RPC call
RetryableClient->>RetryableClient: aClient()/eClient() -> next
RetryableClient->>NextGeth: RPC request
NextGeth-->>RetryableClient: response
RetryableClient-->>Executor: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧬 Code graph analysis (1)node/derivation/derivation.go (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
🔇 Additional comments (8)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
node/derivation/derivation.go (1)
81-97: Legacy clients use the same endpoints as current clients.Both
laClient/leClientandaClient/eClientare dialed using identical configuration values (cfg.L2.EngineAddr,cfg.L2.EthAddr). This means the timestamp-based routing inRetryableClientwill always connect to the same backend, negating the dual-client purpose.If this is WIP, consider adding separate configuration fields (e.g.,
cfg.L2.LegacyEngineAddr,cfg.L2.LegacyEthAddr) for the legacy endpoints. Otherwise, clarify the TODO on line 81.node/core/executor.go (1)
74-92: Legacy clients use identical endpoints as current clients.Same issue as in
derivation.go:laClientandleClientdial the sameconfig.L2.EngineAddrandconfig.L2.EthAddrasaClientandeClient. The dual-client routing will effectively route to the same backend.Consider adding separate configuration for legacy endpoints, or clarify the TODO comment regarding the intended behavior.
🧹 Nitpick comments (2)
node/derivation/derivation.go (1)
141-141: Use the existing logger instead of creating a new one.A new
TMLoggeris created here, but theloggervariable is already available and configured with the "derivation" module context (line 115). Use the existing logger for consistency.- l2Client: types.NewRetryableClient(laClient, leClient, aClient, eClient, tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout))), + l2Client: types.NewRetryableClient(laClient, leClient, aClient, eClient, logger),node/types/retryable_client.go (1)
3-16: Import ordering: standard library imports should come first.Go convention places standard library imports before third-party imports. Consider reordering:
import ( "context" + "math/big" + "strings" + "time" + "github.com/cenkalti/backoff/v4" "github.com/morph-l2/go-ethereum" "github.com/morph-l2/go-ethereum/common" eth "github.com/morph-l2/go-ethereum/core/types" "github.com/morph-l2/go-ethereum/eth/catalyst" "github.com/morph-l2/go-ethereum/ethclient" "github.com/morph-l2/go-ethereum/ethclient/authclient" tmlog "github.com/tendermint/tendermint/libs/log" - "math/big" - "strings" - "time" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
node/go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
node/core/executor.go(3 hunks)node/derivation/derivation.go(2 hunks)node/go.mod(1 hunks)node/types/retryable_client.go(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
node/derivation/derivation.go (1)
node/types/retryable_client.go (1)
NewRetryableClient(41-51)
node/core/executor.go (1)
node/types/retryable_client.go (2)
RetryableClient(29-37)NewRetryableClient(41-51)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
Client(14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: check
- GitHub Check: test
- GitHub Check: check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: check
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (2)
node/types/retryable_client.go (1)
67-70:AssembleL2Blockuses wall-clock time for routing.The timestamp is derived from
time.Now().Unix()rather than from block data. This is appropriate for assembling new blocks, but be aware that:
- Clock drift could cause unexpected routing behavior
- This differs from other methods that extract timestamps from block/transaction data
Ensure this is the intended behavior for block assembly operations.
node/go.mod (1)
14-14: Go-ethereum dependency update is correctly applied.The new pseudo-version
v1.10.14-0.20251211075654-796834acba86is present in node/go.mod line 14 and the codebase already callsAssembleL2Blockwith the timestamp parameter as the second argument after context, confirming the dependency provides the necessary API changes for the RetryableClient functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
node/types/retryable_client.go (3)
117-135: AssembleL2Block routing is fine, but timestamp is captured only once.The overall flow (compute a timestamp, call
switchClient, then go throughaClient) looks consistent with the MPT‑based routing.Note that
timestampis computed once beforebackoff.Retry; if retries happen after the intended MPT switch time, this call will still behave as “pre‑switch” becauseswitchClientalways sees the originaltimestamp. If you want retries to respect the current time, recomputing inside the retry closure would be safer.
193-225: Commit/Append paths rely on existing MPT state only.
CommitBatchandAppendBlsSignaturealways useaClient()without callingswitchClient. That means:
- They will hit the legacy client until some earlier call (e.g., New/Validate/Assemble) has performed the switch and set
mpt=true.- After that, they’ll always go to the new client.
If that ordering is intentional (i.e., commits never lead the cutover), this is fine; otherwise, consider explicitly invoking
switchClienthere as well, using an appropriate timestamp/height.
299-309: Retryable error policy is simple and consistent.Treating everything except
DiscontinuousBlockErroras retryable is straightforward and matches the comment above. Just be aware this interacts with the “infinite” backoff behavior discussed earlier.node/core/executor.go (1)
70-81: Clarify legacy vs new client endpoints; avoid duplicate dials if not needed.
laClient/leClientandaClient/eClientare all dialed against the sameEngineAddr/EthAddr. That’s OK functionally, but it:
- Doubles the TCP/JWT handshakes and keeps four separate connections open.
- Makes it unclear which pair is “legacy” vs “MPT” until configs diverge.
If the intent is to eventually point
laClient/leClientat different legacy endpoints, consider:
- Adding explicit config fields for legacy addresses now, or
- Reusing the same client instances until those endpoints exist, to avoid unnecessary duplication.
Also, you may want to replace the placeholder
// TODOwith a more descriptive comment about the planned split.Also applies to: 91-92
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
node/core/executor.go(2 hunks)node/types/retryable_client.go(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/db/store.go (1)
Store(16-18)
node/core/executor.go (1)
node/types/retryable_client.go (1)
NewRetryableClient(44-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (4)
node/types/retryable_client.go (3)
57-69: Client selection helpers look correct.
aClient/eClientcleanly encapsulate legacy vs new routing via the atomicmptflag, which keeps the call‑sites simple and centralizes switching behavior.
137-191: Validate/New/NewSafe L2 block flows look consistent with MPT switching.These methods all:
- Call
switchClientwith the block’s ownTimestamp/Number.- Then use
aClientso they follow the MPT/legacy split consistently.Given the constructor wiring, this cleanly moves traffic over once
mptis set; no issues from what’s visible here.
227-297: Read‑only ETH RPC wrappers correctly useeClient.
BlockNumber,HeaderByNumber,CallContract, andCodeAtall delegate througheClient(), so they respect the current MPT/legacy state without introducing fresh switching logic. The retry pattern matches the other methods.node/core/executor.go (1)
91-135: Executor wiring to the newRetryableClientlooks correct.
NewExecutornow constructsl2Clientviatypes.NewRetryableClient(laClient, leClient, aClient, eClient, config.Logger)and threads it through:
getNextL1MsgIndex- contract callers (
SequencerCaller,GovCaller,L2StakingCaller)- all existing executor methods (RequestBlockData, CheckBlockData, DeliverBlock, RequestHeight, etc.)
Struct initialization (
Executor{ l2Client: l2Client, ... }) matches this and there are no obvious miswirings here.
node/types/retryable_client.go
Outdated
| type RetryableClient struct { | ||
| authClient *authclient.Client | ||
| ethClient *ethclient.Client | ||
| b backoff.BackOff | ||
| logger tmlog.Logger | ||
| legacyAuthClient *authclient.Client | ||
| legacyEthClient *ethclient.Client | ||
| authClient *authclient.Client | ||
| ethClient *ethclient.Client | ||
| mptTime uint64 | ||
| mpt atomic.Bool | ||
| b backoff.BackOff | ||
| logger tmlog.Logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make mptTime configurable instead of hard‑coding now+1h.
Right now mptTime is derived from time.Now().Add(time.Hour) at construction, which means:
- Each node instance may switch at a different wall‑clock time.
- Restarts will delay switching by another hour, regardless of the canonical MPT cutover time.
Before shipping, this should be driven by config (or passed explicitly into NewRetryableClient) so that routing behavior is deterministic and restart‑safe, not tied to process start time. The existing TODO hints at this, but it’s important for correctness of the migration window.
Also applies to: 43-55
🧩 Analysis chain
🏁 Script executed:
# First, find the retryable_client.go file
find . -name "retryable_client.go" -type fRepository: morph-l2/morph
Length of output: 90
🏁 Script executed:
# Read the entire file to understand the structure and all usages of rc.b
cat -n node/types/retryable_client.goRepository: morph-l2/morph
Length of output: 11289
🏁 Script executed:
# Check the backoff library being used to understand its concurrency characteristics
rg "backoff.Retry|backoff.BackOff|backoff.NewExponential" node/types/retryable_client.go -A 3 -B 1Repository: morph-l2/morph
Length of output: 3325
🌐 Web query:
cenkalti/backoff v4 ExponentialBackOff thread-safe concurrent
💡 Result:
Short answer: No — ExponentialBackOff in cenkalti/backoff v4 is explicitly documented as not thread‑safe for concurrent use. See the package docs and source which state "Note: Implementation is not thread‑safe." [1][2]
If you need concurrent use, either:
- create one ExponentialBackOff per goroutine/operation, or
- guard a single instance with your own synchronization (sync.Mutex), or
- upgrade to a different policy/implementation that documents thread safety. [1][2]
Sources:
[1] pkg.go.dev github.com/cenkalti/backoff/v4 (ExponentialBackOff docs).
[2] ExponentialBackOff implementation (v4) in repository/source.
🏁 Script executed:
# Check if there's any documentation or comments about thread safety in the codebase
rg -i "thread.*safe|concurrent|mutex|lock" node/types/retryable_client.goRepository: morph-l2/morph
Length of output: 2510
🏁 Script executed:
# Look for how this client is used - are methods called concurrently from goroutines?
rg -r "RetryableClient" --type go -A 3 | head -100Repository: morph-l2/morph
Length of output: 454
🏁 Script executed:
# Search for goroutine usage with RetryableClient to verify concurrent calls
rg "go.*rc\.|go.*client\.|RetryableClient" --type go -B 2 -A 2 | head -150Repository: morph-l2/morph
Length of output: 8154
🏁 Script executed:
# Check if RetryableClient methods might be called from multiple goroutines
fd "\.go$" --type f -x rg -l "RetryableClient" {} \; | head -10Repository: morph-l2/morph
Length of output: 174
Shared backoff.BackOff instance is not thread-safe and unsafe for concurrent use.
The single rc.b instance is shared across all retry calls in AssembleL2Block, ValidateL2Block, NewL2Block, NewSafeL2Block, CommitBatch, AppendBlsSignature, BlockNumber, HeaderByNumber, CallContract, and CodeAt methods. Since cenkalti/backoff/v4's ExponentialBackOff is explicitly documented as not thread-safe, and RetryableClient is used concurrently by sequencer, derivation, and executor subsystems, concurrent calls will cause data race conditions on the backoff state.
Create a new ExponentialBackOff per request instead:
type RetryableClient struct {
- b backoff.BackOff
+ // remove shared BackOff; create per-call instances instead
}
func (rc *RetryableClient) AssembleL2Block(...) (..., error) {
- if retryErr := backoff.Retry(func() error { ... }, rc.b); retryErr != nil {
+ b := backoff.NewExponentialBackOff()
+ if retryErr := backoff.Retry(func() error { ... }, b); retryErr != nil {
...
}
}Apply the same fix to all other methods using backoff.Retry.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In node/types/retryable_client.go around lines 31-39, the struct holds a shared
backoff.BackOff (rc.b) which is not thread-safe; replace uses of the shared rc.b
with a fresh ExponentialBackOff instance per request: inside each method that
currently calls backoff.Retry(rc.b, ...) create a new
backoff.NewExponentialBackOff(), configure any required parameters (or copy them
from rc.b if rc.b is only used as a template), and pass that new instance to
backoff.Retry (or backoff.WithContext if you need context cancellation). Apply
this same change to all other methods that call backoff.Retry so no goroutine
reuses a shared backoff state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
node/types/retryable_client.go (2)
87-114:switchClientcan block indefinitely if context is canceled.The loop at lines 87-114 never checks
ctx.Done(), so it will continue polling forever if the context is canceled or the MPT node never catches up. This is a reliability concern that can cause goroutine leaks or hung operations.Apply this diff to respect context cancellation:
for { + select { + case <-ctx.Done(): + rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err()) + return + default: + } + remote, err := rc.ethClient.BlockNumber(ctx) if err != nil { rc.logger.Error("Failed to get MPT block number, retrying...", "error", err) - <-ticker.C + select { + case <-ctx.Done(): + rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err()) + return + case <-ticker.C: + } continue }
38-38: Sharedbackoff.BackOffinstance is not thread-safe.The
cenkalti/backoff/v4ExponentialBackOffis explicitly documented as not thread-safe. SinceRetryableClientmethods are called concurrently from sequencer, derivation, and executor subsystems, the sharedrc.bwill cause data races.Create a new backoff instance per call instead of sharing one:
type RetryableClient struct { legacyAuthClient *authclient.Client legacyEthClient *ethclient.Client authClient *authclient.Client ethClient *ethclient.Client mptTime uint64 mpt atomic.Bool - b backoff.BackOff logger tmlog.Logger }Then in each method:
func (rc *RetryableClient) AssembleL2Block(...) (..., error) { + b := backoff.NewExponentialBackOff() timestamp := uint64(time.Now().Unix()) - if retryErr := backoff.Retry(func() error { + if retryErr := backoff.Retry(func() error { ... - }, rc.b); retryErr != nil { + }, b); retryErr != nil {
🧹 Nitpick comments (2)
node/core/config.go (1)
31-31: Inconsistent JSON tag:l_2_legacyvsl2_legacy.The JSON tag
"l_2_legacy"uses underscores inconsistently compared to the derivation config which uses"l2_legacy". This could cause issues with configuration file parsing or serialization.- L2Legacy *types.L2Config `json:"l_2_legacy"` + L2Legacy *types.L2Config `json:"l2_legacy"`node/core/executor.go (1)
73-81: Consider adding validation for empty legacy endpoint addresses.If the legacy endpoints are not configured (empty strings), the dial calls will likely fail with confusing errors. Adding explicit validation would provide clearer error messages.
// legacy zk endpoint + if config.L2Legacy.EngineAddr == "" || config.L2Legacy.EthAddr == "" { + return nil, errors.New("L2Legacy endpoints (EngineAddr and EthAddr) must be configured") + } laClient, err := authclient.DialContext(context.Background(), config.L2Legacy.EngineAddr, config.L2Legacy.JwtSecret)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
node/core/config.go(2 hunks)node/core/executor.go(2 hunks)node/derivation/config.go(2 hunks)node/derivation/derivation.go(2 hunks)node/flags/flags.go(2 hunks)node/types/retryable_client.go(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
node/derivation/config.go (2)
node/types/config.go (1)
L2Config(14-18)node/flags/flags.go (3)
MptTime(43-47)L2LegacyEthAddr(31-35)L2LegacyEngineAddr(37-41)
node/core/config.go (2)
node/types/config.go (1)
L2Config(14-18)node/flags/flags.go (3)
MptTime(43-47)L2LegacyEthAddr(31-35)L2LegacyEngineAddr(37-41)
node/core/executor.go (2)
node/types/retryable_client.go (1)
NewRetryableClient(44-55)node/flags/flags.go (1)
MptTime(43-47)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/db/store.go (1)
Store(16-18)
node/derivation/derivation.go (2)
node/types/retryable_client.go (1)
NewRetryableClient(44-55)node/flags/flags.go (1)
MptTime(43-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test
- GitHub Check: check
- GitHub Check: check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: check
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (5)
node/types/retryable_client.go (2)
193-225:CommitBatchandAppendBlsSignaturedon't invokeswitchClient.These methods use
rc.aClient()but never callswitchClient(), unlikeValidateL2Block,NewL2Block, andNewSafeL2Block. If these operations occur during the MPT transition window, they may route to the wrong client.Is this intentional? If these operations should always use the current MPT state without waiting for sync, consider adding a comment explaining why. Otherwise, add
switchClientcalls with appropriate timestamp/block parameters.
227-297:BlockNumber,HeaderByNumber,CallContract, andCodeAtalways use current MPT state.These methods call
rc.eClient()withoutswitchClient(). They will use the legacy client untilmptis set totrueby another operation. This could cause inconsistent behavior if these are called before anyswitchClient-invoking method.Verify this is the intended behavior. If these should reflect the MPT state based on a block number, consider adding routing logic similar to the auth client methods.
node/core/executor.go (1)
73-91: Legacy client initialization depends onL2Legacybeing non-nil.The code at lines 74 and 78 accesses
config.L2Legacy.EngineAddrandconfig.L2Legacy.EthAddr. If the config initialization issue innode/core/config.gois not fixed, this will panic with a nil pointer dereference.Ensure that
config.L2Legacyis properly initialized before this code runs. Consider adding a nil check or ensuring the config struct always initializesL2Legacy:if config.L2Legacy == nil { return nil, errors.New("L2Legacy config is required") }node/derivation/derivation.go (1)
81-89: Legacy client initialization depends onL2Legacybeing non-nil.Similar to
executor.go, lines 82 and 86 accesscfg.L2Legacy.EngineAddrandcfg.L2Legacy.EthAddr. IfL2Legacyis not initialized inderivation/config.go, this will panic.Ensure the config initialization fix in
node/derivation/config.gois applied. Consider adding defensive validation:if cfg.L2Legacy == nil { return nil, errors.New("L2Legacy config is required for derivation") }node/flags/flags.go (1)
325-327: LGTM! Flags properly integrated into the CLI.The new flags are correctly added to the
Flagsslice and positioned logically with other L2-related configuration options.
node/flags/flags.go
Outdated
| MptTime = cli.Uint64Flag{ | ||
| Name: "upgrade.mptTime", | ||
| Usage: "Batch index at which the sequencers start to upgrade the batch format", | ||
| EnvVar: prefixEnvVar("MPT_TIME"), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the flags.go file and examine the MptTime flag and UpgradeBatchTime flag
find . -name "flags.go" -type f | grep -E "node.*flags" | head -5Repository: morph-l2/morph
Length of output: 79
🏁 Script executed:
# Check if node/flags/flags.go exists and read it
if [ -f "node/flags/flags.go" ]; then
echo "=== Lines 43-47 (MptTime) ==="
sed -n '43,47p' node/flags/flags.go
echo ""
echo "=== Lines 240-250 (around UpgradeBatchTime) ==="
sed -n '240,250p' node/flags/flags.go
else
echo "File not found. Searching for flags files:"
find . -name "flags.go" -type f 2>/dev/null | head -10
fiRepository: morph-l2/morph
Length of output: 608
Rename MptTime to clarify it represents a batch index, and distinguish its usage description from UpgradeBatchTime.
The flag name MptTime is misleading since it uses a "Time" suffix despite representing a batch index. Additionally, both MptTime and UpgradeBatchTime have identical usage text ("Batch index at which the sequencers start to upgrade the batch format"), which obscures their different purposes.
Consider:
- Renaming to reflect the batch index nature (e.g.,
MptUpgradeBatchIndexorMptSwitchBatchIndex) - Updating the usage text to clarify the MPT-specific distinction (e.g., "Batch index at which sequencers switch from legacy to MPT-based batch format")
🤖 Prompt for AI Agents
In node/flags/flags.go around lines 43-47, the MptTime identifier and its Usage
text are misleading because the suffix "Time" suggests a timestamp while the
flag actually holds a batch index and duplicates the Usage of UpgradeBatchTime;
rename the flag to something clear (e.g., MptUpgradeBatchIndex or
MptSwitchBatchIndex), update the Usage string to explicitly state it is an
MPT-specific batch index (for example: "Batch index at which sequencers switch
from legacy to MPT-based batch format"), change the EnvVar key passed to
prefixEnvVar to a matching name, and update every reference in the codebase to
the new identifier and env var name to keep behavior consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
node/types/retryable_client.go (2)
38-38: Sharedbackoff.BackOffinstance is not thread-safe.The single
rc.binstance is shared across all retry calls. As noted in a previous review,cenkalti/backoff/v4'sExponentialBackOffis explicitly documented as not thread-safe. WithRetryableClientused concurrently by sequencer, derivation, and executor subsystems, concurrent calls can cause data races on the backoff state.Create a new
ExponentialBackOffper request instead:🔎 Proposed fix pattern
type RetryableClient struct { - b backoff.BackOff + // Removed shared BackOff; create per-call instances instead } func (rc *RetryableClient) AssembleL2Block(...) (..., error) { + b := backoff.NewExponentialBackOff() - if retryErr := backoff.Retry(func() error { ... }, rc.b); retryErr != nil { + if retryErr := backoff.Retry(func() error { ... }, b); retryErr != nil { ... } }Apply this pattern to all methods using
backoff.Retry.Also applies to: 150-150
101-133:switchClientignores context cancellation and can block indefinitely.The loop polls
rc.ethClient.BlockNumber(ctx)until MPT geth catches up, but never checksctx.Done(). If the context is canceled (e.g., node shutdown) or if MPT geth never catches up, this loop runs forever, blocking the calling goroutine.🔎 Proposed fix
for { + select { + case <-ctx.Done(): + rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err()) + return + default: + } + remote, err := rc.ethClient.BlockNumber(ctx) if err != nil { rc.logger.Error("Failed to get MPT geth block number", "error", err, "hint", "Please ensure MPT geth is running and accessible") - <-ticker.C + select { + case <-ctx.Done(): + rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err()) + return + case <-ticker.C: + } continue } // ... rest of loop ... - <-ticker.C + select { + case <-ctx.Done(): + rc.logger.Error("Aborting MPT switch due to context cancellation", "error", ctx.Err()) + return + case <-ticker.C: + } }Consider also adding a maximum wait duration to prevent indefinite blocking if MPT geth never syncs.
🧹 Nitpick comments (7)
ops/mpt-switch-test/README.md (1)
7-18: Add language specifiers to fenced code blocks.The documentation would benefit from explicit language identifiers for better rendering and syntax highlighting.
🔎 Recommended fixes
For the ASCII diagram (lines 7-18):
-``` +```text 升级前: 升级后 (交换):For the log output example (lines 89-98):
-``` +```log MPT switch time reached, MUST wait for MPT node to syncFor the file structure (lines 102-118):
-``` +```text mpt-switch-test/Also applies to: 89-98, 102-118
ops/mpt-switch-test/send-txs.sh (1)
7-7: Remove unused variable.
GETH_BINis defined but never referenced in the script. The script relies on thecastcommand instead.🔎 Proposed fix
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -GETH_BIN="${SCRIPT_DIR}/bin/geth" ZK_GETH_HTTP="http://127.0.0.1:8545"node/core/executor.go (2)
300-305: Consider log verbosity for production.While the enhanced logging with "CRITICAL" prefix and actionable hints is useful during the MPT upgrade window, this verbose style may be noisy in steady-state production. Consider gating this enhanced messaging on a debug flag or removing the dramatic formatting after the upgrade stabilizes.
336-344: Consider consolidating error logging.The multiple separate
logger.Errorcalls with decorative separators could be consolidated into a single structured log entry. This would be cleaner and easier to parse programmatically.🔎 Suggested consolidation
- e.logger.Error("========================================") - e.logger.Error("CRITICAL: Failed to deliver block to geth!") - e.logger.Error("========================================") - e.logger.Error("failed to NewL2Block", - "error", err, - "block_number", l2Block.Number, - "block_timestamp", l2Block.Timestamp) - e.logger.Error("HINT: If this occurs after MPT upgrade, your geth node may not support MPT blocks. " + - "Please ensure you are running an MPT-compatible geth node.") + e.logger.Error("CRITICAL: Failed to deliver block to geth", + "error", err, + "block_number", l2Block.Number, + "block_timestamp", l2Block.Timestamp, + "hint", "If this occurs after MPT upgrade, ensure you are running an MPT-compatible geth node")ops/mpt-switch-test/test-mpt-switch-local.sh (3)
27-29: Remove unused variableMORPH_ROOT.Static analysis (SC2034) correctly identifies that
MORPH_ROOTis declared but never used in this script.🔎 Proposed fix
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" cd "$SCRIPT_DIR" -# 路径配置 -MORPH_ROOT="${SCRIPT_DIR}/../.." BIN_DIR="${SCRIPT_DIR}/bin"
250-278: Test environment uses permissive security settings.The Geth instances are configured with
--http.corsdomain="*",--http.vhosts="*",--ws.origins="*", and--authrpc.vhosts="*". This is acceptable for a local test environment but should never be used in production. Consider adding a comment to make this explicit.
196-196: Declare and assign separately to avoid masking return values (SC2155).Shell best practice is to separate
localdeclaration from command assignment to ensure the exit status of the command is not masked.🔎 Proposed fix
- local seq_node_id=$("$TENDERMINT_BIN" show-node-id --home "${SEQUENCER_NODE_DIR}") + local seq_node_id + seq_node_id=$("$TENDERMINT_BIN" show-node-id --home "${SEQUENCER_NODE_DIR}")This pattern should also be applied to similar occurrences at lines 376-377, 457, 469, 475, and 533.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.gitignorenode/core/config.gonode/core/executor.gonode/types/retryable_client.goops/mpt-switch-test/README.mdops/mpt-switch-test/check-nodes.shops/mpt-switch-test/genesis-mpt.jsonops/mpt-switch-test/genesis-zk.jsonops/mpt-switch-test/local-test/engine-make-block-loop.shops/mpt-switch-test/local-test/engine-make-block.shops/mpt-switch-test/local-test/genesis.jsonops/mpt-switch-test/local-test/jwt-secret.txtops/mpt-switch-test/local-test/run-l2-exec-only.shops/mpt-switch-test/send-txs.shops/mpt-switch-test/test-mpt-switch-local.sh
✅ Files skipped from review due to trivial changes (1)
- ops/mpt-switch-test/local-test/jwt-secret.txt
🧰 Additional context used
🧬 Code graph analysis (3)
node/core/config.go (2)
node/types/config.go (1)
L2Config(14-18)node/flags/flags.go (3)
MptTime(43-47)L2LegacyEthAddr(31-35)L2LegacyEngineAddr(37-41)
node/core/executor.go (2)
node/types/retryable_client.go (1)
NewRetryableClient(44-55)node/flags/flags.go (1)
MptTime(43-47)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/db/store.go (1)
Store(16-18)
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh
[high] 20-20: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
89-89: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
102-102: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/check-nodes.sh
[warning] 21-21: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 36-36: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 40-40: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 41-41: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 42-42: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 47-47: Declare and assign separately to avoid masking return values.
(SC2155)
ops/mpt-switch-test/local-test/engine-make-block.sh
[warning] 45-45: In POSIX sh, arithmetic base conversion is undefined.
(SC3052)
ops/mpt-switch-test/local-test/run-l2-exec-only.sh
[warning] 10-10: Remove space after = if trying to assign a value (for empty string, use var='' ... ).
(SC1007)
ops/mpt-switch-test/local-test/engine-make-block-loop.sh
[warning] 8-8: Remove space after = if trying to assign a value (for empty string, use var='' ... ).
(SC1007)
ops/mpt-switch-test/send-txs.sh
[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 30-30: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 46-46: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 59-59: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 100-100: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 118-118: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 119-119: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 127-127: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 182-182: Quote this to prevent word splitting.
(SC2046)
[warning] 189-189: Quote this to prevent word splitting.
(SC2046)
ops/mpt-switch-test/test-mpt-switch-local.sh
[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 196-196: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 376-376: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 377-377: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 457-457: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 469-469: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 475-475: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 533-533: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (19)
.gitignore (1)
34-39: LGTM!The ignore patterns appropriately exclude test artifacts, logs, and generated data from the MPT switch test infrastructure.
ops/mpt-switch-test/check-nodes.sh (2)
14-54: LGTM!The node health check logic correctly handles error cases (lines 25-29) and provides clear status output. The shellcheck SC2155 warnings about combined declaration and assignment are acceptable here since error handling is present.
73-106: Well-designed stateRoot comparison logic.The script correctly selects the smaller block number for comparison (lines 78-82), avoiding false negatives when nodes are at different heights. The result messaging appropriately notes that differences are expected between ZK and MPT implementations.
ops/mpt-switch-test/local-test/engine-make-block.sh (1)
28-35: LGTM: JWT token generation follows Engine API spec.The HS256 JWT implementation correctly encodes the header and payload, computes the HMAC-SHA256 signature using the hex secret, and applies base64url encoding per the Engine API authentication specification.
ops/mpt-switch-test/local-test/run-l2-exec-only.sh (2)
44-67: Verify that this configuration is restricted to isolated test environments.The geth configuration includes security-sensitive settings appropriate only for local testing:
- Line 54: Exposes
personalAPI enabling account/key management- Line 64:
--allow-insecure-unlockpermits account unlocking over HTTP- Lines 50-53: Binds to
0.0.0.0with unrestricted CORSEnsure this script is never used in production or network-accessible environments.
10-10: Shellcheck SC1007 is a false positive.The
CDPATH= cdpattern (line 10) is a well-known idiom to prevent CDPATH environment variable interference when resolving script directories. The space after=is intentional and correct.ops/mpt-switch-test/send-txs.sh (1)
18-22: LGTM: Standard test account key.Lines 19-22 correctly document that this is the well-known Hardhat/Foundry test account private key. The Gitleaks alert is expected and acceptable for test infrastructure. The key corresponds to the first account in the standard test mnemonic and should never hold real funds.
ops/mpt-switch-test/local-test/engine-make-block-loop.sh (2)
13-22: LGTM: Resilient loop design with proper error handling.The script correctly disables
set -e(line 15) to prevent loop termination on individual failures, captures the exit code (line 17), and provides observability via timestamped logging (line 20). This ensures continuous operation for the test harness.
8-8: Shellcheck SC1007 is a false positive.The
CDPATH= cdpattern is a standard idiom to neutralize the CDPATH environment variable when resolving script directories. The syntax is correct.node/core/config.go (2)
31-33: LGTM: Configuration fields properly initialized.The new
L2LegacyandMptTimefields are correctly declared and initialized. Line 47 ensuresL2Legacyis initialized inDefaultConfig(), addressing the previously reported nil pointer dereference issue. The initialization pattern is consistent with the existingL2field.Also applies to: 47-47
128-133: LGTM: Legacy L2 configuration wiring is correct.The legacy endpoint configuration properly:
- Retrieves values from the new CLI flags (lines 128-129)
- Assigns them to the initialized
L2Legacystruct (lines 130-131)- Reuses the same JWT secret as the primary L2 endpoint (line 132), which is appropriate for local testing
- Populates
MptTimefrom the corresponding flag (line 133)node/core/executor.go (2)
73-91: LGTM: Legacy L2 client initialization and updated RetryableClient construction.The legacy ZK endpoint clients (
laClient,leClient) are properly initialized with context and error handling. The updatedNewRetryableClientcall correctly passes all six parameters includingconfig.MptTimeandconfig.Logger, aligning with the constructor signature innode/types/retryable_client.go.
287-296: LGTM: EnsureSwitched call for P2P-synced blocks.Good addition. The
EnsureSwitchedcall ensures that sentry nodes (which receive blocks via P2P sync rather thanNewL2Block) properly trigger the MPT client switch when the upgrade timestamp is reached. The inline comment clearly documents this edge case.ops/mpt-switch-test/test-mpt-switch-local.sh (2)
1-26: Well-structured test harness for MPT switch testing.The script provides a comprehensive local test environment with clear architecture documentation, proper lifecycle management (start/stop/clean), and helpful monitoring utilities. The inline architecture diagram (lines 4-22) is particularly useful for understanding the test setup.
340-342: Sentry node configuration with--dev-sequencerflag is intentional for this test setup.The
--dev-sequencerflag, defined innode/flags/flags.go, is designed "explicitly specify that running as a sequencer. If it enables, the tendermint validator/batch params won't be changed no matter what happens to staking/gov contacts. Only use in dev/test mode." Both the sequencer and sentry nodes use this flag in this test to avoid relying on L1 staking/governance contract interactions, which is appropriate for an isolated dev/test environment (test-mpt-switch-local.sh). The flag disables contract-based parameter updates rather than forcing sequencer behavior. The sentry node still functions as a follower node that syncs blocks, as documented at line 317.Likely an incorrect or invalid review comment.
node/types/retryable_client.go (4)
31-55: LGTM: Updated struct and constructor with MPT switch support.The
RetryableClientstruct now properly includes legacy and MPT clients, withmptTimepassed as a constructor parameter (addressing the previous review concern). Theatomic.Boolformptstate is appropriate for concurrent access.
57-69: LGTM: Client accessor methods.The
aClient()andeClient()methods provide clean routing logic based on the atomicmptflag. The use ofatomic.Boolensures thread-safe reads.
136-154: Verify timestamp source forAssembleL2Block.
AssembleL2Blockusestime.Now().Unix()(line 137) for the timestamp passed toswitchClientand the RPC call. However, the actual block timestamp will be determined by the sequencer/geth. If there's clock drift or the assembled block gets a different timestamp, the switch decision may be made prematurely or delayed.Consider whether this timestamp should come from a more authoritative source, or if using current wall-clock time is acceptable for the switch check.
212-244:CommitBatchandAppendBlsSignaturedon't trigger MPT switch check.Unlike block-related methods,
CommitBatchandAppendBlsSignaturedon't callswitchClient. This may be intentional since batch operations might not have a timestamp context, but verify this is the expected behavior—especially if batches could span the MPT upgrade boundary.
| @@ -0,0 +1,69 @@ | |||
| #!/bin/sh | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix shebang or use POSIX-compatible arithmetic.
Line 1 declares #!/bin/sh but line 45 uses bash-specific arithmetic base conversion (16#$CUR_NOX), which is not POSIX-compliant. This will fail on systems where /bin/sh is dash, ash, or another POSIX shell.
🔎 Recommended fix (change shebang to bash)
-#!/bin/sh
+#!/bin/bash
# Assemble and insert a block via Engine API (authrpc :8551, JWT-based)Alternatively, use POSIX-compatible arithmetic:
CUR_NOX=${CUR_HEX#0x}
-NEXT_DEC=$((16#$CUR_NOX + 1))
+NEXT_DEC=$(printf '%d' "0x$CUR_NOX")
+NEXT_DEC=$((NEXT_DEC + 1))
NEXT_HEX=$(printf '0x%X' "$NEXT_DEC")Also applies to: 44-46
🤖 Prompt for AI Agents
In ops/mpt-switch-test/local-test/engine-make-block.sh around lines 1 and 44-46,
the script uses bash-specific arithmetic (16#$CUR_NOX) while declaring
#!/bin/sh; either update the shebang to a bash interpreter (e.g., #!/usr/bin/env
bash) so the existing base-16 arithmetic works, or rewrite the conversion to a
POSIX-compatible form (for example use printf/awk to convert hex to decimal or
use a portable printf lookup) and keep the POSIX #!/bin/sh; ensure the chosen
approach is used consistently and test on a system where /bin/sh is dash.
ops/mpt-switch-test/send-txs.sh
Outdated
| ;; | ||
| stop) | ||
| if [ -f "$PID_FILE" ]; then | ||
| kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Quote command substitutions to prevent word splitting.
Lines 182 and 189 use unquoted $(cat "$PID_FILE") which can cause unexpected behavior if the PID file contains whitespace or special characters.
🔎 Proposed fix
stop)
if [ -f "$PID_FILE" ]; then
- kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
+ kill "$(cat "$PID_FILE")" 2>/dev/null && rm "$PID_FILE"
log_success "Stopped"
else
log_info "Not running" status)
- if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
- log_success "Running (PID: $(cat $PID_FILE))"
+ if [ -f "$PID_FILE" ] && kill -0 "$(cat "$PID_FILE")" 2>/dev/null; then
+ log_success "Running (PID: $(cat "$PID_FILE"))"
elseAlso applies to: 189-189
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 182-182: Quote this to prevent word splitting.
(SC2046)
🤖 Prompt for AI Agents
In ops/mpt-switch-test/send-txs.sh around lines 182 and 189, the command
substitutions used to read the PID file are unquoted which can cause
word-splitting if the PID file contains whitespace or special characters; update
those lines to quote the command substitution (or better, read the PID into a
variable first and then quote the variable) so the PID is treated as a single
token when passed to kill and file operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ops/mpt-switch-test/send-txs.sh (1)
182-182: Quote command substitutions to prevent word splitting.This was previously flagged. The unquoted
$(cat "$PID_FILE")can cause unexpected behavior.Also applies to: 189-190
🧹 Nitpick comments (7)
ops/mpt-switch-test/README.md (1)
7-18: Consider adding language identifiers to fenced code blocks.Per markdownlint MD040, fenced code blocks should specify a language for proper syntax highlighting. Use
textorplaintextfor ASCII diagrams and log output.🔎 Proposed fix for line 7
-``` +```text Before upgrade: After upgrade (swap):🔎 Proposed fix for line 89
-``` +```text MPT switch time reached, MUST wait for MPT node to sync🔎 Proposed fix for line 102
-``` +```text mpt-switch-test/Also applies to: 89-98, 102-117
ops/mpt-switch-test/send-txs.sh (4)
7-8: Remove unusedGETH_BINvariable and verify port assignment.
GETH_BINis declared but never used. Additionally,ZK_GETH_HTTPpoints to port 9545, but according totest-mpt-switch-local.sh, port 9545 is the MPT Geth port (ZK Geth uses 8545). This naming inconsistency could cause confusion.🔎 Proposed fix
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -GETH_BIN="${SCRIPT_DIR}/bin/geth" -ZK_GETH_HTTP="http://127.0.0.1:9545" +# Default to MPT Geth port (9545) - can be overridden +GETH_HTTP="${GETH_HTTP:-http://127.0.0.1:9545}"Then update references from
$ZK_GETH_HTTPto$GETH_HTTPthroughout the script.
77-89: Remove unusedsend_tx_with_castfunction.This function is defined but never called. The
mainloop (lines 124-145) contains its own inlinecast sendimplementation. Either remove this dead code or refactor the main loop to use this function.🔎 Proposed fix - Option 1: Remove dead code
-# Send transaction using cast (if foundry is installed) -send_tx_with_cast() { - if command -v cast &> /dev/null; then - cast send --private-key "$PRIVATE_KEY" \ - --rpc-url "$ZK_GETH_HTTP" \ - "$TO_ADDRESS" \ - --value 1wei \ - --gas-price 0 \ - 2>/dev/null - return $? - fi - return 1 -} -
52-75: Remove unusedsend_txfunction.Similar to
send_tx_with_cast, this function usingeth_sendTransactionis defined but never called in the main loop. The main loop exclusively usescast send.
103-111: Balance check may incorrectly reject valid accounts.The condition
[ "$balance" == "0x0" ]only catches exact zero balance, butcheck_balancereturns the raw hex string which might be empty or malformed on RPC errors. The-z "$balance"check helps, but consider also validating thatbalancestarts with0xto distinguish between zero balance and RPC failure.🔎 Proposed improvement
if [ "$balance" == "0x0" ] || [ -z "$balance" ]; then + if [ -z "$balance" ]; then + echo "Warning: Could not retrieve account balance (RPC error?)" + else + echo "Warning: Account balance is 0!" + fi echo "" - echo "Warning: Account balance is 0 or cannot be retrieved!"ops/mpt-switch-test/test-mpt-switch-local.sh (2)
28-28: Remove unusedMORPH_ROOTvariable.This variable is declared but never referenced in the script.
🔎 Proposed fix
# Path configuration -MORPH_ROOT="${SCRIPT_DIR}/../.." BIN_DIR="${SCRIPT_DIR}/bin"
376-377: Consider separating variable declaration and assignment.Multiple lines combine
localdeclaration with command substitution, which can mask return values. While this is low-risk for a test script, separating them improves debuggability.🔎 Example fix for lines 376-377
for pid_file in "$SENTRY_NODE_PID" "$SEQUENCER_NODE_PID" "$MPT_GETH_PID" "$ZK_GETH_PID"; do if [ -f "$pid_file" ]; then - local pid=$(cat "$pid_file") - local name=$(basename "$pid_file" .pid) + local pid + local name + pid=$(cat "$pid_file") + name=$(basename "$pid_file" .pid)Also applies to: 457-457, 469-469, 475-475, 533-533
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
node/core/sequencers.goops/mpt-switch-test/README.mdops/mpt-switch-test/send-txs.shops/mpt-switch-test/test-mpt-switch-local.sh
✅ Files skipped from review due to trivial changes (1)
- node/core/sequencers.go
🧰 Additional context used
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh
[high] 20-20: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
89-89: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
102-102: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/send-txs.sh
[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 30-30: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 46-46: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 59-59: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 100-100: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 118-118: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 119-119: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 127-127: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 182-182: Quote this to prevent word splitting.
(SC2046)
[warning] 189-189: Quote this to prevent word splitting.
(SC2046)
ops/mpt-switch-test/test-mpt-switch-local.sh
[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 196-196: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 376-376: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 377-377: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 457-457: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 469-469: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 475-475: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 533-533: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (5)
ops/mpt-switch-test/send-txs.sh (1)
18-22: Test private key is acceptable, but clarify the comment.This is Foundry's default test account #0 private key, which is fine for local testing. The comment already notes this, which is good practice.
ops/mpt-switch-test/test-mpt-switch-local.sh (4)
253-267: Permissive CORS and vhosts settings are acceptable for local testing.The
--http.corsdomain="*",--http.vhosts="*",--ws.origins="*", and--authrpc.vhosts="*"settings are intentionally permissive for a local test environment. This is appropriate given the test-only context.
93-121: Well-structured dependency checking.The
check_dependenciesfunction clearly validates required binaries and provides actionable error messages. Good implementation.
525-606: Main function and command dispatch are well-organized.The command-line interface with start/stop/clean/status/monitor/logs/help commands is cleanly implemented with good user feedback.
340-342: Clarify the node mode flag for the sentry node.The sentry node is explicitly described as a "non-validator (follower node)" but is started with
--dev-sequencer. The flag's documentation states it "explicitly specify[ies] that running as a sequencer," which semantically contradicts the sentry node's role. Either add a comment explaining why sentry nodes use this flag in dev/test mode, or consider if a different flag or approach is needed.
| export MORPH_NODE_L2_ENGINE_RPC="http://127.0.0.1:${MPT_GETH_AUTH_PORT}" | ||
| export MORPH_NODE_L2_ENGINE_AUTH="$jwt_file" | ||
| export MORPH_NODE_MPT_TIME="$mpt_time" | ||
| export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:9545}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L1_ETH_RPC defaults to port 9545, which conflicts with MPT Geth.
Both sequencer and sentry nodes default L1_ETH_RPC to http://127.0.0.1:9545, but port 9545 is already assigned to MPT Geth's HTTP endpoint. This will likely cause issues unless L1_ETH_RPC is explicitly set by the caller.
🔎 Proposed fix
Consider using a different default port or requiring explicit configuration:
- export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:9545}"
+ export MORPH_NODE_L1_ETH_RPC="${L1_ETH_RPC:-http://127.0.0.1:8545}"Or document that L1_ETH_RPC must be set before running the test.
Also applies to: 335-335
🤖 Prompt for AI Agents
In ops/mpt-switch-test/test-mpt-switch-local.sh around lines 300 and 335, the
script defaults L1_ETH_RPC to http://127.0.0.1:9545 which conflicts with MPT
Geth’s HTTP endpoint; update both occurrences to avoid the port collision by
either choosing a non-conflicting default port (e.g., 8545 or another free port)
or require the caller to set L1_ETH_RPC and fail fast with a clear error if it
is unset, and add a short comment or doc note in the script explaining that
L1_ETH_RPC must be explicitly configured to avoid conflicts with MPT Geth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
node/types/retryable_client.go (1)
225-243: Use the actual block timestamp from the RPC response for MPT switching instead of system time.In
AssembleL2Block, the timestamp passed toswitchClientat line 228 is generated fromtime.Now().Unix()at the moment of the call. However, this approach is inconsistent with other RPC methods (ValidateL2Block,NewL2Block,NewSafeL2Block) which all use the actual block timestamp for the switch decision. If the system time differs significantly from the assembled block's timestamp, the MPT upgrade switch will trigger at the wrong time. The timestamp should be obtained from the RPC response after the block is assembled rather than generated before the call.
🤖 Fix all issues with AI agents
In @node/core/config.go:
- Line 31: The JSON tag for the L2Legacy struct field is inconsistent with the
existing L2 field; update the struct tag on L2Legacy (symbol: L2Legacy) from
`json:"l_2_legacy"` to `json:"l2_legacy"` so it follows the existing `l2` style
and maintains consistent JSON naming conventions with the L2 field (symbol: L2).
- Around line 127-131: The L2Legacy CLI flags are assigned unconditionally
(l2LegacyEthAddr, l2LegacyEngineAddr) and may be empty; add validation before
assigning to c.L2Legacy.EthAddr and c.L2Legacy.EngineAddr: check that the flag
values are non-empty (or that the flags were set) and return/configure an error
if required, or skip assignment when unset; also ensure JwtSecret (secret)
handling remains consistent with this validation so legacy client initialization
cannot receive empty addresses.
In @ops/mpt-switch-test/README.md:
- Around line 39-40: The README mentions different genesis filenames; unify
references so operators aren’t misled: pick the actual file name present (e.g.,
genesis-l2.json) and update all occurrences in the “File Structure” section and
elsewhere (including the noted block around lines 102–123) to that chosen name,
or alternatively rename the files in the repo to match the README
(genesis-zk.json / genesis-mpt.json) and update any examples, scripts, and docs
(search for genesis-l2.json, genesis-zk.json, genesis-mpt.json) so all
references are consistent.
- Around line 7-18: The markdown has fenced code blocks without language tags
causing markdownlint MD040 failures; update each ASCII diagram block (e.g., the
block containing "Before upgrade: After upgrade (swap):"
and the box art) to use ```text and any shell/snippet blocks to use ```bash, and
apply the same change to the other affected blocks noted (the blocks around the
referenced diagrams and shell snippets) so all fenced blocks include the
appropriate language tag.
In @ops/mpt-switch-test/send-txs.sh:
- Around line 24-29: The script currently hard-codes PRIVATE_KEY (and related
test addresses) which trips secret scanners; change the script to read
PRIVATE_KEY from an environment variable (e.g., ${PRIVATE_KEY}) or an untracked
local file (like .env or .secret) and exit with a clear error if it is not set;
keep FROM_ADDRESS and TO_ADDRESS configurable via env vars too (e.g.,
${FROM_ADDRESS:-...} / ${TO_ADDRESS:-...}) so defaults can be provided without
committing secrets, and ensure no default plaintext private key remains in the
code or repo.
- Around line 178-204: The bg/start/stop/status PID handling should avoid
word-splitting and ensure the .testdata dir exists: create the directory before
launching (e.g., mkdir -p "${SCRIPT_DIR}/.testdata"), when starting use nohup
... & and capture the PID into a variable (pid=$!; echo "$pid" > "$PID_FILE")
and log that variable, and everywhere you read the PID use a quoted expansion or
a read into a variable (pid="$(cat "$PID_FILE")") and call kill with the quoted
pid (kill "$pid" 2>/dev/null) and kill -0 "$pid" for status checks; also check
the PID file is non-empty before attempting kill and remove the PID file only
after a successful kill.
In @ops/mpt-switch-test/test-mpt-switch-local.sh:
- Around line 390-396: The clean_data() function currently removes
"${SCRIPT_DIR}/genesis-l2.json" which is incorrect for this script that uses
genesis-zk.json and genesis-mpt.json; update the cleanup to remove the actual
files used (e.g., genesis-zk.json and genesis-mpt.json) or remove the
genesis-l2.json deletion entirely and instead delete the correct filenames to
align with the README/scripts naming; modify the rm -f target in clean_data() to
reference the correct genesis filenames (or make it a noop for genesis-l2.json)
so cleanup matches the files created/used by this script.
- Around line 233-279: The start_geth function currently exposes admin/debug
JSON-RPC on 0.0.0.0 with wildcard CORS/VHosts; change the safe defaults to bind
HTTP/WS to loopback (replace --http.addr=0.0.0.0 and --ws.addr=0.0.0.0 with
127.0.0.1), restrict --http.api and --ws.api to non-privileged sets (remove
debug, admin, morph, engine, etc.), and set --http.corsdomain and --http.vhosts
to be restrictive (avoid "*"); keep the ability to override these defaults via
the extra_args parameter so tests can opt into broader exposure when explicitly
required.
🧹 Nitpick comments (5)
ops/mpt-switch-test/README.md (1)
43-45: Avoid user-specific absolute paths in usage examples.The
cd /Users/...path is not portable; use a repo-relative example (or$MORPH_ROOT).ops/mpt-switch-test/send-txs.sh (1)
7-8: Trim dead code or make the fallback path real (GETH_BIN + eth_sendTransaction helpers).Right now the script effectively requires
cast(Line 131+) and exits otherwise, whileGETH_BIN,send_tx(), andsend_tx_with_cast()aren’t used in the main loop. Either remove them or wire a real fallback so the script works without Foundry.Also applies to: 58-96, 130-154
ops/mpt-switch-test/test-mpt-switch-local.sh (1)
233-279: Quoting/word-splitting hardening for bash args + readiness checks.
extra_argsas a string (Line 243-275) can behave unexpectedly when flags contain spaces; alsowait_for_gethchecks only TCP/HTTP, not JSON-RPC readiness (e.g.,eth_blockNumber).Also applies to: 350-369
node/derivation/derivation.go (1)
141-141: Consider using the existing logger instance for consistency.The code creates a new logger with
tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout))instead of using the existing logger configured at line 115. This may bypass any custom logging configuration.♻️ Proposed fix to use existing logger
- l2Client: types.NewRetryableClient(laClient, leClient, aClient, eClient, cfg.L2Legacy.EthAddr, tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout))), + l2Client: types.NewRetryableClient(laClient, leClient, aClient, eClient, cfg.L2Legacy.EthAddr, logger),node/types/retryable_client.go (1)
58-98: Consider adding a timeout for the RPC call to prevent initialization hangs.The
fetchMPTForkTimefunction makes an RPC call without a timeout context, which could cause initialization to hang if the legacy geth is unresponsive.♻️ Proposed fix to add timeout
func fetchMPTForkTime(rpcURL string, logger tmlog.Logger) (uint64, error) { - client, err := rpc.Dial(rpcURL) + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + + client, err := rpc.DialContext(ctx, rpcURL) if err != nil { return 0, fmt.Errorf("failed to connect to geth: %w", err) } defer client.Close() var result json.RawMessage - if err := client.Call(&result, "eth_config"); err != nil { + if err := client.CallContext(ctx, &result, "eth_config"); err != nil { return 0, fmt.Errorf("eth_config call failed: %w", err) }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
bindings/go.sumis excluded by!**/*.sumcontracts/go.sumis excluded by!**/*.sumops/l2-genesis/go.sumis excluded by!**/*.sumops/tools/go.sumis excluded by!**/*.sumoracle/go.sumis excluded by!**/*.sumtoken-price-oracle/go.sumis excluded by!**/*.sumtx-submitter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (17)
bindings/go.modcontracts/go.modgo-ethereumnode/core/config.gonode/core/executor.gonode/derivation/config.gonode/derivation/derivation.gonode/flags/flags.gonode/types/retryable_client.goops/l2-genesis/go.modops/mpt-switch-test/README.mdops/mpt-switch-test/send-txs.shops/mpt-switch-test/test-mpt-switch-local.shops/tools/go.modoracle/go.modtoken-price-oracle/go.modtx-submitter/go.mod
✅ Files skipped from review due to trivial changes (2)
- go-ethereum
- oracle/go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
- node/flags/flags.go
- node/derivation/config.go
🧰 Additional context used
🧬 Code graph analysis (4)
node/derivation/derivation.go (1)
node/types/retryable_client.go (1)
NewRetryableClient(114-133)
node/core/executor.go (1)
node/types/retryable_client.go (1)
NewRetryableClient(114-133)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
Client(14-23)
node/core/config.go (2)
node/types/config.go (1)
L2Config(14-18)node/flags/flags.go (2)
L2LegacyEthAddr(31-35)L2LegacyEngineAddr(37-41)
🪛 Gitleaks (8.30.0)
ops/mpt-switch-test/send-txs.sh
[high] 26-26: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 markdownlint-cli2 (0.18.1)
ops/mpt-switch-test/README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
89-89: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
102-102: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
171-171: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/test-mpt-switch-local.sh
[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 196-196: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 377-377: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 378-378: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 458-458: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 470-470: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 476-476: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 534-534: Declare and assign separately to avoid masking return values.
(SC2155)
ops/mpt-switch-test/send-txs.sh
[warning] 7-7: GETH_BIN appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 36-36: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 44-44: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 52-52: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 65-65: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 107-107: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 125-125: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 126-126: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 134-134: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 192-192: Quote this to prevent word splitting.
(SC2046)
[warning] 199-199: Quote this to prevent word splitting.
(SC2046)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (14)
ops/mpt-switch-test/test-mpt-switch-local.sh (2)
315-348: Verify sentry node mode/flags (currently started with--dev-sequencer).The comments describe sentry as a non-validator follower, but it’s started with
--dev-sequencer(Line 342), which sounds like it enables block production behavior. Please confirm this is intended, or switch to the correct follower/sentry mode if one exists.
559-567: Confirm geth supports--override.mptforktimefor your shipped binary.The whole test hinges on
--override.mptforktime=$mpt_timebeing a valid flag in thebin/gethyou expect operators to use.node/core/config.go (1)
46-46: LGTM!The L2Legacy initialization follows the same pattern as the L2 configuration at line 45.
tx-submitter/go.mod (1)
12-12: Verify the commit796834acba86from morph-l2/go-ethereum and confirm stability.The go-ethereum dependency v1.10.14-0.20251211075654-796834acba86 is consistently pinned across 6+ modules in the repository (tx-submitter, ops/tools, token-price-oracle, bindings, ops/l2-genesis, node, oracle, contracts). However, the underlying commit cannot be verified via public repository access. Confirm that commit 796834acba86 exists in your internal morph-l2/go-ethereum fork, is stable, and free from security issues.
node/derivation/derivation.go (1)
81-89: LGTM! Legacy client initialization is correct.The legacy L2 client initialization properly handles errors and uses appropriate contexts for setup.
node/core/executor.go (4)
73-81: LGTM! Legacy client initialization is correct.The initialization follows the same pattern as in derivation.go with appropriate error handling.
91-91: LGTM! RetryableClient construction is correct.The construction properly passes all legacy and current clients, along with the configured logger.
317-321: LGTM! Enhanced CRITICAL logging improves observability.The enhanced logging clearly indicates when geth is behind and provides actionable guidance.
352-360: LGTM! Enhanced error logging improves debuggability.The multi-line CRITICAL logging with context and hints significantly improves the ability to diagnose NewL2Block failures, especially MPT-related issues.
node/types/retryable_client.go (3)
100-133: LGTM! RetryableClient refactor is well-designed.The expanded struct properly handles legacy and current clients with atomic state management for MPT switching. The constructor gracefully handles errors in MPT fork time fetching.
135-147: LGTM! Client routing helpers are correct and thread-safe.The aClient and eClient methods properly route to legacy or current clients based on atomic MPT state.
320-424: LGTM! RPC methods correctly route through client selectors.The remaining RPC methods properly use
aClient()oreClient()which respect the global MPT switch state. The methods that don't have timestamp information (CommitBatch, AppendBlsSignature, etc.) correctly rely on the global switch state set by previous calls.ops/l2-genesis/go.mod (1)
9-9: All go-ethereum dependencies are already consistent across modules. All 8 go.mod files (bindings, contracts, node, ops/l2-genesis, ops/tools, oracle, token-price-oracle, tx-submitter) use the same version: v1.10.14-0.20251211075654-796834acba86. No action needed.contracts/go.mod (1)
9-9: Verify the updated dependency version against the upstream repository and security advisories.The pseudo-version format is syntactically valid, but the actual commit hash (
796834acba86) and any security advisories require verification via GitHub or local testing (e.g.,go mod tidy,go mod verify). Ensure the upstream commit exists in morph-l2/go-ethereum and check for any known vulnerabilities before merging.
node/core/config.go
Outdated
| ) | ||
|
|
||
| type Config struct { | ||
| L2Legacy *types.L2Config `json:"l_2_legacy"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON tag naming inconsistency.
The JSON tag l_2_legacy uses an underscore between "2" and "legacy", while the existing L2 field at line 32 uses l2 without separating underscores. For consistency, consider using l2_legacy instead of l_2_legacy.
📝 Suggested fix
- L2Legacy *types.L2Config `json:"l_2_legacy"`
+ L2Legacy *types.L2Config `json:"l2_legacy"`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| L2Legacy *types.L2Config `json:"l_2_legacy"` | |
| L2Legacy *types.L2Config `json:"l2_legacy"` |
🤖 Prompt for AI Agents
In @node/core/config.go at line 31, The JSON tag for the L2Legacy struct field
is inconsistent with the existing L2 field; update the struct tag on L2Legacy
(symbol: L2Legacy) from `json:"l_2_legacy"` to `json:"l2_legacy"` so it follows
the existing `l2` style and maintains consistent JSON naming conventions with
the L2 field (symbol: L2).
ops/mpt-switch-test/README.md
Outdated
| ``` | ||
| Before upgrade: After upgrade (swap): | ||
| ┌─────────────┐ ┌─────────────┐ | ||
| │ Sequencer │──► ZK Geth │ Sequencer │──► MPT Geth | ||
| │ Node │ (:8545) │ Node │ (:9545) | ||
| └─────────────┘ └─────────────┘ | ||
| ──────► | ||
| ┌─────────────┐ ┌─────────────┐ | ||
| │ Sentry │──► MPT Geth │ Sentry │──► ZK Geth | ||
| │ Node │ (:9545) │ Node │ (:8545) | ||
| └─────────────┘ └─────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add fenced code block languages to satisfy markdownlint (MD040).
Add text for the ASCII diagrams/output blocks and bash for shell snippets.
Also applies to: 89-98, 102-123, 171-187
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/README.md around lines 7 - 18, The markdown has fenced
code blocks without language tags causing markdownlint MD040 failures; update
each ASCII diagram block (e.g., the block containing "Before upgrade:
After upgrade (swap):" and the box art) to use ```text and any shell/snippet
blocks to use ```bash, and apply the same change to the other affected blocks
noted (the blocks around the referenced diagrams and shell snippets) so all
fenced blocks include the appropriate language tag.
ops/mpt-switch-test/README.md
Outdated
| Genesis file (`genesis-l2.json`) is already included in the directory. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix genesis filename inconsistency (genesis-l2.json vs genesis-zk.json/genesis-mpt.json).
Line 39 mentions genesis-l2.json, but the “File Structure” section lists genesis-zk.json and genesis-mpt.json. This will mislead operators when setting up the test harness.
Also applies to: 102-123
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/README.md around lines 39 - 40, The README mentions
different genesis filenames; unify references so operators aren’t misled: pick
the actual file name present (e.g., genesis-l2.json) and update all occurrences
in the “File Structure” section and elsewhere (including the noted block around
lines 102–123) to that chosen name, or alternatively rename the files in the
repo to match the README (genesis-zk.json / genesis-mpt.json) and update any
examples, scripts, and docs (search for genesis-l2.json, genesis-zk.json,
genesis-mpt.json) so all references are consistent.
ops/mpt-switch-test/send-txs.sh
Outdated
| # Test account (needs balance in genesis) | ||
| # This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266 | ||
| PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80" | ||
| FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266" | ||
| TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hard-coded private key (will trip secret scanning + unsafe by default).
Even if it’s a “commonly used test key”, committing a private key is typically a CI blocker (gitleaks) and a bad precedent. Load it from an env var or a local untracked file instead.
Proposed fix
-# Test account (needs balance in genesis)
-# This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266
-PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
+## Test account (needs balance in genesis)
+## Do NOT hardcode private keys in-repo; provide via env var.
+## Example:
+## export MORPH_TEST_PRIVATE_KEY=0x...
+PRIVATE_KEY="${MORPH_TEST_PRIVATE_KEY:-}"
FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"
TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8" log_info "Account balance: $balance"
+
+ if [ -z "$PRIVATE_KEY" ]; then
+ echo ""
+ echo "Error: MORPH_TEST_PRIVATE_KEY is not set"
+ echo "Set it to the private key for FROM_ADDRESS (must be funded in genesis)."
+ exit 1
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Test account (needs balance in genesis) | |
| # This is a commonly used test private key, corresponding address: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266 | |
| PRIVATE_KEY="0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80" | |
| FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266" | |
| TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8" | |
| ## Test account (needs balance in genesis) | |
| ## Do NOT hardcode private keys in-repo; provide via env var. | |
| ## Example: | |
| ## export MORPH_TEST_PRIVATE_KEY=0x... | |
| PRIVATE_KEY="${MORPH_TEST_PRIVATE_KEY:-}" | |
| FROM_ADDRESS="0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266" | |
| TO_ADDRESS="0x70997970C51812dc3A010C7d01b50e0d17dc79C8" |
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 26-26: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/send-txs.sh around lines 24 - 29, The script currently
hard-codes PRIVATE_KEY (and related test addresses) which trips secret scanners;
change the script to read PRIVATE_KEY from an environment variable (e.g.,
${PRIVATE_KEY}) or an untracked local file (like .env or .secret) and exit with
a clear error if it is not set; keep FROM_ADDRESS and TO_ADDRESS configurable
via env vars too (e.g., ${FROM_ADDRESS:-...} / ${TO_ADDRESS:-...}) so defaults
can be provided without committing secrets, and ensure no default plaintext
private key remains in the code or repo.
ops/mpt-switch-test/send-txs.sh
Outdated
| PID_FILE="${SCRIPT_DIR}/.testdata/send-txs.pid" | ||
|
|
||
| case "${1:-start}" in | ||
| start) | ||
| main | ||
| ;; | ||
| bg) | ||
| log_info "Starting in background..." | ||
| nohup "$0" start > "${SCRIPT_DIR}/.testdata/send-txs.log" 2>&1 & | ||
| echo $! > "$PID_FILE" | ||
| log_success "Started (PID: $(cat $PID_FILE))" | ||
| ;; | ||
| stop) | ||
| if [ -f "$PID_FILE" ]; then | ||
| kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE" | ||
| log_success "Stopped" | ||
| else | ||
| log_info "Not running" | ||
| fi | ||
| ;; | ||
| status) | ||
| if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then | ||
| log_success "Running (PID: $(cat $PID_FILE))" | ||
| else | ||
| log_info "Not running" | ||
| fi | ||
| ;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix PID handling and quoting (ShellCheck SC2046) + ensure .testdata exists for bg mode.
kill $(cat ...) risks word-splitting, and bg may fail if .testdata/ doesn’t exist yet.
Proposed fix
PID_FILE="${SCRIPT_DIR}/.testdata/send-txs.pid"
case "${1:-start}" in
@@
bg)
log_info "Starting in background..."
+ mkdir -p "$(dirname "$PID_FILE")"
nohup "$0" start > "${SCRIPT_DIR}/.testdata/send-txs.log" 2>&1 &
echo $! > "$PID_FILE"
log_success "Started (PID: $(cat $PID_FILE))"
;;
stop)
if [ -f "$PID_FILE" ]; then
- kill $(cat "$PID_FILE") 2>/dev/null && rm "$PID_FILE"
+ pid="$(cat "$PID_FILE")"
+ kill "$pid" 2>/dev/null && rm "$PID_FILE"
log_success "Stopped"
else
log_info "Not running"
fi
;;
status)
- if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
+ if [ -f "$PID_FILE" ] && pid="$(cat "$PID_FILE")" && kill -0 "$pid" 2>/dev/null; then
log_success "Running (PID: $(cat $PID_FILE))"
else
log_info "Not running"
fi
;;🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 192-192: Quote this to prevent word splitting.
(SC2046)
[warning] 199-199: Quote this to prevent word splitting.
(SC2046)
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/send-txs.sh around lines 178 - 204, The
bg/start/stop/status PID handling should avoid word-splitting and ensure the
.testdata dir exists: create the directory before launching (e.g., mkdir -p
"${SCRIPT_DIR}/.testdata"), when starting use nohup ... & and capture the PID
into a variable (pid=$!; echo "$pid" > "$PID_FILE") and log that variable, and
everywhere you read the PID use a quoted expansion or a read into a variable
(pid="$(cat "$PID_FILE")") and call kill with the quoted pid (kill "$pid"
2>/dev/null) and kill -0 "$pid" for status checks; also check the PID file is
non-empty before attempting kill and remove the PID file only after a successful
kill.
| # Start Geth generic function | ||
| start_geth() { | ||
| local name=$1 | ||
| local geth_bin=$2 | ||
| local datadir=$3 | ||
| local http_port=$4 | ||
| local ws_port=$5 | ||
| local auth_port=$6 | ||
| local p2p_port=$7 | ||
| local pid_file=$8 | ||
| local extra_args=${9:-""} | ||
|
|
||
| log_info "Starting ${name} on ports HTTP:${http_port} AUTH:${auth_port} P2P:${p2p_port}..." | ||
| log_info " Binary: $(basename $geth_bin)" | ||
|
|
||
| local jwt_file="${TEST_DATA_DIR}/jwt-secret.txt" | ||
|
|
||
| "$geth_bin" \ | ||
| --datadir="$datadir" \ | ||
| --networkid=53077 \ | ||
| --http \ | ||
| --http.addr=0.0.0.0 \ | ||
| --http.port=$http_port \ | ||
| --http.api=web3,debug,eth,txpool,net,morph,engine,admin \ | ||
| --http.corsdomain="*" \ | ||
| --http.vhosts="*" \ | ||
| --ws \ | ||
| --ws.addr=0.0.0.0 \ | ||
| --ws.port=$ws_port \ | ||
| --ws.api=web3,debug,eth,txpool,net,morph,engine,admin \ | ||
| --ws.origins="*" \ | ||
| --authrpc.addr=0.0.0.0 \ | ||
| --authrpc.port=$auth_port \ | ||
| --authrpc.vhosts="*" \ | ||
| --authrpc.jwtsecret="$jwt_file" \ | ||
| --port=$p2p_port \ | ||
| --nodiscover \ | ||
| --gcmode=archive \ | ||
| --cache.preimages \ | ||
| --miner.gasprice=0 \ | ||
| --verbosity=3 \ | ||
| $extra_args \ | ||
| > "${TEST_DATA_DIR}/${name}.log" 2>&1 & | ||
|
|
||
| echo $! > "$pid_file" | ||
| log_success "${name} started (PID: $(cat $pid_file))" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t expose admin/debug JSON-RPC on 0.0.0.0 with CORS/VHosts wildcard.
With --http.addr=0.0.0.0, --http.api=...debug...admin..., --http.corsdomain="*", and --http.vhosts="*", anyone on the network can hit powerful APIs. Default this to loopback and restrict APIs.
Proposed fix (safer defaults)
"$geth_bin" \
--datadir="$datadir" \
--networkid=53077 \
--http \
- --http.addr=0.0.0.0 \
+ --http.addr=127.0.0.1 \
--http.port=$http_port \
- --http.api=web3,debug,eth,txpool,net,morph,engine,admin \
- --http.corsdomain="*" \
- --http.vhosts="*" \
+ --http.api=web3,eth,txpool,net,morph,engine \
+ --http.corsdomain="http://localhost" \
+ --http.vhosts="localhost" \
--ws \
- --ws.addr=0.0.0.0 \
+ --ws.addr=127.0.0.1 \
--ws.port=$ws_port \
- --ws.api=web3,debug,eth,txpool,net,morph,engine,admin \
- --ws.origins="*" \
- --authrpc.addr=0.0.0.0 \
+ --ws.api=web3,eth,txpool,net,morph,engine \
+ --ws.origins="http://localhost" \
+ --authrpc.addr=127.0.0.1 \
--authrpc.port=$auth_port \
- --authrpc.vhosts="*" \
+ --authrpc.vhosts="localhost" \
--authrpc.jwtsecret="$jwt_file" \🤖 Prompt for AI Agents
In @ops/mpt-switch-test/test-mpt-switch-local.sh around lines 233 - 279, The
start_geth function currently exposes admin/debug JSON-RPC on 0.0.0.0 with
wildcard CORS/VHosts; change the safe defaults to bind HTTP/WS to loopback
(replace --http.addr=0.0.0.0 and --ws.addr=0.0.0.0 with 127.0.0.1), restrict
--http.api and --ws.api to non-privileged sets (remove debug, admin, morph,
engine, etc.), and set --http.corsdomain and --http.vhosts to be restrictive
(avoid "*"); keep the ability to override these defaults via the extra_args
parameter so tests can opt into broader exposure when explicitly required.
| # Clean data | ||
| clean_data() { | ||
| log_info "Cleaning test data..." | ||
| rm -rf "$TEST_DATA_DIR" | ||
| rm -f "${SCRIPT_DIR}/genesis-l2.json" | ||
| log_success "Test data cleaned" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean step deletes genesis-l2.json, but this script uses genesis-zk.json/genesis-mpt.json.
Either remove the genesis-l2.json cleanup or align naming across the README/scripts to avoid surprises.
🤖 Prompt for AI Agents
In @ops/mpt-switch-test/test-mpt-switch-local.sh around lines 390 - 396, The
clean_data() function currently removes "${SCRIPT_DIR}/genesis-l2.json" which is
incorrect for this script that uses genesis-zk.json and genesis-mpt.json; update
the cleanup to remove the actual files used (e.g., genesis-zk.json and
genesis-mpt.json) or remove the genesis-l2.json deletion entirely and instead
delete the correct filenames to align with the README/scripts naming; modify the
rm -f target in clean_data() to reference the correct genesis filenames (or make
it a noop for genesis-l2.json) so cleanup matches the files created/used by this
script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @node/derivation/derivation.go:
- Line 163: The l2Client construction is creating a new TM logger with os.Stdout
instead of using the passed-in logger; update the NewRetryableClient call (where
l2Client is assigned) to pass the configured logger variable (e.g., logger or
cfg.Logger) instead of tmlog.NewTMLogger(tmlog.NewSyncWriter(os.Stdout)) so it
matches how executor.go passes config.Logger to clients.
In @node/types/retryable_client.go:
- Around line 239-282: The loop in switchClient that repeatedly calls
rc.nextEthClient.BlockNumber can block forever if next geth is unreachable; make
it bounded and abortable by adding a configurable max wait and/or max retries
and by honoring context cancellation: introduce a config value (e.g.,
maxWaitDuration or maxRetries) read by switchClient, replace the for loop with a
select over ticker.C and ctx.Done() (and a time.After for the overall timeout),
increment an error counter or retry count on failures from
rc.nextEthClient.BlockNumber/HeaderByNumber and break/return an error after
exceeding maxRetries or when the overall timeout elapses, and ensure you return
a clear error state instead of looping indefinitely (update rc.switched only on
success as currently done).
🧹 Nitpick comments (6)
ops/mpt-switch-test/test-mpt-switch-local.sh (3)
28-28: Unused variableMORPH_ROOT.This variable is declared but never used in the script.
🔧 Suggested fix
-MORPH_ROOT="${SCRIPT_DIR}/../.." +# MORPH_ROOT="${SCRIPT_DIR}/../.." # Uncomment if neededOr remove entirely if not planned for future use.
352-370: Consider a more robust health check.The current check only verifies TCP connectivity. For production-grade testing, consider calling a JSON-RPC method like
eth_blockNumberto confirm geth is fully operational.🔧 Suggested enhancement
- while ! curl -s "http://127.0.0.1:${port}" > /dev/null 2>&1; do + while ! curl -s -X POST -H "Content-Type: application/json" \ + --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ + "http://127.0.0.1:${port}" 2>/dev/null | grep -q '"result"'; do
569-574: Hardcoded sleep delays could be fragile.The
sleep 2statements between service starts could be replaced with readiness checks for more reliable startup sequencing. However, this is acceptable for a local test script.node/core/config.go (1)
127-136: Consider warning when only one L2Next address is provided.If a user provides
l2next.ethbut notl2next.engine(or vice versa), the configuration is silently ignored. Consider logging a warning to help users identify misconfiguration.🔧 Suggested enhancement
// L2Next is optional - only for upgrade switch (e.g., ZK to MPT) l2NextEthAddr := ctx.GlobalString(flags.L2NextEthAddr.Name) l2NextEngineAddr := ctx.GlobalString(flags.L2NextEngineAddr.Name) if l2NextEthAddr != "" && l2NextEngineAddr != "" { c.L2Next = &types.L2Config{ EthAddr: l2NextEthAddr, EngineAddr: l2NextEngineAddr, JwtSecret: secret, // same secret } + } else if l2NextEthAddr != "" || l2NextEngineAddr != "" { + logger.Warn("L2Next partially configured - both l2next.eth and l2next.engine are required for upgrade switch") }node/types/retryable_client.go (2)
150-162: Fallback design prevents nil checks but could mask issues.When next clients are not configured, they fallback to current clients (lines 156-157). This simplifies the code but means
switched.Load()could return true while actually using the same client. Consider adding a field to track if actual switching is possible.
324-359: Verbose logging helpful for debugging but noisy in production.The detailed logging at each step of
NewL2Blockis helpful for debugging the upgrade process. Consider making this configurable or reducing to debug level after the feature is stable.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
node/core/config.gonode/core/executor.gonode/derivation/config.gonode/derivation/derivation.gonode/flags/flags.gonode/types/retryable_client.goops/mpt-switch-test/test-mpt-switch-local.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- node/derivation/config.go
🧰 Additional context used
🧬 Code graph analysis (4)
node/derivation/derivation.go (2)
node/types/retryable_client.go (2)
FetchGethConfig(64-112)NewRetryableClient(140-192)node/flags/flags.go (1)
ChallengeEnable(185-189)
node/core/config.go (2)
node/types/config.go (1)
L2Config(14-18)node/flags/flags.go (2)
L2NextEthAddr(31-35)L2NextEngineAddr(37-41)
node/types/retryable_client.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/db/store.go (1)
Store(16-18)
node/core/executor.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/types/retryable_client.go (1)
NewRetryableClient(140-192)
🪛 Shellcheck (0.11.0)
ops/mpt-switch-test/test-mpt-switch-local.sh
[warning] 28-28: MORPH_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 196-196: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 378-378: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 379-379: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 459-459: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 471-471: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 477-477: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 535-535: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (14)
ops/mpt-switch-test/test-mpt-switch-local.sh (3)
342-344: Sentry node using--dev-sequencerflag seems contradictory.The comments indicate this is a non-validator follower node (Lines 315-318), but it's started with
--dev-sequencer. Is this intentional for the test environment, or should the sentry use different flags?
253-268: Security settings are appropriate for local testing only.The wildcard CORS (
--http.corsdomain="*",--ws.origins="*") and vhost settings (--http.vhosts="*",--authrpc.vhosts="*") are acceptable for a local test script but should never be used in production. Consider adding a comment to clarify this is test-only configuration.
372-389: LGTM - Stop function handles cleanup appropriately.The stop function properly iterates through PID files, checks if processes are running, and cleans up. Using default SIGTERM is appropriate for graceful shutdown.
node/core/config.go (1)
32-32: LGTM - L2Next field properly declared as optional.The
omitemptyJSON tag and nil default ensure backward compatibility.node/flags/flags.go (2)
31-41: LGTM - New CLI flags follow existing conventions.The new flags are well-documented with appropriate usage text indicating they are optional and for upgrades. Naming convention is consistent with existing L2 flags.
319-320: LGTM - Flags properly registered.The new flags are correctly added to the Flags slice alongside the existing L2 endpoint flags.
node/core/executor.go (3)
83-98: LGTM - L2Next client initialization properly guarded.The nil checks ensure backward compatibility when L2Next is not configured. Logging clearly indicates whether upgrade switch is enabled.
296-320: Good handling of post-switch block delivery edge case.The logic correctly handles the scenario where a block exists in the legacy geth (received via P2P) but not yet in the target geth after switching. The re-check of height after
EnsureSwitchedensures blocks are properly delivered to the new client.
361-369: Helpful error context for debugging upgrade issues.The banner-style logging and MPT compatibility hint will help operators diagnose issues during upgrades. Consider whether this logging level should be FATAL to halt the node more visibly.
node/derivation/derivation.go (2)
286-322: Validation skip logic during upgrade is critical - verify thoroughly.The skip condition
(beforeSwitch && !useZktrie) || (!beforeSwitch && useZktrie)handles the root mismatch during upgrade transition. This is security-sensitive as it bypasses state root validation.Please verify:
- The condition correctly identifies only legitimate mismatch scenarios during upgrade
- After the upgrade window, validation resumes normally
- Consider adding metrics to track how often validation is skipped
142-148: LGTM - Geth config fetch failure is correctly fatal.Failing to fetch the geth configuration would lead to incorrect validation behavior during upgrades. Making this fatal at startup is the right choice.
node/types/retryable_client.go (3)
249-249: Verify the sync condition allows switching when 1 block behind.The condition
remote+1 >= numberallows switching when the next geth is at blocknumber-1. Is this intentional to allow the current block to be delivered to the next geth, or should it beremote >= numberto ensure the target block already exists?
63-112: LGTM - Robust config parsing with fallback sources.The cascading fallback (current → next → last) for MPT fork time handles various geth configurations gracefully. Proper resource cleanup with
defer client.Close().
195-207: LGTM - Thread-safe client routing.The
atomic.Boolensures thread-safe reads of the switched state.
- Add ops/mpt-switch-test to .gitignore - This directory is for local testing only
The go-ethereum changes should be managed in go-ethereum repo, not morph repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @node/derivation/derivation.go:
- Around line 93-109: The aNextClient created by authclient.DialContext can leak
if ethclient.Dial for eNextClient fails; after authclient.DialContext returns
aNextClient successfully (in the L2Next block), either defer aNextClient.Close()
immediately or explicitly call aNextClient.Close() before returning the error
from ethclient.Dial so the client is always closed on the error path; update the
L2Next initialization in derivation.go to ensure aNextClient is cleaned up when
ethclient.Dial fails (referencing aNextClient, eNextClient,
authclient.DialContext and ethclient.Dial).
- Around line 289-291: The fork activation check treats blocks with timestamp
equal to d.switchTime as before the switch; update the condition in the block
that computes beforeSwitch (using lastHeader.Time and d.switchTime) from a <=
comparison to a < comparison so that blocks at exactly d.switchTime are
considered post-switch; ensure the surrounding conditional that compares
beforeSwitch with d.useZktrie (the if (beforeSwitch && !d.useZktrie) ||
(!beforeSwitch && d.useZktrie) branch) remains intact.
🧹 Nitpick comments (1)
node/derivation/derivation.go (1)
290-300: Incorrect log level for expected upgrade transition behavior.Using
logger.Errorfor an expected condition during upgrade transition is misleading. This will pollute error logs during planned upgrades. Consider usingInfoorWarnlevel instead.Additionally, the log message doesn't include withdrawal root information when
withdrawalMismatchis the cause.Suggested fix
if (beforeSwitch && !d.useZktrie) || (!beforeSwitch && d.useZktrie) { skipValidation = true - d.logger.Error("Root validation skipped during upgrade transition", + d.logger.Info("Root validation skipped during upgrade transition", "originStateRootHash", batchInfo.root, "deriveStateRootHash", lastHeader.Root.Hex(), + "batchWithdrawalRoot", batchInfo.withdrawalRoot.Hex(), + "deriveWithdrawalRoot", common.BytesToHash(withdrawalRoot[:]).Hex(), "blockTimestamp", lastHeader.Time, "switchTime", d.switchTime, "useZktrie", d.useZktrie, ) }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
node/derivation/derivation.go
🧰 Additional context used
🧬 Code graph analysis (1)
node/derivation/derivation.go (3)
tx-submitter/iface/client.go (1)
Client(14-23)node/types/retryable_client.go (2)
FetchGethConfig(64-112)NewRetryableClient(140-192)node/flags/flags.go (1)
ChallengeEnable(185-189)
🔇 Additional comments (4)
node/derivation/derivation.go (4)
66-69: LGTM!The new fields for tracking geth upgrade configuration are well-placed and appropriately typed. The naming is consistent with usage in
retryable_client.go.
141-147: LGTM!Fetching geth config at startup with appropriate error handling. The fatal behavior on fetch failure is acceptable since it indicates geth connectivity issues.
162-162: LGTM!The
NewRetryableClientinvocation correctly passes the L2Next clients for upgrade switch support.
285-320: Security-sensitive: Root validation bypass during upgrade.This logic bypasses critical root hash validation during upgrade transitions. While the conditions seem reasonable, verify that:
- An attacker cannot manipulate
switchTimeoruseZktrieto bypass validation outside of legitimate upgrade scenarios- The geth config fetch (via
eth_configRPC) is trusted and cannot be spoofedThe bypass conditions are:
switchTime > 0AND- (
beforeSwitch && !useZktrie) OR (!beforeSwitch && useZktrie)
|
|
||
| // L2Next endpoint (optional - for upgrade switch) | ||
| var aNextClient *authclient.Client | ||
| var eNextClient *ethclient.Client | ||
| if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" { | ||
| aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr) | ||
| } else { | ||
| logger.Info("L2Next geth not configured (no upgrade switch)") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak on error path.
If eNextClient dial fails at line 102, the already-created aNextClient is never closed. Consider using defer-based cleanup or explicit cleanup on error paths.
Suggested fix pattern
// L2Next endpoint (optional - for upgrade switch)
var aNextClient *authclient.Client
var eNextClient *ethclient.Client
if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" {
aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret)
if err != nil {
return nil, err
}
eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr)
if err != nil {
+ aNextClient.Close()
return nil, err
}
logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // L2Next endpoint (optional - for upgrade switch) | |
| var aNextClient *authclient.Client | |
| var eNextClient *ethclient.Client | |
| if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" { | |
| aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret) | |
| if err != nil { | |
| return nil, err | |
| } | |
| eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr) | |
| if err != nil { | |
| return nil, err | |
| } | |
| logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr) | |
| } else { | |
| logger.Info("L2Next geth not configured (no upgrade switch)") | |
| } | |
| // L2Next endpoint (optional - for upgrade switch) | |
| var aNextClient *authclient.Client | |
| var eNextClient *ethclient.Client | |
| if cfg.L2Next != nil && cfg.L2Next.EngineAddr != "" && cfg.L2Next.EthAddr != "" { | |
| aNextClient, err = authclient.DialContext(context.Background(), cfg.L2Next.EngineAddr, cfg.L2Next.JwtSecret) | |
| if err != nil { | |
| return nil, err | |
| } | |
| eNextClient, err = ethclient.Dial(cfg.L2Next.EthAddr) | |
| if err != nil { | |
| aNextClient.Close() | |
| return nil, err | |
| } | |
| logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", cfg.L2Next.EngineAddr, "ethAddr", cfg.L2Next.EthAddr) | |
| } else { | |
| logger.Info("L2Next geth not configured (no upgrade switch)") | |
| } |
🤖 Prompt for AI Agents
In @node/derivation/derivation.go around lines 93 - 109, The aNextClient created
by authclient.DialContext can leak if ethclient.Dial for eNextClient fails;
after authclient.DialContext returns aNextClient successfully (in the L2Next
block), either defer aNextClient.Close() immediately or explicitly call
aNextClient.Close() before returning the error from ethclient.Dial so the client
is always closed on the error path; update the L2Next initialization in
derivation.go to ensure aNextClient is cleaned up when ethclient.Dial fails
(referencing aNextClient, eNextClient, authclient.DialContext and
ethclient.Dial).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is process startup initialization. If dial fails, the process exits immediately and OS reclaims all resources. No actual leak occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@curryxbo, you're right—in a startup initialization path where the entire process would exit on error, the OS will reclaim the connection resources. The practical impact is minimal in this specific context. Thanks for the clarification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @node/types/retryable_client.go:
- Around line 216-283: The switchClient method can block forever while polling
rc.nextEthClient, so add a bounded retry/timeout and single-writer guarantee:
introduce a context-aware timeout or maxRetries and bail out with a clear
error/log if the next node doesn't reach target within that limit (use ctx with
deadline or a local retry counter around the loop that calls
rc.nextEthClient.BlockNumber and HeaderByNumber), ensure only one goroutine runs
the switch by protecting the critical section with a sync.Mutex or sync.Once
(guard the polling loop and the rc.switched.Store(true) path using the chosen
sync primitive) and change the periodic status log call (currently using
rc.logger.Error for waiting state) to rc.logger.Warn or Info; keep all other
logging and the existing remote/block checks in switchClient.
🧹 Nitpick comments (1)
node/types/retryable_client.go (1)
324-359: Consider reducing log verbosity for production.The extensive logging in
NewL2Block(5 separate log statements per call) is useful for debugging the upgrade process but may be too noisy for production. Consider:
- Using
Debuglevel instead ofInfofor some messages- Consolidating into fewer log statements
- Adding a verbose/debug flag to control this
♻️ Example consolidation
func (rc *RetryableClient) NewL2Block(ctx context.Context, executableL2Data *catalyst.ExecutableL2Data, batchHash *common.Hash) (err error) { - rc.logger.Info("NewL2Block called", + rc.logger.Debug("NewL2Block called", "block_number", executableL2Data.Number, "parent_hash", executableL2Data.ParentHash.Hex(), - "state_root", executableL2Data.StateRoot.Hex(), - "switched", rc.switched.Load()) + "switched", rc.switched.Load()) rc.switchClient(ctx, executableL2Data.Timestamp, executableL2Data.Number) - rc.logger.Info("After switchClient", - "block_number", executableL2Data.Number, - "switched", rc.switched.Load()) - if retryErr := backoff.Retry(func() error { - rc.logger.Info("Sending block to geth", - "block_number", executableL2Data.Number, - "using_next_client", rc.switched.Load()) - respErr := rc.aClient().NewL2Block(ctx, executableL2Data, batchHash)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
node/derivation/derivation.gonode/types/retryable_client.go
🚧 Files skipped from review as they are similar to previous changes (1)
- node/derivation/derivation.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (8)
node/types/retryable_client.go (8)
34-61: LGTM!The config types are well-structured for parsing the
eth_configRPC response. The use of pointer types for optional fields (*morphExtension,*uint64) is appropriate.
63-112: LGTM with a note on convention.The function correctly fetches and parses the
eth_configresponse with proper cleanup (defer client.Close()). The fallback logic through current → next → last is clear.Note:
SwitchTime = 0is used to indicate "switch disabled." This implicit convention works since Line 166 checksswitchTime > 0, but consider documenting this in theGethConfigstruct comments for clarity.
114-121: LGTM!Simple internal wrapper that extracts only the
SwitchTimefromFetchGethConfig.
123-132: LGTM!The struct extension is clean. Using
atomic.Boolforswitchedis the correct choice for thread-safe access across concurrent RPC calls.
195-207: LGTM!The client routing helpers are clean and thread-safe. The fallback initialization in
NewRetryableClientensures these never return nil.
285-303: LGTM with note on switch placement.The switch check is performed inside the retry loop. This is fine since
switchClientis idempotent after the switch completes (early return whenswitched.Load()is true). Using wall-clock time for the switch decision during block assembly is appropriate.
380-484: Note: Read operations don't trigger switch proactively.
BlockNumber,HeaderByNumber,CallContract, andCodeAtuseeClient()without callingswitchClient. This means if these are called after the switch time but before any block-writing operation triggers the switch, they'll read from the old geth.This may be intentional (switch only happens during block processing), but verify this is the desired behavior. If reads should also use the new geth after switch time, consider adding:
func (rc *RetryableClient) BlockNumber(ctx context.Context) (ret uint64, err error) { // Optionally check switch time for read operations if rc.switchTime > 0 && uint64(time.Now().Unix()) >= rc.switchTime { rc.EnsureSwitched(ctx, uint64(time.Now().Unix()), 0) } // ... existing code }Is it intentional that read-only RPC methods (
BlockNumber,HeaderByNumber, etc.) don't proactively trigger the client switch?
134-192: CallReset()on the exponential backoff before first use.The
backoff.NewExponentialBackOff()must haveReset()called before it is used. According to the cenkalti/backoff package documentation, Reset initializes the internal currentInterval and startTime that the backoff relies on. Without calling Reset(), the first retry will behave unexpectedly.Fix
rc := &RetryableClient{ authClient: authClient, ethClient: ethClient, nextAuthClient: nextAuthClient, nextEthClient: nextEthClient, switchTime: switchTime, b: backoff.NewExponentialBackOff(), logger: logger, } + rc.b.Reset()Likely an incorrect or invalid review comment.
| func (rc *RetryableClient) switchClient(ctx context.Context, timeStamp uint64, number uint64) { | ||
| if rc.switched.Load() { | ||
| return | ||
| } | ||
| if timeStamp < rc.switchTime { | ||
| return | ||
| } | ||
|
|
||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("GETH UPGRADE: Switch time reached!") | ||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("Switch time reached, switching from current client to next client", | ||
| "switch_time", rc.switchTime, | ||
| "current_time", timeStamp, | ||
| "target_block", number) | ||
| rc.logger.Info("Current status: connected to current geth, waiting for next geth to sync...") | ||
|
|
||
| ticker := time.NewTicker(500 * time.Millisecond) | ||
| defer ticker.Stop() | ||
|
|
||
| startTime := time.Now() | ||
| lastLogTime := startTime | ||
|
|
||
| for { | ||
| remote, err := rc.nextEthClient.BlockNumber(ctx) | ||
| if err != nil { | ||
| rc.logger.Error("Failed to get next geth block number", | ||
| "error", err, | ||
| "hint", "Please ensure next geth is running and accessible") | ||
| <-ticker.C | ||
| continue | ||
| } | ||
|
|
||
| if remote+1 >= number { | ||
| // Get next geth's latest block hash for debugging | ||
| targetHeader, headerErr := rc.nextEthClient.HeaderByNumber(ctx, big.NewInt(int64(remote))) | ||
| targetBlockHash := "unknown" | ||
| targetStateRoot := "unknown" | ||
| if headerErr == nil && targetHeader != nil { | ||
| targetBlockHash = targetHeader.Hash().Hex() | ||
| targetStateRoot = targetHeader.Root.Hex() | ||
| } | ||
|
|
||
| rc.switched.Store(true) | ||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("GETH UPGRADE: Successfully switched!") | ||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("Successfully switched to next client", | ||
| "remote_block", remote, | ||
| "target_block", number, | ||
| "target_block_hash", targetBlockHash, | ||
| "target_state_root", targetStateRoot, | ||
| "wait_duration", time.Since(startTime)) | ||
| return | ||
| } | ||
|
|
||
| if time.Since(lastLogTime) >= 5*time.Second { | ||
| rc.logger.Error("!!! WAITING: Node BLOCKED waiting for next geth !!!", | ||
| "next_geth_block", remote, | ||
| "target_block", number, | ||
| "blocks_behind", number-remote-1, | ||
| "wait_duration", time.Since(startTime)) | ||
| lastLogTime = time.Now() | ||
| } | ||
|
|
||
| <-ticker.C | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infinite blocking loop can halt the node permanently.
The switchClient method enters an unbounded loop (lines 239-282) waiting for the next geth to sync. If the next geth is misconfigured, unreachable, or permanently behind, this will block the caller forever with no way to recover.
Additionally:
- Multiple concurrent callers can enter the waiting loop simultaneously before
switched.Store(true)at line 259. - Line 273 uses
logger.Errorfor a status message that's more of a warning.
Consider adding:
- A maximum timeout or retry limit with graceful degradation
- A mutex or
sync.Onceto ensure only one goroutine performs the switch logic
🛠️ Suggested approach with timeout
func (rc *RetryableClient) switchClient(ctx context.Context, timeStamp uint64, number uint64) {
if rc.switched.Load() {
return
}
if timeStamp < rc.switchTime {
return
}
+ // Add a reasonable timeout for the switch operation
+ switchCtx, cancel := context.WithTimeout(ctx, 30*time.Minute)
+ defer cancel()
// ... existing logging ...
ticker := time.NewTicker(500 * time.Millisecond)
defer ticker.Stop()
for {
+ select {
+ case <-switchCtx.Done():
+ rc.logger.Error("Switch timeout exceeded, failing switch operation")
+ return // or panic, depending on desired behavior
+ default:
+ }
+
remote, err := rc.nextEthClient.BlockNumber(ctx)
// ... rest of loop ...
}
}🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 216 - 283, The switchClient
method can block forever while polling rc.nextEthClient, so add a bounded
retry/timeout and single-writer guarantee: introduce a context-aware timeout or
maxRetries and bail out with a clear error/log if the next node doesn't reach
target within that limit (use ctx with deadline or a local retry counter around
the loop that calls rc.nextEthClient.BlockNumber and HeaderByNumber), ensure
only one goroutine runs the switch by protecting the critical section with a
sync.Mutex or sync.Once (guard the polling loop and the rc.switched.Store(true)
path using the chosen sync primitive) and change the periodic status log call
(currently using rc.logger.Error for waiting state) to rc.logger.Warn or Info;
keep all other logging and the existing remote/block checks in switchClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @node/types/retryable_client.go:
- Around line 419-420: The log message in the HeaderByNumber retry path is
incorrect; update the rc.logger.Info call used when respErr != nil inside
HeaderByNumber to reflect the proper operation (e.g., change the message from
"failed to call BlockNumber" to "failed to call HeaderByNumber"), keeping the
same structured fields ("error", respErr) and using the existing rc.logger.Info
invocation so locating the change is done in the HeaderByNumber method where
respErr is checked.
- Around line 239-282: The polling loop that waits for next geth sync (using
rc.nextEthClient.BlockNumber and HeaderByNumber) lacks context cancellation and
a maximum timeout, risking indefinite hang; update the loop to select on three
cases: ctx.Done() to abort and return an error, a timeout channel (configurable
default e.g., 30 minutes) to abort with a clear log/error if exceeded, and
ticker.C for the existing polling path; ensure you propagate the cancel reason
(ctx.Err() or timeout) in the log and return, and keep the existing success path
that sets rc.switched.Store(true) and logs timing (use startTime) while removing
the unconditional blocking read from ticker.C.
🧹 Nitpick comments (1)
node/types/retryable_client.go (1)
140-162: Potential issue: Falling back next clients to current clients may mask configuration errors.When
nextAuthClientornextEthClientis nil, bothnextAuthClientandnextEthClientare set to the current clients. This meansaClient()andeClient()will silently return the current client even afterswitchedbecomes true. Consider logging a warning or returning an error if switch is expected but next clients aren't configured.Also, the condition checks if either is nil but sets both to fallback. If only one is nil (misconfiguration), the other valid next client is discarded.
Suggested improvement
// If next client is not configured, disable switch if nextAuthClient == nil || nextEthClient == nil { + if nextAuthClient != nil || nextEthClient != nil { + logger.Error("Partial L2Next configuration: both nextAuthClient and nextEthClient must be provided or neither") + } logger.Info("L2Next client not configured, switch disabled")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
node/types/retryable_client.go
🧰 Additional context used
🧬 Code graph analysis (1)
node/types/retryable_client.go (1)
tx-submitter/iface/client.go (1)
Client(14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (7)
node/types/retryable_client.go (7)
34-61: LGTM!The configuration types are well-structured for parsing the eth_config RPC response. Good use of pointer types for optional fields like
MPTForkTime.
83-87: Consider fallback forUseZktriesimilar toMPTForkTime.
UseZktrieis only fetched fromresp.Current.Morph, butMPTForkTimehas fallback logic toNextandLastconfigs. IfCurrentis nil butNext/Lastexists,UseZktriewill default tofalse. Verify this is the intended behavior.
114-121: LGTM!Clean internal helper that delegates to
FetchGethConfig.
195-207: LGTM!Clean routing helpers with proper atomic read for the switch state.
249-270: Verify the sync conditionremote+1 >= number.The condition allows switching when the next geth is one block behind the target (
remote == number - 1). This seems intentional but could cause issues if subsequent operations immediately require the target block from the next geth.If the intent is to wait for next geth to have the target block, the condition should be
remote >= number. Please verify this is the intended behavior.
285-303: LGTM!Using current wall clock time for block assembly is appropriate since this creates a new block. The switch check and client routing are correctly implemented.
364-396: Verify:CommitBatchandAppendBlsSignaturedon't trigger client switch.These methods use
aClient()but don't callswitchClient(). If called after the switch time but before any block operation triggers the switch, they'll use the old client.This might be intentional (e.g., batches should complete on the current client before switching), but please verify this is the expected behavior during the upgrade window.
| for { | ||
| remote, err := rc.nextEthClient.BlockNumber(ctx) | ||
| if err != nil { | ||
| rc.logger.Error("Failed to get next geth block number", | ||
| "error", err, | ||
| "hint", "Please ensure next geth is running and accessible") | ||
| <-ticker.C | ||
| continue | ||
| } | ||
|
|
||
| if remote+1 >= number { | ||
| // Get next geth's latest block hash for debugging | ||
| targetHeader, headerErr := rc.nextEthClient.HeaderByNumber(ctx, big.NewInt(int64(remote))) | ||
| targetBlockHash := "unknown" | ||
| targetStateRoot := "unknown" | ||
| if headerErr == nil && targetHeader != nil { | ||
| targetBlockHash = targetHeader.Hash().Hex() | ||
| targetStateRoot = targetHeader.Root.Hex() | ||
| } | ||
|
|
||
| rc.switched.Store(true) | ||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("GETH UPGRADE: Successfully switched!") | ||
| rc.logger.Info("========================================") | ||
| rc.logger.Info("Successfully switched to next client", | ||
| "remote_block", remote, | ||
| "target_block", number, | ||
| "target_block_hash", targetBlockHash, | ||
| "target_state_root", targetStateRoot, | ||
| "wait_duration", time.Since(startTime)) | ||
| return | ||
| } | ||
|
|
||
| if time.Since(lastLogTime) >= 5*time.Second { | ||
| rc.logger.Error("!!! WAITING: Node BLOCKED waiting for next geth !!!", | ||
| "next_geth_block", remote, | ||
| "target_block", number, | ||
| "blocks_behind", number-remote-1, | ||
| "wait_duration", time.Since(startTime)) | ||
| lastLogTime = time.Now() | ||
| } | ||
|
|
||
| <-ticker.C | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Blocking loop lacks timeout and context cancellation, risking indefinite hang.
The polling loop that waits for the next geth to sync has no exit conditions other than successful synchronization:
- No timeout: If the next geth is down or never catches up, this blocks forever.
- No context cancellation: The
ctxparameter is passed butctx.Done()is never checked. A cancelled context won't stop the loop.
This could cause the entire node to hang during an upgrade if the next geth is unavailable.
Proposed fix
for {
+ select {
+ case <-ctx.Done():
+ rc.logger.Error("Context cancelled while waiting for next geth to sync",
+ "error", ctx.Err(),
+ "wait_duration", time.Since(startTime))
+ return
+ default:
+ }
+
remote, err := rc.nextEthClient.BlockNumber(ctx)
if err != nil {
rc.logger.Error("Failed to get next geth block number",
"error", err,
"hint", "Please ensure next geth is running and accessible")
- <-ticker.C
+ select {
+ case <-ctx.Done():
+ return
+ case <-ticker.C:
+ }
continue
}Consider also adding a maximum wait timeout (e.g., configurable or a sensible default like 30 minutes) to prevent indefinite blocking in production.
🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 239 - 282, The polling loop that
waits for next geth sync (using rc.nextEthClient.BlockNumber and HeaderByNumber)
lacks context cancellation and a maximum timeout, risking indefinite hang;
update the loop to select on three cases: ctx.Done() to abort and return an
error, a timeout channel (configurable default e.g., 30 minutes) to abort with a
clear log/error if exceeded, and ticker.C for the existing polling path; ensure
you propagate the cancel reason (ctx.Err() or timeout) in the log and return,
and keep the existing success path that sets rc.switched.Store(true) and logs
timing (use startTime) while removing the unconditional blocking read from
ticker.C.
| if respErr != nil { | ||
| rc.logger.Info("failed to call BlockNumber", "error", respErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect log message.
The log message says "failed to call BlockNumber" but this is the HeaderByNumber method.
Proposed fix
- rc.logger.Info("failed to call BlockNumber", "error", respErr)
+ rc.logger.Info("failed to call HeaderByNumber", "error", respErr)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if respErr != nil { | |
| rc.logger.Info("failed to call BlockNumber", "error", respErr) | |
| if respErr != nil { | |
| rc.logger.Info("failed to call HeaderByNumber", "error", respErr) |
🤖 Prompt for AI Agents
In @node/types/retryable_client.go around lines 419 - 420, The log message in
the HeaderByNumber retry path is incorrect; update the rc.logger.Info call used
when respErr != nil inside HeaderByNumber to reflect the proper operation (e.g.,
change the message from "failed to call BlockNumber" to "failed to call
HeaderByNumber"), keeping the same structured fields ("error", respErr) and
using the existing rc.logger.Info invocation so locating the change is done in
the HeaderByNumber method where respErr is checked.
# Conflicts: # bindings/go.mod # bindings/go.sum # contracts/go.mod # contracts/go.sum # node/go.mod # node/go.sum # ops/l2-genesis/go.mod # ops/l2-genesis/go.sum # ops/tools/go.mod # ops/tools/go.sum # oracle/go.mod # oracle/go.sum # token-price-oracle/go.mod # token-price-oracle/go.sum # tx-submitter/go.mod # tx-submitter/go.sum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
node/core/executor.go (1)
87-99: Consider logging a warning for partial L2Next configuration.If
L2Nextis configured but only one ofEngineAddrorEthAddris provided (non-empty), the partial configuration is silently ignored. This could mask a deployment misconfiguration.💡 Suggested improvement
if config.L2Next != nil && config.L2Next.EngineAddr != "" && config.L2Next.EthAddr != "" { aNextClient, err = authclient.DialContext(context.Background(), config.L2Next.EngineAddr, config.L2Next.JwtSecret) if err != nil { return nil, err } eNextClient, err = ethclient.Dial(config.L2Next.EthAddr) if err != nil { return nil, err } logger.Info("L2Next geth configured (upgrade switch enabled)", "engineAddr", config.L2Next.EngineAddr, "ethAddr", config.L2Next.EthAddr) + } else if config.L2Next != nil && (config.L2Next.EngineAddr != "" || config.L2Next.EthAddr != "") { + logger.Warn("L2Next partially configured - both EngineAddr and EthAddr are required for upgrade switch", + "engineAddr", config.L2Next.EngineAddr, + "ethAddr", config.L2Next.EthAddr) } else { logger.Info("L2Next geth not configured (no upgrade switch)") }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
bindings/go.sumis excluded by!**/*.sumcontracts/go.sumis excluded by!**/*.sumnode/go.sumis excluded by!**/*.sumoracle/go.sumis excluded by!**/*.sumtoken-price-oracle/go.sumis excluded by!**/*.sumtx-submitter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
Makefilebindings/go.modcontracts/go.modnode/core/config.gonode/core/executor.gonode/flags/flags.gonode/go.modoracle/go.modtoken-price-oracle/go.modtx-submitter/go.mod
🚧 Files skipped from review as they are similar to previous changes (5)
- node/core/config.go
- node/go.mod
- contracts/go.mod
- node/flags/flags.go
- tx-submitter/go.mod
🧰 Additional context used
🧬 Code graph analysis (1)
node/core/executor.go (2)
tx-submitter/iface/client.go (1)
Client(14-23)node/types/retryable_client.go (1)
NewRetryableClient(140-192)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test
- GitHub Check: check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: check
- GitHub Check: test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (7)
Makefile (1)
3-3: LGTM - Version bump aligns with go.mod updates across modules.The
ETHEREUM_TARGET_VERSIONupdate is consistent with the corresponding dependency updates in all modulego.modfiles within this PR.token-price-oracle/go.mod (1)
12-12: LGTM - Dependency version aligned with project-wide update.oracle/go.mod (1)
10-10: LGTM - Dependency version aligned with project-wide update.bindings/go.mod (1)
7-7: LGTM - Dependency version aligned with project-wide update.node/core/executor.go (3)
303-327: LGTM - Switch handling logic is well-designed.The flow correctly:
- Checks if the switch should occur based on block timestamp
- Re-validates block height against the potentially new geth client
- Continues delivery if the block doesn't exist in the target geth after switch
This handles the edge case where a block exists in legacy geth (via P2P sync) but needs to be delivered to the new MPT geth after switching.
332-337: LGTM - Clear critical error messaging with actionable guidance.The enhanced logging provides operators with clear context (
consensus_block,geth_height) and an actionable directive when the node is blocked due to geth being behind.
367-377: LGTM - Enhanced error logging aids debugging during MPT transition.The banner-style critical error with block details and the MPT compatibility hint will help operators quickly identify and resolve issues during the upgrade process.
| continue | ||
| } | ||
|
|
||
| if remote+1 >= number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the debug code?
| startTime := time.Now() | ||
| lastLogTime := startTime | ||
|
|
||
| for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add maxWaitTime here? because this function will block the normal new block process
| beforeSwitch := lastHeader.Time < d.switchTime | ||
| if (beforeSwitch && !d.useZktrie) || (!beforeSwitch && d.useZktrie) { | ||
| skipValidation = true | ||
| d.logger.Error("Root validation skipped during upgrade transition", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.logger.Info("xxx")
| return | ||
| } | ||
| } | ||
| d.logger.Info("root hash or withdrawal hash is not equal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root hash not equal should d.logger.Error("xxx")
Co-authored-by: corey <corey.zhang@bitget.com>
Summary by CodeRabbit
New Features
Enhancements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.