Skip to content

Conversation

@phutchins
Copy link
Contributor

@phutchins phutchins commented Oct 17, 2025

Note

Introduces a comprehensive IPC subnet manager with local/remote deployment, systemd-managed services, relayer support, config/genesis automation, and live monitoring/tuning tools, plus networking fixes.

  • ipc-subnet-manager (new):
    • Adds scripts/ipc-subnet-manager/ with main driver ipc-subnet-manager.sh, wrapper ipc-manager, and extensive docs.
    • Supports both local (Anvil) and remote (SSH) deployment modes with execution abstraction.
  • Systemd Integration:
    • Generates and installs node/relayer services (templates/ipc-*.service.template); switches logging to journal and proper targets; enables restart policies.
    • Start/stop/status flows auto-detect systemd and fall back to nohup.
  • Relayer Management:
    • Adds start/stop/status commands and systemd unit; passes required --fendermint-rpc-url and submitter from keystore.
  • Config & Genesis Automation:
    • Generates node-init.yml, fixes resolver listen_addr to 0.0.0.0, builds peer meshes (CometBFT/libp2p), and sets federated power.
    • Automates IPC CLI (~/.ipc/config.toml) generation and optional subnet deployment/genesis creation.
  • Monitoring & Ops:
    • New live dashboard, watch-blocks, watch-finality, health checks, consensus/voting status, and parent-finality monitor scripts.
    • Adds performance tuning script and summaries; mempool and consensus timeout tuning.
  • Local/Anvil Tooling:
    • Anvil lifecycle management, local validator homes/ports, SSH tunnel helpers, and connectivity tests.
  • Utilities & Fixes:
    • Binary update workflow; improved SSH/process handling; error-proof arithmetic; keystore/address helpers.

Written by Cursor Bugbot for commit 760cb2b. This will update automatically on new commits. Configure here.

This commit addresses a critical bug in `ipc-cli node init` that prevented libp2p from binding to network interfaces on cloud VMs (GCP, AWS, Azure). The fix ensures that `listen_addr` is set to `0.0.0.0` for proper binding, while `external_addresses` correctly advertises the public IP. This change restores functionality for parent finality voting and top-down message execution.

Changes include:
- Updated `ConnectionOverrideConfig` to include `external_addresses`.
- Modified port configuration logic to use `0.0.0.0` for `listen_addr`.
- Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect these changes.
- Added tests to verify the correct configuration behavior.

Existing deployments may need to reinitialize or manually update their configurations to apply this fix.
This commit introduces a new `listen-ip` field in the P2P configuration, allowing advanced users to specify a custom IP address for binding services, while maintaining the default of `0.0.0.0` for maximum compatibility. This enhancement addresses previous limitations in binding on cloud VMs and improves flexibility for complex network setups.

Changes include:
- Updated `P2pConfig` structure to include the `listen-ip` field.
- Adjusted port configuration logic to utilize the `listen-ip` for binding.
- Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect the new configuration options and usage examples.
- Added tests to ensure correct behavior of the new `listen-ip` functionality.

This update is fully backward compatible and does not require changes to existing configurations.
…ality issue

This commit updates the subnet configuration by changing the validator power from 1 to 3 and modifying the subnet ID to ensure compatibility with the latest deployment requirements. Additionally, a new markdown file is introduced to document the 16-hour lookback issue affecting parent finality on the Glif Calibration testnet, outlining the problem, root cause, and proposed solutions.

Changes include:
- Updated `ipc-subnet-config.yml` with new subnet ID and validator power.
- Added `PARENT-FINALITY-16H-LOOKBACK-ISSUE.md` to provide detailed insights into the parent finality issue and potential workarounds.

These updates aim to enhance the reliability and documentation of the IPC subnet management process.
…inality progress

This commit introduces a new `watch-finality` command to the IPC subnet manager, enabling users to monitor parent finality progress in real-time. The command supports continuous monitoring, target epoch tracking, and customizable refresh intervals.

Changes include:
- Added `cmd_watch_finality()` function in `ipc-subnet-manager.sh`.
- Updated usage documentation to include examples for the new command.
- Implemented `watch_parent_finality()` function in `lib/health.sh` for monitoring logic.
- Created `WATCH-FINALITY-FEATURE.md` to document usage, output, and potential use cases.

These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of parent finality and subnet health.
…onitoring

This commit adds a new `watch-blocks` command to the IPC subnet manager, enabling users to monitor block production in real-time. The command supports continuous monitoring, target height tracking, and customizable refresh intervals.

Changes include:
- Implemented `cmd_watch_blocks()` function in `ipc-subnet-manager.sh`.
- Added `watch_block_production()` function in `lib/health.sh` for monitoring logic.
- Updated usage documentation with examples for the new command.
- Created `WATCH-BLOCKS-FEATURE.md` to document usage, output, and potential use cases.
- Adjusted `ipc-subnet-config.yml` to optimize block production settings.

These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of block production and overall subnet health.
This commit introduces an extensive "Advanced Performance Tuning Guide" to optimize IPC subnet performance, detailing configuration changes and expected impacts on consensus timeouts, block production, and network performance. Additionally, a new script, `apply-advanced-tuning.sh`, is added to automate the application of these optimizations to existing nodes without reinitialization.

Changes include:
- Created `ADVANCED-TUNING-GUIDE.md` with detailed tuning parameters and expected performance improvements.
- Added `apply-advanced-tuning.sh` script for seamless configuration updates across validators.
- Updated `ipc-subnet-config.yml` with optimized settings for faster block production and parent finality.
- Introduced `OPTIMIZATION-SUMMARY.md` and `PERFORMANCE-OPTIMIZATION-RESULTS.md` to document performance improvements and configurations.
- Enhanced `TUNING-QUICK-REF.md` for quick access to tuning actions and parameters.

These enhancements significantly improve the performance and reliability of the IPC subnet, making it competitive with leading blockchain networks.
This commit introduces a comprehensive solution to address the broadcasting error encountered by validators due to incorrect address configuration. The changes include:

- Added `BOTTOMUP-CHECKPOINT-FIX.md` to document the problem, root cause, and the necessary fix for validator configurations.
- Created `fix-bottomup-checkpoint.sh` script to automate the process of disabling bottom-up checkpointing for federated subnets and updating validator configurations.
- Updated `lib/config.sh` to set the default validator key kind to "ethereum" for EVM-based subnets, preventing future issues.

These enhancements ensure that bottom-up checkpointing is operational and that validators are correctly configured for EVM compatibility, improving overall subnet reliability.
This commit adds a comprehensive live monitoring dashboard to the IPC subnet manager, enabling real-time tracking of various metrics and error categorization. Key changes include:

- Created `lib/dashboard.sh` for core dashboard functionality, including metrics collection and UI rendering.
- Added `cmd_dashboard()` function to `ipc-subnet-manager.sh` for command integration.
- Developed multiple documentation files detailing dashboard features, implementation, and quick reference guides.
- Enhanced error handling and formatting in the dashboard display for improved user experience.

These enhancements significantly improve the monitoring capabilities of the IPC subnet manager, providing users with a unified view of subnet health and activity.
This commit introduces a new `BottomUpSettings` struct to manage bottom-up checkpointing configurations, including an option to enable or disable the feature. Key changes include:

- Added `BottomUpSettings` struct with a default enabled state.
- Updated `IpcSettings` to include a configuration for bottom-up checkpointing.
- Enhanced `BottomUpManager` to accept a flag indicating whether bottom-up checkpointing is enabled.
- Implemented logic to conditionally execute bottom-up checkpointing based on the new settings.

These enhancements provide greater flexibility in managing checkpointing behavior within the IPC subnet, improving overall system reliability.
…t management

This commit introduces a comprehensive "Consensus Recovery Guide" and a "Diagnostic Tools Summary" to assist users in diagnosing and recovering from consensus issues within IPC subnets. Key changes include:

- Added `CONSENSUS-RECOVERY-GUIDE.md` detailing steps for diagnosing and resolving consensus problems, including commands for checking consensus and voting status.
- Introduced `DIAGNOSTIC-TOOLS-SUMMARY.md` outlining new commands like `consensus-status` and `voting-status`, enhancing the ability to monitor validator health and participation.
- Updated `ipc-subnet-manager.sh` to integrate new diagnostic commands.
- Enhanced `lib/health.sh` with functions to display consensus and voting statuses, improving operational visibility.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, enabling targeted recovery actions without data loss and fostering better understanding of consensus dynamics.
…sting

This commit introduces several new scripts to enhance the IPC subnet manager's functionality. Key changes include:

- Added `enable-gateway-ports.sh` to enable GatewayPorts on remote VMs for SSH reverse tunneling.
- Introduced `setup-anvil-tunnels.sh` to establish SSH tunnels from local Anvil to remote validator nodes, allowing access to Anvil running on localhost.
- Created `test-anvil-connection.sh` to verify Anvil connectivity from remote VMs through the established SSH tunnels.
- Updated `ipc-subnet-config.yml` with new configuration settings for improved local and remote RPC endpoints.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, facilitating better connectivity and management of validator nodes.
This commit introduces a new script, `debug-relayer-error.sh`, designed to assist in diagnosing issues related to checkpoint submission failures in the IPC subnet manager. Key features include:

- A series of connectivity checks to ensure the Anvil RPC is accessible.
- Validation of the existence of the Gateway and Subnet Actor contracts.
- Checks for the last bottom-up checkpoint height and subnet activity status.
- Recommendations for common issues encountered during relayer operations.

Additionally, new documentation files, including `FIXES-SUMMARY.md`, `IPC-CONFIG-ORDER-FIX.md`, and `RELAYER-UPDATE-SUMMARY.md`, have been added to summarize recent fixes and updates related to relayer connectivity and configuration management.

These enhancements significantly improve the operational capabilities of the IPC subnet manager, providing users with tools to effectively troubleshoot and resolve relayer-related issues.
This commit introduces a new documentation file, `INSTALL-SYSTEMD-FIX.md`, detailing fixes for common issues encountered during the installation of systemd services in the IPC subnet manager. Key changes include:

- Resolved installation issues where services were only installed on the first validator due to arithmetic expansion errors.
- Ensured the relayer service is installed correctly when requested.
- Added initialization for the `SCRIPT_DIR` variable in service generation functions to prevent template file access issues.
- Included steps to unmask services on affected validators before installation.

Additionally, improvements were made to the `ipc-subnet-manager.sh` and `lib/health.sh` scripts to enhance error handling and logging during the installation process.

These enhancements significantly improve the reliability and usability of the IPC subnet manager's systemd service installation process.
This commit updates the `ipc-subnet-config.yml` with new subnet IDs and contract addresses for improved configuration accuracy. Additionally, it introduces a `--debug` option in the `ipc-subnet-manager.sh` script to enable verbose logging during initialization and error handling, enhancing the debugging process. A new `RELAYER-AND-RESOLVER-FIX.md` documentation file is added, detailing fixes for relayer configuration issues and invalid resolver paths, ensuring better operational reliability.
… configuration improvements

This commit introduces a new command, `update-binaries`, to the `ipc-subnet-manager.sh` script, allowing users to pull the latest code, build, and install binaries on all validators. The command supports specifying a git branch for updates. Additionally, the `ipc-subnet-config.yml` file has been updated with new paths for the IPC repository, and several contract addresses have been modified for improved configuration accuracy. These enhancements streamline the process of maintaining validator binaries and ensure better operational reliability.
This commit adds functionality to convert the validator key to an Ethereum address using fendermint within the `show_subnet_info` function of `lib/health.sh`. It logs the converted address if successful, or warns if the conversion fails. This enhancement improves the visibility of validator information and aids in debugging by providing relevant Ethereum addresses alongside public keys.
This commit introduces a new script, `estimate-gas.sh`, designed to estimate gas usage for transactions between Ethereum addresses. The script utilizes JSON RPC to fetch gas estimates and provides a breakdown of costs at various gas prices. It also includes a recommendation for gas with a 20% buffer, enhancing the operational capabilities of the IPC subnet manager by aiding users in transaction cost planning.
This commit adds a newline at the end of the `estimate-gas.sh` script to ensure consistency with coding standards and improve readability. This minor adjustment helps maintain a clean file structure in the project.
This commit introduces a complete ELK (Elasticsearch, Logstash, Kibana) stack for aggregating logs from IPC validator nodes. Key components include:

- Docker Compose configuration for orchestrating the ELK stack.
- Elasticsearch for log storage and search capabilities.
- Logstash for processing and parsing logs from validators.
- Kibana for visualizing logs and creating dashboards.
- Grafana for alternative visualization options.

Additionally, comprehensive documentation is provided, including setup guides, troubleshooting tips, and monitoring instructions, ensuring a robust logging infrastructure for IPC validators.
@phutchins phutchins marked this pull request as ready for review November 13, 2025 13:16
@phutchins phutchins requested a review from a team as a code owner November 13, 2025 13:16

set -e

cd /Users/philip/github/ipc/scripts/ipc-subnet-manager
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Hardcoded Path Breaks Portability

The script contains a hardcoded personal file path cd /Users/philip/github/ipc/scripts/ipc-subnet-manager which is specific to a developer's local machine and will fail for other users.

Fix in Cursor Fix in Web

# If we couldn't get it from logs, assume it's stuck at the known value
if [ -z "$SUBNET_FINALITY" ] || [ "$SUBNET_FINALITY" = "0" ]; then
SUBNET_FINALITY="3135524" # Known stuck value
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Hardcoded Value Skews Cross-Environment Monitoring

When SUBNET_FINALITY cannot be retrieved from logs, the script falls back to a hardcoded value 3135524 labeled as "known stuck value". This fallback produces incorrect lag calculations for any deployment other than the specific one this was developed on, making the monitoring script unreliable across different environments.

Fix in Cursor Fix in Web

hosts => ["http://elasticsearch:9200"]
user => "elastic"
password => "${ELASTIC_PASSWORD}"
index => "ipc-logs-%{[agent][hostname]}-%{+YYYY.MM.dd}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Pipeline Logic Corrupts Index Names

The Logstash pipeline removes the agent field in the cleanup section (line 133) but then references [agent][hostname] in the Elasticsearch index name (line 143). This will cause the index name to be malformed since the field no longer exists after removal, likely resulting in indexing failures or indices with literal %{[agent][hostname]} in their names.

Fix in Cursor Fix in Web

ssh_user: "philip"
ipc_user: "ipc"
role: "secondary"
private_key: "0xc1099a062e296366a2ac3b26ac80a409833e6a74edbf677a0bd14580d2c68ea2"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Private Keys: Repository Exposure Risk

The configuration file contains three plaintext private keys committed to the repository. These appear to be actual validator private keys rather than example placeholders, given the presence of real IP addresses, subnet IDs, and personal usernames throughout the file. Committing private keys exposes validator control and any associated funds to compromise.

Fix in Cursor Fix in Web


# Network Health
local peers=${METRICS[peers]:-0}
local expected_peers=2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Dashboard misreports validator health.

The dashboard hardcodes expected_peers=2 which assumes a 3-validator setup. This will show incorrect peer status for deployments with different numbers of validators, making the health indicator misleading. The value should be calculated as validator_count - 1 from the configuration.

Fix in Cursor Fix in Web

# Mempool Status
local mempool_size=${METRICS[mempool_size]:-0}
local mempool_bytes=${METRICS[mempool_bytes]:-0}
local mempool_max=10000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Dashboard Misinterprets Mempool Health

The dashboard hardcodes mempool_max=10000 which assumes a specific mempool configuration. If the actual CometBFT mempool size differs (default is 5000), the percentage calculation and status indicators will be incorrect, potentially showing healthy status when the mempool is actually full or vice versa.

Fix in Cursor Fix in Web

# Get finality from recent logs (grep for last known finality)
SUBNET_FINALITY=$(ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -o BatchMode=yes \
philip@${VALIDATOR_IP} \
"sudo journalctl -u ipc-node --since '10 minutes ago' --no-pager 2>/dev/null | grep -oP 'parent at height \K[0-9]+' | tail -1" 2>/dev/null || echo "0")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Grep Dependency Breaks Cross-Platform Compatibility

The script uses grep -oP (Perl-compatible regex) which is not available on all systems, particularly macOS where BSD grep is the default. This will cause the script to fail on macOS or systems without GNU grep, making it non-portable despite being a monitoring utility that should work across platforms.

Fix in Cursor Fix in Web

This commit introduces a new local deployment mode for the IPC subnet manager, allowing multiple validators to run on a single machine. Key features include:

- A new configuration file, `ipc-subnet-config-local.yml`, for local mode settings.
- Automatic management of Anvil, including starting and stopping it as needed.
- Systematic port allocation for validators to avoid conflicts.
- CLI enhancements to support local mode operations, including a `--mode` flag.
- Comprehensive documentation detailing the local mode implementation and usage instructions.

These changes enhance the flexibility and usability of the IPC subnet manager for local development and testing environments.
log_info "Power per validator: $validator_power"

# Run set-federated-power from primary node
local cmd="$ipc_binary subnet set-federated-power --subnet $subnet_id --validator-pubkeys $pubkeys --validator-power $validator_power --from t1d4gxuxytb6vg7cxzvxqk3cvbx4hv7vrtd6oa2mi"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Static Address: A Universal Deployment Blocker

The set_federated_power function uses a hardcoded address t1d4gxuxytb6vg7cxzvxqk3cvbx4hv7vrtd6oa2mi for the --from parameter instead of using a configurable value or deriving it from the validator's actual address. This hardcoded address may not have the necessary permissions or balance to execute the transaction, causing the federated power setup to fail on different deployments.

Fix in Cursor Fix in Web

This commit introduces a new feature in the IPC subnet manager that automates the deployment of subnets before initializing validator nodes. Key changes include:

- A new `deploy_subnet()` function in `lib/health.sh` that handles the creation of subnets and deployment of gateway contracts.
- Updates to the `ipc-subnet-manager.sh` script to incorporate subnet deployment as a prerequisite for node initialization.
- Modifications to the `ipc-subnet-config-local.yml` to include a `deploy_subnet` flag for enabling automatic deployment.
- Enhanced error handling and logging to ensure successful subnet creation and configuration updates.

These improvements streamline the setup process for local development environments, reducing the likelihood of initialization errors related to missing subnets.
This commit updates the `ipc-subnet-config-local.yml` to change the subnet ID and adjust the Ethereum API port to avoid conflicts with Anvil. It also modifies the `ipc-subnet-manager.sh` script to streamline the genesis creation process, ensuring it works for both activated and non-activated subnets. Additionally, the `create_bootstrap_genesis` function in `lib/health.sh` is enhanced to utilize the `ipc-cli subnet create-genesis` command, improving error handling and logging for better visibility during subnet initialization. These changes enhance the reliability and usability of the IPC subnet manager for local development environments.
/// Check if bottom-up checkpointing is enabled.
pub fn bottomup_enabled(&self) -> bool {
self.bottomup.as_ref().map_or(false, |config| config.enabled)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Default configuration incorrectly disables feature.

The bottomup_enabled() method returns false when bottomup is None, but the Default implementation for IpcSettings sets bottomup: None. This means bottom-up checkpointing will be disabled by default for existing configurations that don't explicitly set the bottomup field, even though the intended default behavior (based on default_bottomup_enabled() returning true) is to have it enabled. The logic should be self.bottomup.as_ref().map_or(true, |config| config.enabled) to match the intended default-enabled behavior, or the Default implementation should set bottomup: Some(BottomUpSettings::default()).

Fix in Cursor Fix in Web

RPC_URL=http://node-1.test.ipc.space:8545
FAUCET_AMOUNT=10
RATE_LIMIT_WINDOW=86400000
RATE_LIMIT_MAX=3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Sensitive Credentials Exposed in Repository

A .env file containing actual private keys has been committed to the repository. The file includes PRIVATE_KEY=0x5eda872ee2da7bc9d7e0af4507f7d5060aed54d43fd1a72e1283622400c7cb85 and a commented alternative key. Environment files with credentials should never be committed to version control - they should be in .gitignore and users should create them from a template (like .env.example). This exposes private keys that could control real accounts with funds.

Fix in Cursor Fix in Web

This commit refactors the `fetch_metrics` function in `dashboard.sh` to improve the process of gathering metrics from validator nodes. Key changes include:

- Replaced SSH commands with a new `exec_on_host` function for executing remote commands, enhancing consistency and reducing timeout complexity.
- Updated the method for fetching block height, network info, mempool status, and error logs to utilize local node paths for better compatibility with local deployments.
- Improved the extraction of parent height from logs to ensure accurate reporting.
- Added a note in the dashboard output to indicate when F3 is disabled for local development.

These enhancements improve the reliability and clarity of metrics reporting in the IPC subnet manager.
This commit refactors the `get_chain_id` function in `lib/health.sh` to replace SSH commands with the `exec_on_host` function for executing remote commands. This change enhances consistency and simplifies the process of querying the Ethereum chain ID via JSON-RPC, improving the overall reliability of the health check functionality in the IPC subnet manager.

set -e

VALIDATOR_IP="34.73.187.192"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Parameterize Environment-Specific Values in Scripts

The script contains hardcoded personal configuration values: VALIDATOR_IP="34.73.187.192" and SSH_USER="philip". These are environment-specific values that should be parameterized or read from configuration files rather than hardcoded in scripts committed to the repository. The script also contains hardcoded personal paths like /Users/philip/github/ipc/scripts/ipc-subnet-manager on line 90.

Fix in Cursor Fix in Web

@karlem
Copy link
Contributor

karlem commented Nov 19, 2025

The changes on current files looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants