-
Notifications
You must be signed in to change notification settings - Fork 47
Subnet management script and node init config fixes #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit addresses a critical bug in `ipc-cli node init` that prevented libp2p from binding to network interfaces on cloud VMs (GCP, AWS, Azure). The fix ensures that `listen_addr` is set to `0.0.0.0` for proper binding, while `external_addresses` correctly advertises the public IP. This change restores functionality for parent finality voting and top-down message execution. Changes include: - Updated `ConnectionOverrideConfig` to include `external_addresses`. - Modified port configuration logic to use `0.0.0.0` for `listen_addr`. - Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect these changes. - Added tests to verify the correct configuration behavior. Existing deployments may need to reinitialize or manually update their configurations to apply this fix.
This commit introduces a new `listen-ip` field in the P2P configuration, allowing advanced users to specify a custom IP address for binding services, while maintaining the default of `0.0.0.0` for maximum compatibility. This enhancement addresses previous limitations in binding on cloud VMs and improves flexibility for complex network setups. Changes include: - Updated `P2pConfig` structure to include the `listen-ip` field. - Adjusted port configuration logic to utilize the `listen-ip` for binding. - Enhanced documentation in `CHANGELOG.md` and `node-init.md` to reflect the new configuration options and usage examples. - Added tests to ensure correct behavior of the new `listen-ip` functionality. This update is fully backward compatible and does not require changes to existing configurations.
…ality issue This commit updates the subnet configuration by changing the validator power from 1 to 3 and modifying the subnet ID to ensure compatibility with the latest deployment requirements. Additionally, a new markdown file is introduced to document the 16-hour lookback issue affecting parent finality on the Glif Calibration testnet, outlining the problem, root cause, and proposed solutions. Changes include: - Updated `ipc-subnet-config.yml` with new subnet ID and validator power. - Added `PARENT-FINALITY-16H-LOOKBACK-ISSUE.md` to provide detailed insights into the parent finality issue and potential workarounds. These updates aim to enhance the reliability and documentation of the IPC subnet management process.
…inality progress This commit introduces a new `watch-finality` command to the IPC subnet manager, enabling users to monitor parent finality progress in real-time. The command supports continuous monitoring, target epoch tracking, and customizable refresh intervals. Changes include: - Added `cmd_watch_finality()` function in `ipc-subnet-manager.sh`. - Updated usage documentation to include examples for the new command. - Implemented `watch_parent_finality()` function in `lib/health.sh` for monitoring logic. - Created `WATCH-FINALITY-FEATURE.md` to document usage, output, and potential use cases. These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of parent finality and subnet health.
…onitoring This commit adds a new `watch-blocks` command to the IPC subnet manager, enabling users to monitor block production in real-time. The command supports continuous monitoring, target height tracking, and customizable refresh intervals. Changes include: - Implemented `cmd_watch_blocks()` function in `ipc-subnet-manager.sh`. - Added `watch_block_production()` function in `lib/health.sh` for monitoring logic. - Updated usage documentation with examples for the new command. - Created `WATCH-BLOCKS-FEATURE.md` to document usage, output, and potential use cases. - Adjusted `ipc-subnet-config.yml` to optimize block production settings. These enhancements improve the monitoring capabilities of the IPC subnet manager, facilitating better tracking of block production and overall subnet health.
This commit introduces an extensive "Advanced Performance Tuning Guide" to optimize IPC subnet performance, detailing configuration changes and expected impacts on consensus timeouts, block production, and network performance. Additionally, a new script, `apply-advanced-tuning.sh`, is added to automate the application of these optimizations to existing nodes without reinitialization. Changes include: - Created `ADVANCED-TUNING-GUIDE.md` with detailed tuning parameters and expected performance improvements. - Added `apply-advanced-tuning.sh` script for seamless configuration updates across validators. - Updated `ipc-subnet-config.yml` with optimized settings for faster block production and parent finality. - Introduced `OPTIMIZATION-SUMMARY.md` and `PERFORMANCE-OPTIMIZATION-RESULTS.md` to document performance improvements and configurations. - Enhanced `TUNING-QUICK-REF.md` for quick access to tuning actions and parameters. These enhancements significantly improve the performance and reliability of the IPC subnet, making it competitive with leading blockchain networks.
This commit introduces a comprehensive solution to address the broadcasting error encountered by validators due to incorrect address configuration. The changes include: - Added `BOTTOMUP-CHECKPOINT-FIX.md` to document the problem, root cause, and the necessary fix for validator configurations. - Created `fix-bottomup-checkpoint.sh` script to automate the process of disabling bottom-up checkpointing for federated subnets and updating validator configurations. - Updated `lib/config.sh` to set the default validator key kind to "ethereum" for EVM-based subnets, preventing future issues. These enhancements ensure that bottom-up checkpointing is operational and that validators are correctly configured for EVM compatibility, improving overall subnet reliability.
This commit adds a comprehensive live monitoring dashboard to the IPC subnet manager, enabling real-time tracking of various metrics and error categorization. Key changes include: - Created `lib/dashboard.sh` for core dashboard functionality, including metrics collection and UI rendering. - Added `cmd_dashboard()` function to `ipc-subnet-manager.sh` for command integration. - Developed multiple documentation files detailing dashboard features, implementation, and quick reference guides. - Enhanced error handling and formatting in the dashboard display for improved user experience. These enhancements significantly improve the monitoring capabilities of the IPC subnet manager, providing users with a unified view of subnet health and activity.
This commit introduces a new `BottomUpSettings` struct to manage bottom-up checkpointing configurations, including an option to enable or disable the feature. Key changes include: - Added `BottomUpSettings` struct with a default enabled state. - Updated `IpcSettings` to include a configuration for bottom-up checkpointing. - Enhanced `BottomUpManager` to accept a flag indicating whether bottom-up checkpointing is enabled. - Implemented logic to conditionally execute bottom-up checkpointing based on the new settings. These enhancements provide greater flexibility in managing checkpointing behavior within the IPC subnet, improving overall system reliability.
…t management This commit introduces a comprehensive "Consensus Recovery Guide" and a "Diagnostic Tools Summary" to assist users in diagnosing and recovering from consensus issues within IPC subnets. Key changes include: - Added `CONSENSUS-RECOVERY-GUIDE.md` detailing steps for diagnosing and resolving consensus problems, including commands for checking consensus and voting status. - Introduced `DIAGNOSTIC-TOOLS-SUMMARY.md` outlining new commands like `consensus-status` and `voting-status`, enhancing the ability to monitor validator health and participation. - Updated `ipc-subnet-manager.sh` to integrate new diagnostic commands. - Enhanced `lib/health.sh` with functions to display consensus and voting statuses, improving operational visibility. These enhancements significantly improve the operational capabilities of the IPC subnet manager, enabling targeted recovery actions without data loss and fostering better understanding of consensus dynamics.
…sting This commit introduces several new scripts to enhance the IPC subnet manager's functionality. Key changes include: - Added `enable-gateway-ports.sh` to enable GatewayPorts on remote VMs for SSH reverse tunneling. - Introduced `setup-anvil-tunnels.sh` to establish SSH tunnels from local Anvil to remote validator nodes, allowing access to Anvil running on localhost. - Created `test-anvil-connection.sh` to verify Anvil connectivity from remote VMs through the established SSH tunnels. - Updated `ipc-subnet-config.yml` with new configuration settings for improved local and remote RPC endpoints. These enhancements significantly improve the operational capabilities of the IPC subnet manager, facilitating better connectivity and management of validator nodes.
This commit introduces a new script, `debug-relayer-error.sh`, designed to assist in diagnosing issues related to checkpoint submission failures in the IPC subnet manager. Key features include: - A series of connectivity checks to ensure the Anvil RPC is accessible. - Validation of the existence of the Gateway and Subnet Actor contracts. - Checks for the last bottom-up checkpoint height and subnet activity status. - Recommendations for common issues encountered during relayer operations. Additionally, new documentation files, including `FIXES-SUMMARY.md`, `IPC-CONFIG-ORDER-FIX.md`, and `RELAYER-UPDATE-SUMMARY.md`, have been added to summarize recent fixes and updates related to relayer connectivity and configuration management. These enhancements significantly improve the operational capabilities of the IPC subnet manager, providing users with tools to effectively troubleshoot and resolve relayer-related issues.
This commit introduces a new documentation file, `INSTALL-SYSTEMD-FIX.md`, detailing fixes for common issues encountered during the installation of systemd services in the IPC subnet manager. Key changes include: - Resolved installation issues where services were only installed on the first validator due to arithmetic expansion errors. - Ensured the relayer service is installed correctly when requested. - Added initialization for the `SCRIPT_DIR` variable in service generation functions to prevent template file access issues. - Included steps to unmask services on affected validators before installation. Additionally, improvements were made to the `ipc-subnet-manager.sh` and `lib/health.sh` scripts to enhance error handling and logging during the installation process. These enhancements significantly improve the reliability and usability of the IPC subnet manager's systemd service installation process.
This commit updates the `ipc-subnet-config.yml` with new subnet IDs and contract addresses for improved configuration accuracy. Additionally, it introduces a `--debug` option in the `ipc-subnet-manager.sh` script to enable verbose logging during initialization and error handling, enhancing the debugging process. A new `RELAYER-AND-RESOLVER-FIX.md` documentation file is added, detailing fixes for relayer configuration issues and invalid resolver paths, ensuring better operational reliability.
… configuration improvements This commit introduces a new command, `update-binaries`, to the `ipc-subnet-manager.sh` script, allowing users to pull the latest code, build, and install binaries on all validators. The command supports specifying a git branch for updates. Additionally, the `ipc-subnet-config.yml` file has been updated with new paths for the IPC repository, and several contract addresses have been modified for improved configuration accuracy. These enhancements streamline the process of maintaining validator binaries and ensure better operational reliability.
This commit adds functionality to convert the validator key to an Ethereum address using fendermint within the `show_subnet_info` function of `lib/health.sh`. It logs the converted address if successful, or warns if the conversion fails. This enhancement improves the visibility of validator information and aids in debugging by providing relevant Ethereum addresses alongside public keys.
This commit introduces a new script, `estimate-gas.sh`, designed to estimate gas usage for transactions between Ethereum addresses. The script utilizes JSON RPC to fetch gas estimates and provides a breakdown of costs at various gas prices. It also includes a recommendation for gas with a 20% buffer, enhancing the operational capabilities of the IPC subnet manager by aiding users in transaction cost planning.
This commit adds a newline at the end of the `estimate-gas.sh` script to ensure consistency with coding standards and improve readability. This minor adjustment helps maintain a clean file structure in the project.
This commit introduces a complete ELK (Elasticsearch, Logstash, Kibana) stack for aggregating logs from IPC validator nodes. Key components include: - Docker Compose configuration for orchestrating the ELK stack. - Elasticsearch for log storage and search capabilities. - Logstash for processing and parsing logs from validators. - Kibana for visualizing logs and creating dashboards. - Grafana for alternative visualization options. Additionally, comprehensive documentation is provided, including setup guides, troubleshooting tips, and monitoring instructions, ensuring a robust logging infrastructure for IPC validators.
|
|
||
| set -e | ||
|
|
||
| cd /Users/philip/github/ipc/scripts/ipc-subnet-manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # If we couldn't get it from logs, assume it's stuck at the known value | ||
| if [ -z "$SUBNET_FINALITY" ] || [ "$SUBNET_FINALITY" = "0" ]; then | ||
| SUBNET_FINALITY="3135524" # Known stuck value | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Hardcoded Value Skews Cross-Environment Monitoring
When SUBNET_FINALITY cannot be retrieved from logs, the script falls back to a hardcoded value 3135524 labeled as "known stuck value". This fallback produces incorrect lag calculations for any deployment other than the specific one this was developed on, making the monitoring script unreliable across different environments.
| hosts => ["http://elasticsearch:9200"] | ||
| user => "elastic" | ||
| password => "${ELASTIC_PASSWORD}" | ||
| index => "ipc-logs-%{[agent][hostname]}-%{+YYYY.MM.dd}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Pipeline Logic Corrupts Index Names
The Logstash pipeline removes the agent field in the cleanup section (line 133) but then references [agent][hostname] in the Elasticsearch index name (line 143). This will cause the index name to be malformed since the field no longer exists after removal, likely resulting in indexing failures or indices with literal %{[agent][hostname]} in their names.
| ssh_user: "philip" | ||
| ipc_user: "ipc" | ||
| role: "secondary" | ||
| private_key: "0xc1099a062e296366a2ac3b26ac80a409833e6a74edbf677a0bd14580d2c68ea2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Private Keys: Repository Exposure Risk
The configuration file contains three plaintext private keys committed to the repository. These appear to be actual validator private keys rather than example placeholders, given the presence of real IP addresses, subnet IDs, and personal usernames throughout the file. Committing private keys exposes validator control and any associated funds to compromise.
|
|
||
| # Network Health | ||
| local peers=${METRICS[peers]:-0} | ||
| local expected_peers=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Dashboard misreports validator health.
The dashboard hardcodes expected_peers=2 which assumes a 3-validator setup. This will show incorrect peer status for deployments with different numbers of validators, making the health indicator misleading. The value should be calculated as validator_count - 1 from the configuration.
| # Mempool Status | ||
| local mempool_size=${METRICS[mempool_size]:-0} | ||
| local mempool_bytes=${METRICS[mempool_bytes]:-0} | ||
| local mempool_max=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Dashboard Misinterprets Mempool Health
The dashboard hardcodes mempool_max=10000 which assumes a specific mempool configuration. If the actual CometBFT mempool size differs (default is 5000), the percentage calculation and status indicators will be incorrect, potentially showing healthy status when the mempool is actually full or vice versa.
| # Get finality from recent logs (grep for last known finality) | ||
| SUBNET_FINALITY=$(ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -o BatchMode=yes \ | ||
| philip@${VALIDATOR_IP} \ | ||
| "sudo journalctl -u ipc-node --since '10 minutes ago' --no-pager 2>/dev/null | grep -oP 'parent at height \K[0-9]+' | tail -1" 2>/dev/null || echo "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Grep Dependency Breaks Cross-Platform Compatibility
The script uses grep -oP (Perl-compatible regex) which is not available on all systems, particularly macOS where BSD grep is the default. This will cause the script to fail on macOS or systems without GNU grep, making it non-portable despite being a monitoring utility that should work across platforms.
This commit introduces a new local deployment mode for the IPC subnet manager, allowing multiple validators to run on a single machine. Key features include: - A new configuration file, `ipc-subnet-config-local.yml`, for local mode settings. - Automatic management of Anvil, including starting and stopping it as needed. - Systematic port allocation for validators to avoid conflicts. - CLI enhancements to support local mode operations, including a `--mode` flag. - Comprehensive documentation detailing the local mode implementation and usage instructions. These changes enhance the flexibility and usability of the IPC subnet manager for local development and testing environments.
| log_info "Power per validator: $validator_power" | ||
|
|
||
| # Run set-federated-power from primary node | ||
| local cmd="$ipc_binary subnet set-federated-power --subnet $subnet_id --validator-pubkeys $pubkeys --validator-power $validator_power --from t1d4gxuxytb6vg7cxzvxqk3cvbx4hv7vrtd6oa2mi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Static Address: A Universal Deployment Blocker
The set_federated_power function uses a hardcoded address t1d4gxuxytb6vg7cxzvxqk3cvbx4hv7vrtd6oa2mi for the --from parameter instead of using a configurable value or deriving it from the validator's actual address. This hardcoded address may not have the necessary permissions or balance to execute the transaction, causing the federated power setup to fail on different deployments.
This commit introduces a new feature in the IPC subnet manager that automates the deployment of subnets before initializing validator nodes. Key changes include: - A new `deploy_subnet()` function in `lib/health.sh` that handles the creation of subnets and deployment of gateway contracts. - Updates to the `ipc-subnet-manager.sh` script to incorporate subnet deployment as a prerequisite for node initialization. - Modifications to the `ipc-subnet-config-local.yml` to include a `deploy_subnet` flag for enabling automatic deployment. - Enhanced error handling and logging to ensure successful subnet creation and configuration updates. These improvements streamline the setup process for local development environments, reducing the likelihood of initialization errors related to missing subnets.
This commit updates the `ipc-subnet-config-local.yml` to change the subnet ID and adjust the Ethereum API port to avoid conflicts with Anvil. It also modifies the `ipc-subnet-manager.sh` script to streamline the genesis creation process, ensuring it works for both activated and non-activated subnets. Additionally, the `create_bootstrap_genesis` function in `lib/health.sh` is enhanced to utilize the `ipc-cli subnet create-genesis` command, improving error handling and logging for better visibility during subnet initialization. These changes enhance the reliability and usability of the IPC subnet manager for local development environments.
| /// Check if bottom-up checkpointing is enabled. | ||
| pub fn bottomup_enabled(&self) -> bool { | ||
| self.bottomup.as_ref().map_or(false, |config| config.enabled) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Default configuration incorrectly disables feature.
The bottomup_enabled() method returns false when bottomup is None, but the Default implementation for IpcSettings sets bottomup: None. This means bottom-up checkpointing will be disabled by default for existing configurations that don't explicitly set the bottomup field, even though the intended default behavior (based on default_bottomup_enabled() returning true) is to have it enabled. The logic should be self.bottomup.as_ref().map_or(true, |config| config.enabled) to match the intended default-enabled behavior, or the Default implementation should set bottomup: Some(BottomUpSettings::default()).
| RPC_URL=http://node-1.test.ipc.space:8545 | ||
| FAUCET_AMOUNT=10 | ||
| RATE_LIMIT_WINDOW=86400000 | ||
| RATE_LIMIT_MAX=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Sensitive Credentials Exposed in Repository
A .env file containing actual private keys has been committed to the repository. The file includes PRIVATE_KEY=0x5eda872ee2da7bc9d7e0af4507f7d5060aed54d43fd1a72e1283622400c7cb85 and a commented alternative key. Environment files with credentials should never be committed to version control - they should be in .gitignore and users should create them from a template (like .env.example). This exposes private keys that could control real accounts with funds.
This commit refactors the `fetch_metrics` function in `dashboard.sh` to improve the process of gathering metrics from validator nodes. Key changes include: - Replaced SSH commands with a new `exec_on_host` function for executing remote commands, enhancing consistency and reducing timeout complexity. - Updated the method for fetching block height, network info, mempool status, and error logs to utilize local node paths for better compatibility with local deployments. - Improved the extraction of parent height from logs to ensure accurate reporting. - Added a note in the dashboard output to indicate when F3 is disabled for local development. These enhancements improve the reliability and clarity of metrics reporting in the IPC subnet manager.
This commit refactors the `get_chain_id` function in `lib/health.sh` to replace SSH commands with the `exec_on_host` function for executing remote commands. This change enhances consistency and simplifies the process of querying the Ethereum chain ID via JSON-RPC, improving the overall reliability of the health check functionality in the IPC subnet manager.
|
|
||
| set -e | ||
|
|
||
| VALIDATOR_IP="34.73.187.192" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Parameterize Environment-Specific Values in Scripts
The script contains hardcoded personal configuration values: VALIDATOR_IP="34.73.187.192" and SSH_USER="philip". These are environment-specific values that should be parameterized or read from configuration files rather than hardcoded in scripts committed to the repository. The script also contains hardcoded personal paths like /Users/philip/github/ipc/scripts/ipc-subnet-manager on line 90.
|
The changes on current files looks good! |
Note
Introduces a comprehensive IPC subnet manager with local/remote deployment, systemd-managed services, relayer support, config/genesis automation, and live monitoring/tuning tools, plus networking fixes.
scripts/ipc-subnet-manager/with main driveripc-subnet-manager.sh, wrapperipc-manager, and extensive docs.templates/ipc-*.service.template); switches logging to journal and proper targets; enables restart policies.--fendermint-rpc-urland submitter from keystore.node-init.yml, fixes resolverlisten_addrto0.0.0.0, builds peer meshes (CometBFT/libp2p), and sets federated power.~/.ipc/config.toml) generation and optional subnet deployment/genesis creation.watch-blocks,watch-finality, health checks, consensus/voting status, and parent-finality monitor scripts.Written by Cursor Bugbot for commit 760cb2b. This will update automatically on new commits. Configure here.