diff --git a/VERSIONING.md b/VERSIONING.md deleted file mode 100644 index 93db42fcf..000000000 --- a/VERSIONING.md +++ /dev/null @@ -1,560 +0,0 @@ -# Amp Dataset Versioning System - -## Overview - -The Amp platform uses a **content-addressable storage model** with **semantic versioning** for dataset manifests. This document describes the versioning architecture, storage model, and operational semantics. - -## Core Concepts - -### Content-Addressable Storage - -Manifests are stored by their **SHA-256 content hash**, ensuring: -- **Deduplication**: Identical manifests are stored only once -- **Immutability**: Hash changes if content changes -- **Integrity**: Content can be verified against its hash -- **Cacheability**: Hash-based lookups are deterministic - -### Three-Tier Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Object Store (S3/GCS/etc) │ -│ Manifests stored by hash: manifests/.json │ -└─────────────────────────────────────────────────────────────────┘ - ▲ - │ -┌─────────────────────────────────────────────────────────────────┐ -│ PostgreSQL Metadata DB │ -│ │ -│ ┌────────────────┐ ┌──────────────────┐ ┌────────────────┐ │ -│ │ manifest_files │ │ dataset_manifests│ │ tags │ │ -│ │ │ │ │ │ │ │ -│ │ hash -> path │ │ (ns, name, hash) │ │ (ns,name,ver) │ │ -│ └────────────────┘ └──────────────────┘ └────────────────┘ │ -│ │ │ │ │ -│ └─────────────────────┴─────────────────────┘ │ -│ Content-addressable Many-to-many Version tags │ -│ storage linking (semver) │ -└─────────────────────────────────────────────────────────────────┘ -``` - -**Layer 1: Object Store** -- Physical storage of manifest JSON files -- Path format: `manifests/<64-char-hex-hash>.json` -- Supports: S3, GCS, Azure Blob, local filesystem - -**Layer 2: Manifest Registry** (`manifest_files` table) -- Maps manifest hash → object store path -- Enables existence checks without object store queries -- Tracks creation timestamps - -**Layer 3a: Dataset Linking** (`dataset_manifests` table) -- Many-to-many relationship: datasets ↔ manifests -- Allows multiple datasets to share the same manifest -- Enables manifest lifecycle management (prevent deletion of in-use manifests) - -**Layer 3b: Version Tags** (`tags` table) -- Human-readable names pointing to manifest hashes -- Supports semantic versions and special tags -- Enables versioned references and rollbacks - -## Database Schema - -### manifest_files - -Stores manifest metadata and object store location. - -```sql -CREATE TABLE manifest_files ( - hash TEXT PRIMARY KEY, -- SHA-256 hash (64 hex chars, lowercase) - path TEXT NOT NULL, -- Object store path - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() -); -``` - -**Columns**: -- `hash`: SHA-256 digest of canonical manifest JSON (64 lowercase hex chars, no `0x` prefix) -- `path`: Object store path (e.g., `manifests/abc123...def.json`) -- `created_at`: Registration timestamp (UTC) - -**Properties**: -- Primary key on `hash` prevents duplicate registration -- No foreign keys (this is the root table) - -### dataset_manifests - -Many-to-many junction table linking datasets to manifests. - -```sql -CREATE TABLE dataset_manifests ( - namespace TEXT NOT NULL, - name TEXT NOT NULL, - hash TEXT NOT NULL REFERENCES manifest_files(hash) ON DELETE CASCADE, - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), - PRIMARY KEY (namespace, name, hash) -); - -CREATE INDEX idx_dataset_manifests_hash ON dataset_manifests(hash); -CREATE INDEX idx_dataset_manifests_dataset ON dataset_manifests(namespace, name); -``` - -**Columns**: -- `namespace`: Dataset namespace (e.g., `edgeandnode`, `_`) -- `name`: Dataset name (e.g., `eth_mainnet`) -- `hash`: Reference to manifest in `manifest_files` -- `created_at`: Link creation timestamp - -**Relationships**: -- Foreign key to `manifest_files(hash)` with `CASCADE DELETE` -- Composite primary key prevents duplicate links - -**Properties**: -- One manifest can be linked to multiple datasets (deduplication) -- One dataset can have multiple manifests (version history) -- Deleting a manifest cascades to remove all links - -### tags - -Version identifiers and symbolic names pointing to manifest hashes. - -```sql -CREATE TABLE tags ( - namespace TEXT NOT NULL, - name TEXT NOT NULL, - version TEXT NOT NULL, -- Semantic version or special tag - hash TEXT NOT NULL, -- Points to specific manifest - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), - updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), - PRIMARY KEY (namespace, name, version), - FOREIGN KEY (namespace, name, hash) - REFERENCES dataset_manifests(namespace, name, hash) - ON DELETE CASCADE -); - -CREATE INDEX idx_tags_dataset ON tags(namespace, name); -``` - -**Columns**: -- `namespace`: Dataset namespace -- `name`: Dataset name -- `version`: Version identifier (semantic version or special tag) -- `hash`: Manifest hash this version points to -- `created_at`: Tag creation timestamp -- `updated_at`: Last update timestamp (for tag moves) - -**Relationships**: -- Foreign key to `dataset_manifests(namespace, name, hash)` with `CASCADE DELETE` -- Composite primary key on `(namespace, name, version)` prevents duplicate tags - -**Tag Types**: -1. **Semantic versions**: `1.0.0`, `2.1.3`, `0.0.1-alpha` -2. **Special tags**: `latest`, `dev` - -## Revision Types - -The system supports four types of revision identifiers: - -### 1. Semantic Version - -**Format**: `MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]` - -**Examples**: -- `1.0.0` - Production release -- `2.1.3` - Minor update -- `1.0.0-alpha` - Pre-release -- `1.0.0+20241120` - Build metadata - -**Properties**: -- Follows [Semantic Versioning 2.0.0](https://semver.org/) -- User-created during registration -- Immutable once created (updating requires new registration) -- Used for production deployments - -**Resolution**: Direct lookup in `tags` table - -### 2. Manifest Hash - -**Format**: 64-character lowercase hexadecimal string (SHA-256) - -**Example**: `b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9` - -**Properties**: -- Computed from canonical JSON serialization -- Immutable and globally unique -- Content-addressable -- Used for exact manifest references - -**Resolution**: Direct lookup in `manifest_files` table - -### 3. Special Tag: "latest" - -**Meaning**: The highest semantic version (by semver ordering) - -**Properties**: -- Automatically updated when a higher version is registered -- Transactional: uses `SELECT FOR UPDATE` to prevent race conditions -- Only considers semantic versions (excludes pre-releases by default) -- Used for production "stable" deployments - -**Resolution**: Query `tags` table for highest semver, return its hash - -### 4. Special Tag: "dev" - -**Meaning**: The most recently registered manifest - -**Properties**: -- Automatically updated on every `POST /datasets` call -- Used for development and testing -- May point to any manifest (no version requirements) -- Mutable and frequently changing - -**Resolution**: Direct lookup in `tags` table - -## Registration Flow - -### Two-Step Atomic Operation - -The `POST /datasets` endpoint performs two atomic operations: - -#### Step 1: Register Manifest and Link to Dataset - -``` -POST /datasets -Body: { namespace, name, version, manifest } - -↓ Parse and validate manifest structure -↓ Canonicalize manifest (deterministic JSON serialization) -↓ Compute SHA-256 hash from canonical JSON -↓ -↓ Call dataset_store.register_manifest_and_link(): - │ - ├─ Store manifest in object store - │ └─ PUT manifests/.json - │ └─ Idempotent: content-addressable storage - │ - ├─ Register in manifest_files table - │ └─ INSERT INTO manifest_files (hash, path) - │ └─ ON CONFLICT DO NOTHING (idempotent) - │ - └─ BEGIN TRANSACTION - │ - ├─ Link manifest to dataset - │ └─ INSERT INTO dataset_manifests (namespace, name, hash) - │ └─ ON CONFLICT DO NOTHING (idempotent) - │ - └─ Update "dev" tag - └─ UPSERT INTO tags (namespace, name, 'dev', hash) - └─ Always points to most recently registered manifest - - COMMIT TRANSACTION -``` - -**Properties**: -- **Idempotent**: Safe to retry if transaction fails -- **Atomic**: "dev" tag always points to a valid, linked manifest -- **Automatic**: No manual "dev" tag management required - -#### Step 2: Create/Update Version Tag - -``` -↓ Call dataset_store.set_dataset_version_tag(namespace, name, version, hash): - │ - └─ BEGIN TRANSACTION - │ - ├─ Upsert version tag - │ └─ UPSERT INTO tags (namespace, name, version, hash) - │ └─ Updates updated_at on conflict - │ - ├─ Lock "latest" tag row - │ └─ SELECT * FROM tags WHERE version = 'latest' FOR UPDATE - │ └─ Prevents concurrent modifications - │ - ├─ Compare versions - │ └─ IF version > current_latest OR current_latest IS NULL: - │ - └─ Update "latest" tag (if needed) - └─ UPSERT INTO tags (namespace, name, 'latest', hash) - └─ Only if new version is higher - - COMMIT TRANSACTION -``` - -**Properties**: -- **Idempotent**: Re-registering same version with same manifest succeeds -- **Atomic**: "latest" tag always points to highest version -- **Transactional**: Row-level lock prevents race conditions -- **Automatic**: No manual "latest" tag management required - -### Registration Example - -``` -POST /datasets -{ - "namespace": "_", - "name": "eth_mainnet", - "version": "1.2.0", - "manifest": "{ ... }" -} - -↓ Canonicalize and hash manifest → abc123...def - -Step 1: register_manifest_and_link() - ├─ Store manifests/abc123...def.json in S3 - ├─ INSERT INTO manifest_files (hash='abc123...def', path='manifests/abc123...def.json') - ├─ INSERT INTO dataset_manifests (namespace='_', name='eth_mainnet', hash='abc123...def') - └─ UPSERT tags (namespace='_', name='eth_mainnet', version='dev', hash='abc123...def') - -Step 2: set_dataset_version_tag() - ├─ UPSERT tags (namespace='_', name='eth_mainnet', version='1.2.0', hash='abc123...def') - └─ IF 1.2.0 > current_latest: - └─ UPSERT tags (namespace='_', name='eth_mainnet', version='latest', hash='abc123...def') - -Result: - tags table now contains: - (_, eth_mainnet, '1.2.0', abc123...def) - (_, eth_mainnet, 'dev', abc123...def) - (_, eth_mainnet, 'latest', abc123...def) ← Only if 1.2.0 was highest -``` - -## Revision Resolution - -### Resolution Algorithm - -The `resolve_dataset_revision()` method converts a revision reference into a concrete manifest hash: - -```rust -pub async fn resolve_dataset_revision( - namespace: &Namespace, - name: &Name, - revision: &Revision, -) -> Result> { - match revision { - // Hash is already concrete, just verify it exists - Revision::Hash(hash) => { - let exists = manifest_exists(hash).await?; - Ok(exists.then_some(hash.clone())) - } - - // Semantic version: direct lookup in tags table - Revision::Version(version) => { - SELECT hash FROM tags - WHERE namespace = $1 AND name = $2 AND version = $3 - } - - // Latest: find highest semantic version - Revision::Latest => { - SELECT hash FROM tags - WHERE namespace = $1 AND name = $2 - AND version ~ '^[0-9]+\.[0-9]+\.[0-9]+.*' -- Semantic versions only - ORDER BY version DESC -- Semver ordering - LIMIT 1 - } - - // Dev: direct lookup in tags table - Revision::Dev => { - SELECT hash FROM tags - WHERE namespace = $1 AND name = $2 AND version = 'dev' - } - } -} -``` - -### Resolution Examples - -**Example 1: Version → Hash** -``` -Input: namespace="_", name="eth_mainnet", revision="1.2.0" -Query: SELECT hash FROM tags WHERE namespace='_' AND name='eth_mainnet' AND version='1.2.0' -Output: Some("abc123...def") -``` - -**Example 2: Latest → Hash** -``` -Input: namespace="_", name="eth_mainnet", revision="latest" -Query: SELECT hash FROM tags WHERE namespace='_' AND name='eth_mainnet' - AND version LIKE '[0-9]%' ORDER BY version DESC LIMIT 1 -Output: Some("abc123...def") -- Hash of version 1.2.0 (highest) -``` - -**Example 3: Dev → Hash** -``` -Input: namespace="_", name="eth_mainnet", revision="dev" -Query: SELECT hash FROM tags WHERE namespace='_' AND name='eth_mainnet' AND version='dev' -Output: Some("xyz789...abc") -- Most recently registered manifest -``` - -**Example 4: Hash → Hash** -``` -Input: namespace="_", name="eth_mainnet", revision="abc123...def" -Query: SELECT path FROM manifest_files WHERE hash='abc123...def' -Output: Some("abc123...def") -- Verified to exist -``` - -## Deployment Flow - -When deploying a dataset via `POST /datasets/{namespace}/{name}/versions/{revision}/deploy`: - -``` -1. Resolve revision to manifest hash - ↓ - revision="latest" → hash="abc123...def" - -2. Find which version tag points to this hash - ↓ - Query: SELECT version FROM tags - WHERE namespace=$1 AND name=$2 AND hash=$3 - AND version ~ '^[0-9]' -- Semantic versions only - ↓ - Result: version="1.2.0" - -3. Load full dataset using resolved version - ↓ - dataset_store.get_dataset(name="eth_mainnet", version="1.2.0") - -4. Schedule extraction job - ↓ - scheduler.schedule_dataset_dump(dataset, end_block, parallelism) -``` - -**Why resolve to version, not use hash directly?** -- The scheduler needs a version identifier for job tracking -- Historical jobs are recorded with version tags, not hashes -- Allows correlation between deployed jobs and registered versions - -## Lifecycle Management - -### Manifest Deletion - -Manifests can only be deleted if **no datasets link to them**: - -```sql --- Check for links -SELECT COUNT(*) FROM dataset_manifests WHERE hash = $1; - --- If count = 0, safe to delete -BEGIN TRANSACTION; - DELETE FROM manifest_files WHERE hash = $1; -- Cascades to tags - DELETE FROM object store: manifests/.json; -COMMIT; -``` - -**Garbage Collection**: -- Query `list_orphaned_manifests()` to find unlinked manifests -- Batch delete orphaned manifests periodically -- Prevents object store bloat - -### Tag Lifecycle - -**Semantic Version Tags**: -- Created explicitly during registration -- Never automatically deleted -- Can be manually deleted (leaves manifest intact) - -**"latest" Tag**: -- Created/updated automatically when higher version registered -- Never deleted (unless dataset deleted) -- Always points to highest semver - -**"dev" Tag**: -- Created/updated automatically on every registration -- Never deleted (unless dataset deleted) -- Always points to most recent registration - -### Dataset Deletion - -Deleting a dataset cascades through the schema: - -```sql --- Delete dataset link -DELETE FROM dataset_manifests WHERE namespace = $1 AND name = $2; - --- Cascades to: --- 1. All tags for this dataset (via FK constraint) --- 2. If manifest has no other links, eligible for GC -``` - -## Concurrency and Consistency - -### Race Condition Prevention - -**Problem**: Two concurrent registrations of different versions could cause "latest" tag inconsistency. - -**Solution**: Row-level locking with `SELECT FOR UPDATE` - -```sql -BEGIN TRANSACTION; - -- Lock the "latest" row to prevent concurrent modifications - SELECT * FROM tags - WHERE namespace = $1 AND name = $2 AND version = 'latest' - FOR UPDATE; - - -- Now we have exclusive access to decide if we should update latest - -- Other transactions block here until we commit - - IF new_version > current_latest THEN - UPDATE tags SET hash = $3 WHERE ... version = 'latest'; - END IF; -COMMIT; -``` - -### Idempotency - -All operations are idempotent: - -**Manifest Storage**: -- Content-addressable: same content → same hash → same path -- Object store: `PUT` is idempotent -- Database: `ON CONFLICT DO NOTHING` - -**Dataset Linking**: -- `INSERT ... ON CONFLICT DO NOTHING` -- Safe to retry on transaction failure - -**Version Tagging**: -- `INSERT ... ON CONFLICT DO UPDATE` -- Re-tagging same version with same hash succeeds (no-op) -- Re-tagging same version with different hash updates (version change) - -## Best Practices - -### For Development - -- Use `dev` tag for active development -- Register manifests without semantic versions (auto-updates `dev`) -- Deploy using `--reference namespace/name@dev` - -### For Production - -- Use semantic versions: `1.0.0`, `1.1.0`, etc. -- Deploy using `--reference namespace/name@latest` (stable) -- Pin critical deployments to specific versions: `--reference namespace/name@1.0.0` -- Never manually modify `latest` tag (auto-managed) - -### For Reproducibility - -- Reference by manifest hash when exact reproducibility required -- Hash never changes, guaranteed immutable -- Use for audits, compliance, research - -### Version Numbering - -Follow semantic versioning: -- **MAJOR**: Breaking changes to schema or queries -- **MINOR**: New tables or backward-compatible features -- **PATCH**: Bug fixes, optimizations - -## Related Files - -**Database**: -- `crates/core/metadata-db/migrations/20251016093912_add_manifests_and_tags_tables.sql` - Schema definition - -**Rust Implementation**: -- `crates/core/dataset-store/src/lib.rs` - Core versioning API -- `crates/services/admin-api/src/handlers/datasets/register.rs` - Registration handler -- `crates/services/admin-api/src/handlers/datasets/deploy.rs` - Deployment handler - -**Type Definitions**: -- `crates/common/datasets-common/src/revision.rs` - Revision enum (Rust) -- `crates/common/datasets-common/src/version.rs` - Semantic version (Rust) -- `crates/common/datasets-common/src/hash.rs` - Manifest hash (Rust) -- `typescript/amp/src/Model.ts` - Revision types (TypeScript) diff --git a/typescript/amp/src/cli/commands/init.ts b/typescript/amp/src/cli/commands/init.ts new file mode 100644 index 000000000..b5ab53f11 --- /dev/null +++ b/typescript/amp/src/cli/commands/init.ts @@ -0,0 +1,279 @@ +import * as Command from "@effect/cli/Command" +import * as Options from "@effect/cli/Options" +import * as Prompt from "@effect/cli/Prompt" +import * as FileSystem from "@effect/platform/FileSystem" +import * as Path from "@effect/platform/Path" +import * as Console from "effect/Console" +import * as Effect from "effect/Effect" +import * as Match from "effect/Match" +import * as Option from "effect/Option" +import { localEvmRpc } from "../templates/local-evm-rpc.ts" +import type { Template, TemplateAnswers } from "../templates/Template.ts" +import { resolveTemplateFile, TemplateError } from "../templates/Template.ts" + +/** + * Available templates + */ +const TEMPLATES: Record = { + "local-evm-rpc": localEvmRpc, +} + +/** + * Interactive prompts for dataset configuration + */ +const templatePrompt = Prompt.select({ + message: "Select a template to get started:", + choices: [ + { + title: "Local EVM RPC - Learn Amp with Anvil", + description: "Local blockchain with 500 sample events. No external dependencies.", + value: "local-evm-rpc", + }, + ], +}) + +const datasetNamePrompt = Prompt.text({ + message: "Dataset name:", + default: "my_dataset", + validate: (input) => + /^[a-z_][a-z0-9_]*$/.test(input) + ? Effect.succeed(input) + : Effect.fail( + "Dataset name must start with a letter or underscore and contain only lowercase letters, digits, and underscores", + ), +}) + +const datasetVersionPrompt = Prompt.text({ + message: "Dataset version:", + default: "0.1.0", +}) + +const projectNamePrompt = Prompt.text({ + message: "Project name:", + default: "amp_project", + validate: (input) => + input.trim().length > 0 + ? Effect.succeed(input.trim()) + : Effect.fail("Project name cannot be empty"), +}) + +/** + * Gets a template by name or returns an error + */ +const getTemplate = (name: string): Effect.Effect => { + return Match.value(TEMPLATES[name]).pipe( + Match.when( + (t): t is Template => t !== undefined, + (t) => Effect.succeed(t), + ), + Match.orElse(() => + Effect.fail( + new TemplateError({ + message: `Template "${name}" not found. Available templates: ${Object.keys(TEMPLATES).join(", ")}`, + }), + ) + ), + ) +} + +/** + * Writes template files to the target directory + */ +const writeTemplateFiles = ( + template: Template, + answers: TemplateAnswers, + targetPath: string, +): Effect.Effect => + Effect.gen(function*() { + const fs = yield* FileSystem.FileSystem + const path = yield* Path.Path + + // Ensure target directory exists + yield* fs.makeDirectory(targetPath, { recursive: true }).pipe( + Effect.mapError( + (cause) => + new TemplateError({ + message: `Failed to create directory: ${targetPath}`, + cause, + }), + ), + ) + + // Write each file + for (const [filePath, fileContent] of Object.entries(template.files)) { + const fullPath = path.join(targetPath, filePath) + const content = resolveTemplateFile(fileContent, answers) + + // Ensure parent directory exists + const parentDir = path.dirname(fullPath) + yield* fs.makeDirectory(parentDir, { recursive: true }).pipe( + Effect.mapError( + (cause) => + new TemplateError({ + message: `Failed to create directory: ${parentDir}`, + cause, + }), + ), + ) + + // Write file + yield* fs.writeFileString(fullPath, content).pipe( + Effect.mapError( + (cause) => + new TemplateError({ + message: `Failed to write file: ${fullPath}`, + cause, + }), + ), + ) + + yield* Console.log(` ✓ Created ${filePath}`) + } + }) + +/** + * Validates dataset name format + */ +const validateDatasetName = (name: string): Effect.Effect => { + return /^[a-z_][a-z0-9_]*$/.test(name) + ? Effect.succeed(name) + : Effect.fail( + new TemplateError({ + message: + `Invalid dataset name: "${name}". Dataset name must start with a letter or underscore and contain only lowercase letters, digits, and underscores`, + }), + ) +} + +/** + * Core initialization logic + */ +const initializeProject = ( + datasetName: string, + datasetVersion: string, + projectName: string, +): Effect.Effect => + Effect.gen(function*() { + const path = yield* Path.Path + const fs = yield* FileSystem.FileSystem + + // Validate dataset name + yield* validateDatasetName(datasetName) + + // Use current directory + const targetPath = path.resolve(".") + + // Check if amp.config.ts already exists + const configPath = path.join(targetPath, "amp.config.ts") + const configExists = yield* fs.exists(configPath).pipe( + Effect.mapError( + (cause) => + new TemplateError({ + message: `Failed to check for existing amp.config.ts`, + cause, + }), + ), + ) + + if (configExists) { + return yield* Effect.fail( + new TemplateError({ + message: + `amp.config.ts already exists in this directory. Remove it or run amp init in a different directory.`, + }), + ) + } + + // Get template (only local-evm-rpc for now) + const template = yield* getTemplate("local-evm-rpc") + + // Prepare answers + const answers: TemplateAnswers = { + projectName, + datasetName, + datasetVersion, + network: "anvil", + } + + yield* Console.log(`\nInitializing Amp project with template: ${template.name}`) + yield* Console.log(`Target directory: ${targetPath}\n`) + + // Write files + yield* writeTemplateFiles(template, answers, targetPath) + + // Run post-install hook if present + if (template.postInstall) { + yield* Console.log("\nRunning post-install") + yield* template.postInstall(targetPath) + } + + yield* Console.log(`\nProject initialized successfully\n`) + yield* Console.log(`See README.md for next steps\n`) + }) + +/** + * Initialize command with both interactive and non-interactive modes + * - Interactive mode: Prompts user for values (default when no flags provided) + * - Non-interactive mode: Uses flags or defaults (when --flags or -y provided) + */ +export const init = Command.make("init", { + args: { + datasetName: Options.text("dataset-name").pipe( + Options.withDescription("Dataset identifier (lowercase, alphanumeric, underscore only). Example: my_dataset"), + Options.optional, + ), + datasetVersion: Options.text("dataset-version").pipe( + Options.withDescription("Semantic version for the dataset. Example: 0.1.0"), + Options.optional, + ), + projectName: Options.text("project-name").pipe( + Options.withDescription("Human-readable project name used in generated README. Example: \"My Project\""), + Options.optional, + ), + yes: Options.boolean("yes").pipe( + Options.withAlias("y"), + Options.withDescription("Non-interactive mode: use default values without prompting"), + Options.withDefault(false), + ), + }, +}).pipe( + Command.withDescription("Initialize a new Amp project from a template"), + Command.withHandler(({ args }) => + Effect.gen(function*() { + // Determine if we should use interactive mode or flag-based mode + const hasAnyFlags = Option.isSome(args.datasetName) || + Option.isSome(args.datasetVersion) || + Option.isSome(args.projectName) + + // If -y/--yes flag is set, use all defaults + if (args.yes && !hasAnyFlags) { + yield* initializeProject("my_dataset", "0.1.0", "amp_project") + return + } + + // If any flags are provided, validate and use them (non-interactive mode) + if (hasAnyFlags) { + const datasetName = Option.getOrElse(args.datasetName, () => "my_dataset") + const datasetVersion = Option.getOrElse(args.datasetVersion, () => "0.1.0") + const projectName = Option.getOrElse(args.projectName, () => "amp_project") + + yield* initializeProject(datasetName, datasetVersion, projectName) + return + } + + // Interactive mode - prompt for all values + const templateChoice = yield* templatePrompt + const datasetNameAnswer = yield* datasetNamePrompt + const datasetVersionAnswer = yield* datasetVersionPrompt + const projectNameAnswer = yield* projectNamePrompt + + // Add spacing before initialization output + yield* Console.log("") + + // Verify template exists (should always pass with select prompt, but be defensive) + yield* getTemplate(templateChoice) + + yield* initializeProject(datasetNameAnswer, datasetVersionAnswer, projectNameAnswer) + }) + ), +) diff --git a/typescript/amp/src/cli/main.ts b/typescript/amp/src/cli/main.ts index 2eed7581f..2d3d05bcc 100755 --- a/typescript/amp/src/cli/main.ts +++ b/typescript/amp/src/cli/main.ts @@ -20,6 +20,7 @@ import { auth } from "./commands/auth/index.ts" import { build } from "./commands/build.ts" import { deploy } from "./commands/deploy.ts" import { dev } from "./commands/dev.ts" +import { init } from "./commands/init.ts" import { proxy } from "./commands/proxy.ts" import { publish } from "./commands/publish.ts" import { query } from "./commands/query.ts" @@ -39,7 +40,7 @@ const amp = Command.make("amp", { }, }).pipe( Command.withDescription("The Amp Command Line Interface"), - Command.withSubcommands([build, dev, deploy, query, proxy, register, publish, studio, auth]), + Command.withSubcommands([auth, build, deploy, dev, init, proxy, publish, query, register, studio]), Command.provide(({ args }) => Logger.minimumLogLevel(args.logs)), ) diff --git a/typescript/amp/src/cli/templates/Template.ts b/typescript/amp/src/cli/templates/Template.ts new file mode 100644 index 000000000..0927859c6 --- /dev/null +++ b/typescript/amp/src/cli/templates/Template.ts @@ -0,0 +1,59 @@ +import type * as Effect from "effect/Effect" +import * as Schema from "effect/Schema" + +/** + * Answers to template prompts + */ +export interface TemplateAnswers { + readonly projectName?: string | undefined + readonly datasetName: string + readonly datasetVersion?: string | undefined + readonly network?: string | undefined +} + +/** + * A file in a template that can be either a static string or dynamically generated + */ +export type TemplateFile = string | ((answers: TemplateAnswers) => string) + +/** + * Template definition + */ +export interface Template { + /** + * Template identifier (e.g., "local-evm-rpc") + */ + readonly name: string + + /** + * Human-readable description + */ + readonly description: string + + /** + * Map of file paths to their content + * File paths are relative to the project root + */ + readonly files: Record + + /** + * Optional post-installation hook + * Runs after files are written + */ + readonly postInstall?: (projectPath: string) => Effect.Effect +} + +/** + * Error type for template operations + */ +export class TemplateError extends Schema.TaggedError("TemplateError")("TemplateError", { + cause: Schema.Unknown.pipe(Schema.optional), + message: Schema.String, +}) {} + +/** + * Resolves a template file to its string content + */ +export const resolveTemplateFile = (file: TemplateFile, answers: TemplateAnswers): string => { + return typeof file === "function" ? file(answers) : file +} diff --git a/typescript/amp/src/cli/templates/local-evm-rpc.ts b/typescript/amp/src/cli/templates/local-evm-rpc.ts new file mode 100644 index 000000000..5518f6b85 --- /dev/null +++ b/typescript/amp/src/cli/templates/local-evm-rpc.ts @@ -0,0 +1,743 @@ +import type { Template, TemplateAnswers } from "./Template.ts" + +/** + * Local EVM RPC template for Anvil-based development + * + * This template provides a quick-start setup for learning Amp with a local blockchain. + * It includes a sample contract that generates 500 events for immediate querying. + */ +export const localEvmRpc: Template = { + name: "local-evm-rpc", + description: "Local development with Anvil and sample data", + files: { + "amp.config.ts": (answers: TemplateAnswers) => + `import { defineDataset } from "@edgeandnode/amp" + +const event = (signature: string) => { + return \` + SELECT block_hash, tx_hash, block_num, timestamp, address, + evm_decode_log(topic1, topic2, topic3, data, '\${signature}') as event + FROM anvil_rpc.logs + WHERE topic0 = evm_topic('\${signature}') + \` +} + +const dataEmitted = event("DataEmitted(uint256 indexed id, address indexed sender, uint256 value, string message)") + +export default defineDataset(() => ({ + name: "${answers.datasetName}", + version: "${answers.datasetVersion || "0.1.0"}", + network: "anvil", + dependencies: { + anvil_rpc: "_/anvil_rpc@0.1.0", + }, + tables: { + events: { + sql: \` + SELECT + e.block_hash, + e.tx_hash, + e.address, + e.block_num, + e.timestamp, + e.event['id'] as event_id, + e.event['sender'] as sender, + e.event['value'] as value, + e.event['message'] as message + FROM (\${dataEmitted}) as e + ORDER BY e.block_num, e.event['id'] + \`, + }, + }, +})) +`, + + "README.md": (answers: TemplateAnswers) => + `# ${answers.projectName || answers.datasetName} + +Learn Amp with 500 sample events on a local blockchain. + +## What You Get + +- **500 events** ready to query immediately after setup +- **Local Anvil testnet** - no external dependencies +- **Sample queries** demonstrating Amp's SQL capabilities +- **Educational walkthrough** of Amp's data flow + +## Quick Start (5 steps, ~2 minutes) + +### Prerequisites + +Install these once: + +\`\`\`bash +# Amp daemon (extraction & query engine) +curl --proto '=https' --tlsv1.2 -sSf https://ampup.sh/install | sh + +# Foundry (Anvil testnet & Forge) +curl -L https://foundry.paradigm.xyz | bash && foundryup + +# jq (JSON processing - needed for dataset registration) +# macOS: +brew install jq +# Linux: +sudo apt-get install jq +\`\`\` + +### Step 1: Start Anvil + +Open a terminal and start the local blockchain: + +\`\`\`bash +anvil +\`\`\` + +**Keep this running.** You should see: +\`\`\` +Listening on 127.0.0.1:8545 +\`\`\` + +### Step 2: Generate 500 Sample Events + +In a **new terminal**, deploy the contract to generate events: + +\`\`\`bash +cd contracts + +# Install dependencies +forge install foundry-rs/forge-std --no-git + +# Deploy contract and emit 500 events +forge script script/EventEmitter.s.sol \\ + --broadcast \\ + --rpc-url http://localhost:8545 \\ + --sender 0xf39fd6e51aad88f6f4ce6ab8827279cfffb92266 \\ + --private-key 0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80 + +cd .. +\`\`\` + +This will: +- Deploy the EventEmitter contract +- Emit 500 \`DataEmitted\` events with different values +- Take ~5-10 seconds to complete + +### Step 3: Start Amp + +Start the Amp development server: + +\`\`\`bash +AMP_CONFIG=amp.toml ampd dev +\`\`\` + +**Keep this running.** You should see: +\`\`\` +Admin API started on port 1610 +Query servers started (Arrow Flight: 1602, JSON Lines: 1603) +\`\`\` + +### Step 4: Register Dataset + +In a **new terminal**, register the dataset with Amp: + +\`\`\`bash +# Register the provider (tells Amp where Anvil is) +curl -X POST http://localhost:1610/providers \\ + -H "Content-Type: application/json" \\ + -d '{ + "name": "anvil_rpc", + "kind": "evm-rpc", + "network": "anvil", + "url": "http://localhost:8545" + }' + +# Build registration payload +jq -n \\ + --arg manifest "$(cat manifests/anvil_rpc.json | jq -c .)" \\ + '{namespace: "_", name: "anvil_rpc", version: "0.1.0", manifest: $manifest}' \\ + > /tmp/register.json + +# Register the dataset +curl -X POST http://localhost:1610/datasets \\ + -H "Content-Type: application/json" \\ + -d @/tmp/register.json + +# Deploy to trigger extraction +curl -X POST http://localhost:1610/datasets/_/anvil_rpc/versions/0.1.0/deploy \\ + -H "Content-Type: application/json" \\ + -d '{}' +\`\`\` + +Wait ~10 seconds for extraction to complete. Watch the \`ampd dev\` terminal for extraction progress. + +### Step 5: Query Your Data + +Query the 500 events: + +\`\`\`bash +# Simple count +curl -X POST http://localhost:1603 --data "SELECT COUNT(*) as total_events FROM anvil_rpc.logs" + +# View first 10 events +curl -X POST http://localhost:1603 --data "SELECT * FROM anvil_rpc.logs LIMIT 10" + +# Get event distribution +curl -X POST http://localhost:1603 --data " + SELECT block_num, COUNT(*) as events_in_block + FROM anvil_rpc.logs + GROUP BY block_num + ORDER BY block_num" +\`\`\` + +**Success!** You now have a working Amp setup with real blockchain data. + +### Optional: Use Amp Studio (Visual Query Interface) + +For a better developer experience, use Amp Studio - a web-based query playground: + +\`\`\`bash +# If using published package (requires studio to be built) +npx @edgeandnode/amp studio + +# Or if developing Amp locally +bun /path/to/typescript/amp/src/cli/bun.ts studio +\`\`\` + +This opens a visual query builder at \`http://localhost:1615\` with: +- Syntax highlighting and auto-completion +- Formatted query results +- Pre-populated example queries from your dataset +- Real-time error feedback + +**Note**: Studio requires the frontend to be built. If it's not available, use curl commands below. + +--- + +## Learning Amp: Query Examples + +These examples demonstrate Amp's capabilities. Try them! + +### 1. Basic Filtering + +\`\`\`bash +# Events from a specific block +curl -X POST http://localhost:1603 --data " + SELECT * FROM anvil_rpc.logs + WHERE block_num = 1" + +# Events from blocks 1-5 +curl -X POST http://localhost:1603 --data " + SELECT block_num, log_index, address + FROM anvil_rpc.logs + WHERE block_num BETWEEN 1 AND 5" +\`\`\` + +### 2. Aggregations + +\`\`\`bash +# Count events per block +curl -X POST http://localhost:1603 --data " + SELECT block_num, COUNT(*) as event_count + FROM anvil_rpc.logs + GROUP BY block_num + ORDER BY block_num" + +# Get min/max/avg block numbers +curl -X POST http://localhost:1603 --data " + SELECT + MIN(block_num) as first_block, + MAX(block_num) as last_block, + COUNT(DISTINCT block_num) as total_blocks + FROM anvil_rpc.logs" +\`\`\` + +### 3. Decoded Events (Using Your Dataset) + +Your \`amp.config.ts\` defines decoded tables. Query them: + +\`\`\`bash +# Get decoded events (requires TypeScript dataset to be running) +curl -X POST http://localhost:1603 --data " + SELECT event_id, sender, value, message + FROM ${answers.datasetName}.events + LIMIT 10" + +# Aggregate by message +curl -X POST http://localhost:1603 --data " + SELECT message, COUNT(*) as count + FROM ${answers.datasetName}.events + GROUP BY message + ORDER BY count DESC" +\`\`\` + +### 4. Joining Tables + +\`\`\`bash +# Join logs with transactions +curl -X POST http://localhost:1603 --data " + SELECT + l.block_num, + l.log_index, + t.tx_hash, + t.from, + t.gas_used + FROM anvil_rpc.logs l + JOIN anvil_rpc.transactions t ON l.tx_hash = t.tx_hash + LIMIT 10" + +# Join with blocks +curl -X POST http://localhost:1603 --data " + SELECT + b.block_num, + b.timestamp, + COUNT(l.log_index) as event_count + FROM anvil_rpc.blocks b + LEFT JOIN anvil_rpc.logs l ON b.hash = l.block_hash + GROUP BY b.block_num, b.timestamp + ORDER BY b.block_num" +\`\`\` + +### 5. Advanced: Using Amp UDFs + +Amp provides custom SQL functions for Ethereum data: + +\`\`\`bash +# Get event topic hash +curl -X POST http://localhost:1603 --data " + SELECT evm_topic('DataEmitted(uint256,address,uint256,string)') as topic_hash" + +# Decode event manually +curl -X POST http://localhost:1603 --data " + SELECT + block_num, + evm_decode_log( + topic1, + topic2, + topic3, + data, + 'DataEmitted(uint256 indexed id, address indexed sender, uint256 value, string message)' + ) as decoded + FROM anvil_rpc.logs + WHERE topic0 = evm_topic('DataEmitted(uint256,address,uint256,string)') + LIMIT 5" +\`\`\` + +### 6. Performance: Working with All 500 Events + +\`\`\`bash +# Count all events by block +curl -X POST http://localhost:1603 --data " + SELECT block_num, COUNT(*) as events + FROM anvil_rpc.logs + GROUP BY block_num" + +# Find blocks with most events +curl -X POST http://localhost:1603 --data " + SELECT block_num, COUNT(*) as event_count + FROM anvil_rpc.logs + GROUP BY block_num + ORDER BY event_count DESC + LIMIT 10" +\`\`\` + +--- + +## Project Structure + +\`\`\` +. +├── amp.config.ts # TypeScript dataset (defines decoded event tables) +├── amp.toml # Amp daemon configuration +├── manifests/ +│ └── anvil_rpc.json # Raw dataset definition (blocks, logs, transactions) +├── providers/ +│ └── anvil.toml # Anvil RPC endpoint config +├── data/ # Parquet files (auto-created during extraction) +├── contracts/ +│ ├── src/ +│ │ └── EventEmitter.sol # Sample contract (emits 500 events) +│ ├── script/ +│ │ └── EventEmitter.s.sol # Deployment script +│ └── foundry.toml +└── README.md # This file +\`\`\` + +## How Amp Works: The Data Flow + +Understanding the architecture helps you extend this template: + +1. **Data Source** → Anvil local blockchain running on port 8545 +2. **Provider** → \`providers/anvil.toml\` tells Amp how to connect +3. **Raw Dataset** → \`manifests/anvil_rpc.json\` defines extraction schema +4. **Extraction** → \`ampd\` pulls data and stores as Parquet files in \`data/\` +5. **SQL Dataset** → \`amp.config.ts\` transforms raw data (decodes events) +6. **Query** → Query via HTTP (port 1603) or Arrow Flight (port 1602) + +**Key Insight**: Raw data (blocks, logs, transactions) is extracted once. Your SQL views in \`amp.config.ts\` transform it on-the-fly during queries. + +## The Sample Contract + +The \`EventEmitter.sol\` contract emits one event type: + +\`\`\`solidity +event DataEmitted( + uint256 indexed id, // Sequential ID (0-499) + address indexed sender, // Event sender + uint256 value, // Random value + string message // Rotating message +); +\`\`\` + +The deployment script emits 500 events with varied data to demonstrate: +- Filtering by indexed fields (\`id\`, \`sender\`) +- Decoding non-indexed fields (\`value\`, \`message\`) +- Aggregating across blocks +- Joining with transaction data + +## Next Steps + +### 1. Deploy Your Own Contract + +Replace the sample contract: + +\`\`\`bash +# Add your contract to contracts/src/YourContract.sol +# Create deployment script in contracts/script/YourContract.s.sol +forge script script/YourContract.s.sol --broadcast --rpc-url http://localhost:8545 +\`\`\` + +### 2. Extract Your Events + +Update \`amp.config.ts\`: + +\`\`\`typescript +const myEvent = event("MyEvent(address indexed user, uint256 amount)") + +export default defineDataset(() => ({ + // ... existing config + tables: { + my_events: { + sql: \` + SELECT + e.block_num, + e.event['user'] as user, + e.event['amount'] as amount + FROM (\${myEvent}) as e + \`, + }, + }, +})) +\`\`\` + +### 3. Move to Testnet/Mainnet + +When ready to work with real networks: + +1. Update \`providers/\` with real RPC endpoints +2. Change \`network\` in manifests +3. Adjust \`start_block\` to avoid extracting entire chain +4. Use \`finalized_blocks_only: true\` for production + +### 4. Explore Advanced Features + +- **Custom UDFs** - Write JavaScript functions in your dataset +- **Dependencies** - Build datasets on top of other datasets +- **Materialization** - Cache computed results for performance +- **Real-time streaming** - Subscribe to new data as it arrives + +## Troubleshooting + +**No events showing up?** +\`\`\`bash +# Check Anvil is running +curl -X POST http://localhost:8545 -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' + +# Check ampd extracted data +ls -lah data/ + +# Re-deploy contract if needed +cd contracts && forge script script/EventEmitter.s.sol --broadcast --rpc-url http://localhost:8545 +\`\`\` + +**ampd not found?** +\`\`\`bash +curl --proto '=https' --tlsv1.2 -sSf https://ampup.sh/install | sh +# Restart terminal +\`\`\` + +**Port conflicts?** +- Anvil: 8545 +- Amp Admin: 1610 +- Amp Arrow Flight: 1602 +- Amp JSON Lines: 1603 +- Amp Studio: 1615 (if using \`amp studio\`) + +Stop conflicting services or change ports in config files. + +**Want to start fresh?** +\`\`\`bash +# Stop ampd (Ctrl+C) and Anvil (Ctrl+C) +rm -rf data/ # Delete extracted data +rm -rf contracts/out/ # Delete compiled contracts +anvil # Restart Anvil +# Re-deploy contracts and re-register dataset +\`\`\` + +## Learn More + +- **Amp Documentation** - See \`docs/\` in the Amp repository +- **Foundry Book** - https://book.getfoundry.sh +- **DataFusion SQL** - https://datafusion.apache.org/user-guide/sql/ + +--- + +**Questions or issues?** Open an issue at https://github.com/edgeandnode/amp +`, + + "contracts/foundry.toml": `[profile.default] +src = "src" +out = "out" +libs = ["lib"] +solc_version = "0.8.20" + +# See more config options https://github.com/foundry-rs/foundry/blob/master/crates/config/README.md#all-options +`, + + "contracts/src/EventEmitter.sol": `// SPDX-License-Identifier: MIT +pragma solidity ^0.8.20; + +/** + * @title EventEmitter + * @notice Sample contract that emits 500 events for Amp demonstration + * @dev This contract is designed to generate diverse event data for learning Amp's query capabilities + */ +contract EventEmitter { + event DataEmitted( + uint256 indexed id, + address indexed sender, + uint256 value, + string message + ); + + /** + * @notice Emits a batch of events + * @param start Starting ID + * @param count Number of events to emit + */ + function emitBatch(uint256 start, uint256 count) external { + for (uint256 i = 0; i < count; i++) { + uint256 eventId = start + i; + + // Generate varied data for interesting queries + uint256 value = (eventId * 123) % 1000; // Pseudo-random values 0-999 + string memory message = _getMessage(eventId % 5); + + emit DataEmitted(eventId, msg.sender, value, message); + } + } + + /** + * @notice Returns a message based on index (creates 5 categories) + */ + function _getMessage(uint256 index) internal pure returns (string memory) { + if (index == 0) return "Action: Transfer"; + if (index == 1) return "Action: Mint"; + if (index == 2) return "Action: Burn"; + if (index == 3) return "Action: Swap"; + return "Action: Stake"; + } +} +`, + + "contracts/script/EventEmitter.s.sol": `// SPDX-License-Identifier: MIT +pragma solidity ^0.8.20; + +import {Script} from "forge-std/Script.sol"; +import {EventEmitter} from "../src/EventEmitter.sol"; + +/** + * @title EventEmitter Deployment Script + * @notice Deploys EventEmitter and emits 500 events in batches + * @dev Batches events to avoid gas limits and demonstrate multi-block extraction + */ +contract Deploy is Script { + // Emit events in batches to spread across multiple blocks + uint256 constant BATCH_SIZE = 50; + uint256 constant TOTAL_EVENTS = 500; + + function run() external { + vm.startBroadcast(); + + EventEmitter emitter = new EventEmitter(); + + // Emit 500 events in batches of 50 (10 batches total) + // This spreads events across multiple blocks for more interesting queries + for (uint256 batch = 0; batch < TOTAL_EVENTS / BATCH_SIZE; batch++) { + emitter.emitBatch(batch * BATCH_SIZE, BATCH_SIZE); + } + + vm.stopBroadcast(); + } +} +`, + + "contracts/remappings.txt": `forge-std/=lib/forge-std/src/ +`, + + ".gitignore": `# Amp +data/ +manifests/ +providers/ + +# Foundry +cache/ +out/ +broadcast/ +contracts/out/ +contracts/cache/ +contracts/broadcast/ +contracts/lib/ + +# Rust +target/ + +# Node (nested workspaces) +**/node_modules/ +**/dist/ +**/*.tsbuildinfo + +# Bun +bun.lockb + +# Environment +.env +.env.local + +# IDE +.vscode/ +.idea/ +*.swp +*.swo + +# OS +.DS_Store +`, + + "amp.toml": `# Amp configuration for local development with Anvil +# This config is used by ampd (the Rust daemon) + +# Where extracted parquet files are stored +data_dir = "data" + +# Directory containing provider configurations +providers_dir = "providers" + +# Directory containing dataset manifests +# Note: Manifests here are NOT auto-loaded. You must register datasets via the Admin API. +# See README for registration commands. +dataset_defs_dir = "manifests" + +# Optional: Temporary PostgreSQL will be used automatically in dev mode +# No need to configure metadata_db_url for local development +`, + + "providers/anvil.toml": `# Anvil local testnet provider configuration +kind = "evm-rpc" +network = "anvil" +url = "http://localhost:8545" +`, + + "manifests/anvil_rpc.json": `{ + "kind": "evm-rpc", + "network": "anvil", + "start_block": 0, + "finalized_blocks_only": false, + "tables": { + "blocks": { + "schema": { + "arrow": { + "fields": [ + { "name": "block_num", "type": "UInt64", "nullable": false }, + { "name": "timestamp", "type": { "Timestamp": ["Nanosecond", "+00:00"] }, "nullable": false }, + { "name": "hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "parent_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "ommers_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "miner", "type": { "FixedSizeBinary": 20 }, "nullable": false }, + { "name": "state_root", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "transactions_root", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "receipt_root", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "logs_bloom", "type": "Binary", "nullable": false }, + { "name": "difficulty", "type": { "Decimal128": [38, 0] }, "nullable": false }, + { "name": "gas_limit", "type": "UInt64", "nullable": false }, + { "name": "gas_used", "type": "UInt64", "nullable": false }, + { "name": "extra_data", "type": "Binary", "nullable": false }, + { "name": "mix_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "nonce", "type": "UInt64", "nullable": false }, + { "name": "base_fee_per_gas", "type": { "Decimal128": [38, 0] }, "nullable": true }, + { "name": "withdrawals_root", "type": { "FixedSizeBinary": 32 }, "nullable": true }, + { "name": "blob_gas_used", "type": "UInt64", "nullable": true }, + { "name": "excess_blob_gas", "type": "UInt64", "nullable": true }, + { "name": "parent_beacon_root", "type": { "FixedSizeBinary": 32 }, "nullable": true } + ] + } + }, + "network": "anvil" + }, + "logs": { + "schema": { + "arrow": { + "fields": [ + { "name": "block_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "block_num", "type": "UInt64", "nullable": false }, + { "name": "timestamp", "type": { "Timestamp": ["Nanosecond", "+00:00"] }, "nullable": false }, + { "name": "tx_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "tx_index", "type": "UInt32", "nullable": false }, + { "name": "log_index", "type": "UInt32", "nullable": false }, + { "name": "address", "type": { "FixedSizeBinary": 20 }, "nullable": false }, + { "name": "topic0", "type": { "FixedSizeBinary": 32 }, "nullable": true }, + { "name": "topic1", "type": { "FixedSizeBinary": 32 }, "nullable": true }, + { "name": "topic2", "type": { "FixedSizeBinary": 32 }, "nullable": true }, + { "name": "topic3", "type": { "FixedSizeBinary": 32 }, "nullable": true }, + { "name": "data", "type": "Binary", "nullable": false } + ] + } + }, + "network": "anvil" + }, + "transactions": { + "schema": { + "arrow": { + "fields": [ + { "name": "block_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "block_num", "type": "UInt64", "nullable": false }, + { "name": "timestamp", "type": { "Timestamp": ["Nanosecond", "+00:00"] }, "nullable": false }, + { "name": "tx_index", "type": "UInt32", "nullable": false }, + { "name": "tx_hash", "type": { "FixedSizeBinary": 32 }, "nullable": false }, + { "name": "to", "type": { "FixedSizeBinary": 20 }, "nullable": true }, + { "name": "nonce", "type": "UInt64", "nullable": false }, + { "name": "gas_price", "type": { "Decimal128": [38, 0] }, "nullable": true }, + { "name": "gas_limit", "type": "UInt64", "nullable": false }, + { "name": "value", "type": { "Decimal128": [38, 0] }, "nullable": false }, + { "name": "input", "type": "Binary", "nullable": false }, + { "name": "v", "type": "Binary", "nullable": false }, + { "name": "r", "type": "Binary", "nullable": false }, + { "name": "s", "type": "Binary", "nullable": false }, + { "name": "gas_used", "type": "UInt64", "nullable": false }, + { "name": "type", "type": "Int32", "nullable": false }, + { "name": "max_fee_per_gas", "type": { "Decimal128": [38, 0] }, "nullable": true }, + { "name": "max_priority_fee_per_gas", "type": { "Decimal128": [38, 0] }, "nullable": true }, + { "name": "max_fee_per_blob_gas", "type": { "Decimal128": [38, 0] }, "nullable": true }, + { "name": "from", "type": { "FixedSizeBinary": 20 }, "nullable": false }, + { "name": "status", "type": "Boolean", "nullable": false } + ] + } + }, + "network": "anvil" + } + } +} +`, + + "data/.gitkeep": "", + }, +} diff --git a/typescript/amp/test/cli/init.test.ts b/typescript/amp/test/cli/init.test.ts new file mode 100644 index 000000000..461233892 --- /dev/null +++ b/typescript/amp/test/cli/init.test.ts @@ -0,0 +1,142 @@ +import { describe, expect, it } from "vitest" +import { localEvmRpc } from "../../src/cli/templates/local-evm-rpc.ts" +import type { TemplateAnswers } from "../../src/cli/templates/Template.ts" +import { resolveTemplateFile } from "../../src/cli/templates/Template.ts" + +describe("amp init command", () => { + describe("template file resolution", () => { + it("should resolve static template files", () => { + const staticFile = "static content" + const answers: TemplateAnswers = { + datasetName: "test_dataset", + datasetVersion: "1.0.0", + projectName: "Test Project", + network: "anvil", + } + + const result = resolveTemplateFile(staticFile, answers) + expect(result).toBe("static content") + }) + + it("should resolve dynamic template files with answers", () => { + const dynamicFile = (answers: TemplateAnswers) => + `Dataset: ${answers.datasetName}, Version: ${answers.datasetVersion}` + + const answers: TemplateAnswers = { + datasetName: "test_dataset", + datasetVersion: "1.0.0", + projectName: "Test Project", + network: "anvil", + } + + const result = resolveTemplateFile(dynamicFile, answers) + expect(result).toBe("Dataset: test_dataset, Version: 1.0.0") + }) + }) + + describe("local-evm-rpc template", () => { + it("should have the correct template metadata", () => { + expect(localEvmRpc.name).toBe("local-evm-rpc") + expect(localEvmRpc.description).toBe("Local development with Anvil and sample data") + expect(localEvmRpc.files).toBeDefined() + }) + + it("should generate amp.config.ts with custom dataset name and version", () => { + const answers: TemplateAnswers = { + datasetName: "custom_data", + datasetVersion: "2.0.0", + projectName: "Custom Project", + network: "anvil", + } + + const configFile = localEvmRpc.files["amp.config.ts"] + const content = resolveTemplateFile(configFile, answers) + + expect(content).toContain("name: \"custom_data\"") + expect(content).toContain("version: \"2.0.0\"") + expect(content).toContain("network: \"anvil\"") + }) + + it("should generate README.md with project name", () => { + const answers: TemplateAnswers = { + datasetName: "my_dataset", + datasetVersion: "1.0.0", + projectName: "My Cool Project", + network: "anvil", + } + + const readmeFile = localEvmRpc.files["README.md"] + const content = resolveTemplateFile(readmeFile, answers) + + expect(content).toContain("# My Cool Project") + expect(content).toContain("Quick Start") + expect(content).toContain("500 events") + }) + + it("should include all required files", () => { + const requiredFiles = [ + "amp.config.ts", + "README.md", + "contracts/foundry.toml", + "contracts/src/EventEmitter.sol", + "contracts/script/EventEmitter.s.sol", + "contracts/remappings.txt", + ".gitignore", + ] + + for (const file of requiredFiles) { + expect(localEvmRpc.files[file]).toBeDefined() + } + }) + + it("should generate valid Solidity contract", () => { + const answers: TemplateAnswers = { + datasetName: "test_dataset", + datasetVersion: "1.0.0", + projectName: "Test", + network: "anvil", + } + + const eventEmitterFile = localEvmRpc.files["contracts/src/EventEmitter.sol"] + const content = resolveTemplateFile(eventEmitterFile, answers) + + expect(content).toContain("pragma solidity") + expect(content).toContain("contract EventEmitter") + expect(content).toContain("event DataEmitted") + expect(content).toContain("function emitBatch") + }) + }) + + describe("dataset name validation", () => { + it("should accept valid dataset names", () => { + const validNames = [ + "my_dataset", + "dataset_123", + "a", + "_underscore", + "lower_case_only", + ] + + for (const name of validNames) { + const regex = /^[a-z_][a-z0-9_]*$/ + expect(regex.test(name)).toBe(true) + } + }) + + it("should reject invalid dataset names", () => { + const invalidNames = [ + "MyDataset", // uppercase + "123dataset", // starts with number + "my-dataset", // contains hyphen + "my dataset", // contains space + "dataset!", // special character + "UPPERCASE", + ] + + for (const name of invalidNames) { + const regex = /^[a-z_][a-z0-9_]*$/ + expect(regex.test(name)).toBe(false) + } + }) + }) +}) diff --git a/typescript/amp/test/fixtures/contracts/lib/forge-std b/typescript/amp/test/fixtures/contracts/lib/forge-std deleted file mode 160000 index 8e40513d6..000000000 --- a/typescript/amp/test/fixtures/contracts/lib/forge-std +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 8e40513d678f392f398620b3ef2b418648b33e89 diff --git a/typescript/amp/test/fixtures/contracts/lib/solady b/typescript/amp/test/fixtures/contracts/lib/solady deleted file mode 160000 index acd959aa4..000000000 --- a/typescript/amp/test/fixtures/contracts/lib/solady +++ /dev/null @@ -1 +0,0 @@ -Subproject commit acd959aa4bd04720d640bf4e6a5c71037510cc4b diff --git a/vitest.config.ts b/vitest.config.ts deleted file mode 100644 index edb33f218..000000000 --- a/vitest.config.ts +++ /dev/null @@ -1,7 +0,0 @@ -import { defineConfig } from "vitest/config" - -export default defineConfig({ - test: { - projects: ["typescript/*/vitest.config.ts"], - }, -})