Skip to content

Conversation

@ahundt
Copy link

@ahundt ahundt commented Dec 16, 2025

Problem

Happy crashes when the server is unreachable - Users see uncaught exceptions and can't use the CLI at all:

Error: connect ECONNREFUSED 127.0.0.1:3005
    at TCPConnectWrap.afterConnect

Related issues: #90 (Error 522), #71 (Error 500), #68 (no way to continue after MCP connect failure)

This is especially painful because Claude Code works fine locally - there's no reason happy should crash just because the remote sync server is down.

Solution

Graceful offline mode with background reconnection:

  1. Catch connection errors - ECONNREFUSED, ENOTFOUND, ETIMEDOUT, 5xx responses
  2. Continue in local mode - Claude Code runs normally, just without remote sync
  3. Background reconnection - Silently retries connection with exponential backoff
  4. Single warning - Shows one clear message, not repeated error spam

User Experience

Before:

Error: connect ECONNREFUSED...
[crash]

After:

⚠️ Happy server unreachable - continuing in local mode
[Claude Code runs normally]

Changes

File Change
src/api/api.ts Connection error handling, return null on failure
src/utils/serverConnectionErrors.ts Error classification, handling utilities, and background reconnection
src/claude/runClaude.ts Handle null API response, continue locally
src/codex/runCodex.ts Handle null API response, continue locally

Technical Details

  • OfflineState singleton - Prevents duplicate warnings and reconnection attempts
  • Exponential backoff - 1s → 2s → 4s → 8s... (caps at 5 minutes)
  • Error classification - Distinguishes auth errors (401 - stop retrying) from transient errors (5xx - keep retrying)
  • Graceful recovery - When server comes back, reconnects without user action

Testing

# Test offline mode (stop server or use invalid URL)
HAPPY_SERVER_URL=http://localhost:9999 happy

# Should show warning and continue working
# Run unit tests
bun run test src/utils/serverConnectionErrors.test.ts

Why Merge This

When Happy API server is unreachable, the CLI was crashing with uncaught
exceptions. Now handles connection errors gracefully and continues in offline mode.

Changes:
- api.ts: Add connection error handling (ECONNREFUSED, ENOTFOUND, ETIMEDOUT)
  - getOrCreateSession: Returns null when server unreachable
  - getOrCreateMachine: Returns minimal Machine object when server unreachable
  - Updated return types to reflect null handling
- runClaude.ts, runCodex.ts: Handle null API responses with graceful exit
- Show clear user message: "⚠️ Happy server unreachable - continuing in local mode"
- Add comprehensive unit tests for server error scenarios

This allows users to continue using Happy CLI in local mode even when
the server is temporarily unavailable.
When Happy servers are unreachable, Claude/Codex now continue running
locally instead of exiting. Background reconnection attempts use
exponential backoff (5s-60s delay cap) with unlimited retries.

Previous behavior:
- Server unreachable at startup → process.exit(1)
- User loses their work context

What changed:
- src/utils/offlineReconnection.ts: NEW shared utility with:
  - Exponential backoff using existing exponentialBackoffDelay()
  - Unlimited retries (delay caps at 60s, retries continue forever)
  - Auth failure detection (401 stops retrying)
  - Race condition handling (cancel during async ops)
  - Generic TSession type for backend transparency
- src/utils/offlineReconnection.test.ts: NEW 24 comprehensive tests
- src/claude/runClaude.ts: Offline fallback using claudeLocal() with
  hot reconnection via sessionScanner (syncs all JSONL messages)
- src/codex/runCodex.ts: Offline fallback with session stub that
  swaps to real session on reconnection
- src/api/api.ts: Return null on connection errors for graceful handling
- src/api/api.test.ts: Tests for connection error handling

User experience:
- Startup offline: "⚠️ Happy server unreachable - running Claude locally"
- On reconnect: "✅ Reconnected! Session syncing in background."
- Auth failure: "❌ Authentication failed. Please re-authenticate."
Previous behavior: When server was unreachable, three separate warning
messages would print from different call sites (api.getOrCreateSession,
api.getOrCreateMachine, and runClaude/runCodex), resulting in confusing
output like:
  ⚠️  Happy server unreachable - working in offline mode
  ⚠️  Happy server unreachable - working in offline mode
  ⚠️  Happy server unreachable - running Claude locally

What changed:
- offlineReconnection.ts: Added OfflineState class with simple online/offline
  state machine that prints warning ONCE on first offline transition
- offlineReconnection.ts: Added OfflineFailure type with operation, caller,
  errorCode, and url fields for detailed error context
- offlineReconnection.ts: Added ERROR_DESCRIPTIONS map for human-readable
  error code translations (ECONNREFUSED, ETIMEDOUT, etc.)
- api.ts: Changed console.log() to connectionState.fail() with full context
- runClaude.ts, runCodex.ts: Added connectionState.setBackend() before API
  calls, removed redundant printOfflineWarning() calls
- api.test.ts, offlineReconnection.test.ts: Updated assertions to use
  expect.stringContaining() and added connectionState.reset() in beforeEach

New output format shows consolidated warning with actionable details:
  ⚠️  Happy server unreachable - running Claude locally

  Failed:
  • Session creation: server not accepting connections (ECONNREFUSED) [api.getOrCreateSession]

  → Local work continues normally
  → Will reconnect automatically when server available
…nErrors

Previous behavior:
- offlineReconnection.ts handled all server errors including 403/409
- 403/409 showed "server unreachable" message (semantically wrong - server responded)
- Lost recovery action: no `happy doctor clean` guidance for re-auth conflicts
- Minimal machine object duplicated 3 times (DRY violation)
- ERROR_DESCRIPTIONS not exported (poor discoverability)
- path.test.ts mocked node:os which leaked to sessionScanner tests

What changed:
- src/utils/offlineReconnection.ts → src/utils/serverConnectionErrors.ts
  - Renamed for accurate description (connection errors, not just offline)
  - Export ERROR_DESCRIPTIONS for discoverability
  - Added `details?: string[]` to OfflineFailure for multi-line context
  - Updated module documentation

- src/api/api.ts
  - Extract createMinimalMachine() helper (DRY - 4 call sites)
  - 403/409 uses direct console.log (NOT connectionState) with recovery action:
    "Run 'happy doctor clean' to reset local state"
  - 5xx uses connectionState.fail() with details for auto-reconnect
  - All HTTP error handling in catch block (axios throws on non-2xx)

- src/claude/utils/path.test.ts
  - Remove vi.mock('node:os') that leaked to other tests
  - Use CLAUDE_CONFIG_DIR env var (code already supports it)
  - Cross-platform compatible, works with both npm and bun

- Updated imports in api.test.ts, runClaude.ts, runCodex.ts

Why:
- 403/409 are server rejections, not "server unreachable" - semantic accuracy
- Users need `happy doctor clean` recovery action for re-auth conflicts
- Exported ERROR_DESCRIPTIONS helps developers find error handling code
- File rename improves discoverability: serverConnectionErrors describes content

Testable:
- All 144 tests pass (0 fail)
- HAPPY_SERVER_URL=http://localhost:59999 happy --print "test"
  Shows: "Machine registration failed: ECONNREFUSED - server not accepting connections"
Reverts:
- README.md: restore claude-code-router link (line 39)
- package.json: restore version 0.12.0 (was 0.12.0-0)
- src/index.ts: restore --claude-env parsing (lines 271-284)

Fixes:
- src/api/api.test.ts:7-10: apply vi.hoisted pattern for vitest compatibility

The vi.hoisted() wrapper ensures mock variables are available during
vitest's module hoisting phase, fixing "Cannot access 'mockFn' before
initialization" errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant