Skip to content

Conversation

@hanakannzashi
Copy link
Contributor

@hanakannzashi hanakannzashi commented Jan 8, 2026

Fix #309


Note

Introduces TEE/KMS-backed signing keys and async initialization for attestation.

  • Replaces AttestationService::new with async init that loads VPC info and derives ed25519/secp256k1 keys from dstack; falls back to ephemeral keys only in DEV and errors otherwise
  • Adds key-path constants, key-derivation/validation helpers, and logs signing addresses
  • Updates crates/api/src/lib.rs to await AttestationService::init
  • Simplifies nonce generation in get_attestation_report

Written by Cursor Bugbot for commit d6880c8. This will update automatically on new commits. Configure here.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hanakannzashi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the AttestationService's key management strategy. It introduces a robust mechanism to derive cryptographic signing keys (ED25519 and ECDSA) from a dstack environment, ensuring that the service's signing addresses remain stable across deployments. This is achieved through the integration of HKDF for secure key derivation, with a fallback to ephemeral key generation for non-TEE or development environments. The changes enhance the security and consistency of attestation reports by providing a reliable source for cryptographic keys.

Highlights

  • Key Derivation from dstack/KMS: The system now attempts to derive stable ED25519 and ECDSA signing keys from a 'dstack' (Trusted Execution Environment/Key Management System) using HKDF, ensuring consistent signing addresses across service instances.
  • Fallback to Ephemeral Keys: If key derivation from 'dstack' fails (e.g., in non-TEE environments), the service gracefully falls back to generating ephemeral, cryptographically random signing keys.
  • HKDF Integration: The HMAC-based Key Derivation Function (HKDF) with SHA256 is introduced to securely derive distinct ED25519 and ECDSA keys from a single initial key material provided by 'dstack', using 'app_id' as salt and specific info strings.
  • ECDSA Key Derivation Robustness: A retry mechanism is implemented for ECDSA key derivation, allowing up to MAX_ECDSA_DERIVATION_ATTEMPTS to find a valid private key scalar, which is important given the probabilistic nature of deriving valid elliptic curve private keys.
  • Service Initialization Refactor: The AttestationService initialization method has been refactored to be async (init instead of new) to accommodate the asynchronous nature of interacting with dstack.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Jan 8, 2026

Code Review - PR #346

I've reviewed this PR which refactors the AttestationService to derive cryptographic keys from dstack/KMS. Here are my findings:

⚠️ CRITICAL ISSUES

1. Security: Sensitive Key Material Not Zeroed After Use

Location: crates/services/src/attestation/mod.rs:164, 192

The derived key material (ed_secret, ecdsa_seed) contains sensitive cryptographic secrets but is never explicitly zeroed after use. While Rust will drop these values, best practice for cryptographic key material is to explicitly zero memory.

Risk: If a memory dump or crash occurs, these secrets could be recovered.

Fix: Use zeroize crate or explicitly zero arrays after key derivation:

use zeroize::Zeroize;

let mut ed_secret: [u8; 32] = [0u8; 32];
hk.expand(GATEWAY_KEY_INFO_ED25519, &mut ed_secret)?;
let ed25519_signing_key = SigningKey::from_bytes(&ed_secret);
ed_secret.zeroize(); // Explicit zeroing

// Similarly for ecdsa_seed
let mut ecdsa_seed: [u8; 32] = [0u8; 32];
hk.expand(&normalized_info, &mut ecdsa_seed)?;
if let Ok(sk) = EcdsaSigningKey::from_bytes(&ecdsa_seed.into()) {
    ecdsa_seed.zeroize(); // Zero before breaking
    ecdsa_signing_key_opt = Some(sk);
    break;
}
ecdsa_seed.zeroize(); // Zero after failed attempt

2. Security: KMS Key Material Logged on Error

Location: crates/services/src/attestation/mod.rs:155

The error message "failed to decode dstack derived key hex: {e}" could potentially leak key material if the error contains the invalid hex string.

Fix: Avoid including the error details that might contain key fragments:

let key_bytes = key_resp.decode_key().map_err(|_| {
    AttestationError::InternalError("failed to decode dstack derived key".to_string())
})?;

3. Production Safety: Missing Timeout on KMS Operations

Location: crates/services/src/attestation/mod.rs:139-152

The dstack client calls (client.info(), client.get_key()) have no timeout, which could block server startup indefinitely during KMS issues or network problems in production.

Risk: During rolling deployments or KMS outages, new instances may hang during initialization, preventing successful deployment.

Fix: Add timeout with fallback:

use tokio::time::{timeout, Duration};

let client = dstack_client::DstackClient::new(None);
let info = timeout(Duration::from_secs(10), client.info())
    .await
    .map_err(|_| AttestationError::InternalError("dstack info request timed out".to_string()))?
    .map_err(|e| AttestationError::InternalError(format\!("failed to get dstack info: {e:?}")))?;

4. Logic: Unnecessary Loop Iteration After Success

Location: crates/services/src/attestation/mod.rs:178-201

When an ECDSA key is successfully derived, the loop breaks but attempts counter doesn't reflect the actual number (it's always ctr + 1 calculated outside the loop).

Fix: Move attempts calculation to only when needed:

let mut ecdsa_signing_key_opt: Option<EcdsaSigningKey> = None;

for ctr in 0u16..=MAX_ECDSA_DERIVATION_ATTEMPTS {
    // ... derivation logic ...
    if let Ok(sk) = EcdsaSigningKey::from_bytes(&ecdsa_seed.into()) {
        if ctr > 0 {
            tracing::warn\!(
                "ECDSA key derivation required {} attempts (unusual but acceptable)",
                ctr + 1
            );
        }
        ecdsa_signing_key_opt = Some(sk);
        break;
    }
}

let ecdsa_signing_key = ecdsa_signing_key_opt.ok_or_else(|| {
    let attempts = MAX_ECDSA_DERIVATION_ATTEMPTS + 1;
    AttestationError::InternalError(format\!(
        "Failed to derive a valid secp256k1 signing key from dstack key material after {} attempts.",
        attempts
    ))
})?;

Minor Issues:

  1. Code Quality: The key_bytes from dstack should also be zeroed after HKDF initialization (line 161) to minimize exposure window.

  2. Performance: Consider caching the dstack client instance if multiple calls are made, though this is a startup-time concern only.


Summary

The PR implements a robust key derivation mechanism with good fallback logic. However, the security issues around sensitive key material handling are critical and should be addressed before merge, especially since this code runs in a TEE environment where memory security is paramount.

Status: ⚠️ Critical issues found - recommend fixing security concerns before merge.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant security and stability improvement by refactoring the AttestationService to derive signing keys from a Key Management Service (dstack) instead of generating ephemeral keys on startup. This ensures key stability across service instances. The implementation correctly uses HKDF for key derivation and includes a sensible fallback to ephemeral keys if derivation fails. The code is well-structured, includes appropriate logging, and the changes are sound. I have one minor suggestion to improve code conciseness.

Comment on lines 182 to 190
let normalized_info: Vec<u8> = if ctr == 0 {
GATEWAY_KEY_INFO_ECDSA.to_vec()
} else {
// Include counter in info to get different output for each retry attempt
let mut info = Vec::with_capacity(GATEWAY_KEY_INFO_ECDSA.len() + 2);
info.extend_from_slice(GATEWAY_KEY_INFO_ECDSA);
info.extend_from_slice(&ctr.to_be_bytes());
info
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability and conciseness, the creation of normalized_info can be simplified. The else block can be replaced with a call to concat(), which achieves the same result with less code while maintaining similar performance characteristics for this non-hot path.

            let normalized_info: Vec<u8> = if ctr == 0 {
                GATEWAY_KEY_INFO_ECDSA.to_vec()
            } else {
                // Include counter in info to get different output for each retry attempt
                [GATEWAY_KEY_INFO_ECDSA, &ctr.to_be_bytes()].concat()
            };

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the AttestationService to derive cryptographic signing keys from dstack KMS instead of generating ephemeral keys on each service startup. This ensures stable signing addresses across multiple instances of the same application running in TEE environments.

Key changes:

  • Changed AttestationService::new() to AttestationService::init() (async) to support key derivation
  • Implemented HKDF-based key derivation from dstack KMS for both Ed25519 and ECDSA signing keys
  • Added graceful fallback to ephemeral keys in DEV mode when dstack is unavailable
  • Refactored nonce generation code to use unwrap_or_else pattern for better readability

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
crates/services/src/attestation/mod.rs Core refactoring: added init() method with dstack key derivation, HKDF-based key generation, and fallback logic. Includes helper methods for key derivation and ephemeral key generation. Minor refactoring of nonce generation using unwrap_or_else.
crates/services/Cargo.toml Added hkdf = "0.12" dependency for secure key derivation
crates/api/src/lib.rs Updated call site to use async init() method instead of new()
Cargo.lock Updated lockfile with hkdf dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@think-in-universe think-in-universe changed the title refactor: derive key from kms feat: derive key from kms Jan 8, 2026
@think-in-universe think-in-universe changed the title feat: derive key from kms feat: derive signing key from kms Jan 8, 2026
@think-in-universe
Copy link
Contributor

@claude review

@claude
Copy link

claude bot commented Jan 9, 2026

Code Review

Critical Issues Found

1. Production Safety: Unwrap in Service Initialization (CRITICAL)

Location: crates/api/src/lib.rs:294-302

The .unwrap() will panic and crash the server during startup if init() returns an error in production (non-DEV) environments when dstack is unavailable. This violates the production safety requirement for multi-cluster deployments.

Impact: Complete service outage during rolling updates or infrastructure issues.

Fix: Properly propagate the error instead of unwrapping. The init_domain_services_with_pool function should return Result and propagate this error up the call chain.


2. Cryptographic Security: Ed25519 All-Zeros Key Check is Insufficient

Location: crates/services/src/attestation/mod.rs:145-150

Ed25519 has other weak keys beyond all-zeros. The check only validates against one failure mode. While HKDF-SHA256 output should be cryptographically random, relying solely on an all-zeros check is insufficient.

Recommendation: Either remove the check entirely and trust HKDF cryptographic properties (preferred), or add comprehensive validation using ed25519-dalek key validation APIs.


3. Code Quality: Inconsistent Counter Initialization

Location: crates/services/src/attestation/mod.rs

Both ed25519 and ECDSA derivation loops always include counter=0 in the first derivation. Clarification needed: Should the first derivation attempt use an empty counter (for reproducibility) and only add counters for retries?


4. Performance: Unnecessary Allocation in Derivation Loop

Location: crates/services/src/attestation/mod.rs:140-145, 170-175

Allocates a new Vec on every iteration. Fix: Reuse the buffer by declaring it before the loop and using clear() inside.


Strengths

  1. Security: Proper use of HKDF-SHA256 with app_id as salt
  2. Reliability: Stable signing addresses across TEE instances
  3. Error Handling: Good fallback to ephemeral keys in DEV mode
  4. Code Structure: Clean separation of concerns

Summary

MUST FIX before merge: Issue 1 (unwrap removal)

SHOULD FIX: Issues 2, 3, 4

⚠️ Issues Found - Blocking Merge

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@think-in-universe
Copy link
Contributor

think-in-universe commented Jan 9, 2026

We can switch to the build-in get_key() function with algo argument (ecdsa or ed25519) after dstack v0.5.6 is released:

https://github.com/Dstack-TEE/dstack/blob/66beec6c05d89fb0f6126cc08a6ffc4d1ee9d0ca/guest-agent/src/rpc_service.rs#L231-L254

return Err(AttestationError::InternalError(format!(
"Failed to derive signing keys from dstack: {}. \
Ensure this service runs in a CVM/TEE with dstack available.",
e
Copy link
Contributor

@think-in-universe think-in-universe Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine to keep the errors in response and logs because the signing keys failed to generate. The error details is helpful to understand why the keys fail to create.

@think-in-universe think-in-universe deployed to Cloud API test env January 10, 2026 02:46 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: multiple cloud-api CVM instances should share the same signing address derived from KMS

3 participants