feat: Add Inference Device and SEV GPU Attestation Support #580

jax-cn · 2025-12-08T03:34:16Z

This PR introduces a new Inference Device (dev_inference) that provides an OpenAI-compatible API for local LLM inference. It also adds a SEV GPU Device (dev_sev_gpu) to support NVIDIA GPU TEE attestation, ensuring secure and verifiable inference workloads.

To support these features, the core HTTP handling logic has been extended to support Server-Sent Events (SSE) for streaming responses.

Key Changes

🚀 New Features

Inference Device (dev_inference.erl):
- Implements an OpenAI-compatible API (e.g., /v1/chat/completions).
- Manages the lifecycle of a local Python-based deterministic inference server.
- Supports streaming responses via SSE.
GPU Attestation (dev_sev_gpu.erl, dev_sev_gpu):
- Added support for generating and verifying NVIDIA SEV-SNP attestations.
- Includes native Python scripts for interacting with the TEE environment.

🛠 Core Modifications

HTTP Streaming (hb_http.erl):
- Updated reply/5 to handle stream_generator, enabling real-time token streaming for LLM responses.
- Added proper CORS and header handling for event streams.
Configuration (hb_opts.erl):
- Registered inference@1.0 and sev_gpu@1.0 devices.
- Added default routing for /v1/.* to the local inference server.
- Added inference_opts for model configuration (hash, name, size).
Storage (hb_store_lmdb.erl):
- Exposed max_readers configuration to optimize LMDB for high-concurrency read scenarios.

🧹 Maintenance

Lifecycle Management: Updated hb_app:stop/1 to ensure the inference server is gracefully shut down.
Build System: Updated rebar.config with new profiles and hooks for setting up the inference and GPU environments.

Testing

Run: Start the node with the inference profile: HB_PRINT=inference rebar3 as inference shell.
Verify:
2.1 Check /health endpoint.

curl --request GET \
  --url 'http://localhost:8734/~inference@1.0/health'

2.2 Test a completion request to /v1/chat/completions (both streaming and non-streaming).

curl --request POST \
  --url 'http://localhost:8734/~inference@1.0/chat/completions' \
  --header 'content-type: application/json' \
  --data '{
  "model": "qwen/qwen2.5-0.5b-instruct",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}
'

2.3 Verify TEE attestation if running on supported hardware.

…y logging

… update related configurations

…t process monitoring

… device metadata

…r routes

…ndling

commit 97e92aa Author: jax <jax@apus.network> Date: Thu Jul 24 03:18:30 2025 +0000 optimize code and add files for TC commit 9a2c4dd Author: jax <jax@apus.network> Date: Wed Jul 23 13:21:16 2025 +0000 add more comments commit 626d356 Author: jax <jax@apus.network> Date: Wed Jul 23 11:40:24 2025 +0000 add dev_sev_gpu for gpu attestaion gernate

… command execution

…v_gpu module

…st processing

…g and cleanup

… management from inference module

…request body reading

…rver

…verification via the `nvat` SDK.

…`dev_sev_gpu`

…ing to error level.

…on_ctx` for a streamlined process, include EAT data in the result, and correct a Makefile build flag.

…tion from evidence JSON.

…tion result field from `valid` to `verified`, and update Erlang tests for stricter NIF error handling.

… the response's attestation field.

Jax added 30 commits November 25, 2025 21:37

fix: add .claude and .specify to .gitignore

2c5946b

feat: add SGLang inference server and OpenAI proxy integration

38f565b

fix: increase SGLang server readiness timeout and enhance OpenAI prox…

12a3625

…y logging

feat: replace SGLang inference with deterministic-inference setup and…

1458b95

… update related configurations

feat: add launch script for deterministic-inference server with paren…

8b27a9f

…t process monitoring

fix: add CLAUDE.md to .gitignore and ensure .specify is ignored

e98404c

feat: enhance completions and chat functions to support chat-mode and…

9585a84

… device metadata

feat: update inference request handling and add local inference serve…

47ebdc9

…r routes

fix: improve health check URL construction in inference server

9838e1b

feat: enhance health check mechanism and improve inference request ha…

70ea091

…ndling

feat: add support for Bearer token handling in generate function

e86df25

fix: correct variable assignment in extract_inference_params function

fd5e148

fix: update health check response to use 'status' key for server health

0556c44

feat: enhance inference response handling with attestation support

5cb20d0

refactor: remove unused HTTP info response function and improve shell…

8a7accd

… command execution

refactor: streamline code structure and improve readability in dev_se…

8473d51

…v_gpu module

feat: enhance inference request handling with improved attestation logic

439c3e8

refactor: simplify CORS header handling in add_cors_headers function

a76e115

feat: enhance inference handling with improved health check and reque…

742424a

…st processing

refactor: update launch-monitored script to improve process monitorin…

61b7d17

…g and cleanup

refactor: remove launch-monitored script and related server lifecycle…

883a89a

… management from inference module

fix: adjust read_body function to set maximum length and timeout for …

25da79e

…request body reading

feat: add configurable maximum readers for LMDB environment

1167cf0

feat: add default configuration for sev_gpu module in default_message

11ed83b

feat: add streaming support to inference and HTTP reply handling

81cb161

feat: implement stream commitment handling in maybe_sign function

406845d

feat: add stop function and enhance inference request handling

23222e7

feat: enhance documentation and improve stop function in inference se…

201fd65

…rver

Merge branch 'edge' into PR/dcv_inference

59fe7dc

Jax added 11 commits December 10, 2025 11:00

feat: add NVIDIA GPU TEE Attestation NIF for evidence collection and …

686749f

…verification via the `nvat` SDK.

fix: use raw status in stream replies and add unistd.h include

f1c9688

refactor: rename NIF module initialization from dev_sev_gpu_nif to …

3999951

…`dev_sev_gpu`

feat: Add configurable debug logging for nvat SDK in GPU NIF, default…

7a83d65

…ing to error level.

refactor: Reimplement GPU evidence verification using `nvat_attestati…

23ffc10

…on_ctx` for a streamlined process, include EAT data in the result, and correct a Makefile build flag.

refactor: Add logger name to nvat_logger_spdlog_create calls.

b2b63fb

refactor: migrate GPU attestation logic to C NIF, adding nonce extrac…

4f15237

…tion from evidence JSON.

refactor: c to c++

1b02b51

refactor: Remove nonce from generated evidence JSON, rename attesta…

47e3147

…tion result field from `valid` to `verified`, and update Erlang tests for stricter NIF error handling.

feat: Include full decoded attestation result, raw data, and nonce in…

a333612

… the response's attestation field.

feat: Update default inference model name to google/gemma-3-27b-it

348187f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Inference Device and SEV GPU Attestation Support #580

feat: Add Inference Device and SEV GPU Attestation Support #580

Uh oh!

jax-cn commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Inference Device and SEV GPU Attestation Support #580

Are you sure you want to change the base?

feat: Add Inference Device and SEV GPU Attestation Support #580

Uh oh!

Conversation

jax-cn commented Dec 8, 2025

Key Changes

🚀 New Features

🛠 Core Modifications

🧹 Maintenance

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant