Skip to content

[3.5] Local inference service using llama.cpp (GGUF-first) #5

@mikejmorgan-ai

Description

@mikejmorgan-ai

Deploy always-on local inference daemon (llama-server), GGUF-first model format, CPU-only with transparent GPU acceleration, quantization tier support (Q4_K_M, Q5_K_M, Q8_0), model loading/unloading, streaming responses, and systemd integration.

Scope

This epic covers 11 decisions and 10 tasks from the Cortex Linux planning system.

Source

  • Planning Tool: Skilliks
  • Module: See internal planning documentation

Tasks

Tasks will be added as sub-issues or checklist items as specification is refined.


Epic generated from Cortex Linux strategic planning

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0-criticalDay 1 features - MVP blockersepicEpic: major feature area with subtasks

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions