[3.5] Local inference service using llama.cpp (GGUF-first)

Deploy always-on local inference daemon (llama-server), GGUF-first model format, CPU-only with transparent GPU acceleration, quantization tier support (Q4_K_M, Q5_K_M, Q8_0), model loading/unloading, streaming responses, and systemd integration.

## Scope

This epic covers **11 decisions** and **10 tasks** from the Cortex Linux planning system.

## Source

- **Planning Tool:** [Skilliks](https://planning.skilliks.com)
- **Module:** See internal planning documentation

## Tasks

_Tasks will be added as sub-issues or checklist items as specification is refined._

---
*Epic generated from Cortex Linux strategic planning*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[3.5] Local inference service using llama.cpp (GGUF-first) #5

Scope

Source

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[3.5] Local inference service using llama.cpp (GGUF-first) #5

Description

Scope

Source

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions