feat: support mmap for model loading #1059

wbruna · 2025-12-06T19:43:54Z

Introduces a new --use-mmap flag that replaces model loading I/O operations with mmap + memcpy.

In my tests, this helps model loading speed slightly, though the gain was never higher than half a second. Its primary benefit right now is validation of the mmap backend implementation. Later, I plan to extend this to allow the mapped file to serve directly as weight storage for backends that use main memory.

I used a non-default flag to be extra safe, but we could arguably follow llama.cpp approach, with a --no-mmap flag to disable it instead.

I was only able to test (and build...) it under Linux, so additional testing is very welcome 🙂

wbruna added 2 commits December 6, 2025 16:10

feat: support mmap for model loading

f985f4b

fix a few obvious Windows build errors

db1592e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support mmap for model loading #1059

feat: support mmap for model loading #1059

wbruna commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support mmap for model loading #1059

Are you sure you want to change the base?

feat: support mmap for model loading #1059

Conversation

wbruna commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant