You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: model instances inherit both command line arguments and environment variables from the router server.
1415
1420
1421
+
Alternatively, you can also add GGUF based preset (see next section)
1422
+
1423
+
### Model presets
1424
+
1425
+
Model presets allow advanced users to define custom configurations using an `.ini` file:
1426
+
1427
+
```sh
1428
+
llama-server --models-preset ./my-models.ini
1429
+
```
1430
+
1431
+
Each section in the file defines a new preset. Keys within a section correspond to command-line arguments (without leading dashes). For example, the argument `--n-gpu-layer 123` is written as `n-gpu-layer = 123`.
1432
+
1433
+
Short argument forms (e.g., `c`, `ngl`) and environment variable names (e.g., `LLAMA_ARG_N_GPU_LAYERS`) are also supported as keys.
1434
+
1435
+
Example:
1436
+
1437
+
```ini
1438
+
version = 1
1439
+
1440
+
; If the key corresponds to an existing model on the server,
1441
+
; this will be used as the default config for that model
1442
+
[ggml-org/MY-MODEL-GGUF:Q8_0]
1443
+
; string value
1444
+
chat-template = chatml
1445
+
; numeric value
1446
+
n-gpu-layer = 123
1447
+
; boolean value
1448
+
jinja = false
1449
+
; shorthand argument (for example, context size)
1450
+
c = 4096
1451
+
; environment variable name
1452
+
LLAMA_ARG_CACHE_RAM = 0
1453
+
; file paths are relative to server's CWD
1454
+
model-draft = ./my-models/draft.gguf
1455
+
; but it's RECOMMENDED to use absolute path
1456
+
model-draft = /Users/abc/my-models/draft.gguf
1457
+
1458
+
; If the key does NOT correspond to an existing model,
1459
+
; you need to specify at least the model path
1460
+
[custom_model]
1461
+
model = /Users/abc/my-awesome-model-Q4_K_M.gguf
1462
+
```
1463
+
1464
+
Note: some arguments are controlled by router (e.g., host, port, API key, HF repo, model alias). They will be removed or overwritten upload loading.
1465
+
1416
1466
### Routing requests
1417
1467
1418
1468
Requests are routed according to the requested model name.
0 commit comments