You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
If a GPU is not listed above, it is important to note that TensorRT-LLM is
@@ -254,14 +254,18 @@ The list of supported models is:
254
254
*[LLaMA-v2](examples/llama)
255
255
*[Mistral](examples/llama)
256
256
*[MPT](examples/mpt)
257
+
*[mT5](examples/enc_dec)
257
258
*[OPT](examples/opt)
258
259
*[Qwen](examples/qwen)
259
260
*[Replit Code](examples/mpt)
260
261
*[SantaCoder](examples/gpt)
261
262
*[StarCoder](examples/gpt)
262
263
*[T5](examples/enc_dec)
263
264
264
-
Note: [Encoder-Decoder](examples/enc_dec/) provides general encoder-decoder support that contains many encoder-decoder models such as T5, Flan-T5, etc. We unroll the exact model names in the list above to let users find specific models easier.
265
+
Note: [Encoder-Decoder](examples/enc_dec/) provides general encoder-decoder
266
+
support that contains many encoder-decoder models such as T5, Flan-T5, etc. We
267
+
unroll the exact model names in the list above to let users find specific
268
+
models easier.
265
269
266
270
## Performance
267
271
@@ -325,7 +329,11 @@ enable plugins, for example: `--use_gpt_attention_plugin`.
325
329
326
330
* MPI + Slurm
327
331
328
-
TensorRT-LLM is a [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface)-aware package that uses [`mpi4py`](https://mpi4py.readthedocs.io/en/stable/). If you are running scripts in a [Slurm](https://slurm.schedmd.com/) environment, you might encounter interferences:
As a rule of thumb, if you are running TensorRT-LLM interactively on a Slurm node, prefix your commands with `mpirun -n 1` to run TensorRT-LLM in a dedicated MPI environment, not the one provided by your Slurm allocation.
358
+
As a rule of thumb, if you are running TensorRT-LLM interactively on a Slurm
359
+
node, prefix your commands with `mpirun -n 1` to run TensorRT-LLM in a
360
+
dedicated MPI environment, not the one provided by your Slurm allocation.
361
+
351
362
For example: `mpirun -n 1 python3 examples/gpt/build.py ...`
352
363
353
364
## Release notes
354
365
355
-
* TensorRT-LLM requires TensorRT 9.1.0.4 and 23.08 containers.
366
+
* TensorRT-LLM requires TensorRT 9.2 and 23.10 containers.
356
367
357
368
### Change Log
358
369
370
+
#### Versions 0.6.0 / 0.6.1
371
+
372
+
* Models
373
+
* ChatGLM3
374
+
* InternLM (contributed by @wangruohui)
375
+
* Mistral 7B (developed in collaboration with Mistral.AI)
376
+
* MQA/GQA support to MPT (and GPT) models (contributed by @bheilbrun)
377
+
* Qwen (contributed by @Tlntin and @zhaohb)
378
+
* Replit Code V-1.5 3B (external contribution)
379
+
* T5, mT5, Flan-T5 (Python runtime only)
380
+
381
+
* Features
382
+
* Add runtime statistics related to active requests and KV cache
383
+
utilization from the batch manager (see
384
+
the [batch manager](docs/source/batch_manager.md) documentation)
385
+
* Add `sequence_length` tensor to support proper lengths in beam-search
0 commit comments