You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update llm table in readme
* update llm inference readme
* update llm finetune/inference readme
* update llm readme & docs/tutorials/llm.rst
* remove installation.rst & blog publication
* update vision and audio to 0.18.1 and 2.3.1
* remove dependency_version.yml
* update the installation link
* update llm inference README for accuracy & phi3-mini beam
* add token-latency for phi-3
* change client gpu to MTL-H
* remove comments in the script
* use specific commit for itrex
* add wandb
* remove useless scripts
* set inc to v3.0
* update llm dependencies version
* update torch-ccl tag
* update link to release rather than xpu-main
Copy file name to clipboardExpand all lines: README.md
+45-12Lines changed: 45 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,19 +21,52 @@ The extension can be loaded as a Python module for Python programs or linked as
21
21
22
22
## Large Language Models (LLMs) Optimization
23
23
24
-
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/inference/python/llm) for details.
24
+
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details.
25
25
26
26
### Optimized Model List
27
27
28
-
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Arc™ A-Series Graphics (A770) |
28
+
#### LLM Inference
29
+
30
+
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
Intel® Data Center Max 1550 GPU: support all the models in the model list above. Intel® Core™ Ultra Processors with Intel® Arc™ Graphics: support Llama 2 7B, Llama 3 8B and Phi-3-Mini 3.8B.
51
+
52
+
| MODEL FAMILY | Verified < MODEL ID > (Hugging Face hub)| Mixed Precision (BF16+FP32) | Full fine-tuning | LoRA | Intel® Data Center Max 1550 GPU | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp16). For other LLMs families, we are working in progress to cover those optimizations, which will expand the model list above.
58
69
59
-
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.1.30/examples/gpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts..
70
+
LLM fine-tuning
71
+
~~~~~~~~~~~~~~~
72
+
73
+
.. list-table::
74
+
:widths: auto
75
+
:header-rows: 1
76
+
77
+
* - Model Family
78
+
- Verified < MODEL ID > (Huggingface hub)
79
+
- Mixed Precision (BF16+FP32)
80
+
- Full fine-tuning
81
+
- LoRA
82
+
- Intel® Data Center Max 1550 GPU
83
+
- Intel® Core™ Ultra Processors with Intel® Arc™ Graphics
84
+
* - Llama2
85
+
- "meta-llama/Llama-2-7b-hf"
86
+
- ✅
87
+
- ✅
88
+
- ✅
89
+
- ✅
90
+
- ✅
91
+
* - Llama2
92
+
- "meta-llama/Llama-2-70b-hf",
93
+
- ✅
94
+
- ❎
95
+
- ✅
96
+
- ✅
97
+
- ❎
98
+
* - Llama3
99
+
- "meta-llama/Meta-Llama-3-8B"
100
+
- ✅
101
+
- ✅
102
+
- ✅
103
+
- ✅
104
+
- ✅
105
+
* - Qwen
106
+
- "Qwen/Qwen-7B"
107
+
- ✅
108
+
- ✅
109
+
- ✅
110
+
- ✅
111
+
- ❎
112
+
* - Phi-3-mini 3.8B
113
+
- "Phi-3-mini-4k-instruct"
114
+
- ✅
115
+
- ✅
116
+
- ✅
117
+
- ❎
118
+
- ✅
119
+
120
+
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.3.110/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
-`AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. Check [tutorial](../../../../../docs/tutorials/technical_details/AOT.md) for details.<br />
112
+
-`AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. For example 'pvc,ats-m150' for the Platform Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series Graphics (A770). Check [tutorial](../../../docs/tutorials/technical_details/AOT.md) for details.<br />
0 commit comments