intel
diff --git a/‎README.md‎
Lines changed: 45 additions & 12 deletions b/‎README.md‎
Lines changed: 45 additions & 12 deletions
diff --git a/‎dependency_version.json‎
Lines changed: 1 addition & 1 deletion b/‎dependency_version.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/index.rst‎
Lines changed: 3 additions & 3 deletions b/‎docs/index.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/tutorials/blogs_publications.md‎
Lines changed: 0 additions & 40 deletions b/‎docs/tutorials/blogs_publications.md‎
Lines changed: 0 additions & 40 deletions
diff --git a/‎docs/tutorials/installation.rst‎
Lines changed: 0 additions & 7 deletions b/‎docs/tutorials/installation.rst‎
Lines changed: 0 additions & 7 deletions
diff --git a/‎docs/tutorials/llm.rst‎
Lines changed: 64 additions & 3 deletions b/‎docs/tutorials/llm.rst‎
Lines changed: 64 additions & 3 deletions
diff --git a/‎examples/gpu/llm/README.md‎
Lines changed: 4 additions & 2 deletions b/‎examples/gpu/llm/README.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/Llama2/deepspeed_confg.json‎
Lines changed: 0 additions & 49 deletions b/‎examples/gpu/llm/fine-tuning/Llama2/deepspeed_confg.json‎
Lines changed: 0 additions & 49 deletions
@@ -21,19 +21,52 @@ The extension can be loaded as a Python module for Python programs or linked as
 
 ## Large Language Models (LLMs) Optimization
 
-In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/inference/python/llm) for details.
+In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details.
 
 ### Optimized Model List 
 
-| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Arc™ A-Series Graphics (A770) |
+#### LLM Inference
+
+| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
 |---|:---:|:---:|:---:|:---:|:---:|
-|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" | ✅ | ✅|✅ | ✅|
-|GPT-J| "EleutherAI/gpt-j-6b" | ✅ | ✅ |✅ | ✅|
-|Qwen|"Qwen/Qwen-7B"| ✅ | ✅ |✅ | ✅|
-|OPT|"facebook/opt-6.7b", "facebook/opt-30b"| ✅ | ❎ |✅ | ❎ |
-|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| ✅ | ❎ |✅ | ❎ |
-|ChatGLM3-6B|"THUDM/chatglm3-6b"| ✅ | ❎ |✅ | ❎ |
-|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"| ✅ | ❎ |✅ | ❎ |
+|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |🟩| 🟩|🟩|🟩|
+|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |🟩| 🟩|🟩|🟩|
+|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct" |🟩| 🟩|🟩|🟩|
+|GPT-J| "EleutherAI/gpt-j-6b" | 🟩 | 🟩 |🟩 | 🟩|
+|Qwen|"Qwen/Qwen-7B"|🟩 | 🟩 |🟩 | 🟩|
+|OPT|"facebook/opt-6.7b", "facebook/opt-30b"| 🟩 | 🟥 |🟩 | 🟥 |
+|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| 🟩 | 🟥 |🟩 | 🟥 |
+|ChatGLM3-6B|"THUDM/chatglm3-6b"| 🟩 | 🟥 |🟩 | 🟥 |
+|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"| 🟩 | 🟥 |🟩 | 🟥 |
+
+| Benchmark mode | FP16 | Weight only quantization INT4 |
+|---|:---:|:---:|
+|Single instance | 🟩 | 🟩 |
+| Distributed (autotp) |  🟩 | 🟥 |
+
+#### LLM fine-tuning
+
+ **Note**: 
+ Intel® Data Center Max 1550 GPU: support all the models in the model list above. Intel® Core™ Ultra Processors with Intel® Arc™ Graphics: support Llama 2 7B, Llama 3 8B and Phi-3-Mini 3.8B.
+
+| MODEL FAMILY | Verified < MODEL ID > (Hugging Face hub)| Mixed Precision (BF16+FP32) | Full fine-tuning | LoRA | Intel® Data Center Max 1550 GPU | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
+|---|:---:|:---:|:---:|:---:|:---:|:---:|
+|Llama 2 7B| "meta-llama/Llama-2-7b-hf" | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
+|Llama 2 70B| "meta-llama/Llama-2-70b-hf" | 🟩 | 🟥 |🟩 | 🟩 | 🟥 |
+|Llama 3 8B| "meta-llama/Meta-Llama-3-8B" | 🟩 | 🟩 |🟩 | 🟩 | 🟩 |
+|Qwen 7B|"Qwen/Qwen-7B"| 🟩 | 🟩 |🟩 | 🟩| 🟥 |
+|Phi-3-mini 3.8B|"Phi-3-mini-4k-instruct"| 🟩 | 🟩 |🟩 | 🟥 | 🟩 |
+
+
+
+| Benchmark mode | Full fine-tuning | LoRA |
+|---|:---:|:---:|
+|Single-GPU | 🟥 | 🟩 |
+|Multi-GPU (FSDP) |  🟩 | 🟩 |
+
+- 🟩 signifies that it is supported.
+
+- 🟥 signifies that it is not supported yet.
 
 
 ## Installation
@@ -60,10 +93,9 @@ Compilation instruction of the latest CPU code base `main` branch can be found i
 You can install Intel® Extension for PyTorch\* for GPU via command below.
 
 ```bash
-python -m pip install torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ 
+python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 # for PRC user, you can check with the following link
-python -m pip install torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu  --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
-
+python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
 ```
 
 **Note:** The patched PyTorch 2.1.0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
@@ -126,3 +158,4 @@ for information on how to report a potential security issue or vulnerability.
 See also: [Security Policy](SECURITY.md)
 
 
+
@@ -20,7 +20,7 @@
   },
   "torch-ccl": {
     "version": "2.3.100+xpu",
-    "commit": "master"
+    "commit": "v2.3.100+xpu"
   },
   "basekit": {
     "dpcpp-cpp-rt": {
 
@@ -58,20 +58,20 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
 
    tutorials/introduction
    tutorials/features
-   Large Language Models (LLM)<tutorials/llm>
+   Large Language Models (LLM) <tutorials/llm>
    tutorials/performance
    tutorials/technical_details
    tutorials/releases
    tutorials/known_issues
-   tutorials/blogs_publications
+   Blogs & Publications <https://intel.github.io/intel-extension-for-pytorch/blogs.html>
    tutorials/license
 
 .. toctree::
    :maxdepth: 3
    :caption: GET STARTED
    :hidden:
 
-   tutorials/installation
+   Installation <https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.3.110%2bxpu>
    tutorials/getting_started
    tutorials/examples
 
 
@@ -13,8 +13,11 @@ These LLM-specific optimizations can be automatically applied with a single fron
 
    llm/llm_optimize_transformers
 
-Optimized Models
-----------------
+Optimized Models List
+---------------------
+
+LLM Inference
+~~~~~~~~~~~~~
 
 .. list-table::
    :widths: auto
@@ -28,6 +31,14 @@ Optimized Models
      - "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf"
      - ✅
      - ✅
+   * - Llama3
+     - "meta-llama/Meta-Llama-3-8B"
+     - ✅
+     - ✅
+   * - Phi-3 mini
+     - "microsoft/Phi-3-mini-128k-instruct"
+     - ✅
+     - ✅
    * - GPT-J
      - "EleutherAI/gpt-j-6b"
      - ✅
@@ -56,7 +67,57 @@ Optimized Models
 
 *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp16). For other LLMs families, we are working in progress to cover those optimizations, which will expand the model list above.
 
-Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.1.30/examples/gpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts..
+LLM fine-tuning
+~~~~~~~~~~~~~~~
+
+.. list-table::
+   :widths: auto
+   :header-rows: 1
+
+   * - Model Family
+     - Verified < MODEL ID > (Huggingface hub)
+     - Mixed Precision (BF16+FP32)
+     - Full fine-tuning
+     - LoRA
+     - Intel® Data Center Max 1550 GPU
+     - Intel® Core™ Ultra Processors with Intel® Arc™ Graphics
+   * - Llama2
+     - "meta-llama/Llama-2-7b-hf"
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+   * - Llama2
+     - "meta-llama/Llama-2-70b-hf",
+     - ✅
+     - ❎
+     - ✅
+     - ✅
+     - ❎
+   * - Llama3
+     - "meta-llama/Meta-Llama-3-8B"
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+   * - Qwen
+     - "Qwen/Qwen-7B"
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ❎
+   * - Phi-3-mini 3.8B
+     - "Phi-3-mini-4k-instruct"
+     - ✅
+     - ✅
+     - ✅
+     - ❎
+     - ✅
+
+Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.3.110/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
 
 Optimization Methodologies
 --------------------------
 
@@ -100,15 +100,17 @@ conda activate llm
 # Setup the environment with the provided script
 cd examples/gpu/llm
 # If you want to install Intel® Extension for PyTorch\* from source, use the commands below:
-bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <MPI_ROOT> <PTI_ROOT> <AOT>
 # e.g. bash ./tools/env_setup.sh 0x03 /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest /opt/intel/oneapi/ccl/latest /opt/intel/oneapi/mpi/latest /opt/intel/oneapi/pti/latest pvc
+bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <MPI_ROOT> <PTI_ROOT> <AOT>
+
 conda deactivate
 conda activate llm
 source ./tools/env_activate.sh [inference|fine-tuning]
 ```
 
 where <br />
-- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. Check [tutorial](../../../../../docs/tutorials/technical_details/AOT.md) for details.<br />
+- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. For example 'pvc,ats-m150' for the Platform Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series Graphics (A770). Check [tutorial](../../../docs/tutorials/technical_details/AOT.md) for details.<br />
+
 
 <br />