intel
diff --git a/‎docs/tutorials/features/torch_compile_gpu.md‎
Lines changed: 37 additions & 9 deletions b/‎docs/tutorials/features/torch_compile_gpu.md‎
Lines changed: 37 additions & 9 deletions
diff --git a/‎docs/tutorials/getting_started.md‎
Lines changed: 0 additions & 9 deletions b/‎docs/tutorials/getting_started.md‎
Lines changed: 0 additions & 9 deletions
diff --git a/‎docs/tutorials/known_issues.md‎
Lines changed: 34 additions & 9 deletions b/‎docs/tutorials/known_issues.md‎
Lines changed: 34 additions & 9 deletions
diff --git a/‎examples/gpu/inference/README.md‎
Lines changed: 25 additions & 0 deletions b/‎examples/gpu/inference/README.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎examples/gpu/llm/README.md‎
Lines changed: 58 additions & 9 deletions b/‎examples/gpu/llm/README.md‎
Lines changed: 58 additions & 9 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/Llama2/README.md‎
Lines changed: 0 additions & 2 deletions b/‎examples/gpu/llm/fine-tuning/Llama2/README.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/Llama3/README.md‎
Lines changed: 0 additions & 3 deletions b/‎examples/gpu/llm/fine-tuning/Llama3/README.md‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/Phi3/README.md‎
Lines changed: 0 additions & 6 deletions b/‎examples/gpu/llm/fine-tuning/Phi3/README.md‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/Qwen/README.md‎
Lines changed: 0 additions & 2 deletions b/‎examples/gpu/llm/fine-tuning/Qwen/README.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎examples/gpu/llm/fine-tuning/requirements.txt‎
Lines changed: 1 addition & 1 deletion b/‎examples/gpu/llm/fine-tuning/requirements.txt‎
Lines changed: 1 addition & 1 deletion
@@ -9,22 +9,22 @@ Intel® Extension for PyTorch\* now empowers users to seamlessly harness graph c
 # Required Dependencies
 
 **Verified version**:
-- `torch` : v2.3
-- `intel_extension_for_pytorch` : v2.3
-- `triton` : >= v3.0.0
+- `torch` : v2.5
+- `intel_extension_for_pytorch` : v2.5
+- `triton` : v3.1.0+91b14bf559
 
 
-Install [Intel® oneAPI Base Toolkit 2024.2.1](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
+Install [Intel® oneAPI DPC++/C++ Compiler 2025.0.4](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html).
 
 Follow [Intel® Extension for PyTorch\* Installation](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/) to install `torch` and `intel_extension_for_pytorch` firstly.
 
 Triton could be directly installed using the following command:
 
 ```Bash
-pip install --pre pytorch-triton-xpu==3.0.0+1b2f15840e --index-url https://download.pytorch.org/whl/nightly/xpu
+pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu
 ```
 
-Remember to activate the oneAPI basekit by following commands.
+Remember to activate the oneAPI DPC++/C++ Compiler by following commands.
 
 ```bash
 # {dpcpproot} is the location for dpcpp ROOT path and it is where you installed oneAPI DPCPP, usually it is /opt/intel/oneapi/compiler/latest or ~/intel/oneapi/compiler/latest
@@ -39,19 +39,43 @@ source {dpcpproot}/env/vars.sh
 
 ```python
 import torch
+import torch.nn as nn
 import intel_extension_for_pytorch
 
-# create model
+# Define the SimpleNet model
+class SimpleNet(nn.Module):
+    def __init__(self):
+        super(SimpleNet, self).__init__()
+        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
+        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
+        self.fc1 = nn.Linear(32 * 56 * 56, 128)
+        self.fc2 = nn.Linear(128, 10)
+        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
+        self.relu = nn.ReLU()
+
+    def forward(self, x):
+        x = self.pool(self.relu(self.conv1(x)))
+        x = self.pool(self.relu(self.conv2(x)))
+        x = x.view(-1, 32 * 56 * 56)
+        x = self.relu(self.fc1(x))
+        x = self.fc2(x)
+        return x
+
+# Create model
 model = SimpleNet().to("xpu")
 
-# compile model
+# Compile model
 compiled_model = torch.compile(model, options={"freezing": True})
 
-# inference main
+# Inference main
 input = torch.rand(64, 3, 224, 224, device=torch.device("xpu"))
 with torch.no_grad():
     with torch.xpu.amp.autocast(dtype=torch.float16):
         output = compiled_model(input)
+
+# Print the output shape
+print(output.shape)
+print("Done for inference with torch.compile")
 ```
 
 ## Training with torch.compile
@@ -76,3 +100,7 @@ optimizer.zero_grad()
 loss.backward()
 optimizer.step()
 ```
+
+## Troubleshooting
+
+If you encounter any issue related to `torch.compile` or `triton`, please refer to Library Dependencies section in [known_issues](../known_issues.md).
@@ -51,12 +51,3 @@ More examples, including training and usage of low precision data types are avai
 
 There are some environment variables in runtime that can be used to configure executions on GPU. Please check [Advanced Configuration](./features/advanced_configuration.html#runtime-configuration) for more detailed information.
 
-Set `OCL_ICD_VENDORS` with default path `/etc/OpenCL/vendors`.
-Set `CCL_ROOT` if you are using multi-GPU.
-
-```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
-export CCL_ROOT=${CONDA_PREFIX} 
-python <script>
-```
-
@@ -83,24 +83,49 @@ Troubleshooting
 
     If you continue seeing similar issues for other shared object files, add the corresponding files under `${MKL_DPCPP_ROOT}/lib/intel64/` by `LD_PRELOAD`. Note that the suffix of the libraries may change (e.g. from .1 to .2), if more than one oneMKL library is installed on the system.
 
-- **Problem**: RuntimeError: could not create an engine.
-  - **Cause**: `OCL_ICD_VENDORS` path is wrongly set when activate a exist conda environment.
-  - **Solution**: `export OCL_ICD_VENDORS=/etc/OpenCL/vendors` after `conda activate`
-
-- **Problem**: If you encounter issues related to CCL environment variable configuration when running distributed tasks.
-  - **Cause**: `CCL_ROOT` path is wrongly set.
-  - **Solution**: `export CCL_ROOT=${CONDA_PREFIX}`
-
 - **Problem**: If you encounter issues related to MPI environment variable configuration when running distributed tasks.
   - **Cause**: MPI environment variable configuration not correct.
   - **Solution**: `conda deactivate` and then `conda activate` to activate the correct MPI environment variable automatically.
 
     ```
     conda deactivate
     conda activate
-    export OCL_ICD_VENDORS=/etc/OpenCL/vendors
     ```
 
+
+- **Problem**: If you encounter issues Runtime error related to C++ compiler with `torch.compile`. Runtime Error: Failed to find C++ compiler. Please specify via CXX environment variable.
+  - **Cause**: Not install and activate DPC++/C++ Compiler correctly.
+  - **Solution**: [Install DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html) and activate it by following commands.
+ 
+    ```bash
+    # {dpcpproot} is the location for dpcpp ROOT path and it is where you installed oneAPI DPCPP, usually it is /opt/intel/oneapi/compiler/latest or ~/intel/oneapi/compiler/latest
+    source {dpcpproot}/env/vars.sh
+    ```
+
+- **Problem**: RuntimeError: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at https://github.com/openai/triton
+  - **Cause**: No pytorch-triton-xpu installed
+  - **Solution**: Resolve the issue with following command:
+
+    ```bash
+    # Install correct version of pytorch-triton-xpu
+    pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559  --index-url https://download.pytorch.org/whl/nightly/xpu
+    ```
+
+
+- **Problem**: LoweringException: ImportError: cannot import name 'intel' from 'triton._C.libtriton'
+  - **Cause**: Installing Triton causes pytorch-triton-xpu to stop working.
+  - **Solution**: Resolve the issue with following command:
+
+    ```bash
+    pip list | grep triton
+    # If triton related packages are listed, remove them
+    pip uninstall triton
+    pip uninstall pytorch-triton-xpu
+    # Reinstall correct version of pytorch-triton-xpu
+    pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559  --index-url https://download.pytorch.org/whl/nightly/xpu
+    ```
+
+
 ## Performance Issue
 
 - **Problem**: Extended durations for data transfers from the host system to the device (H2D) and from the device back to the host system (D2H).
 
@@ -0,0 +1,25 @@
+# Run Model inference
+
+
+1. Command for compilation of example-app 
+
+```
+$ cd example-app
+$ mkdir build
+$ cd build
+$ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
+$ make
+```
+
+2. Use model_gen.py to generate the resnet50 jit model and save it as resnet50.pt
+
+```
+python ../../model_gen.py
+```
+
+
+3. Run example 
+
+```
+./example-app resnet50.pt
+```
@@ -8,21 +8,70 @@ Here you can find benchmarking scripts for large language models (LLM) text gene
 
 ## Environment Setup
 
-### [Recommended] Docker-based environment setup with compilation from source
+### [Recommended] Docker-based environment setup with prebuilt wheel files
 
 ```bash
 # Get the Intel® Extension for PyTorch* source code
 git clone https://github.com/intel/intel-extension-for-pytorch.git
 cd intel-extension-for-pytorch
-git checkout xpu-main
+git checkout v2.5.10+xpu
+git submodule sync
+git submodule update --init --recursive
+
+# Build an image with the provided Dockerfile by installing Intel® Extension for PyTorch* with prebuilt wheels
+docker build -f examples/gpu/llm/Dockerfile -t ipex-llm:2510 .
+
+# Run the container with command below
+docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:2510 bash
+
+# When the command prompt shows inside the docker container, enter llm examples directory
+cd llm
+
+# Activate environment variables
+source ./tools/env_activate.sh [inference|fine-tuning]
+```
+
+### Conda-based environment setup with prebuilt wheel files
+
+Make sure the driver packages are installed. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.5.10%2Bxpu&os=linux%2Fwsl2&package=pip).
+
+```bash
+
+# Get the Intel® Extension for PyTorch* source code
+git clone https://github.com/intel/intel-extension-for-pytorch.git
+cd intel-extension-for-pytorch
+git checkout v2.5.10+xpu
+git submodule sync
+git submodule update --init --recursive
+
+# Make sure you have GCC >= 11 is installed on your system.
+# Create a conda environment
+conda create -n llm python=3.10 -y
+conda activate llm
+# Setup the environment with the provided script
+cd examples/gpu/llm
+# If you want to install Intel® Extension for PyTorch\* with prebuilt wheels, use the commands below:
+bash ./tools/env_setup.sh 0x07
+conda deactivate
+conda activate llm
+source ./tools/env_activate.sh [inference|fine-tuning]
+```
+
+### Docker-based environment setup with compilation from source
+
+```bash
+# Get the Intel® Extension for PyTorch* source code
+git clone https://github.com/intel/intel-extension-for-pytorch.git
+cd intel-extension-for-pytorch
+git checkout v2.5.10+xpu
 git submodule sync
 git submodule update --init --recursive
 
 # Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch* from source
-docker build -f examples/gpu/llm/Dockerfile --build-arg COMPILE=ON -t ipex-llm:xpu-main .
+docker build -f examples/gpu/llm/Dockerfile --build-arg COMPILE=ON -t ipex-llm:2510 .
 
 # Run the container with command below
-docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:xpu-main bash
+docker run -it --rm --privileged -v /dev/dri/by-path:/dev/dri/by-path ipex-llm:2510 bash
 
 # When the command prompt shows inside the docker container, enter llm examples directory
 cd llm
@@ -33,14 +82,14 @@ source ./tools/env_activate.sh [inference|fine-tuning]
 
 ### Conda-based environment setup with compilation from source
 
-Make sure the driver and Base Toolkit are installed. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.3.110%2Bxpu&os=linux%2Fwsl2&package=source).
+Make sure the driver and Base Toolkit are installed. Refer to [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.5.10%2Bxpu&os=linux%2Fwsl2&package=source).
 
 ```bash
 
 # Get the Intel® Extension for PyTorch* source code
 git clone https://github.com/intel/intel-extension-for-pytorch.git
 cd intel-extension-for-pytorch
-git checkout xpu-main
+git checkout v2.5.10+xpu
 git submodule sync
 git submodule update --init --recursive
 
@@ -51,12 +100,12 @@ conda activate llm
 # Setup the environment with the provided script
 cd examples/gpu/llm
 # If you want to install Intel® Extension for PyTorch\* from source, use the commands below:
-# e.g. bash ./tools/env_setup.sh 3 /opt/intel/oneapi pvc
-bash ./tools/env_setup.sh 3 <ONEAPI_ROOT_DIR> <AOT>
+
+# e.g. bash ./tools/env_setup.sh 0x03 /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest /opt/intel/oneapi/ccl/latest /opt/intel/oneapi/mpi/latest /opt/intel/oneapi/pti/latest pvc
+bash ./tools/env_setup.sh 0x03 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <MPI_ROOT> <PTI_ROOT> <AOT>
 
 conda deactivate
 conda activate llm
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 source ./tools/env_activate.sh [inference|fine-tuning]
 ```
 
 
@@ -45,7 +45,6 @@ Remove the flags `--data_path` in fine-tuning command will load the guanaco-llam
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
@@ -84,7 +83,6 @@ Remove the flags `--data_path` in fine-tuning command will load the guanaco-llam
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
 
@@ -26,7 +26,6 @@ Full-finetuning on single card will cause OOM.
 Example: Llama 3 8B LoRA fine-tuning on single card. The default dataset `financial_phrasebank` is loaded in `llama3_ft.py`.
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export TORCH_LLM_ALLREDUCE=1
 
 export model="meta-llama/Meta-Llama-3-8B"
@@ -57,7 +56,6 @@ Example: Llama 3 8B full fine-tuning, you can change the model name/path for ano
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
@@ -85,7 +83,6 @@ Example: Llama 3 8B LoRA fine-tuning, you can change the model name/path for ano
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
 
@@ -21,7 +21,6 @@ wandb login
 **Note**: Not support full finetuning and flash attention on this platform.
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export model="microsoft/Phi-3-mini-4k-instruct"
 
 python phi3_ft.py \
@@ -47,7 +46,6 @@ python phi3_ft.py \
 Example: Phi-3 Mini 4k full fine-tuning on single card. The default dataset `financial_phrasebank` is loaded in `phi3_ft.py`.
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export TORCH_LLM_ALLREDUCE=1
 
 export model="microsoft/Phi-3-mini-4k-instruct"
@@ -71,7 +69,6 @@ python phi3_ft.py \
 Example: Phi-3 Mini 4k LoRA fine-tuning on single card. The default dataset `financial_phrasebank` is loaded in `phi3_ft.py`.
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export TORCH_LLM_ALLREDUCE=1
 
 export model="microsoft/Phi-3-mini-4k-instruct"
@@ -102,7 +99,6 @@ Example: Phi-3 Mini 4k full fine-tuning.
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
@@ -130,7 +126,6 @@ Example: Phi-3 Mini 4k LoRA fine-tuning.
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
@@ -159,7 +154,6 @@ Example: Phi3-Mini 4k LoRA fine-tuning.
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
 
@@ -29,7 +29,6 @@ Example: Qwen 7B full fine-tuning.
 
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
@@ -61,7 +60,6 @@ accelerate launch --config_file "fsdp_config.yaml" qwen2_ft.py \
 Example: Qwen 7B LoRA fine-tuning.
 
 ```bash
-export OCL_ICD_VENDORS=/etc/OpenCL/vendors
 export CCL_PROCESS_LAUNCHER=none
 export TORCH_LLM_ALLREDUCE=1
 
 
@@ -4,4 +4,4 @@ fire
 tokenizers>=0.13.3
 wandb==0.17.5
 trl==0.9.4
-accelerate==0.28.0
+accelerate==1.1.1