You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/blogs_publications.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
Blogs & Publications
2
2
====================
3
3
4
+
*[Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
4
5
*[Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
5
6
*[Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
6
7
*[Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
-- Found Torch: /workspace/libtorch/lib/libtorch.so
292
-
-- Found INTEL_EXT_PT_CPU: TRUE
297
+
-- Found IPEX: /workspace/libtorch/lib/libintel-ext-pt-cpu.so
293
298
-- Configuring done
294
299
-- Generating done
295
-
-- Build files have been written to: /workspace/build
300
+
-- Build files have been written to: examples/cpu/inference/cpp/build
296
301
297
302
$ ldd example-app
298
303
...
@@ -307,4 +312,4 @@ $ ldd example-app
307
312
308
313
## Model Zoo
309
314
310
-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo forIntel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.13-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r1.13-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running sciptsin the Model Zoo.
315
+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
PyTorch* 2.0 introduces a new feature, `torch.compile`, to speed up PyTorch* code. It makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, all while requiring minimal code changes. Intel® Extension for PyTorch\* enables a backend, `ipex`, in the `torch.compile` to optimize generation of the graph model.
25
+
26
+
Usage is as simple as importing Intel® Extension for PyTorch\* and setting `backend` parameter of the `torch.compile` to `ipex`. While optimizations with `torch.compile` applies to backend, invocation of `ipex.optimize` function is highly recommended as well to apply optimizations in frontend.
27
+
28
+
.. code-block:: python
29
+
30
+
import torch
31
+
import intel_extension_for_pytorch as ipex
32
+
...
33
+
model = ipex.optimize(model)
34
+
model = torch.compile(model, backend='ipex')
35
+
21
36
ISA Dynamic Dispatching
22
37
-----------------------
23
38
@@ -182,3 +197,18 @@ For more detailed information, check `HyperTune <features/hypertune.md>`_.
182
197
:maxdepth:1
183
198
184
199
features/hypertune
200
+
201
+
Fast BERT Optimization (Experimental, *NEW feature from 2.0.0*)
Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors.
205
+
206
+
Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique. An API `ipex.fast_bert` is provided for a simple usage.
207
+
208
+
For more detailed information, check `Fast BERT <features/fast_bert.md>`_.
Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. Detailed contents are available at [*Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads*](https://arxiv.org/pdf/2104.05755.pdf).
7
+
8
+
Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique, for both training and inference.
9
+
10
+
### Prerequisite
11
+
12
+
- Transformers 4.6.0 ~ 4.20.0
13
+
14
+
### Usage Example
15
+
16
+
An API `ipex.fast_bert` is provided for a simple usage. Usage of this API follows the pattern of `ipex.optimize` function. More detailed description of API is available at [Fast BERT API doc](../api_doc)
Copy file name to clipboardExpand all lines: docs/tutorials/features/hypertune.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,15 +95,15 @@ This is the script as an optimization function.
95
95
'target_val' # optional. Target value of the objective function. Default is -float('inf')
96
96
```
97
97
98
-
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
98
+
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
99
99
100
100
## Usage Examples
101
101
102
102
**Tuning `ncore_per_instance` for minimum `latency`**
103
103
104
104
Suppose we want to tune `ncore_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
105
105
106
-
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
106
+
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
118
+
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
119
119
120
120
Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.
0 commit comments