Skip to content

Commit 70c8ac0

Browse files
authored
update docs for 2.0.0 release (#1476)
1 parent 5b44996 commit 70c8ac0

File tree

15 files changed

+147
-102
lines changed

15 files changed

+147
-102
lines changed

docker/Dockerfile.prebuilt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
2727
# Some TF tools expect a "python" binary
2828
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
2929

30-
ARG IPEX_VERSION=1.13.100
31-
ARG PYTORCH_VERSION=1.13.1
32-
ARG TORCHAUDIO_VERSION=0.13.1
33-
ARG TORCHVISION_VERSION=0.14.1
30+
ARG IPEX_VERSION=2.0.0
31+
ARG PYTORCH_VERSION=2.0.0
32+
ARG TORCHAUDIO_VERSION=2.0.0
33+
ARG TORCHVISION_VERSION=0.15.0
3434
ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
3535

3636
RUN \

docs/tutorials/api_doc.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ General
88
.. autofunction:: optimize
99
.. autoclass:: verbose
1010

11+
Fast Bert (Experimental)
12+
************************
13+
14+
.. currentmodule:: intel_extension_for_pytorch
15+
.. autofunction:: fast_bert
16+
1117
Graph Optimization
1218
******************
1319

docs/tutorials/blogs_publications.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
Blogs & Publications
22
====================
33

4+
* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
45
* [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
56
* [Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
67
* [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)

docs/tutorials/examples.md

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,11 @@ We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchSc
184184
[//]: # (marker_inf_bert_ts_bf16)
185185
[//]: # (marker_inf_bert_ts_bf16)
186186

187+
### Fast Bert (*Experimental*)
188+
189+
[//]: # (marker_inf_bert_fast_bf16)
190+
[//]: # (marker_inf_bert_fast_bf16)
191+
187192
### INT8
188193

189194
Starting from Intel® Extension for PyTorch\* 1.12.0, quantization feature supports both static and dynamic modes.
@@ -257,6 +262,9 @@ The example code below works for all data types.
257262
**Command for compilation**
258263

259264
```bash
265+
$ cd examples/cpu/inference/cpp
266+
$ mkdir build
267+
$ cd build
260268
$ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
261269
$ make
262270
```
@@ -265,34 +273,31 @@ If *Found INTEL_EXT_PT_CPU* is shown as *TRUE*, the extension had been linked in
265273

266274
```bash
267275
$ cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
268-
-- The C compiler identification is GNU 9.3.0
269-
-- The CXX compiler identification is GNU 9.3.0
270-
-- Check for working C compiler: /usr/bin/cc
271-
-- Check for working C compiler: /usr/bin/cc -- works
276+
-- The C compiler identification is GNU 11.2.1
277+
-- The CXX compiler identification is GNU 11.2.1
272278
-- Detecting C compiler ABI info
273279
-- Detecting C compiler ABI info - done
280+
-- Check for working C compiler: /usr/bin/cc - skipped
274281
-- Detecting C compile features
275282
-- Detecting C compile features - done
276-
-- Check for working CXX compiler: /usr/bin/c++
277-
-- Check for working CXX compiler: /usr/bin/c++ -- works
278283
-- Detecting CXX compiler ABI info
279284
-- Detecting CXX compiler ABI info - done
285+
-- Check for working CXX compiler: /usr/bin/c++ - skipped
280286
-- Detecting CXX compile features
281287
-- Detecting CXX compile features - done
282-
-- Looking for pthread.h
283-
-- Looking for pthread.h - found
284-
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
285-
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
286-
-- Looking for pthread_create in pthreads
287-
-- Looking for pthread_create in pthreads - not found
288-
-- Looking for pthread_create in pthread
289-
-- Looking for pthread_create in pthread - found
290-
-- Found Threads: TRUE
288+
CMake Warning at /workspace/libtorch/share/cmake/Torch/TorchConfig.cmake:22 (message):
289+
static library kineto_LIBRARY-NOTFOUND not found.
290+
Call Stack (most recent call first):
291+
/workspace/libtorch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
292+
/workspace/libtorch/share/cmake/IPEX/IPEXConfig.cmake:84 (FIND_PACKAGE)
293+
CMakeLists.txt:4 (find_package)
294+
295+
291296
-- Found Torch: /workspace/libtorch/lib/libtorch.so
292-
-- Found INTEL_EXT_PT_CPU: TRUE
297+
-- Found IPEX: /workspace/libtorch/lib/libintel-ext-pt-cpu.so
293298
-- Configuring done
294299
-- Generating done
295-
-- Build files have been written to: /workspace/build
300+
-- Build files have been written to: examples/cpu/inference/cpp/build
296301

297302
$ ldd example-app
298303
...
@@ -307,4 +312,4 @@ $ ldd example-app
307312

308313
## Model Zoo
309314

310-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.13-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r1.13-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
315+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.

docs/tutorials/features.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,21 @@ Check the `API Documentation`_ for details of API functions. `Examples <examples
1818
Here are detailed discussions of specific feature topics, summarized in the rest
1919
of this document:
2020

21+
torch.compile (Experimental, *NEW feature from 2.0.0*)
22+
------------------------------------------------------
23+
24+
PyTorch* 2.0 introduces a new feature, `torch.compile`, to speed up PyTorch* code. It makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, all while requiring minimal code changes. Intel® Extension for PyTorch\* enables a backend, `ipex`, in the `torch.compile` to optimize generation of the graph model.
25+
26+
Usage is as simple as importing Intel® Extension for PyTorch\* and setting `backend` parameter of the `torch.compile` to `ipex`. While optimizations with `torch.compile` applies to backend, invocation of `ipex.optimize` function is highly recommended as well to apply optimizations in frontend.
27+
28+
.. code-block:: python
29+
30+
import torch
31+
import intel_extension_for_pytorch as ipex
32+
...
33+
model = ipex.optimize(model)
34+
model = torch.compile(model, backend='ipex')
35+
2136
ISA Dynamic Dispatching
2237
-----------------------
2338

@@ -182,3 +197,18 @@ For more detailed information, check `HyperTune <features/hypertune.md>`_.
182197
:maxdepth: 1
183198

184199
features/hypertune
200+
201+
Fast BERT Optimization (Experimental, *NEW feature from 2.0.0*)
202+
---------------------------------------------------------------
203+
204+
Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors.
205+
206+
Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique. An API `ipex.fast_bert` is provided for a simple usage.
207+
208+
For more detailed information, check `Fast BERT <features/fast_bert.md>`_.
209+
210+
.. toctree::
211+
:hidden:
212+
:maxdepth: 1
213+
214+
features/fast_bert
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Fast BERT (Experimental)
2+
========================
3+
4+
### Feature Description
5+
6+
Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. Detailed contents are available at [*Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads*](https://arxiv.org/pdf/2104.05755.pdf).
7+
8+
Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique, for both training and inference.
9+
10+
### Prerequisite
11+
12+
- Transformers 4.6.0 ~ 4.20.0
13+
14+
### Usage Example
15+
16+
An API `ipex.fast_bert` is provided for a simple usage. Usage of this API follows the pattern of `ipex.optimize` function. More detailed description of API is available at [Fast BERT API doc](../api_doc)
17+
18+
[//]: # (marker_inf_bert_fast_bf16)
19+
[//]: # (marker_inf_bert_fast_bf16)

docs/tutorials/features/graph_capture.md

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,5 @@ This feature automatically applies a combination of TorchScript trace technique
77

88
### Usage Example
99

10-
```python
11-
import torch
12-
import torchvision.models as models
13-
14-
model = models.resnet50(pretrained=True)
15-
model.eval()
16-
data = torch.rand(1, 3, 224, 224)
17-
18-
model = model.to(memory_format=torch.channels_last)
19-
data = data.to(memory_format=torch.channels_last)
20-
21-
#################### code changes ####################
22-
import intel_extension_for_pytorch as ipex
23-
model = ipex.optimize(model, graph_mode=True)
24-
######################################################
25-
26-
with torch.no_grad():
27-
model(data)
28-
```
10+
[//]: # (marker_feature_graph_capture)
11+
[//]: # (marker_feature_graph_capture)

docs/tutorials/features/hypertune.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -95,15 +95,15 @@ This is the script as an optimization function.
9595
'target_val' # optional. Target value of the objective function. Default is -float('inf')
9696
```
9797

98-
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
98+
Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
9999

100100
## Usage Examples
101101

102102
**Tuning `ncore_per_instance` for minimum `latency`**
103103

104104
Suppose we want to tune `ncore_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
105105

106-
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
106+
Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
107107
```
108108
python -m intel_extension_for_pytorch.cpu.hypertune --conf_file <hypertune_directory>/example/example.yaml <hypertune_directory>/example/resnet50.py
109109
```
@@ -115,6 +115,6 @@ latency: 12.339081764221191
115115
```
116116
15 `ncore_per_instance` gave the minimum latency.
117117

118-
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
118+
You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
119119

120120
Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.

docs/tutorials/features/int8_recipe_tuning_api.md

Lines changed: 2 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -7,39 +7,5 @@ Users need to provide a prepared model and some parameters required for tuning.
77

88
### Usage Example
99

10-
```python
11-
model = torchvision.models.resnet50(pretrained=True)
12-
model.eval()
13-
data_loader = torch.utils.data.DataLoader(
14-
datasets.ImageFolder(valdir, transforms.Compose([
15-
transforms.Resize(256),
16-
transforms.CenterCrop(224),
17-
transforms.ToTensor(),
18-
normalize,
19-
])),
20-
batch_size=batch_size, shuffle=False,
21-
num_workers=workers, pin_memory=True)
22-
23-
# prepare model, do conv+bn folding, and init model quant_state.
24-
qconfig = ipex.quantization.default_static_qconfig
25-
data = torch.randn(1, 3, 224, 224)
26-
prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, inplace=False)
27-
28-
######################## recipe tuning with INC ########################
29-
def eval(prepared_model):
30-
# return accuracy value
31-
return evaluate(prepared_model, data_loader)
32-
tuned_model = ipex.quantization.autotune(prepared_model, data_loader, eval, sampling_size=[100],
33-
accuracy_criterion={'relative': 0.01}, tuning_time=0)
34-
########################################################################
35-
36-
# run tuned model
37-
convert_model = ipex.quantization.convert(tuned_model)
38-
with torch.no_grad():
39-
traced_model = torch.jit.trace(convert_model, data)
40-
traced_model = torch.jit.freeze(traced_model)
41-
traced_model(data)
42-
43-
# save tuned qconfig file
44-
tuned_model.save_qconf_summary(qconf_summary = "tuned_conf.json")
45-
```
10+
[//]: # (marker_feature_int8_autotune)
11+
[//]: # (marker_feature_int8_autotune)

docs/tutorials/getting_started.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,36 +26,38 @@ In general, APIs invocation should follow orders below.
2626

2727
1. `import intel_extension_for_pytorch as ipex`
2828
2. Invoke `optimize()` function to apply optimizations.
29-
3. For Torchscript, invoke `torch.jit.trace()` and `torch.jit.freeze()`.
29+
3. Convert the imperative model to a graph model.
30+
- For TorchScript, invoke `torch.jit.trace()` and `torch.jit.freeze()`.
31+
- For TorchDynamo, invoke `torch.compile(model, backend="ipex")`. (*Experimental feature*, FP32 ONLY)
3032

3133
**Note:** It is highly recommended to `import intel_extension_for_pytorch` right after `import torch`, prior to importing other packages.
3234

3335
```python
3436
import torch
35-
####### import ipex ########
37+
############## import ipex ###############
3638
import intel_extension_for_pytorch as ipex
37-
############################
39+
##########################################
3840

3941
model = Model()
4042
model.eval()
4143
data = ...
42-
dtype=torch.float32 # torch.bfloat16
4344

44-
##### ipex.optimize() ######
45-
model = ipex.optimize(model, dtype=dtype)
46-
############################
45+
############## TorchScript ###############
46+
model = ipex.optimize(model, dtype=torch.bfloat16)
4747

48-
########## FP32 ############
49-
with torch.no_grad():
50-
####### BF16 on CPU ########
51-
with torch.no_grad(), with torch.cpu.amp.autocast():
52-
############################
53-
###### Torchscript #######
48+
with torch.no_grad(), torch.cpu.amp.autocast():
5449
model = torch.jit.trace(model, data)
5550
model = torch.jit.freeze(model)
56-
###### Torchscript #######
51+
model(data)
52+
##########################################
5753

54+
############ T##orchDynamo ###############
55+
model = ipex.optimize(model)
56+
57+
model = torch.compile(model, backend="ipex")
58+
with torch.no_grad():
5859
model(data)
60+
##########################################
5961
```
6062

6163
More examples, including training and usage of low precision data types are available at [Examples](./examples.md).

0 commit comments

Comments
 (0)