update docs for 2.0.0 release (#1476)

jingxu10 · web-flow · commit 70c8ac00f11b · 2023-03-13T15:13:10.000+09:00
diff --git a/docker/Dockerfile.prebuilt b/docker/Dockerfile.prebuilt
@@ -27,10 +27,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
 # Some TF tools expect a "python" binary
 RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
 
-ARG IPEX_VERSION=1.13.100
-ARG PYTORCH_VERSION=1.13.1
-ARG TORCHAUDIO_VERSION=0.13.1
-ARG TORCHVISION_VERSION=0.14.1
+ARG IPEX_VERSION=2.0.0
+ARG PYTORCH_VERSION=2.0.0
+ARG TORCHAUDIO_VERSION=2.0.0
+ARG TORCHVISION_VERSION=0.15.0
 ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
 
 RUN \
diff --git a/docs/tutorials/api_doc.rst b/docs/tutorials/api_doc.rst
@@ -8,6 +8,12 @@ General
 .. autofunction:: optimize
 .. autoclass:: verbose
 
+Fast Bert (Experimental)
+************************
+
+.. currentmodule:: intel_extension_for_pytorch
+.. autofunction:: fast_bert
+
 Graph Optimization
 ******************
 
diff --git a/docs/tutorials/blogs_publications.md b/docs/tutorials/blogs_publications.md
@@ -1,6 +1,7 @@
 Blogs & Publications
 ====================
 
+* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
 * [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
 * [Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
 * [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -184,6 +184,11 @@ We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchSc
 [//]: # (marker_inf_bert_ts_bf16)
 [//]: # (marker_inf_bert_ts_bf16)
 
+### Fast Bert (*Experimental*)
+
+[//]: # (marker_inf_bert_fast_bf16)
+[//]: # (marker_inf_bert_fast_bf16)
+
 ### INT8
 
 Starting from Intel® Extension for PyTorch\* 1.12.0, quantization feature supports both static and dynamic modes.
@@ -257,6 +262,9 @@ The example code below works for all data types.
 **Command for compilation**
 
 ```bash
+$ cd examples/cpu/inference/cpp
+$ mkdir build
+$ cd build
 $ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
 $ make
 ```
@@ -265,34 +273,31 @@ If *Found INTEL_EXT_PT_CPU* is shown as *TRUE*, the extension had been linked in
 
 ```bash
 $ cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
--- The C compiler identification is GNU 9.3.0
--- The CXX compiler identification is GNU 9.3.0
--- Check for working C compiler: /usr/bin/cc
--- Check for working C compiler: /usr/bin/cc -- works
+-- The C compiler identification is GNU 11.2.1
+-- The CXX compiler identification is GNU 11.2.1
 -- Detecting C compiler ABI info
 -- Detecting C compiler ABI info - done
+-- Check for working C compiler: /usr/bin/cc - skipped
 -- Detecting C compile features
 -- Detecting C compile features - done
--- Check for working CXX compiler: /usr/bin/c++
--- Check for working CXX compiler: /usr/bin/c++ -- works
 -- Detecting CXX compiler ABI info
 -- Detecting CXX compiler ABI info - done
+-- Check for working CXX compiler: /usr/bin/c++ - skipped
 -- Detecting CXX compile features
 -- Detecting CXX compile features - done
--- Looking for pthread.h
--- Looking for pthread.h - found
--- Performing Test CMAKE_HAVE_LIBC_PTHREAD
--- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
--- Looking for pthread_create in pthreads
--- Looking for pthread_create in pthreads - not found
--- Looking for pthread_create in pthread
--- Looking for pthread_create in pthread - found
--- Found Threads: TRUE
+CMake Warning at /workspace/libtorch/share/cmake/Torch/TorchConfig.cmake:22 (message):
+  static library kineto_LIBRARY-NOTFOUND not found.
+Call Stack (most recent call first):
+  /workspace/libtorch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
+  /workspace/libtorch/share/cmake/IPEX/IPEXConfig.cmake:84 (FIND_PACKAGE)
+  CMakeLists.txt:4 (find_package)
+
+
 -- Found Torch: /workspace/libtorch/lib/libtorch.so
--- Found INTEL_EXT_PT_CPU: TRUE
+-- Found IPEX: /workspace/libtorch/lib/libintel-ext-pt-cpu.so
 -- Configuring done
 -- Generating done
--- Build files have been written to: /workspace/build
+-- Build files have been written to: examples/cpu/inference/cpp/build
 
 $ ldd example-app
         ...
@@ -307,4 +312,4 @@ $ ldd example-app
 
 ## Model Zoo
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.13-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r1.13-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.0-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r2.0-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst
@@ -18,6 +18,21 @@ Check the `API Documentation`_ for details of API functions. `Examples <examples
 Here are detailed discussions of specific feature topics, summarized in the rest
 of this document:
 
+torch.compile (Experimental, *NEW feature from 2.0.0*)
+------------------------------------------------------
+
+PyTorch* 2.0 introduces a new feature, `torch.compile`, to speed up PyTorch* code. It makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, all while requiring minimal code changes. Intel® Extension for PyTorch\* enables a backend, `ipex`, in the `torch.compile` to optimize generation of the graph model.
+
+Usage is as simple as importing Intel® Extension for PyTorch\* and setting `backend` parameter of the `torch.compile` to `ipex`. While optimizations with `torch.compile` applies to backend, invocation of `ipex.optimize` function is highly recommended as well to apply optimizations in frontend.
+
+.. code-block:: python
+
+   import torch
+   import intel_extension_for_pytorch as ipex
+   ...
+   model = ipex.optimize(model)
+   model = torch.compile(model, backend='ipex')
+
 ISA Dynamic Dispatching
 -----------------------
 
@@ -182,3 +197,18 @@ For more detailed information, check `HyperTune <features/hypertune.md>`_.
    :maxdepth: 1
 
    features/hypertune
+
+Fast BERT Optimization (Experimental, *NEW feature from 2.0.0*)
+---------------------------------------------------------------
+
+Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors.
+
+Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique. An API `ipex.fast_bert` is provided for a simple usage.
+
+For more detailed information, check `Fast BERT <features/fast_bert.md>`_.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   features/fast_bert
diff --git a/docs/tutorials/features/fast_bert.md b/docs/tutorials/features/fast_bert.md
@@ -0,0 +1,19 @@
+Fast BERT (Experimental)
+========================
+
+### Feature Description
+
+Intel proposed a technique, Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. Detailed contents are available at [*Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads*](https://arxiv.org/pdf/2104.05755.pdf).
+
+Implementation of TPP is integrated into Intel® Extension for PyTorch\*. BERT could benefit from this new technique, for both training and inference.
+
+### Prerequisite
+
+- Transformers 4.6.0 ~ 4.20.0
+
+### Usage Example
+
+An API `ipex.fast_bert` is provided for a simple usage. Usage of this API follows the pattern of `ipex.optimize` function. More detailed description of API is available at [Fast BERT API doc](../api_doc)
+
+[//]: # (marker_inf_bert_fast_bf16)
+[//]: # (marker_inf_bert_fast_bf16)
diff --git a/docs/tutorials/features/graph_capture.md b/docs/tutorials/features/graph_capture.md
@@ -7,22 +7,5 @@ This feature automatically applies a combination of TorchScript trace technique
 
 ### Usage Example
 
-```python
-import torch
-import torchvision.models as models
-
-model = models.resnet50(pretrained=True)
-model.eval()
-data = torch.rand(1, 3, 224, 224)
-
-model = model.to(memory_format=torch.channels_last)
-data = data.to(memory_format=torch.channels_last)
-
-#################### code changes ####################
-import intel_extension_for_pytorch as ipex
-model = ipex.optimize(model, graph_mode=True)
-######################################################
-
-with torch.no_grad():
-    model(data)
-```
+[//]: # (marker_feature_graph_capture)
+[//]: # (marker_feature_graph_capture)
diff --git a/docs/tutorials/features/hypertune.md b/docs/tutorials/features/hypertune.md
@@ -95,15 +95,15 @@ This is the script as an optimization function.
 'target_val'                               # optional. Target value of the objective function. Default is -float('inf')
 ```
 
-Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
+Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
 
 ## Usage Examples
 
 **Tuning `ncore_per_instance` for minimum `latency`**
 
 Suppose we want to tune `ncore_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
 
-Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
+Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
 ```
 python -m intel_extension_for_pytorch.cpu.hypertune --conf_file <hypertune_directory>/example/example.yaml <hypertune_directory>/example/resnet50.py
 ```
@@ -115,6 +115,6 @@ latency: 12.339081764221191
 ```
 15 `ncore_per_instance` gave the minimum latency.
 
-You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
+You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
 
 Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.
diff --git a/docs/tutorials/features/int8_recipe_tuning_api.md b/docs/tutorials/features/int8_recipe_tuning_api.md
@@ -7,39 +7,5 @@ Users need to provide a prepared model and some parameters required for tuning.
 
 ### Usage Example
 
-```python
-model = torchvision.models.resnet50(pretrained=True)
-model.eval()
-data_loader = torch.utils.data.DataLoader(
-        datasets.ImageFolder(valdir, transforms.Compose([
-            transforms.Resize(256),
-            transforms.CenterCrop(224),
-            transforms.ToTensor(),
-            normalize,
-        ])),
-        batch_size=batch_size, shuffle=False,
-        num_workers=workers, pin_memory=True)
-
-# prepare model, do conv+bn folding, and init model quant_state.
-qconfig = ipex.quantization.default_static_qconfig
-data = torch.randn(1, 3, 224, 224)
-prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, inplace=False)
-
-######################## recipe tuning with INC ########################
-def eval(prepared_model):
-    # return accuracy value
-    return evaluate(prepared_model, data_loader)
-tuned_model = ipex.quantization.autotune(prepared_model, data_loader, eval, sampling_size=[100], 
-        accuracy_criterion={'relative': 0.01}, tuning_time=0)
-########################################################################
-
-# run tuned model
-convert_model = ipex.quantization.convert(tuned_model)
-with torch.no_grad():
-    traced_model = torch.jit.trace(convert_model, data)
-    traced_model = torch.jit.freeze(traced_model)
-    traced_model(data)
-
-# save tuned qconfig file
-tuned_model.save_qconf_summary(qconf_summary = "tuned_conf.json")
-```
+[//]: # (marker_feature_int8_autotune)
+[//]: # (marker_feature_int8_autotune)
diff --git a/docs/tutorials/getting_started.md b/docs/tutorials/getting_started.md
@@ -26,36 +26,38 @@ In general, APIs invocation should follow orders below.
 
 1. `import intel_extension_for_pytorch as ipex`
 2. Invoke `optimize()` function to apply optimizations.
-3. For Torchscript, invoke `torch.jit.trace()` and `torch.jit.freeze()`.
+3. Convert the imperative model to a graph model.
+    - For TorchScript, invoke `torch.jit.trace()` and `torch.jit.freeze()`.
+    - For TorchDynamo, invoke `torch.compile(model, backend="ipex")`. (*Experimental feature*, FP32 ONLY)
 
 **Note:** It is highly recommended to `import intel_extension_for_pytorch` right after `import torch`, prior to importing other packages.
 
 ```python
 import torch
-####### import ipex ########
+############## import ipex ###############
 import intel_extension_for_pytorch as ipex
-############################
+##########################################
 
 model = Model()
 model.eval()
 data = ...
-dtype=torch.float32 # torch.bfloat16
 
-##### ipex.optimize() ######
-model = ipex.optimize(model, dtype=dtype)
-############################
+############## TorchScript ###############
+model = ipex.optimize(model, dtype=torch.bfloat16)
 
-########## FP32 ############
-with torch.no_grad():
-####### BF16 on CPU ########
-with torch.no_grad(), with torch.cpu.amp.autocast():
-############################
-  ###### Torchscript #######
+with torch.no_grad(), torch.cpu.amp.autocast():
   model = torch.jit.trace(model, data)
   model = torch.jit.freeze(model)
-  ###### Torchscript #######
+  model(data)
+##########################################
 
+############ T##orchDynamo ###############
+model = ipex.optimize(model)
+
+model = torch.compile(model, backend="ipex")
+with torch.no_grad():
   model(data)
+##########################################
 ```
 
 More examples, including training and usage of low precision data types are available at [Examples](./examples.md).
diff --git a/docs/tutorials/performance_tuning/launch_script.md b/docs/tutorials/performance_tuning/launch_script.md
@@ -69,7 +69,7 @@ run_20210712212258_instance_0_cores_0-43.log
 
 ## Usage Examples
 
-Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.100+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
+Example script [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.0+cpu/examples/cpu/inference/resnet50_general_inference_script.py) will be used in this guide.
 
 - Single instance for inference
   - [I. Use all physical cores](#i-use-all-physical-cores)
diff --git a/examples/cpu/inference/python/bert_fast_inference_bf16.py b/examples/cpu/inference/python/bert_fast_inference_bf16.py
@@ -0,0 +1,19 @@
+import torch
+from transformers import BertModel
+
+model = BertModel.from_pretrained("bert-base-uncased")
+model.eval()
+
+vocab_size = model.config.vocab_size
+batch_size = 1
+seq_length = 512
+data = torch.randint(vocab_size, size=[batch_size, seq_length])
+torch.manual_seed(43)
+
+#################### code changes ####################
+import intel_extension_for_pytorch as ipex
+model = ipex.fast_bert(model, dtype=torch.bfloat16)
+######################################################
+
+with torch.no_grad():
+  model(data)
diff --git a/examples/cpu/inference/python/bert_torchdynamo_mode_inference_fp32.py b/examples/cpu/inference/python/bert_torchdynamo_mode_inference_fp32.py
@@ -12,6 +12,7 @@
 # Experimental Feature
 #################### code changes ####################
 import intel_extension_for_pytorch as ipex
+model = ipex.optimize(model)
 model = torch.compile(model, backend="ipex")
 ######################################################
 
diff --git a/examples/cpu/inference/python/resnet50_torchdynamo_mode_inference_fp32.py b/examples/cpu/inference/python/resnet50_torchdynamo_mode_inference_fp32.py
@@ -8,6 +8,7 @@
 # Experimental Feature
 #################### code changes ####################
 import intel_extension_for_pytorch as ipex
+model = ipex.optimize(model)
 model = torch.compile(model, backend="ipex")
 ######################################################
 
diff --git a/intel_extension_for_pytorch/frontend.py b/intel_extension_for_pytorch/frontend.py