Skip to content

Commit 407fd71

Browse files
authored
[doc] sync docs to release branch (#1645)
* Add api_doc.rst, features.rst, optimizer_fusion and installation * remove DDP, hvd, DLpack from features.rst
1 parent 31875e7 commit 407fd71

File tree

4 files changed

+322
-0
lines changed

4 files changed

+322
-0
lines changed

docs/tutorials/api_doc.rst

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
API Documentation
2+
#################
3+
4+
General
5+
*******
6+
7+
.. currentmodule:: intel_extension_for_pytorch
8+
.. autofunction:: optimize
9+
.. currentmodule:: intel_extension_for_pytorch.xpu
10+
.. StreamContext
11+
.. can_device_access_peer
12+
.. current_blas_handle
13+
.. autofunction:: current_device
14+
.. autofunction:: current_stream
15+
.. default_stream
16+
.. autoclass:: device
17+
.. autofunction:: device_count
18+
.. autoclass:: device_of
19+
.. autofunction:: getDeviceIdListForCard
20+
.. autofunction:: get_device_name
21+
.. autofunction:: get_device_properties
22+
.. get_gencode_flags
23+
.. get_sync_debug_mode
24+
.. autofunction:: init
25+
.. ipc_collect
26+
.. autofunction:: is_available
27+
.. autofunction:: is_initialized
28+
.. memory_usage
29+
.. autofunction:: set_device
30+
.. set_stream
31+
.. autofunction:: stream
32+
.. autofunction:: synchronize
33+
34+
35+
36+
Random Number Generator
37+
***********************
38+
39+
.. currentmodule:: intel_extension_for_pytorch.xpu
40+
.. autofunction:: get_rng_state
41+
.. autofunction:: get_rng_state_all
42+
.. autofunction:: set_rng_state
43+
.. autofunction:: set_rng_state_all
44+
.. autofunction:: manual_seed
45+
.. autofunction:: manual_seed_all
46+
.. autofunction:: seed
47+
.. autofunction:: seed_all
48+
.. autofunction:: initial_seed
49+
50+
51+
52+
Streams and events
53+
******************
54+
55+
.. currentmodule:: intel_extension_for_pytorch.xpu
56+
.. autoclass:: Stream
57+
:members:
58+
.. ExternalStream
59+
.. autoclass:: Event
60+
:members:
61+
62+
Memory management
63+
*****************
64+
65+
.. currentmodule:: intel_extension_for_pytorch.xpu
66+
.. autofunction:: empty_cache
67+
.. list_gpu_processes
68+
.. mem_get_info
69+
.. autofunction:: memory_stats
70+
.. autofunction:: memory_summary
71+
.. autofunction:: memory_snapshot
72+
.. autofunction:: memory_allocated
73+
.. autofunction:: max_memory_allocated
74+
.. reset_max_memory_allocated
75+
.. autofunction:: memory_reserved
76+
.. autofunction:: max_memory_reserved
77+
.. set_per_process_memory_fraction
78+
.. memory_cached
79+
.. max_memory_cached
80+
.. reset_max_memory_cached
81+
.. autofunction:: reset_peak_memory_stats
82+
.. caching_allocator_alloc
83+
.. caching_allocator_delete
84+
85+
86+
87+
.. autofunction:: memory_stats_as_nested_dict
88+
.. autofunction:: reset_accumulated_memory_stats
89+
90+
Other
91+
*****
92+
93+
.. currentmodule:: intel_extension_for_pytorch.xpu
94+
.. autofunction:: get_fp32_math_mode
95+
.. autofunction:: set_fp32_math_mode
96+
97+
98+
.. .. automodule:: intel_extension_for_pytorch.quantization
99+
.. :members:
100+
101+
C++ API
102+
*******
103+
104+
.. doxygenenum:: xpu::FP32_MATH_MODE
105+
106+
.. doxygenfunction:: xpu::set_fp32_math_mode
107+
108+
.. doxygenfunction:: xpu::get_queue_from_stream

docs/tutorials/features.rst

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
Features
2+
========
3+
4+
Ease-of-use Python API
5+
----------------------
6+
7+
Intel® Extension for PyTorch\* provides simple frontend Python APIs and utilities to get performance optimizations such as operator optimization.
8+
9+
Check the `API Documentation <api_doc.html>`_ for details of API functions and `Examples <examples.md>`_ for helpful usage tips.
10+
11+
DPC++ Extension
12+
---------------
13+
14+
Intel® Extension for PyTorch\* provides C++ APIs to get DPCPP queue and configure floating-point math mode.
15+
16+
Check the `API Documentation`_ for the details of API functions. `DPC++ Extension <features/DPC++_Extension.md>`_ describes how to write customized DPC++ kernels with a practical example and build it with setuptools and CMake.
17+
18+
.. toctree::
19+
:hidden:
20+
:maxdepth: 1
21+
22+
features/DPC++_Extension
23+
24+
Here are detailed discussions of specific feature topics, summarized in the rest of this document:
25+
26+
27+
Channels Last
28+
-------------
29+
30+
Compared with the default NCHW memory format, using channels_last (NHWC) memory format can further accelerate convolutional neural networks. In Intel® Extension for PyTorch\*, NHWC memory format has been enabled for most key GPU operators.
31+
32+
For more detailed information, check `Channels Last <features/nhwc.md>`_.
33+
34+
.. toctree::
35+
:hidden:
36+
:maxdepth: 1
37+
38+
features/nhwc
39+
40+
Auto Mixed Precision (AMP)
41+
--------------------------
42+
43+
The support of Auto Mixed Precision (AMP) with BFloat16 and Float16 optimization of operators has been enabled in Intel® Extension for PyTorch\*. BFloat16 is the default low precision floating data type when AMP is enabled. We suggest use AMP for accelerating convolutional and matmul based neural networks.
44+
45+
For more detailed information, check `Auto Mixed Precision (AMP) <features/amp.md>`_.
46+
47+
.. toctree::
48+
:hidden:
49+
:maxdepth: 1
50+
51+
features/amp
52+
53+
Advanced Configuration
54+
----------------------
55+
56+
The default settings for Intel® Extension for PyTorch* are sufficient for most use cases. However, if users want to customize Intel® Extension for PyTorch*, advanced configuration is available at build time and runtime.
57+
58+
For more detailed information, check `Advanced Configuration <features/advanced_configuration.md>`_.
59+
60+
.. toctree::
61+
:hidden:
62+
:maxdepth: 1
63+
64+
features/advanced_configuration
65+
66+
Optimizer Optimization
67+
----------------------
68+
69+
Optimizers are a key part of the training workloads. Intel® Extension for PyTorch\* supports operator fusion for computation in the optimizers.
70+
71+
For more detailed information, check `Optimizer Fusion <features/optimizer_fusion.md>`_.
72+
73+
.. toctree::
74+
:hidden:
75+
:maxdepth: 1
76+
77+
features/optimizer_fusion
78+
79+
Simple Trace Tool
80+
-----------------
81+
82+
Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context.
83+
84+
For more detailed information, check `Simple Trace Tool <features/simple_trace.md>`_.
85+
86+
.. toctree::
87+
:hidden:
88+
:maxdepth: 1
89+
90+
features/simple_trace
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Optimizer Fusion
2+
================
3+
4+
## Introduction
5+
As with TorchScript, operation fusion reduces the number of operators that will be executed, and reduces overhead time. This methodology is also applied in Intel® Extension for PyTorch\* optimizer optimization. We support SGD/AdamW fusion for both FP32/BF16 at current stage.
6+
7+
Let's examine the code in [sgd update](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html?highlight=sgd#torch.optim.SGD) as an example.
8+
9+
```python
10+
11+
# original version
12+
if weight_decay != 0:
13+
grad = grad.add(param, alpha=weight_decay)
14+
if momentum != 0:
15+
buf = momentum_buffer_list[i]
16+
if buf is None:
17+
buf = torch.clone(grad).detach()
18+
momentum_buffer_list[i] = buf
19+
else:
20+
buf.mul_(momentum).add_(grad, alpha=1 - dampening)
21+
if nesterov:
22+
grad = grad.add(buf, alpha=momentum)
23+
else:
24+
grad = buf
25+
26+
param.add_(grad, alpha=-lr)
27+
```
28+
29+
## Operation Fusion
30+
31+
One problem of the native implementation above is that we need to access the storages of `grad`, `param`, and `buf` several times. For large topologies, `grad` and `parameters` might not be stored in the cache. When we need to access the storage of `grad` again when executing the remaining clauses, the processor must read data out of low speed memory again instead of the more efficient high speed cache. This is a memory-bound bottle neck preventing good performance.
32+
33+
Operation fusion is a way to solve this problem. The clauses in the pseudo code are all element-wise operations, so we can fuse them into a single operation, as in the pseudo code below.
34+
35+
```python
36+
# fused version
37+
sgd_fused_step(param, grad, buf, ...(other args))
38+
```
39+
40+
After fusion, one operation `sgd_fused_step` can provide equivalent functionality but much better performance compared with original version of [sgd update](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html?highlight=sgd#torch.optim.SGD).

docs/tutorials/installation.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Build and Install from Source Code
2+
3+
This is guide to build an Intel® Extension for PyTorch* PyPI package from source and install it in Linux.
4+
5+
6+
## Prepare
7+
8+
### Hardware Requirement
9+
10+
Verified Hardware Platforms:
11+
- Intel® Data Center GPU Flex Series 170
12+
13+
### Software Requirements
14+
15+
- Ubuntu 20.04 (64-bit)
16+
- Intel GPU Drivers
17+
- Intel® Data Center GPU Flex Series [419.40](https://dgpu-docs.intel.com/releases/stable_419_40_20220914.html)
18+
- Intel® oneAPI Base Toolkit 2022.3
19+
- Python 3.7-3.10
20+
21+
### Install Intel GPU Driver
22+
23+
|Release|OS|Intel GPU|Install Intel GPU Driver|
24+
|-|-|-|-|
25+
|v1.0.0|Ubuntu 20.04|Intel® Data Center GPU Flex Series| Refer to the [Installation Guides](https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-focal-dc.html) for latest driver installation. If install the verified Intel® Data Center GPU Flex Series [419.40](https://dgpu-docs.intel.com/releases/stable_419_40_20220914.html), please append the specific version after components, such as `sudo apt-get install intel-opencl-icd=22.28.23726.1+i419~u20.04`|
26+
27+
### Install oneAPI Base Toolkit
28+
29+
Please refer to [Install oneAPI Base Toolkit Packages](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit)
30+
31+
Need to install components of Intel® oneAPI Base Toolkit:
32+
- Intel® oneAPI DPC++ Compiler
33+
- Intel® oneAPI Math Kernel Library (oneMKL)
34+
35+
Default installation location is /opt/intel/oneapi for root account, ${HOME}/intel/oneapi for other accounts.
36+
37+
### Configure the AOT
38+
39+
Please refer to [AOT documentation](./AOT.md) for how to configure AOT.
40+
41+
### Build and Install from Source Code
42+
43+
Make sure PyTorch is installed so that the extension will work properly. For each PyTorch release, we have a corresponding release of the extension. Here are the PyTorch versions that we support and the mapping relationship:
44+
45+
|PyTorch Version|Intel® Extension for PyTorch* Version|
46+
|--|--|
47+
|[v1.10.\*](https://github.com/pytorch/pytorch/tree/v1.10.0 "v1.10.0")|[v1.10.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.10.200)|
48+
49+
50+
Build and Install PyTorch:
51+
52+
```bash
53+
$ git clone https://github.com/pytorch/pytorch.git
54+
$ cd pytorch
55+
# checkout to specific release branch if in need
56+
$ git checkout ${PYTORCH_RELEASE_BRANCH_NAME}
57+
# apply git patch to pytorch code, e.g., apply patch for pytorch v1.10.
58+
$ git apply ${intel_extension_for_pytorch_directory}/torch_patches/{xpu-1.10}.patch
59+
$ git submodule update --init --recursive
60+
$ pip install -r requirements.txt
61+
# configure MKL env to enable MKL features
62+
$ source ${oneAPI_HOME}/mkl/latest/env/vars.sh
63+
# build pypi package and install it locally
64+
$ python setup.py bdist_wheel
65+
$ pip install dist/*.whl
66+
```
67+
68+
Build and Install Intel® Extension for PyTorch*:
69+
70+
```bash
71+
$ git clone -b xpu-master https://github.com/intel/intel-extension-for-pytorch.git
72+
$ cd intel-extension-for-pytorch
73+
# checkout to specific release branch if in need
74+
$ git checkout ${IPEX_RELEASE_BRANCH_NAME}
75+
$ git submodule update --init --recursive
76+
$ pip install -r requirements.txt
77+
# configure dpcpp compiler env
78+
$ source ${oneAPI_HOME}/compiler/latest/env/vars.sh
79+
# configure MKL env to enable MKL features
80+
$ source ${oneAPI_HOME}/mkl/latest/env/vars.sh
81+
# build pypi package and install it locally
82+
$ ${USE_AOT_DEVLIST} python setup.py bdist_wheel
83+
$ pip install dist/*.whl
84+
```

0 commit comments

Comments
 (0)