You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/features.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,7 @@ Intel® Extension for PyTorch* also optimizes operators and implements several c
104
104
.. autoclass:: MergedEmbeddingBag
105
105
.. autoclass:: MergedEmbeddingBagWithSGD
106
106
107
-
**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aims to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can diesable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
107
+
**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aim to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can disable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
Copy file name to clipboardExpand all lines: docs/tutorials/features/graph_capture.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ Graph Capture (Experimental)
3
3
4
4
### Feature Description
5
5
6
-
This feature automatically applies a combination of TorchScript trace technique and TorchDynamo to try to generate a graph model, for providing a good user experience while keep execution fast. Specifically, the process tries to generate a graph with TorchScript trace functionality first. In case of generation failure or incorrect results detected, it changes to TorchDynamo with TorchScript backend. Failure of the graph generation with TorchDynamo triggers a warning message. Meanwhile the generated graph model falls back to the original one. I.e. the inference workload runs in eager mode. Users can take advantage of this feature through a new knob `--graph_mode` of the `ipex.optimize()` function to automatically run into graph mode.
6
+
This feature automatically applies a combination of TorchScript trace technique and TorchDynamo to try to generate a graph model, for providing a good user experience while keeping execution fast. Specifically, the process tries to generate a graph with TorchScript trace functionality first. In case of generation failure or incorrect results detected, it changes to TorchDynamo with TorchScript backend. Failure of the graph generation with TorchDynamo triggers a warning message. Meanwhile the generated graph model falls back to the original one. I.e. the inference workload runs in eager mode. Users can take advantage of this feature through a new knob `--graph_mode` of the `ipex.optimize()` function to automatically run into graph mode.
Copy file name to clipboardExpand all lines: docs/tutorials/features/nhwc.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Look at the following image of illustrating NCHW and NHWC when N=1. Actually whe
12
12
13
13
PyTorch refers to NCHW as `torch.contiguous_format` (the default memory format) and to NHWC as `torch.channels_last`, which is a new feature as of the 1.5 release.
14
14
15
-
TensorFlow uses NHWC as the default memory format because NHWC has a performance advantage over NCHW. On CPU platforms, we propose to optimize Channels Last memory path for ihe following reasons:
15
+
TensorFlow uses NHWC as the default memory format because NHWC has a performance advantage over NCHW. On CPU platforms, we propose to optimize Channels Last memory path for the following reasons:
16
16
***Performance** - NHWC performance is not as good as blocked memory format (nChw16c), but it is close, and much better performance than NCHW.
17
17
***User Experience** - Operator coverage of NHWC would be higher than blocked memory format (`to_mkldnn()` method), so user experience is better. To be specific, it is difficult to enable operators that manipulates `dim` on blocked format such as `sum(dim=?)`. You would need to convert tensor from blocked memory format back to NHWC using `to_dense()`, before feeding it into `sum()`. This is naturally supported on Channels Last memory format already.
18
18
***Upstream** - Will be easier since CPU doesn't hold secret ingredient and both inference and training will be covered.
@@ -74,7 +74,7 @@ Better to explain the concepts here with a diagram, the **dotted lines** indicat
74
74
75
75
Before moving on, I feel it is necessary to explain how PyTorch organizes tensors in memory - the **layout**. Here we only focus on **dense** tensors, skip 'coo' layout of **sparse** tensor.
76
76
77
-
The question itself can be reinterpreted as for a tensor of size <N, C, H, W>, how does PyTorch accesses the element with index <n, c, h, w> from memory, the answer is **stride**:
77
+
The question itself can be reinterpreted as for a tensor of size <N, C, H, W>, how does PyTorch access the element with index <n, c, h, w> from memory, the answer is **stride**:
78
78
```
79
79
tensor: <N, C, H, W>
80
80
index: <n, c, h, w>
@@ -146,7 +146,7 @@ The general guideline has been listed under reference [Writing-memory-format-awa
146
146
147
147
### c. Register oneDNN Kernel on Channels Last
148
148
149
-
Registering a oneDNN kernel under Channels Last memory format on CPU is no different from [cuDNN](https://github.com/pytorch/pytorch/pull/23861): Only very few upper level changes are needed, such as accommodate 'contiguous()' to 'contiguous(suggested_memory_format)'. The automatic reorder of oneDNN weight shall been hidden in ideep.
149
+
Registering a oneDNN kernel under Channels Last memory format on CPU is no different from [cuDNN](https://github.com/pytorch/pytorch/pull/23861): Only very few upper level changes are needed, such as accommodate 'contiguous()' to 'contiguous(suggested_memory_format)'. The automatic reorder of oneDNN weight shall have been hidden in ideep.
To ensure a smooth compilation, a script is provided in the Github repo. If you would like to compile the binaries from source, it is highly recommended to utilize this script.
0 commit comments