Skip to content

Commit 7768124

Browse files
update document (#851)
1 parent 500fc79 commit 7768124

File tree

2 files changed

+9
-3
lines changed

2 files changed

+9
-3
lines changed

docs/tutorials/features/runtime_extension.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Intel® Extension for PyTorch\* Runtime Extension provides a couple of PyTorch f
77
2. Spawn asynchronous tasks via the Python frontend module `intel_extension_for_pytorch.cpu.runtime.Task`.
88
3. Configure core bindings for OpenMP threads via the Python frontend `intel_extension_for_pytorch.cpu.runtime.pin`.
99

10-
Please **note**: Intel® Extension for PyTorch\* Runtime extension is still in the **POC** stage. The API is subject to change. More detailed descriptions are available at [API Documentation page](../api_doc.html).
10+
Please **note**: Intel® Extension for PyTorch\* Runtime extension is still in the **Experimental** stage. The API is subject to change. More detailed descriptions are available at [API Documentation page](../api_doc.html).
1111

1212
## Requirements
1313

@@ -17,7 +17,9 @@ Intel® Extension for PyTorch\* Runtime Extension relies on `intel omp` to bind
1717

1818
### Example of Multi Stream Module
1919

20-
Runtime extension supports weight-sharing multi-stream inference for throughput mode on CPU. You just need to convert the original model into multi stream model and run the new multi stream model as normal. The detailed description of parameters to create `MultiStreamModule` is available at [API Documentation page](../api_doc.html)
20+
Runtime extension supports weight-sharing multi-stream inference for throughput mode on CPU. You just need to convert the original model into multi stream model and run the new multi stream model as normal. The detailed description of parameters to create `MultiStreamModule` is available at [API Documentation page](../api_doc.html).
21+
22+
`MultiStreamModule` targets to improve performance of inference in throughput mode. We recommend creating a `MultiStreamModule` object with the `num_streams` parameter set to "AUTO" to heuristically decide the number of streams. Usually, it provides reasonable performance. However, it may still not be optimal for some cases (refer to the section [Performance recipes](#performance-recipes) for details) where manual tuning for the number of streams is needed.
2123

2224
The `MultiStreamModule` creates number of streams based on input parameter `num_streams` and bind cores to stream based on input parameter `cpu_pool`. If the number of cores inside `cpu_pool` is divisible by `num_streams`, the cores will be allocated equally to each stream. If the number of cores inside `cpu_pool` is not divisible by `num_streams` with remainder N, one extra core will be allocated to the first N streams. We suggest to set the `num_streams` as divisor of core number inside `cpu_pool`.
2325

intel_extension_for_pytorch/cpu/runtime/multi_stream.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .cpupool import CPUPool
66
from .task import Task
77
import copy
8+
import warnings
89

910
class MultiStreamModuleHint(object):
1011
def __init__(self, *args, **kwargs):
@@ -91,6 +92,9 @@ def __init__(self,
9192
output_concat_hint: MultiStreamModuleHint = default_multi_stream_module_concat_hint):
9293
super(MultiStreamModule, self).__init__()
9394
assert type(cpu_pool) is CPUPool, "Input of cpu_pool must be provided with type of ipex.cpu.runtime.CPUPool"
95+
if not isinstance(model, torch.jit.ScriptModule):
96+
warnings.warn("Creating MultiStreamModule on an nn.Module. This can be slow due "
97+
"to Python Global Interpreter Lock (GIL). Suggest to use JIT ScriptModule for better performance.")
9498
self.core_list = cpu_pool.core_ids
9599
if isinstance(num_streams, str):
96100
# For str input of num_streams, it must be "auto"
@@ -215,7 +219,7 @@ def _do_get_input_for_each_stream(self, hint_object, input_object, stream_input_
215219
self.init_forward_status(input_object[idx_or_key].size(hint_object[idx_or_key]), stream_id)
216220
# Get the split input for each stream
217221
# Here we assume split along the outside dim, otherwise memory copy happens and obviously hurt multi stream module's performance.
218-
if hint_object[idx_or_key] is 0:
222+
if hint_object[idx_or_key] == 0:
219223
# Split along dim 0, the slice will not create new tensor
220224
stream_input_object[idx_or_key] = input_object[idx_or_key][self.current_split_start_idx:self.current_split_end_idx]
221225
else:

0 commit comments

Comments
 (0)