Skip to content

Commit 68cdc43

Browse files
guangyeygujinghui
andauthored
[doc] add xpu-memory-management (#2504) (#2560)
[doc] add xpu-memory-management add usm python API decrepated warning --------- Co-authored-by: Jinghui <jinghui.gu@intel.com> (cherry picked from commit 1344b88)
1 parent c6c1c1d commit 68cdc43

File tree

5 files changed

+94
-4
lines changed

5 files changed

+94
-4
lines changed

csrc/include/xpu/Utils.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ namespace dpcpp {
3030
/// @param strides: strides.
3131
/// @param device_id: device id.
3232
/// @returns: Tensor.
33+
C10_DEPRECATED_MESSAGE(
34+
"fromUSM is deprecated. Please use the USM-based DLPack solution instead.")
3335
IPEX_API at::Tensor fromUSM(
3436
void* src,
3537
const at::ScalarType stype,
@@ -40,6 +42,8 @@ IPEX_API at::Tensor fromUSM(
4042
/// Get a pointer of united shared memory from a tensor.
4143
/// @param src: Tensor.
4244
/// @returns: a pointer of united shared memory.
45+
C10_DEPRECATED_MESSAGE(
46+
"toUSM is deprecated. Please use the USM-based DLPack solution instead.")
4347
IPEX_API void* toUSM(const at::Tensor& src);
4448

4549
} // namespace dpcpp

docs/tutorials/technical_details.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,31 @@ Optimizers are a key part of the training workloads. Intel® Extension for PyTor
3939
2. SplitSGD for BF16 training, which reduces the memory footprint of the master weights by half. **[CPU]**
4040

4141

42-
For more detailed information, check `Optimizer Fusion on CPU <technical_details/optimizer_fusion_cpu.md>`_, `Optimizer Fusion on GPU <technical_details/optimizer_fusion_gpu.md>`_ and `Split SGD <technical_details/split_sgd.html>`_
43-
4442
.. toctree::
4543
:hidden:
4644
:maxdepth: 1
4745

4846
technical_details/optimizer_fusion_cpu
4947
technical_details/optimizer_fusion_gpu
5048
technical_details/split_sgd
49+
50+
51+
.. _xpu-memory-management:
52+
53+
Memory Management [GPU]
54+
---------------------------------
55+
56+
Intel® Extension for PyTorch* uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without any overhead.
57+
Allocations are associated with a sycl device. The allocator attempts to find the smallest cached block that will fit the requested size from the reserved block pool.
58+
If it unable to find a appropriate memory block inside of already allocated ares, the allocator will delegate to allocate a new block memory.
59+
60+
For more detailed information, check `Memory Management <technical_details/memory_management.html>`_.
61+
62+
.. toctree::
63+
:hidden:
64+
:maxdepth: 1
65+
66+
technical_details/memory_management
67+
68+
69+
For more detailed information, check `Optimizer Fusion on CPU <technical_details/optimizer_fusion_cpu.md>`_, `Optimizer Fusion on GPU <technical_details/optimizer_fusion_gpu.md>`_, `Split SGD <technical_details/split_sgd.html>`_ and `Memory Management <technical_details/memory_management.html>`_
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Memory Management
2+
=================
3+
4+
You can use :meth:`~torch.xpu.memory_allocated` and
5+
:meth:`~torch.xpu.max_memory_allocated` to monitor memory occupied by
6+
tensors, and use :meth:`~torch.xpu.memory_reserved` and
7+
:meth:`~torch.xpu.max_memory_reserved` to monitor the total amount of memory
8+
managed by the caching allocator. Calling :meth:`~torch.xpu.empty_cache`
9+
releases all **unused** cached memory from PyTorch so that those can be used
10+
by other GPU applications. However, the occupied GPU memory by tensors will not
11+
be freed so it can not increase the amount of GPU memory available for PyTorch.
12+
13+
For more advanced users, we offer more comprehensive memory benchmarking via
14+
:meth:`~torch.xpu.memory_stats`. We also offer the capability to capture a
15+
complete snapshot of the memory allocator state via
16+
:meth:`~torch.xpu.memory_snapshot`, which can help you understand the
17+
underlying allocation patterns produced by your code.

intel_extension_for_pytorch/xpu/memory.py

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ def empty_cache() -> None:
1414
.. note::
1515
:func:`~torch.xpu.empty_cache` doesn't increase the amount of GPU
1616
memory available for PyTorch. However, it may help reduce fragmentation
17-
of GPU memory in certain cases.
17+
of GPU memory in certain cases. See :ref:`xpu-memory-management` for
18+
more details about GPU memory management.
1819
"""
1920
intel_extension_for_pytorch._C._emptyCache()
2021

@@ -73,6 +74,10 @@ def memory_stats(device: Union[Device, int] = None) -> Dict[str, Any]:
7374
device (torch.device or int, optional): selected device. Returns
7475
statistics for the current device, given by :func:`~torch.xpu.current_device`,
7576
if :attr:`device` is ``None`` (default).
77+
78+
.. note::
79+
See :ref:`xpu-memory-management` for more details about GPU memory
80+
management.
7681
"""
7782
result = []
7883

@@ -109,6 +114,10 @@ def reset_accumulated_memory_stats(device: Union[Device, int] = None) -> None:
109114
device (torch.device or int, optional): selected device. Returns
110115
statistic for the current device, given by :func:`~torch.xpu.current_device`,
111116
if :attr:`device` is ``None`` (default).
117+
118+
.. note::
119+
See :ref:`xpu-memory-management` for more details about GPU memory
120+
management.
112121
"""
113122
device = _get_device_index(device, optional=True)
114123
return intel_extension_for_pytorch._C._resetAccumulatedMemoryStats(device)
@@ -124,6 +133,10 @@ def reset_peak_memory_stats(device: Union[Device, int] = None) -> None:
124133
device (torch.device or int, optional): selected device. Returns
125134
statistic for the current device, given by :func:`~torch.xpu.current_device`,
126135
if :attr:`device` is ``None`` (default).
136+
137+
.. note::
138+
See :ref:`xpu-memory-management` for more details about GPU memory
139+
management.
127140
"""
128141
device = _get_device_index(device, optional=True)
129142
return intel_extension_for_pytorch._C._resetPeakMemoryStats(device)
@@ -141,7 +154,8 @@ def memory_allocated(device: Union[Device, int] = None) -> int:
141154
.. note::
142155
This is likely less than the amount shown in sysman toolkit since some
143156
unused memory can be held by the caching allocator and some context
144-
needs to be created on GPU.
157+
needs to be created on GPU. See :ref:`xpu-memory-management` for more
158+
details about GPU memory management.
145159
"""
146160
return memory_stats(device=device)["allocated_bytes.all.current"]
147161

@@ -160,6 +174,10 @@ def max_memory_allocated(device: Union[Device, int] = None) -> int:
160174
device (torch.device or int, optional): selected device. Returns
161175
statistic for the current device, given by :func:`~torch.xpu.current_device`,
162176
if :attr:`device` is ``None`` (default).
177+
178+
.. note::
179+
See :ref:`xpu-memory-management` for more details about GPU memory
180+
management.
163181
"""
164182
return memory_stats(device=device)["allocated_bytes.all.peak"]
165183

@@ -172,6 +190,10 @@ def memory_reserved(device: Union[Device, int] = None) -> int:
172190
device (torch.device or int, optional): selected device. Returns
173191
statistic for the current device, given by :func:`~torch.xpu.current_device`,
174192
if :attr:`device` is ``None`` (default).
193+
194+
.. note::
195+
See :ref:`xpu-memory-management` for more details about GPU memory
196+
management.
175197
"""
176198
return memory_stats(device=device)["reserved_bytes.all.current"]
177199

@@ -190,6 +212,10 @@ def max_memory_reserved(device: Union[Device, int] = None) -> int:
190212
device (torch.device or int, optional): selected device. Returns
191213
statistic for the current device, given by :func:`~torch.xpu.current_device`,
192214
if :attr:`device` is ``None`` (default).
215+
216+
.. note::
217+
See :ref:`xpu-memory-management` for more details about GPU memory
218+
management.
193219
"""
194220
return memory_stats(device=device)["reserved_bytes.all.peak"]
195221

@@ -199,6 +225,10 @@ def memory_snapshot():
199225
200226
Interpreting the output of this function requires familiarity with the
201227
memory allocator internals.
228+
229+
.. note::
230+
See :ref:`xpu-memory-management` for more details about GPU memory
231+
management.
202232
"""
203233
return intel_extension_for_pytorch._C._memorySnapshot()
204234

@@ -216,6 +246,10 @@ def memory_summary(device: Union[Device, int] = None, abbreviated: bool = False)
216246
if :attr:`device` is ``None`` (default).
217247
abbreviated (bool, optional): whether to return an abbreviated summary
218248
(default: False).
249+
250+
.. note::
251+
See :ref:`xpu-memory-management` for more details about GPU memory
252+
management.
219253
"""
220254
device = _get_device_index(device, optional=True)
221255
stats = memory_stats(device=device)

intel_extension_for_pytorch/xpu/utils.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import torch
33
from .. import _C
44
from enum import Enum
5+
import warnings
56
from .. import frontend
67
import intel_extension_for_pytorch # noqa
78

@@ -25,12 +26,27 @@ def from_usm(src, dtype, shape, stride = None, device_id: int = -1) -> torch.Ten
2526
returned tensor is contiguous.
2627
device_id: the root device id where the USM pointer is allocated. Default: -1,
2728
if the user is not sure.
29+
30+
Warning: This is decrepated. Please use torch.from_dlpack instead.
2831
"""
2932

33+
warnings.warn("from_usm is decrepated. Please use torch.from_dlpack instead.")
3034
return _C._from_usm(src, dtype, shape, stride, device_id)
3135

3236

3337
def to_usm(src: torch.Tensor):
38+
"""to_usm(src: torch.Tensor): -> PyCapsule
39+
40+
Converts a torch tensor allocated in USM(United Shared Memory) into a ``PyCapsule``,
41+
which encapsules a USM data pointer address.
42+
43+
Args:
44+
src: a torch tensor.
45+
46+
Warning: This is decrepated. Please use torch.to_dlpack instead.
47+
"""
48+
49+
warnings.warn("to_usm is decrepated. Please use torch.to_dlpack instead.")
3450
return _C._to_usm(src)
3551

3652

0 commit comments

Comments
 (0)