Skip to content

Commit a7a8039

Browse files
committed
make explicit typing as default, polish examples and docs
1 parent 3e008c2 commit a7a8039

File tree

4 files changed

+163
-66
lines changed

4 files changed

+163
-66
lines changed

docsrc/user_guide/mixed_precision.rst

Lines changed: 58 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,9 @@ Consider the following PyTorch model which explicitly casts intermediate layer t
3232
return x
3333
3434
35-
If we compile the above model using Torch-TensorRT with the following settings, layer profiling logs indicate that all the layers are
36-
run in FP32. This is because TensorRT picks the kernels for layers which result in the best performance (i.e., weak typing in TensorRT).
35+
Before TensorRT 10.12, if we compile the above model using Torch-TensorRT with the following settings,
36+
layer profiling logs indicate that all the layers are run in FP32. This is because old TensorRT picks
37+
the kernels for layers which result in the best performance (i.e., weak typing in old TensorRT).
3738

3839
.. code-block:: python
3940
@@ -49,8 +50,10 @@ run in FP32. This is because TensorRT picks the kernels for layers which result
4950
# Name: __myl_AddResMulSumAdd_myl0_2, LayerType: kgen, Inputs: [ { Name: __mye146_dconst, Dimensions: [30,40], Format/Datatype: Float }, { Name: linear3/addmm_2_constant_0 _ linear3/addmm_2_add_broadcast_to_same_shape_lhs_broadcast_constantFloat, Dimensions: [1,40], Format/Datatype: Float }, { Name: __myln_k_arg__bb1_3, Dimensions: [1,30], Format/Datatype: Float }, { Name: linear2/addmm_1_constant_0 _ linear2/addmm_1_add_broadcast_to_same_shape_lhs_broadcast_constantFloat, Dimensions: [1,30], Format/Datatype: Float }], Outputs: [ { Name: output0, Dimensions: [1,40], Format/Datatype: Float }], TacticName: __myl_AddResMulSumAdd_0xcdd0085ad25f5f45ac5fafb72acbffd6, StreamId: 0, Metadata:
5051
5152
52-
In order to respect the types specified by the user in the model (eg: in this case, ``linear2`` layer to run in FP16), users can enable
53-
the compilation setting ``use_explicit_typing=True``. Compiling with this option results in the following TensorRT logs:
53+
However, since TensorRT 10.12, TensorRT has deprecated weak typing, we must set ``use_explicit_typing=True``
54+
to enable strong typing, which means users must specify the precision of the nodes in the model. For example,
55+
in the case above, we set ``linear2`` layer to run in FP16, so if we compile the model with the following settings,
56+
the ``linear2`` layer will run in FP16 and other layers will run in FP32 as shown in the following TensorRT logs:
5457

5558
.. code-block:: python
5659
@@ -68,32 +71,67 @@ the compilation setting ``use_explicit_typing=True``. Compiling with this option
6871
Autocast
6972
---------------
7073

71-
Weak typing behavior in TensorRT is deprecated. However it is a good way to maximize performance. Therefore, in Torch-TensorRT,
72-
we want to provide a way to enable weak typing behavior in Torch-TensorRT, which is called `Autocast`.
74+
Weak typing behavior in TensorRT is deprecated. However mixed precision is a good way to maximize performance.
75+
Therefore, in Torch-TensorRT, we want to provide a way to enable mixed precision behavior like weak typing in
76+
old TensorRT, which is called `Autocast`.
7377

74-
Torch-TensorRT Autocast intelligently selects nodes to keep in FP32 precision to maintain model accuracy while benefiting from
75-
reduced precision on the rest of the nodes. Torch-TensorRT Autocast also supports users to specify which nodes to exclude from Autocast,
76-
considering some nodes might be more sensitive to affecting accuracy. In addition, Torch-TensorRT Autocast can cooperate with PyTorch
77-
native Autocast, allowing users to use both PyTorch and Torch-TensorRT Autocast in the same model. Torch-TensorRT respects the precision
78-
of the nodes within PyTorch Autocast.
78+
Before we dive into Torch-TensorRT Autocast, let's first take a look at PyTorch Autocast. PyTorch Autocast is a
79+
context-based autocast, which means it will affect the precision of the nodes inside the context. For example,
80+
in PyTorch, we can do the following:
7981

80-
To enable Torch-TensorRT Autocast, users need to set both ``enable_autocast=True`` and ``use_explicit_typing=True``. For example,
82+
.. code-block:: python
83+
84+
x = self.linear1(x)
85+
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.float16):
86+
x = self.linear2(x)
87+
x = self.linear3(x)
88+
89+
This will run ``linear2`` in FP16 and other layers remain in FP32. Please refer to `PyTorch Autocast documentation <https://docs.pytorch.org/docs/stable/amp.html#torch.autocast>`_ for more details.
90+
91+
Unlike PyTorch Autocast, Torch-TensorRT Autocast is a rule-based autocast, which intelligently selects nodes to
92+
keep in FP32 precision to maintain model accuracy while benefiting from reduced precision on the rest of the nodes.
93+
Torch-TensorRT Autocast also supports users to specify which nodes to exclude from Autocast, considering some nodes
94+
might be more sensitive to affecting accuracy. In addition, Torch-TensorRT Autocast can cooperate with PyTorch Autocast,
95+
allowing users to use both PyTorch Autocast and Torch-TensorRT Autocast in the same model. Torch-TensorRT Autocast
96+
respects the precision of the nodes within PyTorch Autocast context.
97+
98+
To enable Torch-TensorRT Autocast, we need to set both ``enable_autocast=True`` and ``use_explicit_typing=True``.
99+
On top of them, we can also specify the precision of the nodes to reduce to by ``autocast_low_precision_type``,
100+
and exclude certain nodes/ops from Torch-TensorRT Autocast by ``autocast_excluded_nodes`` or ``autocast_excluded_ops``.
101+
For example,
81102

82103
.. code-block:: python
83104
105+
class MyModule(torch.nn.Module):
106+
def __init__(self):
107+
super().__init__()
108+
self.linear1 = torch.nn.Linear(10,10)
109+
self.linear2 = torch.nn.Linear(10,30)
110+
self.linear3 = torch.nn.Linear(30,40)
111+
112+
def forward(self, x):
113+
x = self.linear1(x)
114+
x = self.linear2(x)
115+
x = self.linear3(x)
116+
return x
117+
84118
inputs = [torch.randn((1, 10), dtype=torch.float32).cuda()]
85119
mod = MyModule().eval().cuda()
86120
ep = torch.export.export(mod, tuple(inputs))
87-
trt_gm = torch_tensorrt.dynamo.compile(ep, inputs=inputs, enable_autocast=True, use_explicit_typing=True)
88-
121+
trt_gm = torch_tensorrt.dynamo.compile(
122+
ep,
123+
inputs=inputs,
124+
enable_autocast=True,
125+
use_explicit_typing=True,
126+
autocast_low_precision_type=torch.float16,
127+
autocast_excluded_nodes={"^linear2$"},
128+
)
89129
90-
Users can also specify the precision of the nodes by ``autocast_low_precision_type``, or ``autocast_excluded_nodes`` / ``autocast_excluded_ops``
91-
to exclude certain nodes/ops from Autocast.
130+
This model excludes ``linear2`` from Autocast, so it will run ``linear2`` in FP32 and other layers in FP16.
92131

93-
In summary, there are three ways in Torch-TensorRT to enable mixed precision:
94-
1. TRT chooses precision (weak typing): ``use_explicit_typing=False + enable_autocast=False``
95-
2. User specifies precision (strong typing): ``use_explicit_typing=True + enable_autocast=False``
96-
3. Autocast chooses precision (autocast + strong typing): ``use_explicit_typing=True + enable_autocast=True``
132+
In summary, now there are two ways in Torch-TensorRT to choose the precision of the nodes:
133+
1. User specifies precision (strong typing): ``use_explicit_typing=True + enable_autocast=False``
134+
2. Autocast chooses precision (autocast + strong typing): ``use_explicit_typing=True + enable_autocast=True``
97135

98136
FP32 Accumulation
99137
-----------------
Lines changed: 97 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,21 @@
1+
"""
2+
.. _autocast_example:
3+
4+
An example of using Torch-TensorRT Autocast
5+
================
6+
7+
This example demonstrates how to use Torch-TensorRT Autocast with PyTorch Autocast to compile a mixed precision model.
8+
"""
9+
110
import torch
211
import torch.nn as nn
312
import torch_tensorrt
413

14+
# %% Mixed Precision Model
15+
#
16+
# We define a mixed precision model that consists of a few layers, a ``log`` operation, and an ``abs`` operation.
17+
# Among them, the ``fc1``, ``log``, and ``abs`` operations are within PyTorch Autocast context with ``dtype=torch.float16``.
18+
519

620
class MixedPytorchAutocastModel(nn.Module):
721
def __init__(self):
@@ -20,51 +34,89 @@ def __init__(self):
2034
self.fc1 = nn.Linear(16 * 8 * 8, 10)
2135

2236
def forward(self, x):
23-
x = self.conv1(x)
24-
x = self.relu1(x)
25-
x = self.pool1(x)
26-
x = self.conv2(x)
27-
x = self.relu2(x)
28-
x = self.pool2(x)
29-
x = self.flatten(x)
37+
out1 = self.conv1(x)
38+
out2 = self.relu1(out1)
39+
out3 = self.pool1(out2)
40+
out4 = self.conv2(out3)
41+
out5 = self.relu2(out4)
42+
out6 = self.pool2(out5)
43+
out7 = self.flatten(out6)
3044
with torch.autocast(x.device.type, enabled=True, dtype=torch.float16):
31-
x = self.fc1(x)
32-
out = torch.log(
33-
torch.abs(x) + 1
45+
out8 = self.fc1(out7)
46+
out9 = torch.log(
47+
torch.abs(out8) + 1
3448
) # log is fp32 due to Pytorch Autocast requirements
35-
return out
36-
37-
38-
if __name__ == "__main__":
39-
model = MixedPytorchAutocastModel().cuda().eval()
40-
inputs = (torch.randn((8, 3, 32, 32), dtype=torch.float32, device="cuda"),)
41-
ep = torch.export.export(model, inputs)
42-
calibration_dataloader = torch.utils.data.DataLoader(
43-
torch.utils.data.TensorDataset(*inputs), batch_size=2, shuffle=False
44-
)
45-
46-
with torch_tensorrt.dynamo.Debugger(
47-
"graphs",
48-
logging_dir=".",
49-
engine_builder_monitor=False,
50-
):
51-
trt_autocast_mod = torch_tensorrt.compile(
52-
ep.module(),
53-
arg_inputs=inputs,
54-
min_block_size=1,
55-
use_python_runtime=True,
56-
##### weak typing #####
57-
# use_explicit_typing=False,
58-
# enabled_precisions={torch.float16},
59-
##### strong typing + autocast #####
60-
use_explicit_typing=True,
61-
enable_autocast=True,
62-
autocast_low_precision_type=torch.float16,
63-
autocast_excluded_nodes={"^conv1$", "relu"},
64-
autocast_excluded_ops={"torch.ops.aten.flatten.using_ints"},
65-
autocast_max_output_threshold=512,
66-
autocast_max_depth_of_reduction=None,
67-
autocast_calibration_dataloader=calibration_dataloader,
68-
)
49+
return x, out1, out2, out3, out4, out5, out6, out7, out8, out9
50+
51+
52+
# %%
53+
# Define the model, inputs, and calibration dataloader for Autocast, and then we run the original PyTorch model to get the reference outputs.
54+
55+
model = MixedPytorchAutocastModel().cuda().eval()
56+
inputs = (torch.randn((8, 3, 32, 32), dtype=torch.float32, device="cuda"),)
57+
ep = torch.export.export(model, inputs)
58+
calibration_dataloader = torch.utils.data.DataLoader(
59+
torch.utils.data.TensorDataset(*inputs), batch_size=2, shuffle=False
60+
)
61+
62+
pytorch_outs = model(*inputs)
63+
64+
# %% Compile the model with Torch-TensorRT Autocast
65+
#
66+
# We compile the model with Torch-TensorRT Autocast by setting ``enable_autocast=True``, ``use_explicit_typing=True``, and
67+
# ``autocast_low_precision_type=torch.bfloat16``. To illustrate, we exclude the ``conv1`` node, all nodes with name
68+
# containing ``relu``, and ``torch.ops.aten.flatten.using_ints`` ATen op from Autocast. In addtion, we also set
69+
# ``autocast_max_output_threshold``, ``autocast_max_depth_of_reduction``, and ``autocast_calibration_dataloader``. Please refer to
70+
# the documentation for more details.
71+
72+
trt_autocast_mod = torch_tensorrt.compile(
73+
ep.module(),
74+
arg_inputs=inputs,
75+
min_block_size=1,
76+
use_python_runtime=True,
77+
use_explicit_typing=True,
78+
enable_autocast=True,
79+
autocast_low_precision_type=torch.bfloat16,
80+
autocast_excluded_nodes={"^conv1$", "relu"},
81+
autocast_excluded_ops={"torch.ops.aten.flatten.using_ints"},
82+
autocast_max_output_threshold=512,
83+
autocast_max_depth_of_reduction=None,
84+
autocast_calibration_dataloader=calibration_dataloader,
85+
)
86+
87+
autocast_outs = trt_autocast_mod(*inputs)
88+
89+
# %% Verify the outputs
90+
#
91+
# We verify both the dtype and values of the outputs of the model are correct.
92+
# As expected, ``fc1`` is in FP16 because of PyTorch Autocast;
93+
# ``pool1``, ``conv2``, and ``pool2`` are in BFP16 because of Torch-TensorRT Autocast;
94+
# the rest remain in FP32. Note that ``log`` is in FP32 because of PyTorch Autocast requirements.
95+
96+
should_be_fp32 = [
97+
autocast_outs[0],
98+
autocast_outs[1],
99+
autocast_outs[2],
100+
autocast_outs[5],
101+
autocast_outs[7],
102+
autocast_outs[9],
103+
]
104+
should_be_fp16 = [
105+
autocast_outs[8],
106+
]
107+
should_be_bf16 = [autocast_outs[3], autocast_outs[4], autocast_outs[6]]
69108

70-
autocast_outs = trt_autocast_mod(*inputs)
109+
assert all(
110+
a.dtype == torch.float32 for a in should_be_fp32
111+
), "Some Autocast outputs are not float32!"
112+
assert all(
113+
a.dtype == torch.float16 for a in should_be_fp16
114+
), "Some Autocast outputs are not float16!"
115+
assert all(
116+
a.dtype == torch.bfloat16 for a in should_be_bf16
117+
), "Some Autocast outputs are not bfloat16!"
118+
for i, (a, w) in enumerate(zip(autocast_outs, pytorch_outs)):
119+
assert torch.allclose(
120+
a.to(torch.float32), w.to(torch.float32), atol=1e-2, rtol=1e-2
121+
), f"Autocast and Pytorch outputs do not match! autocast_outs[{i}] = {a}, pytorch_outs[{i}] = {w}"
122+
print("All dtypes and values match!")

py/torch_tensorrt/dynamo/_compiler.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,13 @@ def compile(
543543
stacklevel=2,
544544
)
545545

546+
if kwargs.get("use_explicit_typing", False) == False:
547+
warnings.warn(
548+
"`use_explicit_typing` is deprecated. This setting will be removed and you should enable autocast instead.",
549+
DeprecationWarning,
550+
stacklevel=2,
551+
)
552+
546553
if "truncate_long_and_double" in kwargs.keys():
547554
if truncate_double is not _defaults.TRUNCATE_DOUBLE:
548555
raise ValueError(

py/torch_tensorrt/dynamo/_defaults.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
ENGINE_CACHE_DIR = os.path.join(tempfile.gettempdir(), "torch_tensorrt_engine_cache")
4747
ENGINE_CACHE_SIZE = 5368709120 # 5GB
4848
CUSTOM_ENGINE_CACHE = None
49-
USE_EXPLICIT_TYPING = False
49+
USE_EXPLICIT_TYPING = True
5050
USE_FP32_ACC = False
5151
REFIT_IDENTICAL_ENGINE_WEIGHTS = False
5252
STRIP_ENGINE_WEIGHTS = False

0 commit comments

Comments
 (0)