You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
527
527
enable_autocast (bool): Whether to enable autocast. If enabled, use_explicit_typing will be set to True.
528
528
autocast_low_precision_type (Optional[Union[torch.dtype, dtype]]): The precision to reduce to. We currently support torch.float16 and torch.bfloat16. Default is None, which means no low precision is used.
529
-
autocast_excluded_nodes (Collection[str]): The set of regex patterns to match node names that should remain in FP32. Default is [].
529
+
autocast_excluded_nodes (Collection[str]): The set of regex patterns to match user-specified node names that should remain in FP32. Default is [].
530
530
autocast_excluded_ops (Collection[Target]): The set of targets (ATen ops) that should remain in FP32. Default is [].
531
-
autocast_data_max (float): Maximum absolute value for node outputs, nodes with outputs greater than this value will remain in FP32. Default is 512.
532
-
autocast_max_depth_of_reduction (Optional[int]): Maximum depth of reduction allowed in low precision. Nodes with higher reduction depths will remain in FP32. If not provided, infinity will be used. Default is None.
531
+
autocast_max_output_threshold (float): Maximum absolute value for node outputs, nodes with outputs greater than this value will remain in FP32. Default is 512.
532
+
autocast_max_depth_of_reduction (Optional[int]): Maximum depth of reduction allowed in low precision. Nodes with higher reduction depths will remain in FP32. This helps prevent excessive accuracy loss in operations particularly sensitive to reduced precision, as higher-depth reductions may amplify computation errors in low precision formats. If not provided, infinity will be used. Default is None.
533
533
autocast_calibration_dataloader (Optional[torch.utils.data.DataLoader]): The dataloader to use for autocast calibration. Default is None.
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@
8
8
fromtorch_tensorrt.dynamo._defaultsimport (
9
9
ASSUME_DYNAMIC_SHAPE_SUPPORT,
10
10
AUTOCAST_CALIBRATION_DATALOADER,
11
-
AUTOCAST_DATA_MAX,
12
11
AUTOCAST_EXCLUDED_NODES,
13
12
AUTOCAST_EXCLUDED_OPS,
14
13
AUTOCAST_LOW_PRECISION_TYPE,
15
14
AUTOCAST_MAX_DEPTH_OF_REDUCTION,
15
+
AUTOCAST_MAX_OUTPUT_THRESHOLD,
16
16
CACHE_BUILT_ENGINES,
17
17
DISABLE_TF32,
18
18
DLA_GLOBAL_DRAM_SIZE,
@@ -107,10 +107,10 @@ class CompilationSettings:
107
107
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
108
108
enable_autocast (bool): Whether to enable autocast. If enabled, use_explicit_typing will be set to True.
109
109
autocast_low_precision_type (Optional[Union[torch.dtype, dtype]]): The precision to reduce to. We currently support torch.float16 and torch.bfloat16. Default is None, which means no low precision is used.
110
-
autocast_excluded_nodes (Collection[str]): The set of regex patterns to match node names that should remain in FP32. Default is [].
110
+
autocast_excluded_nodes (Collection[str]): The set of regex patterns to match user-specified node names that should remain in FP32. Default is [].
111
111
autocast_excluded_ops (Collection[Target]): The set of targets (ATen ops) that should remain in FP32. Default is [].
112
-
autocast_data_max (float): Maximum absolute value for node outputs, nodes with outputs greater than this value will remain in FP32. Default is 512.
113
-
autocast_max_depth_of_reduction (Optional[int]): Maximum depth of reduction allowed in low precision. Nodes with higher reduction depths will remain in FP32. If not provided, infinity will be used. Default is None.
112
+
autocast_max_output_threshold (float): Maximum absolute value for node outputs, nodes with outputs greater than this value will remain in FP32. Default is 512.
113
+
autocast_max_depth_of_reduction (Optional[int]): Maximum depth of reduction allowed in low precision. Nodes with higher reduction depths will remain in FP32. This helps prevent excessive accuracy loss in operations particularly sensitive to reduced precision, as higher-depth reductions may amplify computation errors in low precision formats. If not provided, infinity will be used. Default is None.
114
114
autocast_calibration_dataloader (Optional[torch.utils.data.DataLoader]): The dataloader to use for autocast calibration. Default is None.
"""Rule for keeping nodes with high depth of reduction in high precision."""
136
+
"""
137
+
Rule for keeping nodes with high depth of reduction in high precision. This helps prevent excessive accuracy loss in operations particularly sensitive to reduced precision, as higher-depth reductions may amplify computation errors in low precision formats.
138
+
Reduction ops are those that aggregate data across one or more axes, decreasing the dimensionality of the input tensor, such as convolution, gemm, etc.
"""Trace the intermediate node outputs of a graph module.
79
+
80
+
Args:
81
+
gm (torch.fx.GraphModule): The graph module to trace the intermediate node outputs of.
82
+
calibration_dataloader (torch.utils.data.DataLoader): The dataloader to use for tracing.
83
+
excluded_ops (Set[torch.fx.node.Target]): The set of ATen ops that should be excluded from the trace. For example, `{torch.ops.higher_order.wrap_with_autocast, operator.getitem}`. Default is an empty set.
84
+
85
+
Returns:
86
+
Dict[str, torch.Tensor]: A dictionary of intermediate node outputs. The key is the node name and the value is the tensor.
0 commit comments