Align _choose_qparams_affine with _choose_scale_float8 behavior #3447

AryanBagade · 2025-12-05T19:04:09Z

Changes keepdim default from False to True in _choose_qparams_affine to match _choose_scale_float8 behavior. This ensures scale/zero_point maintain the same rank as input tensor, making downstream handling more consistent.

Part 1 of fixing #3324

Changes

Core Changes (`torchao/quantization/quant_primitives.py`)

Changed keepdim: bool = False → keepdim: bool = True in both choose_qparams_affine (line 1220) and _choose_qparams_affine (line 1526)
Added reshape logic (lines 1600-1608) to match _choose_scale_float8 behavior
Saved original_input_size before reshaping to compute correct output shape
Added documentation explaining the alignment with _choose_scale_float8

Workflow Simplification (`torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py`)

Removed manual reshape logic (lines 247-254) that is no longer needed

Test Updates(`test/quantization/test_quant_primitives.py`)

Updated 3 test cases to squeeze scale/zero_point before comparison with reference values
All 7 test_choose_qparams tests now pass

Changes keepdim default from False to True in _choose_qparams_affine to match _choose_scale_float8 behavior. This ensures scale/zero_point maintain the same rank as input tensor, making downstream handling more consistent. Fixes pytorch#3324

pytorch-bot · 2025-12-05T19:04:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3447

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Replace all macOS instances with nextjs due CVE-2025-55182

❌ 13 New Failures

As of commit bdf1210 with merge base aa21b80 ():

NEW FAILURES - The following jobs have failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https... / linux-job (gh)
test/integration/test_integration.py::SmoothquantIntegrationTest::test_on_dummy_distilbert
Run 1xL4 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
test/integration/test_integration.py::SmoothquantIntegrationTest::test_on_dummy_distilbert
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_quantize_dequantize_channel_asym_4d
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_quantize_dequantize_channel_asym_4d
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_quantize_dequantize_channel_asym_4d
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_quantize_dequantize_channel_asym_4d
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.1, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_quantize_dequantize_channel_asym_4d
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/sparsity/test_sparse_api.py::TestQuantBlockSparseWeight::test_sparse_compile_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-12-05T19:19:53Z

torchao/quantization/quant_primitives.py

+    # Reshape scale and zero_point to match expected output shape
+    # This aligns with _choose_scale_float8 behavior
+    if keepdim:
+        output_shape = [
+            original_input_size[i] // block_size[i] for i in range(len(block_size))
+        ]
+        scale = scale.reshape(output_shape)
+        zero_point = zero_point.reshape(output_shape)


is this needed? keepdim=True is already used in int8_tensor.py

Edit: oh OK I think it is needed because the reshapes we are doing before

ao/torchao/quantization/quant_primitives.py

Line 1554 in aa21b80

input = input.view(shape_for_reduction)

Yes, this reshape is needed! int8_tensor.py passes scale/zero_point directly to quantize_affine, which internally reshapes the scale at line 461. So it doesn't need the output to be pre-reshaped. but the thing is, IntxUnpackedToInt8Tensor.init (lines131-136) asserts that scale.shape must exactly match tuple(n_blocks) before passing to quantize_affine:

assert scale.shape == tuple(n_blocks), ( f"Expected scale to have shape {n_blocks} (inferred from block_size={block_size}), but got {scale.shape}" )

basically without this reshape:

keepdim=True gives scale shape like (1, 5, 1) for block_size (10, 4) on input (10, 20)

But IntxUnpackedToInt8Tensor expects (1, 5)

The assertion would fail!

yeah that's correct, I'll approve the CI to run all the tests to see, especially this one:

ao/test/quantization/test_quant_primitives.py

Line 569 in aa21b80

block_size = (3, 3, 2, 2)

also in a future PR we could remove some of the reshaping logic in quantize_affine/dequantize _affine as well:

ao/torchao/quantization/quant_primitives.py

Lines 453 to 461 in aa21b80

shape_for_reduction, reduction_dims = _get_reduction_params(

block_size, input.size()

)

original_shape = input.shape

input = input.view(shape_for_reduction)

shape_after_reduction = shape_for_reduction

for i in reduction_dims:

shape_after_reduction[i] = 1

scale = scale.view(shape_after_reduction)

also eventually remove the block_size arg from these ops (bc-breaking)

sure, sounds like a plan, happy to contribute

jerryzh168 · 2025-12-05T19:35:26Z

Thanks, I think it's a good start, we can remove keepdim arg in next PR after this PR is merged

AryanBagade · 2025-12-05T20:02:58Z

I see 25 integration tests failed due to backward compatibility issues with the keepdim=True default change

jerryzh168 · 2025-12-06T00:42:58Z

it's expected, I think maybe just don't change the default for now, but turn the keepdim to True in these tests one by one to make sure these tests are fixed, and alls the callsites are fixed before making the switch would be better

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2025

jerryzh168 reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align _choose_qparams_affine with _choose_scale_float8 behavior #3447

Align _choose_qparams_affine with _choose_scale_float8 behavior #3447

AryanBagade commented Dec 5, 2025 •

edited by jerryzh168

Loading

Uh oh!

pytorch-bot bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

jerryzh168 Dec 5, 2025 •

edited

Loading

Uh oh!

AryanBagade Dec 5, 2025

Uh oh!

AryanBagade Dec 5, 2025

Uh oh!

jerryzh168 Dec 5, 2025

Uh oh!

jerryzh168 Dec 5, 2025

Uh oh!

AryanBagade Dec 5, 2025

Uh oh!

jerryzh168 commented Dec 5, 2025

Uh oh!

AryanBagade commented Dec 5, 2025

Uh oh!

jerryzh168 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	shape_for_reduction, reduction_dims = _get_reduction_params(
	block_size, input.size()
	)
	original_shape = input.shape
	input = input.view(shape_for_reduction)
	shape_after_reduction = shape_for_reduction
	for i in reduction_dims:
	shape_after_reduction[i] = 1
	scale = scale.view(shape_after_reduction)

Align _choose_qparams_affine with _choose_scale_float8 behavior #3447

Are you sure you want to change the base?

Align _choose_qparams_affine with _choose_scale_float8 behavior #3447

Conversation

AryanBagade commented Dec 5, 2025 • edited by jerryzh168 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Core Changes (torchao/quantization/quant_primitives.py)

Workflow Simplification (torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py)

Test Updates(test/quantization/test_quant_primitives.py)

Uh oh!

pytorch-bot bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3447

❗ 1 Active SEVs

❌ 13 New Failures

Uh oh!

jerryzh168 Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AryanBagade Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

AryanBagade Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

AryanBagade Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 5, 2025

Uh oh!

AryanBagade commented Dec 5, 2025

Uh oh!

jerryzh168 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AryanBagade commented Dec 5, 2025 •

edited by jerryzh168

Loading

Core Changes (`torchao/quantization/quant_primitives.py`)

Workflow Simplification (`torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py`)

Test Updates(`test/quantization/test_quant_primitives.py`)

pytorch-bot bot commented Dec 5, 2025 •

edited

Loading

jerryzh168 Dec 5, 2025 •

edited

Loading