GranQ: Efficient Channel-wise Quantization via Vectorized Pre-Scaling for Zero-Shot QAT

Inpyo Hong, Youngwan Jo, Hyojeong Lee, Sunghyun Ahn, Kijung Lee and Sanghyun Park

Abstract

Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. To mitigate this, existing methods typically employ per-channel scaling, but they still struggle due to the severe computational overhead during the accumulation process. To overcome this critical bottleneck, we propose GranQ, a novel activation quantization framework that introduces an efficient pre-scaling strategy. Unlike conventional channel-wise methods that repeatedly perform scaling operations during accumulation, GranQ applies scaling factors in a pre-scaling step through fully vectorized computation, eliminating runtime scaling overhead. This design enables GranQ to maintain fine-grained quantization accuracy while significantly reducing computational burden, particularly in low-bit quantization settings. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-ofthe- art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR- 10. Code is available at https://github.com/Inpyo-Hong/GranQ.

Requirements

python==3.8.0
numpy==1.16.4
requests==2.21.0
pyhocon==0.3.51
torchvision==0.4.0
torch==1.2.0+cu92
Pillow==7.2.0
termcolor==1.1.0

Usage

Example of 3-bit Quantization on GranQ.

conda create -n GranQ python=3.8
conda activate GranQ

cd CIFAR
pip install -r requirements.txt
bash run_3bit.sh
bash run_3bit_cifar100.sh

Experimental Results

Accuracy comparison of GranQ with zero-shot quantization methods.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
GranQ-main		GranQ-main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GranQ: Efficient Channel-wise Quantization via Vectorized Pre-Scaling for Zero-Shot QAT

Abstract

Requirements

Usage

Experimental Results

About

Uh oh!

Releases

Packages

Languages

Inpyo-Hong/GranQ

Folders and files

Latest commit

History

Repository files navigation

GranQ: Efficient Channel-wise Quantization via Vectorized Pre-Scaling for Zero-Shot QAT

Abstract

Requirements

Usage

Experimental Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages