-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
I am trying to use bitblas.Linear with A_dtype="int8" and W_dtype="int4".
I expected the output to be all 16.0 (since both inputs and weights are filled with ones), but I got unexpected values.
import torch
import bitblas
batch_size = 1
num_tokens = 2
in_features = 16
out_features = 32
model = bitblas.Linear(
in_features=in_features,
out_features=out_features,
bias=False,
A_dtype="int8", # activation A dtype
W_dtype="int4", # weight W dtype
accum_dtype="int32", # accumulation dtype
out_dtype="float32", # output dtype
# configs for weight only quantization
group_size=None, # setting for grouped quantization
with_scaling=False, # setting for scaling factor
with_zeros=False, # setting for zeros
zeros_mode=None, # setting for how to calculating zeros
# Target optimization var for dynamic symbolic.
# For detailed information please checkout docs/PythonAPI.md
# By default, the optimization var is [1, 16, 32, 64, 128, 256, 512]
opt_M=[1, 16, 32, 64, 128],
)
x = torch.ones((batch_size, num_tokens, in_features)).to(torch.int8)
w = torch.ones((out_features, in_features)).to(torch.int8)
x = x.cuda()
w = w.cuda()
model = model.cuda()
model.load_and_transform_weight(w)
model.eval()
with torch.no_grad():
y = model(x)
print(y)result:
tensor([[[ 32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
32., -104.],
[ 16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
16., 16.]]], device='cuda:0')
environment:
torch: 2.3.0+cu121
cuda version: 12.1
GPU: NVIDIA GeForce RTX 4090
Metadata
Metadata
Assignees
Labels
No labels