Output wrong result when using w4a8 linear

I am trying to use `bitblas.Linear` with `A_dtype="int8"` and `W_dtype="int4"`.
 I expected the output to be all `16.0` (since both inputs and weights are filled with ones), but I got unexpected values.

```python
import torch
import bitblas

batch_size = 1
num_tokens = 2
in_features = 16
out_features = 32

model = bitblas.Linear(
    in_features=in_features,
    out_features=out_features,
    bias=False,
    A_dtype="int8",  # activation A dtype
    W_dtype="int4",  # weight W dtype
    accum_dtype="int32",  # accumulation dtype
    out_dtype="float32",  # output dtype
    # configs for weight only quantization
    group_size=None,  # setting for grouped quantization
    with_scaling=False,  # setting for scaling factor
    with_zeros=False,  # setting for zeros
    zeros_mode=None,  # setting for how to calculating zeros
    # Target optimization var for dynamic symbolic.
    # For detailed information please checkout docs/PythonAPI.md
    # By default, the optimization var is [1, 16, 32, 64, 128, 256, 512]
    opt_M=[1, 16, 32, 64, 128],
)

x = torch.ones((batch_size, num_tokens, in_features)).to(torch.int8)
w = torch.ones((out_features, in_features)).to(torch.int8)

x = x.cuda()
w = w.cuda()
model = model.cuda()

model.load_and_transform_weight(w)
model.eval()

with torch.no_grad():
    y = model(x)
print(y)
```

result:

```
tensor([[[  32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., 32., 32., 32., 32., 32., 32., 32., 32., 32.,
            32., -104.],
         [  16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16., 16., 16., 16., 16., 16., 16., 16., 16.,
            16., 16.]]], device='cuda:0')
```

environment:

```
torch: 2.3.0+cu121
cuda version: 12.1
GPU: NVIDIA GeForce RTX 4090
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Output wrong result when using w4a8 linear #313

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Output wrong result when using w4a8 linear #313

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions