Skip to content

Commit 1f09fc2

Browse files
rogerxfeng8Zhenhuan Chen
andauthored
fix woq int4 oom issue by deleting old weight (#4631) (#4650)
Co-authored-by: Zhenhuan Chen <zhenhuan.chen@intel.com>
1 parent 70d776d commit 1f09fc2

File tree

1 file changed

+6
-0
lines changed
  • intel_extension_for_pytorch/transformers/models/xpu/optimize_transformers/modules/transformer_modules

1 file changed

+6
-0
lines changed

intel_extension_for_pytorch/transformers/models/xpu/optimize_transformers/modules/transformer_modules/QuantizedMlp.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,14 +213,20 @@ def transpose_inner(self):
213213
self.mlp_silu_qweight = torch.stack(
214214
(self.fc_out_quant.qweight, self.fc_in_quant.qweight)
215215
).contiguous()
216+
del self.fc_out_quant.qweight
217+
del self.fc_in_quant.qweight
216218
self.mlp_silu_scales = torch.stack(
217219
(self.fc_out_quant.scales, self.fc_in_quant.scales)
218220
).contiguous()
221+
del self.fc_out_quant.scales
222+
del self.fc_in_quant.scales
219223
self.mlp_silu_qzeros = None
220224
if self.fc_out_quant.qzeros is not None:
221225
self.mlp_silu_qzeros = torch.stack(
222226
(self.fc_out_quant.qzeros, self.fc_in_quant.qzeros)
223227
).contiguous()
228+
del self.fc_out_quant.qzeros
229+
del self.fc_in_quant.qzeros
224230

225231
def inter_mm(self, hidden_states):
226232
assert self.fc_in_quant.blocksize == self.fc_out_quant.blocksize

0 commit comments

Comments
 (0)