Skip to content

Commit a7c4a16

Browse files
zhuyuhua-vjanghaeng-intel
andauthored
Enable TF32 mode in GRU ops (#2512)
TF32 GRU op was not in place. This brings ~1.72x speed up on Molan Co-authored-by: Janghaeng Lee <janghaeng.lee@intel.com>
1 parent 5c7bfad commit a7c4a16

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

csrc/gpu/oneDNN/GRU.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,10 @@ static inline Tensor gru_forward(
8080
pattr.set_scratchpad_mode(dnnl::scratchpad_mode::user);
8181
#endif
8282

83+
if (data_t == memory::data_type::f32) {
84+
pattr.set_fpmath_mode(xpu::oneDNN::get_onednn_fpmath_mode());
85+
}
86+
8387
auto gru_forward_pd = lbr_gru_forward::primitive_desc(
8488
engine,
8589
train ? prop_kind::forward_training : prop_kind::forward_inference,
@@ -323,6 +327,10 @@ static inline std::tuple<Tensor, Tensor, Tensor, Tensor, Tensor> gru_backward(
323327
pattr.set_scratchpad_mode(dnnl::scratchpad_mode::user);
324328
#endif
325329

330+
if (data_dt == memory::data_type::f32) {
331+
pattr.set_fpmath_mode(xpu::oneDNN::get_onednn_fpmath_mode());
332+
}
333+
326334
auto gru_forward_pd = lbr_gru_forward::primitive_desc(
327335
engine,
328336
prop_kind::forward_training,

0 commit comments

Comments
 (0)