[AArch64] Eliminate redundant setcc on vector comparison results #171431

SavchenkoValeriy · 2025-12-09T12:25:15Z

Vector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation.

The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property.

llvmbot · 2025-12-09T12:25:48Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Valeriy Savchenko (SavchenkoValeriy)

Changes

Vector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation.

The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property.

Full diff: https://github.com/llvm/llvm-project/pull/171431.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+59)
(added) llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll (+115)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7a15d7b75f1b9..d8c381f2bd588 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -26556,6 +26556,46 @@ performVecReduceBitwiseCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
   return SDValue();
 }
 
+// Check if the value is derived from a vector comparison through operations
+// that preserve the all-zeros-or-all-ones property per lane.
+static bool isDerivedFromVectorCompare(SDValue V) {
+  switch (V.getOpcode()) {
+  case ISD::SETCC:
+    // Found a vector comparison - this is the source of 0/-1 values
+    return true;
+
+  case ISD::VECTOR_SHUFFLE:
+  case ISD::EXTRACT_SUBVECTOR:
+    // Any shuffle or subvector extract preserves the property
+    return isDerivedFromVectorCompare(V.getOperand(0));
+
+  case ISD::BITCAST: {
+    // Bitcast preserves the property only if source element size >=
+    // destination element size. This ensures each destination element
+    // is entirely within one source element.
+    // E.g., v4i32 -> v16i8 is safe (each byte is 0x00 or 0xFF)
+    // But v16i8 -> v4i32 is NOT safe (mixing bytes can create non-0/-1)
+    SDValue Src = V.getOperand(0);
+    EVT SrcVT = Src.getValueType();
+    EVT DstVT = V.getValueType();
+    if (SrcVT.isVector() && DstVT.isVector() &&
+        SrcVT.getScalarSizeInBits() >= DstVT.getScalarSizeInBits())
+      return isDerivedFromVectorCompare(Src);
+    return false;
+  }
+
+  case AArch64ISD::DUPLANE8:
+  case AArch64ISD::DUPLANE16:
+  case AArch64ISD::DUPLANE32:
+  case AArch64ISD::DUPLANE64:
+    // DUPLANE broadcasts one lane - preserves the property
+    return isDerivedFromVectorCompare(V.getOperand(0));
+
+  default:
+    return false;
+  }
+}
+
 static SDValue performSETCCCombine(SDNode *N,
                                    TargetLowering::DAGCombinerInfo &DCI,
                                    SelectionDAG &DAG) {
@@ -26628,6 +26668,25 @@ static SDValue performSETCCCombine(SDNode *N,
 
   EVT CmpVT = LHS.getValueType();
 
+  // setcc X, 0, setlt --> X  (when X is derived from a vector comparison)
+  // setcc 0, X, setgt --> X  (equivalent form)
+  //
+  // Vector comparisons produce all-zeros or all-ones per lane. For any value
+  // where each lane is either 0 or -1, comparing < 0 is an identity operation.
+  if (VT.isVector() && VT == CmpVT) {
+    SDValue Candidate;
+    // Match: setcc LHS, 0, setlt
+    if (Cond == ISD::SETLT && ISD::isConstantSplatVectorAllZeros(RHS.getNode()))
+      Candidate = LHS;
+    // Match: setcc 0, RHS, setgt (equivalent to RHS < 0)
+    else if (Cond == ISD::SETGT &&
+             ISD::isConstantSplatVectorAllZeros(LHS.getNode()))
+      Candidate = RHS;
+
+    if (Candidate && isDerivedFromVectorCompare(Candidate))
+      return Candidate;
+  }
+
   // NOTE: This exists as a combine only because it proved too awkward to match
   // splat(1) across all the NEON types during isel.
   APInt SplatLHSVal;
diff --git a/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
new file mode 100644
index 0000000000000..0510dd734769c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
@@ -0,0 +1,115 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+define <4 x i32> @direct_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_lt0:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %lt0 = icmp slt <4 x i32> %sext, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <4 x i32> @shuffle_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: shuffle_setcc_lt0:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    dup v0.4s, v0.s[2]
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %dup = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+  %lt0 = icmp slt <4 x i32> %dup, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <4 x i32> @direct_setcc_0gt(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_0gt:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %gt0 = icmp sgt <4 x i32> zeroinitializer, %sext
+  %sel = select <4 x i1> %gt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <8 x i16> @direct_setcc_lt0_v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: direct_setcc_lt0_v8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.8h, v1.8h, v0.8h
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <8 x i16> %a, %b
+  %sext = sext <8 x i1> %cmp to <8 x i16>
+  %lt0 = icmp slt <8 x i16> %sext, zeroinitializer
+  %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+  ret <8 x i16> %sel
+}
+
+define <4 x i32> @non_splat_shuffle(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: non_splat_shuffle:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NOT:     cmlt
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 3, i32 1, i32 2, i32 0>
+  %lt0 = icmp slt <4 x i32> %shuf, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <16 x i8> @bitcast_narrow(<4 x i32> %a, <4 x i32> %b, <16 x i8> %x, <16 x i8> %y) {
+; CHECK-LABEL: bitcast_narrow:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %bc = bitcast <4 x i32> %sext to <16 x i8>
+  %lt0 = icmp slt <16 x i8> %bc, zeroinitializer
+  %sel = select <16 x i1> %lt0, <16 x i8> %x, <16 x i8> %y
+  ret <16 x i8> %sel
+}
+
+define <8 x i16> @chain_shuffle_bitcast(<4 x i32> %a, <4 x i32> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: chain_shuffle_bitcast:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    dup v0.4s, v0.s[2]
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+  %bc = bitcast <4 x i32> %shuf to <8 x i16>
+  %lt0 = icmp slt <8 x i16> %bc, zeroinitializer
+  %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+  ret <8 x i16> %sel
+}
+
+; NEGATIVE TEST: Widening bitcast should NOT be optimized
+define <4 x i32> @bitcast_widen_negative(<16 x i8> %a, <16 x i8> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: bitcast_widen_negative:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.16b, v1.16b, v0.16b
+; CHECK-NEXT:    cmlt v0.4s, v0.4s, #0
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <16 x i8> %a, %b
+  %sext = sext <16 x i1> %cmp to <16 x i8>
+  %bc = bitcast <16 x i8> %sext to <4 x i32>
+  %lt0 = icmp slt <4 x i32> %bc, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}

davemgreen

It sounds like you are trying to reimplement computeKnownBits / computeNumSignBits and perform optimizations based on it. A lot of the test cases already simplify in IR, but maybe not through a shuffle. I didn't look too much into the details, what would we be missing to make this happen through computeNumSignBits instead? Is there a generic DAG combine that could handle it if more AArch64 nodes were producing results?

SavchenkoValeriy · 2025-12-09T14:00:45Z

It sounds like you are trying to reimplement computeKnownBits / computeNumSignBits and perform optimizations based on it. A lot of the test cases already simplify in IR, but maybe not through a shuffle. I didn't look too much into the details, what would we be missing to make this happen through computeNumSignBits instead? Is there a generic DAG combine that could handle it if more AArch64 nodes were producing results?

Hmm, I didn't look in that direction. I did try the most basic case (the first test case) and it didn't get transformed, so I assumed that nothing handles these cases. Let me check and come back to you.

SavchenkoValeriy · 2025-12-09T14:58:06Z

@davemgreen I moved it to SelectionDAG and predicated it on boolean result === -1/0 mask using computeNumSignBits. Thanks for the suggestion!

llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

RKSimon · 2025-12-10T20:22:33Z

I'm travelling right now, but does this overlap with #164946 ?

SavchenkoValeriy · 2025-12-10T20:36:41Z

I'm travelling right now, but does this overlap with #164946 ?

From what I can see, it doesn't.

For values with all lanes being either 0 or -1, comparing < 0 is an identity operation.

SavchenkoValeriy requested review from aemerson, davemgreen and jroelofs December 9, 2025 12:25

llvmbot added the backend:AArch64 label Dec 9, 2025

davemgreen reviewed Dec 9, 2025

View reviewed changes

SavchenkoValeriy force-pushed the feat/redundant-setcc branch from f587fe3 to ac0e24a Compare December 9, 2025 14:52

llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Dec 9, 2025

SavchenkoValeriy requested a review from arsenm December 9, 2025 15:37

davemgreen reviewed Dec 10, 2025

View reviewed changes

llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll Outdated Show resolved Hide resolved

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

davemgreen requested review from RKSimon and topperc December 10, 2025 18:06

SavchenkoValeriy force-pushed the feat/redundant-setcc branch from ac0e24a to 43fb6f1 Compare December 10, 2025 20:30

[SelectionDAG] Eliminate redundant setcc on vector comparison results

1864330

For values with all lanes being either 0 or -1, comparing < 0 is an identity operation.

SavchenkoValeriy force-pushed the feat/redundant-setcc branch from 43fb6f1 to 1864330 Compare December 10, 2025 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Eliminate redundant setcc on vector comparison results #171431

[AArch64] Eliminate redundant setcc on vector comparison results #171431

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

llvmbot commented Dec 9, 2025 •

edited

Loading

Uh oh!

davemgreen left a comment

Uh oh!

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

RKSimon commented Dec 10, 2025

Uh oh!

SavchenkoValeriy commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AArch64] Eliminate redundant setcc on vector comparison results #171431

Are you sure you want to change the base?

[AArch64] Eliminate redundant setcc on vector comparison results #171431

Conversation

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

llvmbot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

SavchenkoValeriy commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

RKSimon commented Dec 10, 2025

Uh oh!

SavchenkoValeriy commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Dec 9, 2025 •

edited

Loading