Skip to content

Conversation

@SavchenkoValeriy
Copy link
Member

Vector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation.

The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property.

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Valeriy Savchenko (SavchenkoValeriy)

Changes

Vector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation.

The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property.


Full diff: https://github.com/llvm/llvm-project/pull/171431.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+59)
  • (added) llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll (+115)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7a15d7b75f1b9..d8c381f2bd588 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -26556,6 +26556,46 @@ performVecReduceBitwiseCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
   return SDValue();
 }
 
+// Check if the value is derived from a vector comparison through operations
+// that preserve the all-zeros-or-all-ones property per lane.
+static bool isDerivedFromVectorCompare(SDValue V) {
+  switch (V.getOpcode()) {
+  case ISD::SETCC:
+    // Found a vector comparison - this is the source of 0/-1 values
+    return true;
+
+  case ISD::VECTOR_SHUFFLE:
+  case ISD::EXTRACT_SUBVECTOR:
+    // Any shuffle or subvector extract preserves the property
+    return isDerivedFromVectorCompare(V.getOperand(0));
+
+  case ISD::BITCAST: {
+    // Bitcast preserves the property only if source element size >=
+    // destination element size. This ensures each destination element
+    // is entirely within one source element.
+    // E.g., v4i32 -> v16i8 is safe (each byte is 0x00 or 0xFF)
+    // But v16i8 -> v4i32 is NOT safe (mixing bytes can create non-0/-1)
+    SDValue Src = V.getOperand(0);
+    EVT SrcVT = Src.getValueType();
+    EVT DstVT = V.getValueType();
+    if (SrcVT.isVector() && DstVT.isVector() &&
+        SrcVT.getScalarSizeInBits() >= DstVT.getScalarSizeInBits())
+      return isDerivedFromVectorCompare(Src);
+    return false;
+  }
+
+  case AArch64ISD::DUPLANE8:
+  case AArch64ISD::DUPLANE16:
+  case AArch64ISD::DUPLANE32:
+  case AArch64ISD::DUPLANE64:
+    // DUPLANE broadcasts one lane - preserves the property
+    return isDerivedFromVectorCompare(V.getOperand(0));
+
+  default:
+    return false;
+  }
+}
+
 static SDValue performSETCCCombine(SDNode *N,
                                    TargetLowering::DAGCombinerInfo &DCI,
                                    SelectionDAG &DAG) {
@@ -26628,6 +26668,25 @@ static SDValue performSETCCCombine(SDNode *N,
 
   EVT CmpVT = LHS.getValueType();
 
+  // setcc X, 0, setlt --> X  (when X is derived from a vector comparison)
+  // setcc 0, X, setgt --> X  (equivalent form)
+  //
+  // Vector comparisons produce all-zeros or all-ones per lane. For any value
+  // where each lane is either 0 or -1, comparing < 0 is an identity operation.
+  if (VT.isVector() && VT == CmpVT) {
+    SDValue Candidate;
+    // Match: setcc LHS, 0, setlt
+    if (Cond == ISD::SETLT && ISD::isConstantSplatVectorAllZeros(RHS.getNode()))
+      Candidate = LHS;
+    // Match: setcc 0, RHS, setgt (equivalent to RHS < 0)
+    else if (Cond == ISD::SETGT &&
+             ISD::isConstantSplatVectorAllZeros(LHS.getNode()))
+      Candidate = RHS;
+
+    if (Candidate && isDerivedFromVectorCompare(Candidate))
+      return Candidate;
+  }
+
   // NOTE: This exists as a combine only because it proved too awkward to match
   // splat(1) across all the NEON types during isel.
   APInt SplatLHSVal;
diff --git a/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
new file mode 100644
index 0000000000000..0510dd734769c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
@@ -0,0 +1,115 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+define <4 x i32> @direct_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_lt0:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %lt0 = icmp slt <4 x i32> %sext, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <4 x i32> @shuffle_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: shuffle_setcc_lt0:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    dup v0.4s, v0.s[2]
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %dup = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+  %lt0 = icmp slt <4 x i32> %dup, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <4 x i32> @direct_setcc_0gt(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_0gt:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %gt0 = icmp sgt <4 x i32> zeroinitializer, %sext
+  %sel = select <4 x i1> %gt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <8 x i16> @direct_setcc_lt0_v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: direct_setcc_lt0_v8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.8h, v1.8h, v0.8h
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <8 x i16> %a, %b
+  %sext = sext <8 x i1> %cmp to <8 x i16>
+  %lt0 = icmp slt <8 x i16> %sext, zeroinitializer
+  %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+  ret <8 x i16> %sel
+}
+
+define <4 x i32> @non_splat_shuffle(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: non_splat_shuffle:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NOT:     cmlt
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 3, i32 1, i32 2, i32 0>
+  %lt0 = icmp slt <4 x i32> %shuf, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}
+
+define <16 x i8> @bitcast_narrow(<4 x i32> %a, <4 x i32> %b, <16 x i8> %x, <16 x i8> %y) {
+; CHECK-LABEL: bitcast_narrow:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %bc = bitcast <4 x i32> %sext to <16 x i8>
+  %lt0 = icmp slt <16 x i8> %bc, zeroinitializer
+  %sel = select <16 x i1> %lt0, <16 x i8> %x, <16 x i8> %y
+  ret <16 x i8> %sel
+}
+
+define <8 x i16> @chain_shuffle_bitcast(<4 x i32> %a, <4 x i32> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: chain_shuffle_bitcast:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT:    dup v0.4s, v0.s[2]
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <4 x i32> %a, %b
+  %sext = sext <4 x i1> %cmp to <4 x i32>
+  %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+  %bc = bitcast <4 x i32> %shuf to <8 x i16>
+  %lt0 = icmp slt <8 x i16> %bc, zeroinitializer
+  %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+  ret <8 x i16> %sel
+}
+
+; NEGATIVE TEST: Widening bitcast should NOT be optimized
+define <4 x i32> @bitcast_widen_negative(<16 x i8> %a, <16 x i8> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: bitcast_widen_negative:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cmgt v0.16b, v1.16b, v0.16b
+; CHECK-NEXT:    cmlt v0.4s, v0.4s, #0
+; CHECK-NEXT:    bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT:    ret
+  %cmp = icmp slt <16 x i8> %a, %b
+  %sext = sext <16 x i1> %cmp to <16 x i8>
+  %bc = bitcast <16 x i8> %sext to <4 x i32>
+  %lt0 = icmp slt <4 x i32> %bc, zeroinitializer
+  %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %sel
+}

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you are trying to reimplement computeKnownBits / computeNumSignBits and perform optimizations based on it. A lot of the test cases already simplify in IR, but maybe not through a shuffle. I didn't look too much into the details, what would we be missing to make this happen through computeNumSignBits instead? Is there a generic DAG combine that could handle it if more AArch64 nodes were producing results?

@SavchenkoValeriy
Copy link
Member Author

It sounds like you are trying to reimplement computeKnownBits / computeNumSignBits and perform optimizations based on it. A lot of the test cases already simplify in IR, but maybe not through a shuffle. I didn't look too much into the details, what would we be missing to make this happen through computeNumSignBits instead? Is there a generic DAG combine that could handle it if more AArch64 nodes were producing results?

Hmm, I didn't look in that direction. I did try the most basic case (the first test case) and it didn't get transformed, so I assumed that nothing handles these cases. Let me check and come back to you.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Dec 9, 2025
@SavchenkoValeriy
Copy link
Member Author

@davemgreen I moved it to SelectionDAG and predicated it on boolean result === -1/0 mask using computeNumSignBits. Thanks for the suggestion!

@RKSimon
Copy link
Collaborator

RKSimon commented Dec 10, 2025

I'm travelling right now, but does this overlap with #164946 ?

@SavchenkoValeriy
Copy link
Member Author

I'm travelling right now, but does this overlap with #164946 ?

From what I can see, it doesn't.

For values with all lanes being either 0 or -1, comparing < 0 is an identity
operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants