-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AArch64] Eliminate redundant setcc on vector comparison results #171431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-aarch64 Author: Valeriy Savchenko (SavchenkoValeriy) ChangesVector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation. The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property. Full diff: https://github.com/llvm/llvm-project/pull/171431.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7a15d7b75f1b9..d8c381f2bd588 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -26556,6 +26556,46 @@ performVecReduceBitwiseCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
return SDValue();
}
+// Check if the value is derived from a vector comparison through operations
+// that preserve the all-zeros-or-all-ones property per lane.
+static bool isDerivedFromVectorCompare(SDValue V) {
+ switch (V.getOpcode()) {
+ case ISD::SETCC:
+ // Found a vector comparison - this is the source of 0/-1 values
+ return true;
+
+ case ISD::VECTOR_SHUFFLE:
+ case ISD::EXTRACT_SUBVECTOR:
+ // Any shuffle or subvector extract preserves the property
+ return isDerivedFromVectorCompare(V.getOperand(0));
+
+ case ISD::BITCAST: {
+ // Bitcast preserves the property only if source element size >=
+ // destination element size. This ensures each destination element
+ // is entirely within one source element.
+ // E.g., v4i32 -> v16i8 is safe (each byte is 0x00 or 0xFF)
+ // But v16i8 -> v4i32 is NOT safe (mixing bytes can create non-0/-1)
+ SDValue Src = V.getOperand(0);
+ EVT SrcVT = Src.getValueType();
+ EVT DstVT = V.getValueType();
+ if (SrcVT.isVector() && DstVT.isVector() &&
+ SrcVT.getScalarSizeInBits() >= DstVT.getScalarSizeInBits())
+ return isDerivedFromVectorCompare(Src);
+ return false;
+ }
+
+ case AArch64ISD::DUPLANE8:
+ case AArch64ISD::DUPLANE16:
+ case AArch64ISD::DUPLANE32:
+ case AArch64ISD::DUPLANE64:
+ // DUPLANE broadcasts one lane - preserves the property
+ return isDerivedFromVectorCompare(V.getOperand(0));
+
+ default:
+ return false;
+ }
+}
+
static SDValue performSETCCCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {
@@ -26628,6 +26668,25 @@ static SDValue performSETCCCombine(SDNode *N,
EVT CmpVT = LHS.getValueType();
+ // setcc X, 0, setlt --> X (when X is derived from a vector comparison)
+ // setcc 0, X, setgt --> X (equivalent form)
+ //
+ // Vector comparisons produce all-zeros or all-ones per lane. For any value
+ // where each lane is either 0 or -1, comparing < 0 is an identity operation.
+ if (VT.isVector() && VT == CmpVT) {
+ SDValue Candidate;
+ // Match: setcc LHS, 0, setlt
+ if (Cond == ISD::SETLT && ISD::isConstantSplatVectorAllZeros(RHS.getNode()))
+ Candidate = LHS;
+ // Match: setcc 0, RHS, setgt (equivalent to RHS < 0)
+ else if (Cond == ISD::SETGT &&
+ ISD::isConstantSplatVectorAllZeros(LHS.getNode()))
+ Candidate = RHS;
+
+ if (Candidate && isDerivedFromVectorCompare(Candidate))
+ return Candidate;
+ }
+
// NOTE: This exists as a combine only because it proved too awkward to match
// splat(1) across all the NEON types during isel.
APInt SplatLHSVal;
diff --git a/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
new file mode 100644
index 0000000000000..0510dd734769c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/setcc-redundant-cmlt.ll
@@ -0,0 +1,115 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+define <4 x i32> @direct_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_lt0:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %lt0 = icmp slt <4 x i32> %sext, zeroinitializer
+ %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %sel
+}
+
+define <4 x i32> @shuffle_setcc_lt0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: shuffle_setcc_lt0:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT: dup v0.4s, v0.s[2]
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %dup = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+ %lt0 = icmp slt <4 x i32> %dup, zeroinitializer
+ %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %sel
+}
+
+define <4 x i32> @direct_setcc_0gt(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: direct_setcc_0gt:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %gt0 = icmp sgt <4 x i32> zeroinitializer, %sext
+ %sel = select <4 x i1> %gt0, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %sel
+}
+
+define <8 x i16> @direct_setcc_lt0_v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: direct_setcc_lt0_v8i16:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.8h, v1.8h, v0.8h
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <8 x i16> %a, %b
+ %sext = sext <8 x i1> %cmp to <8 x i16>
+ %lt0 = icmp slt <8 x i16> %sext, zeroinitializer
+ %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+ ret <8 x i16> %sel
+}
+
+define <4 x i32> @non_splat_shuffle(<4 x i32> %a, <4 x i32> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: non_splat_shuffle:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NOT: cmlt
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 3, i32 1, i32 2, i32 0>
+ %lt0 = icmp slt <4 x i32> %shuf, zeroinitializer
+ %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %sel
+}
+
+define <16 x i8> @bitcast_narrow(<4 x i32> %a, <4 x i32> %b, <16 x i8> %x, <16 x i8> %y) {
+; CHECK-LABEL: bitcast_narrow:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %bc = bitcast <4 x i32> %sext to <16 x i8>
+ %lt0 = icmp slt <16 x i8> %bc, zeroinitializer
+ %sel = select <16 x i1> %lt0, <16 x i8> %x, <16 x i8> %y
+ ret <16 x i8> %sel
+}
+
+define <8 x i16> @chain_shuffle_bitcast(<4 x i32> %a, <4 x i32> %b, <8 x i16> %x, <8 x i16> %y) {
+; CHECK-LABEL: chain_shuffle_bitcast:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.4s, v1.4s, v0.4s
+; CHECK-NEXT: dup v0.4s, v0.s[2]
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <4 x i32> %a, %b
+ %sext = sext <4 x i1> %cmp to <4 x i32>
+ %shuf = shufflevector <4 x i32> %sext, <4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
+ %bc = bitcast <4 x i32> %shuf to <8 x i16>
+ %lt0 = icmp slt <8 x i16> %bc, zeroinitializer
+ %sel = select <8 x i1> %lt0, <8 x i16> %x, <8 x i16> %y
+ ret <8 x i16> %sel
+}
+
+; NEGATIVE TEST: Widening bitcast should NOT be optimized
+define <4 x i32> @bitcast_widen_negative(<16 x i8> %a, <16 x i8> %b, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: bitcast_widen_negative:
+; CHECK: // %bb.0:
+; CHECK-NEXT: cmgt v0.16b, v1.16b, v0.16b
+; CHECK-NEXT: cmlt v0.4s, v0.4s, #0
+; CHECK-NEXT: bsl v0.16b, v2.16b, v3.16b
+; CHECK-NEXT: ret
+ %cmp = icmp slt <16 x i8> %a, %b
+ %sext = sext <16 x i1> %cmp to <16 x i8>
+ %bc = bitcast <16 x i8> %sext to <4 x i32>
+ %lt0 = icmp slt <4 x i32> %bc, zeroinitializer
+ %sel = select <4 x i1> %lt0, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %sel
+}
|
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you are trying to reimplement computeKnownBits / computeNumSignBits and perform optimizations based on it. A lot of the test cases already simplify in IR, but maybe not through a shuffle. I didn't look too much into the details, what would we be missing to make this happen through computeNumSignBits instead? Is there a generic DAG combine that could handle it if more AArch64 nodes were producing results?
Hmm, I didn't look in that direction. I did try the most basic case (the first test case) and it didn't get transformed, so I assumed that nothing handles these cases. Let me check and come back to you. |
f587fe3 to
ac0e24a
Compare
|
@davemgreen I moved it to SelectionDAG and predicated it on boolean result === -1/0 mask using |
|
I'm travelling right now, but does this overlap with #164946 ? |
ac0e24a to
43fb6f1
Compare
From what I can see, it doesn't. |
For values with all lanes being either 0 or -1, comparing < 0 is an identity operation.
43fb6f1 to
1864330
Compare
Vector comparisons produce all-zeros or all-ones per lane. For values with this property, comparing < 0 is an identity operation. This eliminates redundant cmlt/cmgt instructions after another comparison operation.
The optimization traces through shuffles, narrowing bitcasts, and DUPLANE operations that preserve the all-zeros-or-all-ones per-lane property.