[VPlan] First step towards VPlan cost modeling. #67934

fhahn · 2023-10-01T21:09:32Z

This adds a new computeCost interface to VPReicpeBase and implements it
for VPWidenRecipe and VPWidenIntOrFpInductionRecipe.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF. For recipes that do not yet implement computeCost, the
legacy cost for the underlying instruction is used.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree.

Builds on VPlan type inference (included in this PR as separate commit).

llvmbot · 2023-10-01T21:11:04Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Changes

This adds a new computeCost interface to VPReicpeBase and implements it
for VPWidenRecipe and VPWidenIntOrFpInductionRecipe.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF. For recipes that do not yet implement computeCost, the
legacy cost for the underlying instruction is used.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree.

Builds on VPlan type inference (included in this PR as separate commit).

Patch is 26.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67934.diff

7 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+4)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+119-14)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+29-3)
(added) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+225)
(added) llvm/lib/Transforms/Vectorize/VPlanAnalysis.h (+56)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+92)

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 998dfd956575d3c..9674094024b9ec7 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -6,6 +6,7 @@ add_llvm_component_library(LLVMVectorize
   Vectorize.cpp
   VectorCombine.cpp
   VPlan.cpp
+  VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 9691e1cd4f2ed00..08142fa014c178d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -316,6 +316,8 @@ class LoopVectorizationPlanner {
   /// A builder used to construct the current plan.
   VPBuilder Builder;
 
+  InstructionCost computeCost(VPlan &Plan, ElementCount VF);
+
 public:
   LoopVectorizationPlanner(Loop *L, LoopInfo *LI, const TargetLibraryInfo *TLI,
                            const TargetTransformInfo &TTI,
@@ -339,6 +341,8 @@ class LoopVectorizationPlanner {
   /// Return the best VPlan for \p VF.
   VPlan &getBestPlanFor(ElementCount VF) const;
 
+  std::pair<VPlan &, ElementCount> getBestPlan();
+
   /// Generate the IR code for the body of the vectorized loop according to the
   /// best selected \p VF, \p UF and VPlan \p BestPlan.
   /// TODO: \p IsEpilogueVectorization is needed to avoid issues due to epilogue
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cc17d91d4f43727..b34d11e516ebbc3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1679,21 +1679,11 @@ class LoopVectorizationCostModel {
   /// of elements.
   ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements);
 
-  /// Returns the execution time cost of an instruction for a given vector
-  /// width. Vector width of one means scalar.
-  VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);
-
   /// The cost-computation logic from getInstructionCost which provides
   /// the vector type as an output parameter.
   InstructionCost getInstructionCost(Instruction *I, ElementCount VF,
                                      Type *&VectorTy);
 
-  /// Return the cost of instructions in an inloop reduction pattern, if I is
-  /// part of that pattern.
-  std::optional<InstructionCost>
-  getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy,
-                          TTI::TargetCostKind CostKind);
-
   /// Calculate vectorization cost of memory instruction \p I.
   InstructionCost getMemoryInstructionCost(Instruction *I, ElementCount VF);
 
@@ -1839,6 +1829,15 @@ class LoopVectorizationCostModel {
   }
 
 public:
+  /// Returns the execution time cost of an instruction for a given vector
+  /// width. Vector width of one means scalar.
+  VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);
+  /// Return the cost of instructions in an inloop reduction pattern, if I is
+  /// part of that pattern.
+  std::optional<InstructionCost>
+  getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy,
+                          TTI::TargetCostKind CostKind);
+
   /// The loop that we evaluate.
   Loop *TheLoop;
 
@@ -5369,7 +5368,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor(
             ? Candidate.Width.getKnownMinValue() * AssumedMinimumVscale
             : Candidate.Width.getFixedValue();
     LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i
-                      << " costs: " << (Candidate.Cost / Width));
+                      << " costs: " << Candidate.Cost / Width);
     if (i.isScalable())
       LLVM_DEBUG(dbgs() << " (assuming a minimum vscale of "
                         << AssumedMinimumVscale << ")");
@@ -7529,6 +7528,108 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
   return VF;
 }
 
+InstructionCost LoopVectorizationPlanner::computeCost(VPlan &Plan,
+                                                      ElementCount VF) {
+  InstructionCost Cost = 0;
+
+  VPBasicBlock *Header =
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getEntry());
+
+  // Cost modeling for inductions is inaccurate in the legacy cost model. Try as
+  // to match it here initially during VPlan cost model bring up:
+  // * VPWidenIntOrFpInductionRecipes implement computeCost,
+  // * VPWidenPointerInductionRecipe costs seem to be 0 in the legacy cost model
+  // * other inductions only have a cost of 1 (i.e. the cost of the scalar
+  // induction increment).
+  unsigned NumWideIVs = count_if(Header->phis(), [](VPRecipeBase &R) {
+    return isa<VPWidenPointerInductionRecipe>(&R) ||
+           (isa<VPWidenIntOrFpInductionRecipe>(&R) &&
+            !cast<VPWidenIntOrFpInductionRecipe>(&R)->getTruncInst());
+  });
+  Cost += Legal->getInductionVars().size() - NumWideIVs;
+
+  for (VPBlockBase *Block : to_vector(vp_depth_first_shallow(Header))) {
+    if (auto *Region = dyn_cast<VPRegionBlock>(Block)) {
+      assert(Region->isReplicator());
+      VPBasicBlock *Then =
+          cast<VPBasicBlock>(Region->getEntry()->getSuccessors()[0]);
+      for (VPRecipeBase &R : *Then) {
+        if (isa<VPInstruction, VPScalarIVStepsRecipe>(&R))
+          continue;
+        auto *RepR = cast<VPReplicateRecipe>(&R);
+        Cost += CM.getInstructionCost(RepR->getUnderlyingInstr(), VF).first;
+      }
+      continue;
+    }
+
+    VPCostContext Ctx(CM.TTI, OrigLoop->getHeader()->getContext());
+    for (VPRecipeBase &R : *cast<VPBasicBlock>(Block)) {
+      InstructionCost RecipeCost = R.computeCost(VF, Ctx);
+      if (!RecipeCost.isValid()) {
+        if (auto *IG = dyn_cast<VPInterleaveRecipe>(&R)) {
+          RecipeCost = CM.getInstructionCost(IG->getInsertPos(), VF).first;
+        } else if (auto *WidenMem =
+                       dyn_cast<VPWidenMemoryInstructionRecipe>(&R)) {
+          RecipeCost =
+              CM.getInstructionCost(&WidenMem->getIngredient(), VF).first;
+        } else if (auto *I = dyn_cast_or_null<Instruction>(
+                       R.getVPSingleValue()->getUnderlyingValue()))
+          RecipeCost = CM.getInstructionCost(I, VF).first;
+        else
+          continue;
+      }
+      if (ForceTargetInstructionCost.getNumOccurrences() > 0)
+        Cost = InstructionCost(ForceTargetInstructionCost);
+
+      LLVM_DEBUG({
+        dbgs() << "Cost of " << RecipeCost << " for " << VF << ": ";
+        R.dump();
+      });
+      Cost += RecipeCost;
+    }
+  }
+  Cost += 1;
+  LLVM_DEBUG(dbgs() << "Cost for " << VF << ": " << Cost << "\n");
+  return Cost;
+}
+
+std::pair<VPlan &, ElementCount> LoopVectorizationPlanner::getBestPlan() {
+  // If there is a single VPlan with a single VF, return it directly.
+  if (VPlans.size() == 1 && size(VPlans[0]->vectorFactors()) == 1) {
+    ElementCount VF = *VPlans[0]->vectorFactors().begin();
+    return {*VPlans[0], VF};
+  }
+
+  VPlan *BestPlan = &*VPlans[0];
+  assert(hasPlanWithVF(ElementCount::getFixed(1)));
+  ElementCount BestVF = ElementCount::getFixed(1);
+  InstructionCost ScalarCost = computeCost(
+      getBestPlanFor(ElementCount::getFixed(1)), ElementCount::getFixed(1));
+  InstructionCost BestCost = ScalarCost;
+  bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled;
+  if (ForceVectorization) {
+    // Ignore scalar width, because the user explicitly wants vectorization.
+    // Initialize cost to max so that VF = 2 is, at least, chosen during cost
+    // evaluation.
+    BestCost = InstructionCost::getMax();
+  }
+
+  for (auto &P : VPlans) {
+    for (ElementCount VF : P->vectorFactors()) {
+      if (VF.isScalar())
+        continue;
+      InstructionCost Cost = computeCost(*P, VF);
+      if (isMoreProfitable(VectorizationFactor(VF, Cost, ScalarCost),
+                           VectorizationFactor(BestVF, BestCost, ScalarCost))) {
+        BestCost = Cost;
+        BestVF = VF;
+        BestPlan = &*P;
+      }
+    }
+  }
+  return {*BestPlan, BestVF};
+}
+
 VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {
   assert(count_if(VPlans,
                   [VF](const VPlanPtr &Plan) { return Plan->hasVF(VF); }) ==
@@ -8595,7 +8696,7 @@ VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
         new VPWidenCastRecipe(CI->getOpcode(), Operands[0], CI->getType(), CI));
   }
 
-  return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));
+  return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
@@ -10161,8 +10262,12 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                                VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
                                PSI, Checks);
 
-        VPlan &BestPlan = LVP.getBestPlanFor(VF.Width);
-        LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
+        const auto &[BestPlan, Width] = LVP.getBestPlan();
+        LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width
+                          << "\n");
+        assert(VF.Width == Width &&
+               "VPlan cost model and legacy cost model disagreed");
+        LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
         // Add metadata to disable runtime unrolling a scalar loop when there
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e65a7ab2cd028ee..02d93915e3c8d6e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -23,6 +23,7 @@
 #ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
 #define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
 
+#include "VPlanAnalysis.h"
 #include "VPlanValue.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/MapVector.h"
@@ -38,6 +39,7 @@
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/IR/FMF.h"
 #include "llvm/IR/Operator.h"
+#include "llvm/Support/InstructionCost.h"
 #include <algorithm>
 #include <cassert>
 #include <cstddef>
@@ -697,6 +699,14 @@ class VPLiveOut : public VPUser {
 #endif
 };
 
+struct VPCostContext {
+  const TargetTransformInfo &TTI;
+  VPTypeAnalysis Types;
+
+  VPCostContext(const TargetTransformInfo &TTI, LLVMContext &Ctx)
+      : TTI(TTI), Types(Ctx) {}
+};
+
 /// VPRecipeBase is a base class modeling a sequence of one or more output IR
 /// instructions. VPRecipeBase owns the VPValues it defines through VPDef
 /// and is responsible for deleting its defined values. Single-value
@@ -762,6 +772,10 @@ class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock>,
   /// \returns an iterator pointing to the element after the erased one
   iplist<VPRecipeBase>::iterator eraseFromParent();
 
+  virtual InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) {
+    return InstructionCost::getInvalid();
+  }
+
   /// Returns the underlying instruction, if the recipe is a VPValue or nullptr
   /// otherwise.
   Instruction *getUnderlyingInstr() {
@@ -1167,6 +1181,10 @@ class VPWidenRecipe : public VPRecipeWithIRFlags, public VPValue {
   /// Produce widened copies of all Ingredients.
   void execute(VPTransformState &State) override;
 
+  unsigned getOpcode() const { return Opcode; }
+
+  InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
   void print(raw_ostream &O, const Twine &Indent,
@@ -1458,9 +1476,11 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
   bool isCanonical() const;
 
   /// Returns the scalar type of the induction.
-  const Type *getScalarType() const {
+  Type *getScalarType() const {
     return Trunc ? Trunc->getType() : IV->getType();
   }
+
+  InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;
 };
 
 class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe {
@@ -1747,6 +1767,8 @@ class VPInterleaveRecipe : public VPRecipeBase {
            "Op must be an operand of the recipe");
     return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op);
   }
+
+  Instruction *getInsertPos() const { return IG->getInsertPos(); }
 };
 
 /// A recipe to represent inloop reduction operations, performing a reduction on
@@ -2080,7 +2102,7 @@ class VPCanonicalIVPHIRecipe : public VPHeaderPHIRecipe {
 #endif
 
   /// Returns the scalar type of the induction.
-  const Type *getScalarType() const {
+  Type *getScalarType() const {
     return getOperand(0)->getLiveInIRValue()->getType();
   }
 
@@ -2149,7 +2171,7 @@ class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
 #endif
 
   /// Returns the scalar type of the induction.
-  const Type *getScalarType() const {
+  Type *getScalarType() const {
     return cast<VPCanonicalIVPHIRecipe>(getOperand(0)->getDefiningRecipe())
         ->getScalarType();
   }
@@ -2596,6 +2618,10 @@ class VPlan {
 
   bool hasVF(ElementCount VF) { return VFs.count(VF); }
 
+  iterator_range<SmallSetVector<ElementCount, 2>::iterator> vectorFactors() {
+    return {VFs.begin(), VFs.end()};
+  }
+
   bool hasScalarVFOnly() const { return VFs.size() == 1 && VFs[0].isScalar(); }
 
   bool hasUF(unsigned UF) const { return UFs.empty() || UFs.contains(UF); }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
new file mode 100644
index 000000000000000..088da81f950425c
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -0,0 +1,225 @@
+//===- VPlanAnalysis.cpp - Various Analyses working on VPlan ----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "VPlanAnalysis.h"
+#include "VPlan.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "vplan"
+
+Type *VPTypeAnalysis::inferType(const VPBlendRecipe *R) {
+  return inferType(R->getIncomingValue(0));
+}
+
+Type *VPTypeAnalysis::inferType(const VPInstruction *R) {
+  switch (R->getOpcode()) {
+  case Instruction::Select:
+    return inferType(R->getOperand(1));
+  case VPInstruction::FirstOrderRecurrenceSplice:
+    return inferType(R->getOperand(0));
+  default:
+    llvm_unreachable("Unhandled instruction!");
+  }
+}
+
+Type *VPTypeAnalysis::inferType(const VPInterleaveRecipe *R) { return nullptr; }
+
+Type *VPTypeAnalysis::inferType(const VPReductionPHIRecipe *R) {
+  return R->getOperand(0)->getLiveInIRValue()->getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenRecipe *R) {
+  unsigned Opcode = R->getOpcode();
+  switch (Opcode) {
+  case Instruction::ICmp:
+  case Instruction::FCmp:
+    return IntegerType::get(Ctx, 1);
+  case Instruction::UDiv:
+  case Instruction::SDiv:
+  case Instruction::SRem:
+  case Instruction::URem:
+  case Instruction::Add:
+  case Instruction::FAdd:
+  case Instruction::Sub:
+  case Instruction::FSub:
+  case Instruction::FNeg:
+  case Instruction::Mul:
+  case Instruction::FMul:
+  case Instruction::FDiv:
+  case Instruction::FRem:
+  case Instruction::Shl:
+  case Instruction::LShr:
+  case Instruction::AShr:
+  case Instruction::And:
+  case Instruction::Or:
+  case Instruction::Xor: {
+    Type *ResTy = inferType(R->getOperand(0));
+    if (Opcode != Instruction::FNeg) {
+      assert(ResTy == inferType(R->getOperand(1)));
+      CachedTypes[R->getOperand(1)] = ResTy;
+    }
+    return ResTy;
+  }
+  case Instruction::Freeze:
+    return inferType(R->getOperand(0));
+  default:
+    // This instruction is not vectorized by simple widening.
+    //    LLVM_DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
+    llvm_unreachable("Unhandled instruction!");
+  }
+
+  return nullptr;
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenCallRecipe *R) {
+  auto &CI = *cast<CallInst>(R->getUnderlyingInstr());
+  return CI.getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenIntOrFpInductionRecipe *R) {
+  return R->getScalarType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenMemoryInstructionRecipe *R) {
+  if (R->isStore())
+    return cast<StoreInst>(&R->getIngredient())->getValueOperand()->getType();
+
+  return cast<LoadInst>(&R->getIngredient())->getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenSelectRecipe *R) {
+  return inferType(R->getOperand(1));
+}
+
+Type *VPTypeAnalysis::inferType(const VPReplicateRecipe *R) {
+  switch (R->getUnderlyingInstr()->getOpcode()) {
+  case Instruction::Call: {
+    unsigned CallIdx = R->getNumOperands() - (R->isPredicated() ? 2 : 1);
+    return cast<Function>(R->getOperand(CallIdx)->getLiveInIRValue())
+        ->getReturnType();
+  }
+  case Instruction::UDiv:
+  case Instruction::SDiv:
+  case Instruction::SRem:
+  case Instruction::URem:
+  case Instruction::Add:
+  case Instruction::FAdd:
+  case Instruction::Sub:
+  case Instruction::FSub:
+  case Instruction::FNeg:
+  case Instruction::Mul:
+  case Instruction::FMul:
+  case Instruction::FDiv:
+  case Instruction::FRem:
+  case Instruction::Shl:
+  case Instruction::LShr:
+  case Instruction::AShr:
+  case Instruction::And:
+  case Instruction::Or:
+  case Instruction::Xor:
+  case Instruction::ICmp:
+  case Instruction::FCmp: {
+    Type *ResTy = inferType(R->getOperand(0));
+    assert(ResTy == inferType(R->getOperand(1)));
+    CachedTypes[R->getOperand(1)] = ResTy;
+    return ResTy;
+  }
+  case Instruction::Trunc:
+  case Instruction::SExt:
+  case Instruction::ZExt:
+  case Instruction::FPExt:
+  case Instruction::FPTrunc:
+    return R->getUnderlyingInstr()->getType();
+  case Instruction::ExtractValue: {
+    return R->getUnderlyingValue()->getType();
+  }
+  case Instruction::Freeze:
+    return inferType(R->getOperand(0));
+  case Instruction::Load:
+    return cast<LoadInst>(R->getUnderlyingInstr())->getType();
+  case Instruction::Store:
+    return cast<StoreInst>(R->getUnderlyingInstr())
+        ->getValueOperand()
+        ->getType();
+  default:
+    llvm_unreachable("Unhandled instruction");
+  }
+
+  return nullptr;
+}
+
+Type *VPTypeAnalysis::inferType(const VPValue *V) {
+  auto Iter = CachedTypes.find(V);
+  if (Iter != CachedTypes.end())
+    return Iter->second;
+
+  Type *ResultTy = nullptr;
+  if (V->isLiveIn())
+    ResultTy = V->getLiveInIRValue()->getType();
+  else {
+    const VPRecipeBase *Def = V->getDefiningRecipe();
+    switch (Def->getVPDefID()) {
+    case VPDef::VPBlendSC:
+      ResultTy = inferType(cast<VPBlendRecipe>(Def));
+      break;
+    case VPDef::VPCanonicalIVPHISC:
+      ResultTy = cast<VPCanonicalIVPHIRecipe>(Def)->getScalarType();
+      break;
+    case VPDef::VPFirstOrderRecurrencePHISC:
+      ResultTy = Def->getOperand(0)->getLiveInIRValue()->getType();
+      break;
+    case VPDef::VPInstructionSC:
+      ResultTy = inferType(cast<VPInstruction>(Def));
+      break;
+    case VPDef::VPInterleaveSC:
+      ResultTy = V->getUnderlyingValue()
+                     ->getType(); // inferType(cast<VPInterleaveRecipe>(Def));
+      break;
+    case VPDef::VPPredInstPHISC:
+      ResultTy = inferType(Def->getOperand(0));
+      break;
+    case VPDef::VPReductionPHISC:
+      ResultTy = inferType(cast<VPReductionPHIRecipe>(Def));
+      break;
+    case VPDef::VPReplicateSC:
+      ResultTy = inferType(cast<VPReplicateRecipe>(Def));
+      break;
+    case VPDef::VPScalarIVStepsSC:
+      return inferType(Def->getOperand(0));
+      break;
+    case VPDef::VPWidenSC:
+      ResultTy = inferType(cast<VPWidenRecipe>(Def));
+      break;
+    case VPDef::VPWidenPHISC:
+      return inferType(Def->getOperand(0));
+    case VPDef::VPWidenPointerInductionSC:
+      return inferType(Def->getOperand(0));
+    case VPDef::VPWidenCallSC:
+      ResultTy = inferType(cast<VPWidenCallRecipe>(Def));
+      break;
+    case VPDef::VPWidenCastSC:
+      ResultTy = cast<VPWidenCastRecipe>(Def)->getResultType();
+      break;
+    case VPDef::VPWidenGEPSC:
+      ResultTy = PointerType::get(Ctx, 0);
+      break;
+    case VPDef::VPWidenIntOrFpInductionSC:
+      ResultTy = inferType(cast<VPWidenIntOrFpInductionRecipe>(Def));
+      break;
+    case VPDef::VPWidenMemory...
[truncated]

arcbbb · 2023-11-06T09:13:01Z

I am trying to mitigate the cost difference caused by removeDeadRecipes(), since legacy cost model still count them.
In my implementation arcbbb@cba398c,
I collect all scalar costs which are not seen in previous visited recipes.

Add test coverage for cost-model code-paths not covered by current unit tests in preparation for #67934.

fhahn · 2024-04-28T12:33:02Z

The latest update of the PR includes computing the costs of all VPlans for their associated VFs and then picking the best one. In particular, this also now includes computing costs of replicate regions.

In the initial version, the VPlan-based cost-model first tries to ask the recipe for its cost (via computeCost). If that returns an invalid cost, look up the cost via the legacy cost model. Initially VPWidenRecipe::computeCost can compute the costs for almost all opcodes (except UDiv, SDiv, URem, SRem and recipes in reduction chains) which which have more complex logic in the legacy cost-model.

I tested the latest version on a range of configurations and code-bases (llvm-test-suite + SPEC2017 + Clang bootstrap on AArch64 with and without SVE, with and without -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue and X86 with AVX512) and both legacy and VPlan cost-models agree on the selected VF in all cases. (there are 2 cases where the legacy cost model computes the cost inaccurately, which I'll submit fixes for shortly)

I added a number of test cases separately for loops where they disagreed before. There may be cases where the assertion gets triggered still due to missing coverage. It may also trigger in hand-written test cases that contain dead code, which VPlan transform will remove before computing the test (at the moment causing Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll to fail), so we may want to either remove the assert or guard it by an option.

Another thing to note is that during cast-simplifications, we preserve the underlying instruction, so we can still use the legacy cost-model for the casts, as otherwise we would also need to implement costing for casts directly. This is an area where there may be some differences between legacy and VPlan-based cost-model, due to the latter having more accurate information.

Going forward I think we should gradually move cost computation to the VPlan-based model and allow divergence as needed when the VPlan-based model more accurately estimates cost.

This adds a new computeCost interface to VPReicpeBase and implements it for VPWidenRecipe and VPWidenIntOrFpInductionRecipe. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. For recipes that do not yet implement computeCost, the legacy cost for the underlying instruction is used. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree.

rengolin

A few comments, I'll let Gil/Ayal review proper.

rengolin · 2024-04-28T17:14:20Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -2071,6 +2085,8 @@ class VPInterleaveRecipe : public VPRecipeBase {
           "Op must be an operand of the recipe");
    return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op);
  }
+
+  Instruction *getInsertPos() const { return IG->getInsertPos(); }


Is this really used?

Cannot see where.

It's used in computeCostForRecipe at the moment

rengolin · 2024-04-28T17:17:08Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

-            new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy);
-        VPC->insertBefore(&R);
+        VPValue *VPC;
+        if (auto *UV = R.getOperand(0)->getUnderlyingValue())


nit: couldn't you just set UV to nullptr? Or return nullptr from getUnderlyingValue?

Then this would just be a single call. It took me a second pass to parse the semantics here.

VPWidenCastRecipe would need to have a single constructor accepting a (possibly nullptr) CastInst* as its last parameter, to avoid the choice below.

Yes, could adjust to this effect.

ayalz

Very nice step forward!!

Making the last decision, namely, selecting which VPlan has the best cost, based on (partially) VPlan-based cost computation, is a good starting point, gradually allowing earlier cost-based decisions to take place along the VPlan-to-VPlan transformation pipeline.

Similar to how code-gen is simplified, modularized and kept consistent by breaking down ILV into VPlan/Region/Block/Recipe::execute() - in a gradual process which still utilizes ILV methods via VPTransformState, could compute-cost be driven by VPlan/Region/Block/Recipe::computeCost - initially utilizing CM methods internally where needed, by passing a CM* in VPCostContext? That should help keep code-gen and its cost aligned and consistent at each scope.

Adding various comments inline after a first pass.

ayalz · 2024-05-06T20:12:56Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

@@ -361,6 +364,9 @@ class LoopVectorizationPlanner {
  /// Return the best VPlan for \p VF.
  VPlan &getBestPlanFor(ElementCount VF) const;

+  /// Return the most profitable plan.


nit: every plan contains its VF range; reduce the range of the best plan to a single value, instead of passing it alongside? Method should be const?

Marked as const (same as computeCost) and updated to restrict VFs, thanks!

ayalz · 2024-05-07T07:45:36Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  }
+
+  VPlan *BestPlan = &*VPlans[0];
+  assert(hasPlanWithVF(ElementCount::getFixed(1)));


Suggested change

assert(hasPlanWithVF(ElementCount::getFixed(1)));

ElementCount ScalarVF = ElementCount::getFixed(1);

assert(hasPlanWithVF(ScalarVF) && "More than a single plan/VF w/o any plan having scalar VF");

Done, thanks!

ayalz · 2024-05-07T09:36:22Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -699,6 +700,14 @@ class VPLiveOut : public VPUser {
 #endif
 };

+struct VPCostContext {


Document what this is for.

Done, thanks!

ayalz · 2024-05-07T09:38:20Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -841,6 +854,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
  static inline bool classof(const VPRecipeBase *R) {
    switch (R->getVPDefID()) {
    case VPRecipeBase::VPDerivedIVSC:
+    case VPRecipeBase::VPEVLBasedIVPHISC:


Independent fix?

Split off to c3d2af0, thanks!

ayalz · 2024-05-07T09:40:30Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1349,6 +1363,8 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {

  unsigned getOpcode() const { return Opcode; }

+  InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;


nit: better placed slightly above, next to execute() - given that the two are closely related.

Moved, thanks!

ayalz · 2024-05-08T19:21:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  for (const auto &[IV, _] : Legal->getInductionVars()) {
+    Instruction *IVInc = cast<Instruction>(
+        IV->getIncomingValueForBlock(OrigLoop->getLoopLatch()));
+    InstructionCost RecipeCost = CM.getInstructionCost(IVInc, VF).first;


The use of "Recipe" may be confusing as no recipes are involved here, IVInc is an underlying Instruction.

Updated, thanks!

ayalz · 2024-05-08T19:24:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      IVInc->dump();
+    });
+    Cost += RecipeCost;
+    SeenUI.insert(IVInc);


"SeenUI" may be confusing, it stands for both having pre-accounted for its cost here, and later whenever encountering a recipe with an underlying Instruction?

If IVInc is left for the regular scan over recipes, will its cost be computed differently than RecipeCost above?

Should reduction chains also be traversed and marked to compute their cost?

"SeenUI" may be confusing, it stands for both having pre-accounted for its cost here, and later whenever encountering a recipe with an underlying Instruction?

Renamed to SkipCostComputation to helpful clarify and also removed the code to add all underlying instructions, it should not be needed.

If IVInc is left for the regular scan over recipes, will its cost be computed differently than RecipeCost above?

The reason this is done as pre-processing step is that the VPlan may not have any recipes associated with the original induction increment instruction.

Should reduction chains also be traversed and marked to compute their cost?
Done, thanks!

ayalz · 2024-05-08T19:28:11Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getEntry());
+  for (VPBlockBase *Block : to_vector(vp_depth_first_shallow(Header))) {
+    if (auto *Region = dyn_cast<VPRegionBlock>(Block)) {
+      Cost += computeCostForReplicatorRegion(Region, VF, SeenUI, CM, CM.TTI,


This should ideally be a VPRegionBlock::computeCost(...) method?

I deliberately did not make this VPRegionBlock::computeCost(...), to avoid leaking/polluting the VPlan-based bits with the legacy cost-model, which may make it tempting to rely on.

ayalz · 2024-05-08T19:33:05Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  VPValue *Cond = BOM->getOperand(0);
+
+  // Check if Cond is a uniform compare.
+  auto IsUniformCompare = [Cond]() {


Deserves to be more generally available.

Moved to vputils, thanks!

ayalz · 2024-05-08T19:34:26Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      IsUniformCompare ||
+      match(Cond, m_ActiveLaneMask(m_VPValue(), m_VPValue())) ||
+      match(Cond, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue())) ||
+      isa<VPActiveLaneMaskPHIRecipe>(Cond);


Deserves to use getHeaderMask();

At the moment, there's collectAllHeaderMasks, but it only collects the compare with wide canonical IV; we would need a variant that collects the multiple specialized variants, left as is for now.

VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast & co work as expected. Split off #67934.

fhahn

I think I forgot to mention this for completeness, but this depends on #89386

Similar to how code-gen is simplified, modularized and kept consistent by breaking down ILV into VPlan/Region/Block/Recipe::execute() - in a gradual process which still utilizes ILV methods via VPTransformState, could compute-cost be driven by VPlan/Region/Block/Recipe::computeCost - initially utilizing CM methods internally where needed, by passing a CM* in VPCostContext? That should help keep code-gen and its cost aligned and consistent at each scope.

The patch intentionally avoided making CM part of VPCostContext, to keep a clear separation between VPlan-based and legacy costs, to avoid leaking information from the legacy cost model and avoid introducing new uses of the legacy cost model at this point.

fhahn · 2024-05-09T16:02:09Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -2071,6 +2085,8 @@ class VPInterleaveRecipe : public VPRecipeBase {
           "Op must be an operand of the recipe");
    return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op);
  }
+
+  Instruction *getInsertPos() const { return IG->getInsertPos(); }


It's used in computeCostForRecipe at the moment

fhahn · 2024-05-09T17:08:26Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -3182,6 +3198,10 @@ class VPlan {
    return any_of(VFs, [](ElementCount VF) { return VF.isScalable(); });
  }

+  iterator_range<SmallSetVector<ElementCount, 2>::iterator> vectorFactors() {


Done, thanks!

fhahn · 2024-05-09T17:11:01Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1349,6 +1363,8 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {

  unsigned getOpcode() const { return Opcode; }

+  InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;


Moved, thanks!

fhahn · 2024-05-09T17:23:55Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -699,6 +700,14 @@ class VPLiveOut : public VPUser {
 #endif
 };

+struct VPCostContext {


Done, thanks!

fhahn · 2024-05-09T17:26:30Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // If there is a single VPlan with a single VF, return it directly.
+  if (VPlans.size() == 1 && size(VPlans[0]->vectorFactors()) == 1) {
+    ElementCount VF = *VPlans[0]->vectorFactors().begin();
+    return {*VPlans[0], VF};
+  }
+
+  VPlan *BestPlan = &*VPlans[0];


Done, thanks!

fhahn · 2024-05-10T14:32:03Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  VPValue *Cond = BOM->getOperand(0);
+
+  // Check if Cond is a uniform compare.
+  auto IsUniformCompare = [Cond]() {


Moved to vputils, thanks!

fhahn · 2024-05-10T14:34:19Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      IsUniformCompare ||
+      match(Cond, m_ActiveLaneMask(m_VPValue(), m_VPValue())) ||
+      match(Cond, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue())) ||
+      isa<VPActiveLaneMaskPHIRecipe>(Cond);


At the moment, there's collectAllHeaderMasks, but it only collects the compare with wide canonical IV; we would need a variant that collects the multiple specialized variants, left as is for now.

fhahn · 2024-05-10T14:35:25Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1371,8 +1387,6 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
        ResultTy(ResultTy) {
    assert(UI.getOpcode() == Opcode &&
           "opcode of underlying cast doesn't match");
-    assert(UI.getType() == ResultTy &&
-           "result type of underlying cast doesn't match");


No, as we retain the underlying instruction in a narrower version of the cast, so we can still query the cost model for the underlying instruction, even after VP2VP narrowing. This is needed until we handle cast-costs completely in VPlan.

fhahn · 2024-05-10T14:35:42Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+                                           VPCostContext &Ctx) {
+  VPWidenRecipe *Cur = this;
+  // Check if the recipe is used in a reduction chain. Let the legacy cost-model
+  // handle that case for now.


Code removed, as reduction chain costs are pre-computed

fhahn · 2024-05-10T14:35:54Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+    if (auto *Next = dyn_cast<VPWidenRecipe>(*Cur->user_begin())) {
+      Cur = Next;
+      continue;
+    }
+    if (isa<VPReductionRecipe>(*Cur->user_begin()))
+      return InstructionCost::getInvalid();
+    break;


Code removed as per comment above, thanks!

fhahn · 2024-05-17T14:24:58Z

I put up an alternative version with most of the logic moved the ::computeCost functions in VPlan, VPBasicBlock, VPRegionBlock in #92555

fhahn · 2024-05-22T09:06:01Z

Ping. All pending patches landed now and I just updated this PR to current main, as well as the one with the alternative structure #92555

fhahn · 2024-06-07T11:26:21Z

It sounds like the slightly stripped down version (no cost for VPWidenRecipe for now) is the preferred version: #92555

Closing this one here

@arcbbb

This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555

@arcbbb

This reverts commit 46080ab. Extra tests have been added in 52d29eb. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555

@arcbbb

This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555

@arcbbb

This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555

@arcbbb

This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com//pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555

@arcbbb

This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555

@arcbbb

This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555

llvmbot added vectorizers llvm:transforms labels Oct 1, 2023

fhahn mentioned this pull request Oct 2, 2023

[RFC][LV] VPlan-based cost model #67647

Closed

fhahn mentioned this pull request Oct 13, 2023

[VPlan] Add initial anlysis to infer scalar type of VPValues. #69013

Merged

fhahn force-pushed the vplan-cost branch from 10a6efc to 9557529 Compare October 14, 2023 01:54

fhahn mentioned this pull request Dec 14, 2023

[LoopVectorize] Enable shuffle padding for masked interleaved accesses #75329

Open

fhahn added a commit that referenced this pull request Apr 27, 2024

[LV] Add additional cost model coverage for loops with casted inds.

6084dcb

Add test coverage for cost-model code-paths not covered by current unit tests in preparation for #67934.

fhahn force-pushed the vplan-cost branch from 9557529 to 893e28f Compare April 28, 2024 12:19

llvmbot added the backend:RISC-V label Apr 28, 2024

fhahn requested a review from ayalz April 28, 2024 12:33

fhahn force-pushed the vplan-cost branch from 893e28f to 98230db Compare April 28, 2024 12:33

fhahn requested review from rengolin and aniragil April 28, 2024 12:33

fhahn assigned arcbbb Apr 28, 2024

rengolin reviewed Apr 28, 2024

View reviewed changes

ayalz reviewed May 8, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into vplan-cost

6330a67

fhahn added a commit that referenced this pull request May 9, 2024

[VPlan] VPEVLBasedIVPHI is a VPSingleDefRecipe.

c3d2af0

VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast & co work as expected. Split off #67934.

fhahn added 2 commits May 9, 2024 20:01

Merge remote-tracking branch 'origin/main' into vplan-cost

0da9e25

!fixup address latest comments, thanks!

52786ae

fhahn commented May 10, 2024

View reviewed changes

This was referenced May 14, 2024

[LoopVectorize] Add cost of generating tail-folding mask to the loop #90191

Open

[VPlan] First step towards VPlan cost modeling. #92555

Merged

fhahn added 5 commits May 21, 2024 10:10

Merge remote-tracking branch 'origin/main' into vplan-cost

32eaeb4

Merge remote-tracking branch 'origin/main' into vplan-cost

8ea5965

Merge remote-tracking branch 'origin/main' into vplan-cost

9f9c09f

Merge remote-tracking branch 'origin/main' into vplan-cost

6597912

!fixup more precisely match header mask.

6c1079b

fhahn closed this Jun 7, 2024

fhahn deleted the vplan-cost branch June 7, 2024 11:26

-  assert(hasPlanWithVF(ElementCount::getFixed(1)));
+ElementCount ScalarVF = ElementCount::getFixed(1);
+assert(hasPlanWithVF(ScalarVF) && "More than a single plan/VF w/o any plan having scalar VF");

		@@ -1349,6 +1363,8 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {

		unsigned getOpcode() const { return Opcode; }

		InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;

[VPlan] First step towards VPlan cost modeling. #67934

[VPlan] First step towards VPlan cost modeling. #67934

Conversation

fhahn commented Oct 1, 2023

llvmbot commented Oct 1, 2023 • edited Loading

arcbbb commented Nov 6, 2023

fhahn commented Apr 28, 2024

rengolin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented May 17, 2024

fhahn commented May 22, 2024

fhahn commented Jun 7, 2024

llvmbot commented Oct 1, 2023 •

edited

Loading