-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] First step towards VPlan cost modeling. #92555
Changes from 11 commits
98230db
6330a67
0da9e25
52786ae
7043085
d2fa5ee
b1ab1b8
c91f8ba
e1cd132
e66563b
9a4111d
faa855d
860aae1
32fc296
17442f9
b27201c
24e03bd
1ae4d60
423adca
8ff3109
f49ed3f
204dfaf
389e841
2c3e408
9c69bfb
7b7581b
de59992
f5f3581
d13777c
9c99b10
bd14e40
b316c55
692a55c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -344,6 +344,15 @@ class LoopVectorizationPlanner { | |
/// A builder used to construct the current plan. | ||
VPBuilder Builder; | ||
|
||
/// Computes the cost of \p Plan for vectorization factor \p VF. | ||
/// | ||
/// The current implementation requires access to the legacy cost model which | ||
/// is why it is kept separate from the VPlan-only cost infrastructure. | ||
/// | ||
/// TODO: Move to VPlan::computeCost once the use of the legacy cost model | ||
/// has been retired. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has more to do with Legal::inductions and reductions, and their CM cost; the former are kept separate from VPlan and its cost implementation, rather than the latter, atm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||
InstructionCost computeCost(VPlan &Plan, ElementCount VF) const; | ||
|
||
public: | ||
LoopVectorizationPlanner( | ||
Loop *L, LoopInfo *LI, DominatorTree *DT, const TargetLibraryInfo *TLI, | ||
|
@@ -365,6 +374,9 @@ class LoopVectorizationPlanner { | |
/// Return the best VPlan for \p VF. | ||
VPlan &getBestPlanFor(ElementCount VF) const; | ||
|
||
/// Return the most profitable plan. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this also fixes the best VF. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||
VPlan &getBestPlan() const; | ||
|
||
/// Generate the IR code for the vectorized loop captured in VPlan \p BestPlan | ||
/// according to the best selected \p VF and \p UF. | ||
/// | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -59,6 +59,7 @@ | |||||||||||||||||||||||||||||||||||||
#include "VPlan.h" | ||||||||||||||||||||||||||||||||||||||
#include "VPlanAnalysis.h" | ||||||||||||||||||||||||||||||||||||||
#include "VPlanHCFGBuilder.h" | ||||||||||||||||||||||||||||||||||||||
#include "VPlanPatternMatch.h" | ||||||||||||||||||||||||||||||||||||||
#include "VPlanTransforms.h" | ||||||||||||||||||||||||||||||||||||||
#include "VPlanVerifier.h" | ||||||||||||||||||||||||||||||||||||||
#include "llvm/ADT/APInt.h" | ||||||||||||||||||||||||||||||||||||||
|
@@ -289,7 +290,7 @@ static cl::opt<unsigned> ForceTargetMaxVectorInterleaveFactor( | |||||||||||||||||||||||||||||||||||||
cl::desc("A flag that overrides the target's max interleave factor for " | ||||||||||||||||||||||||||||||||||||||
"vectorized loops.")); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
static cl::opt<unsigned> ForceTargetInstructionCost( | ||||||||||||||||||||||||||||||||||||||
cl::opt<unsigned> ForceTargetInstructionCost( | ||||||||||||||||||||||||||||||||||||||
"force-target-instruction-cost", cl::init(0), cl::Hidden, | ||||||||||||||||||||||||||||||||||||||
cl::desc("A flag that overrides the target's expected cost for " | ||||||||||||||||||||||||||||||||||||||
"an instruction to a single constant value. Mostly " | ||||||||||||||||||||||||||||||||||||||
|
@@ -1621,6 +1622,16 @@ class LoopVectorizationCostModel { | |||||||||||||||||||||||||||||||||||||
/// \p VF is the vectorization factor chosen for the original loop. | ||||||||||||||||||||||||||||||||||||||
bool isEpilogueVectorizationProfitable(const ElementCount VF) const; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// Return the cost of instructions in an inloop reduction pattern, if I is | ||||||||||||||||||||||||||||||||||||||
/// part of that pattern. | ||||||||||||||||||||||||||||||||||||||
Comment on lines
+1616
to
+1617
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(unrelated to this patch). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will adjust separately. |
||||||||||||||||||||||||||||||||||||||
std::optional<InstructionCost> | ||||||||||||||||||||||||||||||||||||||
getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy, | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better called getInLoopReductionPatternCost()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will adjust separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Very well. Another suggestion is to use Invalid cost for "no cost" instead of optional. |
||||||||||||||||||||||||||||||||||||||
TTI::TargetCostKind CostKind) const; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// Returns the execution time cost of an instruction for a given vector | ||||||||||||||||||||||||||||||||||||||
/// width. Vector width of one means scalar. | ||||||||||||||||||||||||||||||||||||||
VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
private: | ||||||||||||||||||||||||||||||||||||||
unsigned NumPredStores = 0; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
@@ -1646,21 +1657,11 @@ class LoopVectorizationCostModel { | |||||||||||||||||||||||||||||||||||||
/// of elements. | ||||||||||||||||||||||||||||||||||||||
ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// Returns the execution time cost of an instruction for a given vector | ||||||||||||||||||||||||||||||||||||||
/// width. Vector width of one means scalar. | ||||||||||||||||||||||||||||||||||||||
VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// The cost-computation logic from getInstructionCost which provides | ||||||||||||||||||||||||||||||||||||||
/// the vector type as an output parameter. | ||||||||||||||||||||||||||||||||||||||
InstructionCost getInstructionCost(Instruction *I, ElementCount VF, | ||||||||||||||||||||||||||||||||||||||
Type *&VectorTy); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// Return the cost of instructions in an inloop reduction pattern, if I is | ||||||||||||||||||||||||||||||||||||||
/// part of that pattern. | ||||||||||||||||||||||||||||||||||||||
std::optional<InstructionCost> | ||||||||||||||||||||||||||||||||||||||
getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy, | ||||||||||||||||||||||||||||||||||||||
TTI::TargetCostKind CostKind) const; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
/// Calculate vectorization cost of memory instruction \p I. | ||||||||||||||||||||||||||||||||||||||
InstructionCost getMemoryInstructionCost(Instruction *I, ElementCount VF); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
|
@@ -7396,6 +7397,122 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) { | |||||||||||||||||||||||||||||||||||||
return VF; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
InstructionCost VPCostContext::getLegacyCost(Instruction *UI, ElementCount VF) { | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. const? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, thanks! |
||||||||||||||||||||||||||||||||||||||
return CM.getInstructionCost(UI, VF).first; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
bool VPCostContext::skipCostComputation(Instruction *UI) const { | ||||||||||||||||||||||||||||||||||||||
return CM.VecValuesToIgnore.contains(UI) || SkipCostComputation.contains(UI); | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
InstructionCost LoopVectorizationPlanner::computeCost(VPlan &Plan, | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Renamed, thanks! |
||||||||||||||||||||||||||||||||||||||
ElementCount VF) const { | ||||||||||||||||||||||||||||||||||||||
InstructionCost Cost = 0; | ||||||||||||||||||||||||||||||||||||||
LLVMContext &LLVMCtx = OrigLoop->getHeader()->getContext(); | ||||||||||||||||||||||||||||||||||||||
VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx, CM); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
// Cost modeling for inductions is inaccurate in the legacy cost model | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth indicating that this is restricted to the cost of the induction bump only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added as below, thanks! |
||||||||||||||||||||||||||||||||||||||
// compared to the recipes that are generated. To match here initially during | ||||||||||||||||||||||||||||||||||||||
// VPlan cost model bring up directly use the induction costs from the legacy | ||||||||||||||||||||||||||||||||||||||
// cost model and skip induction bump recipes. Note that we do this as | ||||||||||||||||||||||||||||||||||||||
// pre-processing; the VPlan may not have any recipes associated with the | ||||||||||||||||||||||||||||||||||||||
// original induction increment instruction. | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... in this case, if VPlan has a bump recipe w/o such association, its cost will be accumulated along with that of the original induction increment instruction below? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but that is not the case at the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, worth clarifying in the comment? If original induction increment instructions do have recipes, is this pre-processing needed, in this preliminary version where recipe costs default to the CM cost of their underlying instructions? Perhaps to retain debug dumps. Instructions associated with in-loop reductions do need to be pre-processed in order to take their getReductionPatternCost() rather than their getInstructionCost(). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Tried to clarify, at this point we cannot easily check if a recipe for the induction bump has been generated (one is created if there are other users). Pre-processing handles both cases (with and w/o widen recipe for the induction increment), hopefully the comment is clearer now. |
||||||||||||||||||||||||||||||||||||||
// TODO: Switch to more accurate costing based on VPlan. | ||||||||||||||||||||||||||||||||||||||
for (const auto &[IV, _] : Legal->getInductionVars()) { | ||||||||||||||||||||||||||||||||||||||
Instruction *IVInc = cast<Instruction>( | ||||||||||||||||||||||||||||||||||||||
IV->getIncomingValueForBlock(OrigLoop->getLoopLatch())); | ||||||||||||||||||||||||||||||||||||||
InstructionCost InductionCost = CM.getInstructionCost(IVInc, VF).first; | ||||||||||||||||||||||||||||||||||||||
LLVM_DEBUG({ | ||||||||||||||||||||||||||||||||||||||
dbgs() << "Cost of " << InductionCost << " for VF " << VF | ||||||||||||||||||||||||||||||||||||||
<< ":\n induction increment " << *IVInc << "\n"; | ||||||||||||||||||||||||||||||||||||||
IVInc->dump(); | ||||||||||||||||||||||||||||||||||||||
}); | ||||||||||||||||||||||||||||||||||||||
Cost += InductionCost; | ||||||||||||||||||||||||||||||||||||||
assert(!CostCtx.SkipCostComputation.contains(IVInc) && | ||||||||||||||||||||||||||||||||||||||
"Same IV increment for multiple inductions?"); | ||||||||||||||||||||||||||||||||||||||
CostCtx.SkipCostComputation.insert(IVInc); | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth asserting IVInc is not already in there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
consistent with the order of asserting/marking-dumping-accumulating the costs of reductions below; there they depend on having a cost, here it is independent of cost. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reordered, thanks! |
||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
// The legacy cost model has special logic to compute the cost of in-loop | ||||||||||||||||||||||||||||||||||||||
// reductions, which may be smaller than the sum of all instructions involved | ||||||||||||||||||||||||||||||||||||||
// in the reduction. Pre-compute the cost for now. | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Augment comment to also address AnyOf reductions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! |
||||||||||||||||||||||||||||||||||||||
// TODO: Switch to costing based on VPlan once the logic has been ported. | ||||||||||||||||||||||||||||||||||||||
for (const auto &[RedPhi, RdxDesc] : Legal->getReductionVars()) { | ||||||||||||||||||||||||||||||||||||||
if (!CM.isInLoopReduction(RedPhi)) | ||||||||||||||||||||||||||||||||||||||
continue; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
const auto &ChainOps = RdxDesc.getReductionOpChain(RedPhi, OrigLoop); | ||||||||||||||||||||||||||||||||||||||
SetVector<Instruction *> ReductionOperations(ChainOps.begin(), | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Renamed, thanks! |
||||||||||||||||||||||||||||||||||||||
ChainOps.end()); | ||||||||||||||||||||||||||||||||||||||
// Also include the operands of instructions in the chain, as the cost-model | ||||||||||||||||||||||||||||||||||||||
// may mark extends as free. | ||||||||||||||||||||||||||||||||||||||
for (unsigned I = 0, E = ReductionOperations.size(); I != E; ++I) { | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better iterate over ChainOps directly here rather than over the first E entries of ReductionOperations? Only direct operands are visited. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||||||||||||||||||||||||||||||
for (Value *Op : ReductionOperations[I]->operands()) { | ||||||||||||||||||||||||||||||||||||||
if (auto *I = dyn_cast<Instruction>(Op)) | ||||||||||||||||||||||||||||||||||||||
ReductionOperations.insert(I); | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
for (Instruction *I : ReductionOperations) { | ||||||||||||||||||||||||||||||||||||||
auto ReductionCost = CM.getReductionPatternCost( | ||||||||||||||||||||||||||||||||||||||
I, VF, ToVectorTy(I->getType(), VF), TTI::TCK_RecipThroughput); | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth a comment that we precompute the cost of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks. |
||||||||||||||||||||||||||||||||||||||
if (!ReductionCost) | ||||||||||||||||||||||||||||||||||||||
continue; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
assert(!CostCtx.SkipCostComputation.contains(I) && | ||||||||||||||||||||||||||||||||||||||
"reduction op visited multiple times"); | ||||||||||||||||||||||||||||||||||||||
CostCtx.SkipCostComputation.insert(I); | ||||||||||||||||||||||||||||||||||||||
LLVM_DEBUG(dbgs() << "Cost of " << ReductionCost << " for VF " << VF | ||||||||||||||||||||||||||||||||||||||
<< ":\n in-loop reduction " << *I << "\n"); | ||||||||||||||||||||||||||||||||||||||
Cost += *ReductionCost; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth emphasizing that
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! |
||||||||||||||||||||||||||||||||||||||
Cost += Plan.computeCost(VF, CostCtx); | ||||||||||||||||||||||||||||||||||||||
// Add the cost for the backedge. | ||||||||||||||||||||||||||||||||||||||
Cost += 1; | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can and should be taken care of by (loop) region::computeCost()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved, thanks! |
||||||||||||||||||||||||||||||||||||||
LLVM_DEBUG(dbgs() << "Cost for VF " << VF << ": " << Cost << "\n"); | ||||||||||||||||||||||||||||||||||||||
return Cost; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
VPlan &LoopVectorizationPlanner::getBestPlan() const { | ||||||||||||||||||||||||||||||||||||||
// If there is a single VPlan with a single VF, return it directly. | ||||||||||||||||||||||||||||||||||||||
VPlan &FirstPlan = *VPlans[0]; | ||||||||||||||||||||||||||||||||||||||
if (VPlans.size() == 1 && size(FirstPlan.vectorFactors()) == 1) | ||||||||||||||||||||||||||||||||||||||
return FirstPlan; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
VPlan *BestPlan = &FirstPlan; | ||||||||||||||||||||||||||||||||||||||
ElementCount ScalarVF = ElementCount::getFixed(1); | ||||||||||||||||||||||||||||||||||||||
assert(hasPlanWithVF(ScalarVF) && | ||||||||||||||||||||||||||||||||||||||
"More than a single plan/VF w/o any plan having scalar VF"); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
InstructionCost ScalarCost = | ||||||||||||||||||||||||||||||||||||||
computeCost(getBestPlanFor(ElementCount::getFixed(1)), ScalarVF); | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||||||||||||||||||||||||||||
VectorizationFactor BestFactor(ScalarVF, ScalarCost, ScalarCost); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled; | ||||||||||||||||||||||||||||||||||||||
if (ForceVectorization) { | ||||||||||||||||||||||||||||||||||||||
// Ignore scalar width, because the user explicitly wants vectorization. | ||||||||||||||||||||||||||||||||||||||
// Initialize cost to max so that VF = 2 is, at least, chosen during cost | ||||||||||||||||||||||||||||||||||||||
// evaluation. | ||||||||||||||||||||||||||||||||||||||
BestFactor.Cost = InstructionCost::getMax(); | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
for (auto &P : VPlans) { | ||||||||||||||||||||||||||||||||||||||
for (ElementCount VF : P->vectorFactors()) { | ||||||||||||||||||||||||||||||||||||||
if (VF.isScalar()) | ||||||||||||||||||||||||||||||||||||||
continue; | ||||||||||||||||||||||||||||||||||||||
InstructionCost Cost = computeCost(*P, VF); | ||||||||||||||||||||||||||||||||||||||
VectorizationFactor CurrentFactor(VF, Cost, ScalarCost); | ||||||||||||||||||||||||||||||||||||||
if (isMoreProfitable(CurrentFactor, BestFactor)) { | ||||||||||||||||||||||||||||||||||||||
BestFactor = CurrentFactor; | ||||||||||||||||||||||||||||||||||||||
BestPlan = &*P; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
BestPlan->setVF(BestFactor.Width); | ||||||||||||||||||||||||||||||||||||||
return *BestPlan; | ||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const { | ||||||||||||||||||||||||||||||||||||||
assert(count_if(VPlans, | ||||||||||||||||||||||||||||||||||||||
[VF](const VPlanPtr &Plan) { return Plan->hasVF(VF); }) == | ||||||||||||||||||||||||||||||||||||||
|
@@ -10253,8 +10370,15 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||||||||||||||||||||||||||||||||||||
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI, | ||||||||||||||||||||||||||||||||||||||
PSI, Checks); | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
VPlan &BestPlan = LVP.getBestPlanFor(VF.Width); | ||||||||||||||||||||||||||||||||||||||
LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false); | ||||||||||||||||||||||||||||||||||||||
VPlan &BestPlan = LVP.getBestPlan(); | ||||||||||||||||||||||||||||||||||||||
assert(size(BestPlan.vectorFactors()) == 1 && | ||||||||||||||||||||||||||||||||||||||
"Plan should have a single VF"); | ||||||||||||||||||||||||||||||||||||||
ElementCount Width = *BestPlan.vectorFactors().begin(); | ||||||||||||||||||||||||||||||||||||||
LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width | ||||||||||||||||||||||||||||||||||||||
<< "\n"); | ||||||||||||||||||||||||||||||||||||||
assert(VF.Width == Width && | ||||||||||||||||||||||||||||||||||||||
"VPlan cost model and legacy cost model disagreed"); | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth adding a comment in LVP::selectVectorizationFactor(), which selects the best VF based on legacy cost model, that it is destined to retire once computing the best VF based on VPlan costs is confirmed to agree and stabilizes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a comment to both the call site and header for selectVectorizationFactor; with this patch, it is only used to cross-check the VPlan-based one, but the VPlan-based one will pick the plan to execute via getBestPlan in the main code vector code path (epilogue vectorization code path is not updated yet) |
||||||||||||||||||||||||||||||||||||||
LVP.executePlan(Width, IC, BestPlan, LB, DT, false); | ||||||||||||||||||||||||||||||||||||||
++LoopsVectorized; | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
// Add metadata to disable runtime unrolling a scalar loop when there | ||||||||||||||||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -52,6 +52,7 @@ using namespace llvm::VPlanPatternMatch; | |||||||||
namespace llvm { | ||||||||||
extern cl::opt<bool> EnableVPlanNativePath; | ||||||||||
} | ||||||||||
extern cl::opt<unsigned> ForceTargetInstructionCost; | ||||||||||
|
||||||||||
#define DEBUG_TYPE "vplan" | ||||||||||
|
||||||||||
|
@@ -730,6 +731,89 @@ void VPRegionBlock::execute(VPTransformState *State) { | |||||||||
State->Instance.reset(); | ||||||||||
} | ||||||||||
|
||||||||||
static InstructionCost computeCostForRecipe(VPRecipeBase *R, ElementCount VF, | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this be folded into Recipe::computeCost()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so, as VPRecipeBase::computeCost has the generic implementation to fall back on the legacy CM. Subclasses implementing it would all need to invoke There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A public non-virtual There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved as suggested, thanks! |
||||||||||
VPCostContext &Ctx) { | ||||||||||
if (auto *S = dyn_cast<VPSingleDefRecipe>(R)) { | ||||||||||
auto *UI = dyn_cast_or_null<Instruction>(S->getUnderlyingValue()); | ||||||||||
if (UI && Ctx.skipCostComputation(UI)) | ||||||||||
return 0; | ||||||||||
} | ||||||||||
|
||||||||||
InstructionCost RecipeCost = R->computeCost(VF, Ctx); | ||||||||||
if (ForceTargetInstructionCost.getNumOccurrences() > 0 && | ||||||||||
RecipeCost.isValid()) | ||||||||||
RecipeCost = InstructionCost(ForceTargetInstructionCost); | ||||||||||
|
||||||||||
LLVM_DEBUG({ | ||||||||||
dbgs() << "Cost of " << RecipeCost << " for VF " << VF << ": "; | ||||||||||
R->dump(); | ||||||||||
}); | ||||||||||
return RecipeCost; | ||||||||||
} | ||||||||||
|
||||||||||
InstructionCost VPBasicBlock::computeCost(ElementCount VF, VPCostContext &Ctx) { | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should VPlan and VPBlockBase have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||
InstructionCost Cost = 0; | ||||||||||
for (VPRecipeBase &R : *this) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
Cost += computeCostForRecipe(&R, VF, Ctx); | ||||||||||
return Cost; | ||||||||||
} | ||||||||||
|
||||||||||
InstructionCost VPRegionBlock::computeCost(ElementCount VF, | ||||||||||
VPCostContext &Ctx) { | ||||||||||
InstructionCost Cost = 0; | ||||||||||
if (!isReplicator()) { | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||
for (VPBlockBase *Block : vp_depth_first_shallow(getEntry())) | ||||||||||
Cost += Block->computeCost(VF, Ctx); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add cost of backedge here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||
return Cost; | ||||||||||
} | ||||||||||
|
||||||||||
// Compute the cost of a replicate region. Replicating isn't supported for | ||||||||||
// scalable vectors, return an invalid cost for them. | ||||||||||
if (VF.isScalable()) | ||||||||||
return InstructionCost::getInvalid(); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it isn't supported, should it be (prevented and) asserted, instead of built and cost invalidated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but at the moment this is done via the cost. Might be worth to adjust (e.g. bail out during VPlan construction), but best done separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth leaving behind a TODO? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! |
||||||||||
|
||||||||||
// First compute the cost of the conditionally executed recipes, followed by | ||||||||||
// account for the branching cost, except if the mask is a header mask or | ||||||||||
// uniform condition. | ||||||||||
using namespace llvm::VPlanPatternMatch; | ||||||||||
VPBasicBlock *Then = cast<VPBasicBlock>(getEntry()->getSuccessors()[0]); | ||||||||||
for (VPRecipeBase &R : *Then) | ||||||||||
Cost += computeCostForRecipe(&R, VF, Ctx); | ||||||||||
|
||||||||||
// Note the cost estimates below closely match the current legacy cost model. | ||||||||||
auto *BOM = cast<VPBranchOnMaskRecipe>(&getEntryBasicBlock()->front()); | ||||||||||
VPValue *Cond = BOM->getOperand(0); | ||||||||||
|
||||||||||
// Check if Cond is a uniform compare or a header mask and don't account for | ||||||||||
// branching costs. A uniform condition correspondings to a single branch per | ||||||||||
// VF, and the header mask will always be true except in the last iteration. | ||||||||||
VPValue *Op; | ||||||||||
bool IsHeaderMaskOrUniformCond = | ||||||||||
vputils::isUniformBoolean(Cond) || isa<VPActiveLaneMaskPHIRecipe>(Cond) || | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also worth capturing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||
match(Cond, m_ActiveLaneMask(m_VPValue(), m_VPValue())) || | ||||||||||
(match(Cond, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue(Op))) && | ||||||||||
Op == getPlan()->getOrCreateBackedgeTakenCount()); | ||||||||||
if (IsHeaderMaskOrUniformCond) | ||||||||||
return Cost; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||
|
||||||||||
// For the scalar case, we may not always execute the original predicated | ||||||||||
// block, Thus, scale the block's cost by the probability of executing it. | ||||||||||
// blockNeedsPredication from Legal is used so as to not include all blocks in | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blockNeedsPredication is no longer used here, which only checks if VF is scalar. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks |
||||||||||
// tail folded loops. | ||||||||||
if (VF.isScalar()) | ||||||||||
return Cost / 2; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that getReciprocalPredBlockProb() should be used, which currently returns 2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, required moving it to a header. |
||||||||||
|
||||||||||
// Add the cost for branches around scalarized and predicated blocks. | ||||||||||
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput; | ||||||||||
auto *Vec_i1Ty = VectorType::get(IntegerType::getInt1Ty(Ctx.Ctx), VF); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: may be clearer to do
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||
return Cost + | ||||||||||
Ctx.TTI.getScalarizationOverhead( | ||||||||||
Vec_i1Ty, APInt::getAllOnes(VF.getFixedValue()), | ||||||||||
/*Insert*/ false, /*Extract*/ true, CostKind) + | ||||||||||
(Ctx.TTI.getCFInstrCost(Instruction::Br, CostKind) * | ||||||||||
VF.getFixedValue()); | ||||||||||
} | ||||||||||
|
||||||||||
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) | ||||||||||
void VPRegionBlock::print(raw_ostream &O, const Twine &Indent, | ||||||||||
VPSlotTracker &SlotTracker) const { | ||||||||||
|
@@ -900,6 +984,13 @@ void VPlan::execute(VPTransformState *State) { | |||||||||
} | ||||||||||
} | ||||||||||
|
||||||||||
InstructionCost VPlan::computeCost(ElementCount VF, VPCostContext &Ctx) { | ||||||||||
InstructionCost Cost = 0; | ||||||||||
for (VPBlockBase *Block : vp_depth_first_shallow(getEntry())) | ||||||||||
Cost += Block->computeCost(VF, Ctx); | ||||||||||
return Cost; | ||||||||||
} | ||||||||||
|
||||||||||
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) | ||||||||||
void VPlan::printLiveIns(raw_ostream &O) const { | ||||||||||
VPSlotTracker SlotTracker(this); | ||||||||||
|
@@ -1472,3 +1563,15 @@ VPValue *vputils::getOrCreateVPValueForSCEVExpr(VPlan &Plan, const SCEV *Expr, | |||||||||
Plan.addSCEVExpansion(Expr, Expanded); | ||||||||||
return Expanded; | ||||||||||
} | ||||||||||
|
||||||||||
bool vputils::isUniformBoolean(VPValue *Cond) { | ||||||||||
if (match(Cond, m_Not(m_VPValue()))) | ||||||||||
Cond = Cond->getDefiningRecipe()->getOperand(0); | ||||||||||
auto *R = Cond->getDefiningRecipe(); | ||||||||||
if (!R) | ||||||||||
return true; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth adding a TODO to match additional patterns preserving uniformity of booleans, e.g., AND/OR/etc.? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||
return match(R, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue())) && | ||||||||||
all_of(R->operands(), [](VPValue *Op) { | ||||||||||
return vputils::isUniformAfterVectorization(Op); | ||||||||||
Comment on lines
+1531
to
+1532
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: may be simpler the check the two operands of ICmp directly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||
}); | ||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!