Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][LV] VPlan-based cost model #67647

Closed
wants to merge 3 commits into from
Closed

Conversation

arcbbb
Copy link
Contributor

@arcbbb arcbbb commented Sep 28, 2023

This patch follows D89322 to add an initial skeleton of vplan-based cost model.

This difference is that instead of incorporating a cost() interface to VPRecipes,
all cost implementations are put together in VPlanCostModel.

This allows VPlanCostModel to concentrate on assigning costs to vplan,
thus seprating the cost model code from the vplan IR, similar to LLVM IR cost
modeling.

Please let me know if you agree with the main idea of this patch.
If there is a general consensus, I'll proceed to implement the cost for the
other recipes for review.

Differential Revision: https://reviews.llvm.org/D158716

  • Address comments
  • Move VPCM object outside of the loop
  • Add getElementType() and getReturnElementType() for VPRecipe

Directly compute the cost of a VPlan after construction and track it
together with a plan. This allows moving selecting the best VF to the
planner. This seems to be a good fit anyways, and removes code from the
cost-model that is not directly related to assigning costs to a specific
plan/VF. Later this can be swapped out with computing the cost for a
plan directly.

This may help to simplify D142015.

Differential Revision: https://reviews.llvm.org/D143938
This patch follows D89322 to add an initial skeleton of vplan-based cost model.

This difference is that instead of incorporating a cost() interface to VPRecipes,
all cost implementations are put together in VPlanCostModel.

This allows VPlanCostModel to concentrate on assigning costs to vplan,
thus seprating the cost model code from the vplan IR, similar to LLVM IR cost
modeling.

During the transition, it will still use the legacy model to obtain cost until
all cost calculation for recipes are implemented.

Please let me know if you agree with the main idea of this patch.
If there is a general consensus, I'll proceed to implement the cost for the
other recipes for review.

Differential Revision: https://reviews.llvm.org/D158716

- Address comments
- Move VPCM object outside of the loop
- Add getElementType() and getReturnElementType()
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.

@@ -0,0 +1,71 @@
//===- SiFive_VPlanCostModel.cpp - Vectorizer Cost Model ------------------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/SiFive_//

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wrong headers

@rengolin rengolin self-requested a review September 28, 2023 17:41
@rengolin
Copy link
Member

General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.

Indeed. Similar comment on the original Phab thread. Needs existing tests with flag to not change behaviour based on new cost model.

@fhahn
Copy link
Contributor

fhahn commented Oct 2, 2023

Thanks for sharing this as PR, it is great to see people interested in pushing in that direction!

Some high-level thoughts:

  • Using the VPlan cost model to pick the vectorization factor seems like a great idea!
  • Bring-up should happen on-by-default and gradually; if there are any issues, we want to know about it ASAP, otherwise it might be very very difficult to flip the switch.
  • If possible, the VPlan based cost model implementation should be completely separated from the legacy cost model, to make sure the VPlan cost model can stand on its own (and to avoid temptation)
  • The cost model should be as independent of the underlying IR as possible; some recipes may not have an underlying instruction to use and even if they do, the type info may not be correct (e.g. when applying truncation as VPlan-to-VPlan transform, https://reviews.llvm.org/D149903)
  • I think it makes sense to have the code to compute costs for recipes in the recipe itself; the recipe already knows how to generate the code, so it seems like the right place to also compute the cost of the generated code; that makes it even more important to be completely independent of the legacy cost model.

I tried to sketch an alternative structure in #67934 which addresses some of those points. It consists of a type-inference for VPValues as build block.

It only implements costs for VPWidenRecipe and VPWidenIntOrFpInductionRecipe (only limited to compute the same cost as the legacy cost model). Those are used by a getBestVPlan helper in LoopVectorizationPlanner, which asks recipes for their cost and if not implemented yet falls back to using the legacy cost model as a stop-gap until all recipes implement computing their cost. For testing, the patch uses getBestPlan to get the VPlan to execute, combined with an assertion that the most profitable VF matches the VF computed by the legacy cost model. At the moment, this passes for SPEC2017,MultiSource,SingleSource on AArch64.

Before filling out all the cost implementations, I think we should converge on the overall structure. This PR already implements costs for several recipes, which is great as hopefully we will be able to fill in the gaps soon!

@arcbbb
Copy link
Contributor Author

arcbbb commented Oct 3, 2023

Thanks for sharing this as PR, it is great to see people interested in pushing in that direction!

Some high-level thoughts:

  • Using the VPlan cost model to pick the vectorization factor seems like a great idea!
  • Bring-up should happen on-by-default and gradually; if there are any issues, we want to know about it ASAP, otherwise it might be very very difficult to flip the switch.
  • If possible, the VPlan based cost model implementation should be completely separated from the legacy cost model, to make sure the VPlan cost model can stand on its own (and to avoid temptation)
  • The cost model should be as independent of the underlying IR as possible; some recipes may not have an underlying instruction to use and even if they do, the type info may not be correct (e.g. when applying truncation as VPlan-to-VPlan transform, https://reviews.llvm.org/D149903)
  • I think it makes sense to have the code to compute costs for recipes in the recipe itself; the recipe already knows how to generate the code, so it seems like the right place to also compute the cost of the generated code; that makes it even more important to be completely independent of the legacy cost model.

I tried to sketch an alternative structure in #67934 which addresses some of those points. It consists of a type-inference for VPValues as build block.

It only implements costs for VPWidenRecipe and VPWidenIntOrFpInductionRecipe (only limited to compute the same cost as the legacy cost model). Those are used by a getBestVPlan helper in LoopVectorizationPlanner, which asks recipes for their cost and if not implemented yet falls back to using the legacy cost model as a stop-gap until all recipes implement computing their cost. For testing, the patch uses getBestPlan to get the VPlan to execute, combined with an assertion that the most profitable VF matches the VF computed by the legacy cost model. At the moment, this passes for SPEC2017,MultiSource,SingleSource on AArch64.

Before filling out all the cost implementations, I think we should converge on the overall structure. This PR already implements costs for several recipes, which is great as hopefully we will be able to fill in the gaps soon!

I agree that using the legacy cost model during the transition period is a prudent approach to ensure stability,
and I appreciate the alternative approach on type inference - seems like a better way to go!

About computing cost in the recipe itself, I see your point.
In addition to compute the throughput cost, we might consider other costs like register spills and loop exit overhead, so I prefer a separate class for these cost computation to avoid the need to increase the number of the VPRecipeBase interfaces and treat it as a kind of IR.

Do you want to get your PR reviewed and merged first?

Here's hoping we can get those implementations filled in soon!

@fhahn
Copy link
Contributor

fhahn commented Oct 13, 2023

Do you want to get your PR reviewed and merged first?

I think it would be good to make sure we get the right foundation in first. I created a separate PR for the type analysis only to start with: #69013

About computing cost in the recipe itself, I see your point.
In addition to compute the throughput cost, we might consider other costs like register spills and loop exit overhead, so I prefer a separate class for these cost computation to avoid the need to increase the number of the VPRecipeBase interfaces and treat it as a kind of IR.

I think it makes sense to consider those additional cases once we get to it.

@arcbbb
Copy link
Contributor Author

arcbbb commented Oct 17, 2023

I think it would be good to make sure we get the right foundation in first. I created a separate PR for the type analysis only to start with: #69013
I think it makes sense to consider those additional cases once we get to it.

I agree, let's focus on the migration for now.

@nikolaypanchenko
Copy link
Contributor

Thanks for sharing this as PR, it is great to see people interested in pushing in that direction!

Some high-level thoughts:

  • Using the VPlan cost model to pick the vectorization factor seems like a great idea!
  • Bring-up should happen on-by-default and gradually; if there are any issues, we want to know about it ASAP, otherwise it might be very very difficult to flip the switch.
  • If possible, the VPlan based cost model implementation should be completely separated from the legacy cost model, to make sure the VPlan cost model can stand on its own (and to avoid temptation)
  • The cost model should be as independent of the underlying IR as possible; some recipes may not have an underlying instruction to use and even if they do, the type info may not be correct (e.g. when applying truncation as VPlan-to-VPlan transform, https://reviews.llvm.org/D149903)
  • I think it makes sense to have the code to compute costs for recipes in the recipe itself; the recipe already knows how to generate the code, so it seems like the right place to also compute the cost of the generated code; that makes it even more important to be completely independent of the legacy cost model.

I tried to sketch an alternative structure in #67934 which addresses some of those points. It consists of a type-inference for VPValues as build block.

It only implements costs for VPWidenRecipe and VPWidenIntOrFpInductionRecipe (only limited to compute the same cost as the legacy cost model). Those are used by a getBestVPlan helper in LoopVectorizationPlanner, which asks recipes for their cost and if not implemented yet falls back to using the legacy cost model as a stop-gap until all recipes implement computing their cost. For testing, the patch uses getBestPlan to get the VPlan to execute, combined with an assertion that the most profitable VF matches the VF computed by the legacy cost model. At the moment, this passes for SPEC2017,MultiSource,SingleSource on AArch64.

Before filling out all the cost implementations, I think we should converge on the overall structure. This PR already implements costs for several recipes, which is great as hopefully we will be able to fill in the gaps soon!

@fhahn
Even though I understand adding ::computeCost to VPlan/VPRecipe/VPInstruction/etc is aligned with current design of a VPlan which has ::execute methods there. However, cost model is usually what downstream compilers modify and may not want to upstream due to the policies. With that, a dedicated object for the cost model that takes VPlan and computes the cost of it makes life easier for downstream compilers to modify code without adding to much pain to pulldowns.
That is, if upstream compiler has a base version

class VPlanCostModel {
  virtual int64_t getCost(VPlanPtr);
  virtual int64_t getCost(VPInstruction *VPInst);
  ...
};

downstream compiler, if it wants to have its own implementation may override some of these functions:

class VPlanCostModelDownstream : public VPlanCostModel {
  int64_t getCost(VPInstruction *VPInst) override;
  ...
};

with minor change when VPCM is constructed.
In both cases extra object is necessary, but entire state/context of the cost model is isolated in VPlanCostModel.

btw, do you know why ::execute methods were added to a VPlan instead of having something like VPlanCodeGen ?

@nikolaypanchenko
Copy link
Contributor

@fhahn ping

Copy link
Member

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.

Indeed. Similar comment on the original Phab thread. Needs existing tests with flag to not change behaviour based on new cost model.

The cost model has moved inside the VPlan infrastructure and is no longer called from the same place in the same way. Please provide evidence that, even without the vplan-based cost model flag, the behaviour of the vectorizer hasn't changed.

Performance numbers on some benchmarks, traces on the test-suite showing what gets vectorized and what doesn't, etc.

If this is just about the migration, then it should behave in the same way as before without the optional flag.

@@ -0,0 +1,71 @@
//===- SiFive_VPlanCostModel.cpp - Vectorizer Cost Model ------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wrong headers

@@ -370,6 +388,25 @@ class LoopVectorizationPlanner {
void adjustRecipesForReductions(VPBasicBlock *LatchVPBB, VPlanPtr &Plan,
VPRecipeBuilder &RecipeBuilder,
ElementCount MinVF);

/// Returns true when Factor A is more profitable than Factor B.
bool isMoreProfitable(const VectorizationFactor &A,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have guessed this is part of the cost model, not the planner.

assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop");
assert(VFCandidates.count(ElementCount::getFixed(1)) &&
"Expected Scalar VF to be a candidate");
bool LoopVectorizationPlanner::isMoreProfitable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since working with VPlan is cheaper and we'll be doing a lot more cost model checks and we'll be comparing the same VPlan against other VPlans, I suggest we cache the cost of VPlans and only compute the ones that haven't been done yet.

Otherwise we'll be re-running the cost checks across the same VPlans over and over.

@fhahn
Copy link
Contributor

fhahn commented Apr 28, 2024

@fhahn Even though I understand adding ::computeCost to VPlan/VPRecipe/VPInstruction/etc is aligned with current design of a VPlan which has ::execute methods there. However, cost model is usually what downstream compilers modify and may not want to upstream due to the policies. With that, a dedicated object for the cost model that takes VPlan and computes the cost of it makes life easier for downstream compilers to modify code without adding to much pain to pulldowns. That is, if upstream compiler has a base version

class VPlanCostModel {
  virtual int64_t getCost(VPlanPtr);
  virtual int64_t getCost(VPInstruction *VPInst);
  ...
};

downstream compiler, if it wants to have its own implementation may override some of these functions:

class VPlanCostModelDownstream : public VPlanCostModel {
  int64_t getCost(VPInstruction *VPInst) override;
  ...
};

with minor change when VPCM is constructed. In both cases extra object is necessary, but entire state/context of the cost model is isolated in VPlanCostModel.

At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what ::execute will generate) and target-specific info is pulled in via TTI only. In that sense, I think it makes sense to keep them defined in the same place. Do you by any chance have some examples where TTI would not be sufficient for downstream customizations?

In general, I think llvm tries to limit target-specific customizations in the middle-end to TTI and possibly via intrinsics to encourage as much code-sharing upstream as possible. I am not aware of other areas in the middle-end where other means are provided to customize for downstream users explicitly, although I may be missing something and I don't think it is an official policy that's spelled out anywhere.

btw, do you know why ::execute methods were added to a VPlan instead of having something like VPlanCodeGen ?

I think the original motivation was to have codegen for each recipe in the recipes directly, as it is strongly tied to the semantics of the recipe and ensure that it is easy to see what the implementation of the recipe looks like in terms of codegen.

General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.
The cost model has moved inside the VPlan infrastructure and is no longer called from the same place in the same way. Please provide evidence that, even without the vplan-based cost model flag, the behaviour of the vectorizer hasn't changed.

Performance numbers on some benchmarks, traces on the test-suite showing what gets vectorized and what doesn't, etc.

If this is just about the migration, then it should behave in the same way as before without the optional flag.

Finally had time to get #67934 updated to a version that has the VPlan-based and legacy cost-models agree on the VF for large code-bases (llvm-test-suite, SPEC2017, Clang bootstrap) and multiple configurations (AArch64 with/wo SVE, with/wo -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue, X86 with AVX512). In the PR, the VPlan-based cost-model is used to select the VF and best plan. Note it still falls back to the legacy cost-model for most cases (VPlan-based costing is implemented for VPWidenRecipe initially), but allows gradually transitioning.

@nikolaypanchenko
Copy link
Contributor

General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.

Indeed. Similar comment on the original Phab thread. Needs existing tests with flag to not change behaviour based on new cost model.

The cost model has moved inside the VPlan infrastructure and is no longer called from the same place in the same way. Please provide evidence that, even without the vplan-based cost model flag, the behaviour of the vectorizer hasn't changed.

Performance numbers on some benchmarks, traces on the test-suite showing what gets vectorized and what doesn't, etc.

If this is just about the migration, then it should behave in the same way as before without the optional flag.

Is there a list of benchmarks we should care about for the community ? Spec2k6, tsvc ?
To me to enable vplan-based cost model in one shot is not a trivial work, not only because of benchmarks, but also because of multiple targets. We can enable it by targets only and take a responsibility to do that for RISC-V first.

@fhahn
Copy link
Contributor

fhahn commented Apr 30, 2024

Is there a list of benchmarks we should care about for the community ? Spec2k6, tsvc ? To me to enable vplan-based cost model in one shot is not a trivial work, not only because of benchmarks, but also because of multiple targets. We can enable it by targets only and take a responsibility to do that for RISC-V first.

After my recent update to #67934, I think we should be able to gradually migrate to the VPlan-based cost-model on a per-recipe basis, starting with VPWidenRecipe. Other recipes can be converted gradually, reducing the potential number of regressions.

As I mentioned earlier, with #67934 , the chosen VFs agree with the legacy cost model for a large test set (llvm-test-suite (includes tsvc and many others) , SPEC2017, clang bootstrap, some large internal projects on multiple platforms and configs, more details in my previous comment #67647 (comment)). There's no pre-defined set of test suites I think, but the ones I mentioned should cover a lot of ground.

@nikolaypanchenko
Copy link
Contributor

nikolaypanchenko commented Apr 30, 2024

At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what ::execute will generate) and target-specific info is pulled in via TTI only. In that sense, I think it makes sense to keep them defined in the same place. Do you by any chance have some examples where TTI would not be sufficient for downstream customizations?

Not sure I follow TTI reason. llvm::Instruction does not have getCost or execute methods, unlike VPRecipe. So I do see TTI/BasicTTI as as dedicated object to compute cost of instruction(s). Unlike vectorizer's cost model, TTI was done to be caller-agonostic as much as possible, so it's caller's responsibility to estimate other context-related information.
Regarding TTI's downstream customization, I'd say it first important to notice X86, ARM, RISC-V etc have their own TTI's implementation. At least for RISC-V we didn't find a reason to have our own TTI, however that is possible to have vendor-specific TTI.
My points about separate object for cost model are:

  • separation of concerns: VPlan represents possible vectorization, cost model estimates cost of that vectorization.
  • clear single instance to estimate the cost. Unless context does not matter, VPRecipe::getCost has to accept some "state", so I don't see how it's going to be beneficial from encapsulation point of view.
    Regarding vendor-specific heuristic, as I mentioned earlier, separate object to evaluate the cost does not mean absence of a default cost model that vendor can use as a base class: BasicTTI -> RISCVTTI

Similar goes to execute and selectiondag.

In general, I think llvm tries to limit target-specific customizations in the middle-end to TTI and possibly via intrinsics to encourage as much code-sharing upstream as possible. I am not aware of other areas in the middle-end where other means are provided to customize for downstream users explicitly, although I may be missing something and I don't think it is an official policy that's spelled out anywhere.

Yeah, I can understand why "downstream has to deal with it" is beneficial for upstream. My biggest worry that with a variety of different hws, that might become a big concern for everyone in a long run.

I think the original motivation was to have codegen for each recipe in the recipes directly, as it is strongly tied to the semantics of the recipe and ensure that it is easy to see what the implementation of the recipe looks like in terms of codegen.

Agree, that sounds like a reasonable view. At the same time recipe itself has a well-defined semantics and there could be few ways to generate code for it.

@rengolin
Copy link
Member

There's no pre-defined set of test suites I think, but the ones I mentioned should cover a lot of ground.

Indeed, this isn't about performance versus each other, but making sure the cost model choices aren't different by looking at the debug traces while running over those programs. They are, so we're good.

@arcbbb
Copy link
Contributor Author

arcbbb commented May 6, 2024

At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what ::execute will generate) and target-specific info is pulled in via TTI only. In that sense, I think it makes sense to keep them defined in the same place. Do you by any chance have some examples where TTI would not be sufficient for downstream customizations?

I'm inclined to proceed with your approach outlined in #67934. However, I believe it's important to address the ongoing discussion in this thread regarding the potential separation of the cost model from VPlan.

@nikolaypanchenko's observation on TTI being caller-agnostic highlights the need for callers to estimate additional context-related information.
Expanding on this, while TTI does offer target-specific information at the instruction level, the consideration for cost could extend beyond a recipe level and cover basic block and cross basic-block levels. By separating the cost model from VPlan, we can integrate heuristics at both levels, including considerations like register pressure, spills, and fills estimates. Such an approach aligns with our overall objectives by providing flexibility and granularity in our cost modeling approach.
Would you agree that this separation could enhance our cost modeling capabilities?

@fhahn
Copy link
Contributor

fhahn commented Jun 7, 2024

Not sure I follow TTI reason. llvm::Instruction does not have getCost or execute methods, unlike VPRecipe. So I do see TTI/BasicTTI as as dedicated object to compute cost of instruction(s). Unlike vectorizer's cost model, TTI was done to be caller-agonostic as much as possible, so it's caller's responsibility to estimate other context-related information. Regarding TTI's downstream customization, I'd say it first important to notice X86, ARM, RISC-V etc have their own TTI's implementation. At least for RISC-V we didn't find a reason to have our own TTI, however that is possible to have vendor-specific TTI. My points about separate object for cost model are:

  • separation of concerns: VPlan represents possible vectorization, cost model estimates cost of that vectorization.
  • clear single instance to estimate the cost. Unless context does not matter, VPRecipe::getCost has to accept some "state", so I don't see how it's going to be beneficial from encapsulation point of view.
    Regarding vendor-specific heuristic, as I mentioned earlier, separate object to evaluate the cost does not mean absence of a default cost model that vendor can use as a base class: BasicTTI -> RISCVTTI

With regards to TTI, recipes are defined in terms of IR instructions they generate, so defining the cost of the recipe next to the place where code is generated seems naturally (and I think this also matches the original intended design). We need some state, but that should be only limited to TTI and some analysis like type inference eventually

Yeah, I can understand why "downstream has to deal with it" is beneficial for upstream. My biggest worry that with a variety of different hws, that might become a big concern for everyone in a long run.

I also mention this in the response below, but I think the first step doesn't mean that everything needs to fit the initial structure in the long run. Once there's a concrete use case that benefits from a different structure we can adjust, while making sure that there's a clear improvement to the current state that's testable.

At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what ::execute will generate) and target-specific info is pulled in via TTI only. In that sense, I think it makes sense to keep them defined in the same place. Do you by any chance have some examples where TTI would not be sufficient for downstream customizations?

I'm inclined to proceed with your approach outlined in #67934. However, I believe it's important to address the ongoing discussion in this thread regarding the potential separation of the cost model from VPlan.

@nikolaypanchenko's observation on TTI being caller-agnostic highlights the need for callers to estimate additional context-related information. Expanding on this, while TTI does offer target-specific information at the instruction level, the consideration for cost could extend beyond a recipe level and cover basic block and cross basic-block levels. By separating the cost model from VPlan, we can integrate heuristics at both levels, including considerations like register pressure, spills, and fills estimates. Such an approach aligns with our overall objectives by providing flexibility and granularity in our cost modeling approach. Would you agree that this separation could enhance our cost modeling capabilities?

The initial VPlan-based cost model should IMO be solely focused on evaluating the cost per recipe, as this directly maps to the generated IR for which we can query cost. One of the design goals is to model concepts explicitly in VPlan, to avoid having the look across recipes, regions or blocks to compute an accurate cost for any given recipe (e.g. modeling interleave groups explicitly). At the moment, the only concrete example I know of where that is not possible at the moment is arithmetic vector instructions that also can do a sign/zero-extend of some of the operands.

Currently TTI tries to solve this via looking at users of a given instructions to determine if a cast is free or not. This won’t work with VPlan, as TTI cannot traverse VPlan’s def-use chains (nor should it IMO). To address this, this could be modeled by a transform that replaces zext/add recipes with a widening add recipe for targets that have that capability.

There will be other analysis that are not interested in the cost of a single recipe, but other metrics, e.g. resource usage or register pressure when deciding on the interleave count. But those seem to be completely separate from assigning a cost per instruction and could be done completely separately.

Switching to a VPlan-based cost-model is likely to cause some disruption if we aren’t careful, so I think to start with we should focus on moving forward with targeted steps, with everything we bring up being used and tested by default. If in the future there are concrete cases where we could benefit from a different structure in terms of where cost is computed and how, we should re-evaluate once we are at this point.

@fhahn
Copy link
Contributor

fhahn commented Jun 7, 2024

Just to close the loop here as well, #92555 is the new version based on #67934 with the interfaces refactored/rearranged a bit.

fhahn added a commit that referenced this pull request Jun 13, 2024
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to 
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
#67647 and 
#67934 which is an earlier
version of the current PR.


PR: #92555
@nikolaypanchenko
Copy link
Contributor

With regards to TTI, recipes are defined in terms of IR instructions they generate, so defining the cost of the recipe next to the place where code is generated seems naturally (and I think this also matches the original intended design). We need some state, but that should be only limited to TTI and some analysis like type inference eventually

Isn't it more about code placement ? CM::getCost(VPInstruction *) can be placed near to the execute function to satisfy that requirement. Similarly, it's possible to just place VPInstruction::getCost in a separate file and miss it at code review.

I also mention this in the response below, but I think the first step doesn't mean that everything needs to fit the initial structure in the long run. Once there's a concrete use case that benefits from a different structure we can adjust, while making sure that there's a clear improvement to the current state that's testable.

I still not sure I understand what's a big no-no of a dedicated class to evaluate the cost ? What's a big yes-yes for your proposed approach ?

One of the design goals is to model concepts explicitly in VPlan, to avoid having the look across recipes, regions or blocks to compute an accurate cost for any given recipe (e.g. modeling interleave groups explicitly). At the moment, the only concrete example I know of where that is not possible at the moment is arithmetic vector instructions that also can do a sign/zero-extend of some of the operands.

There're multiple cases where "look across recipes" are needed. The most important one, which already exists in some form, is estimation of register pressure.

The initial VPlan-based cost model should IMO be solely focused on evaluating the cost per recipe, as this directly maps to the generated IR for which we can query cost.
There will be other analysis that are not interested in the cost of a single recipe, but other metrics, e.g. resource usage or register pressure when deciding on the interleave count. But those seem to be completely separate from assigning a cost per instruction and could be done completely separately.

Switching to a VPlan-based cost-model is likely to cause some disruption if we aren’t careful, so I think to start with we should focus on moving forward with targeted steps, with everything we bring up being used and tested by default. If in the future there are concrete cases where we could benefit from a different structure in terms of where cost is computed and how, we should re-evaluate once we are at this point.

I'd say it's fine, but since your plan is to introduce VPCostContext, which is another dedicated object to construct and carry, that seems to be already a plan to not only compute cost per recipe.

Just to close the loop here as well, #92555 is the new version based on #67934 with the interfaces refactored/rearranged a bit.

I do think something like that needs to have a broader discussion, not only between couple of people. Tuesday meeting seems to be the right one.

@fhahn
Copy link
Contributor

fhahn commented Jun 13, 2024

With regards to TTI, recipes are defined in terms of IR instructions they generate, so defining the cost of the recipe next to the place where code is generated seems naturally (and I think this also matches the original intended design). We need some state, but that should be only limited to TTI and some analysis like type inference eventually

Isn't it more about code placement ? CM::getCost(VPInstruction *) can be placed near to the execute function to satisfy that requirement. Similarly, it's possible to just place VPInstruction::getCost in a separate file and miss it at code review.

Sure, in the end the whole discussion boils down to code placement.

I also mention this in the response below, but I think the first step doesn't mean that everything needs to fit the initial structure in the long run. Once there's a concrete use case that benefits from a different structure we can adjust, while making sure that there's a clear improvement to the current state that's testable.

I still not sure I understand what's a big no-no of a dedicated class to evaluate the cost ? What's a big yes-yes for your proposed approach ?

I don't think there's a big yes-yes or no-no for either. I think the most important bit is getting started on transitioning the cost computations without regressions; using overloading is probably slightly more compact than having a separate cost model class with some kind of type switch initially, but I don't feel strongly about it. I mostly have been trying to better understand the benefit/desire to have it separate.

There're multiple cases where "look across recipes" are needed. The most important one, which already exists in some form, is estimation of register pressure.

Is this the case in the current cost-model for instruction costs? We try to estimate the register pressure when picking the interleave factor, but that's completely separate from computing instruction costs at least at the moment?

The initial VPlan-based cost model should IMO be solely focused on evaluating the cost per recipe, as this directly maps to the generated IR for which we can query cost.
There will be other analysis that are not interested in the cost of a single recipe, but other metrics, e.g. resource usage or register pressure when deciding on the interleave count. But those seem to be completely separate from assigning a cost per instruction and could be done completely separately.

Switching to a VPlan-based cost-model is likely to cause some disruption if we aren’t careful, so I think to start with we should focus on moving forward with targeted steps, with everything we bring up being used and tested by default. If in the future there are concrete cases where we could benefit from a different structure in terms of where cost is computed and how, we should re-evaluate once we are at this point.

I'd say it's fine, but since your plan is to introduce VPCostContext, which is another dedicated object to construct and carry, that seems to be already a plan to not only compute cost per recipe.

I'm not sure why that would be the case? It only contains global info needed for TTI queries + state that's used temporary during the transition from the legacy CM which unfortunately seems needed during the initial bringup.

@nikolaypanchenko
Copy link
Contributor

I don't think there's a big yes-yes or no-no for either. I think the most important bit is getting started on transitioning the cost computations without regressions; using overloading is probably slightly more compact than having a separate cost model class with some kind of type switch initially, but I don't feel strongly about it. I mostly have been trying to better understand the benefit/desire to have it separate.

Understood. Having a proper separation of concerns, I mentioned earlier is a big reason to me (otherwise it's not clear what should be a part of VPlan/Recipe and what should not be: should transforms and verification be a part of VPlan ?)

Is this the case in the current cost-model for instruction costs? We try to estimate the register pressure when picking the interleave factor, but that's completely separate from computing instruction costs at least at the moment?

That's exact the reason internally we added vplan-based cost model, since current regpressure heuristic is only used to trim down VF/IF candidates, but is not used later during loop estimation. Without it loop with multiple live registers and zexts was assumed profitable.

I'm not sure why that would be the case? It only contains global info needed for TTI queries + state that's used temporary during the transition from the legacy CM which unfortunately seems needed during the initial bringup.

If we do consider basic estimation only, entire debate from my side is "tomato-tomato": in both cases object needs to be constructed:

VPCostContext Ctx(Analyses);
Plan.cost(...Ctx..., VF)

vs

VPlanCostModel CM(Analyses);
CM.cost(Plan, VF)

but when it's become essential to know context around instruction, I do see that proper separation is essential.

fhahn added a commit that referenced this pull request Jun 14, 2024
This reverts commit 46080ab.

Extra tests have been added in 52d29eb.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
#67647 and
#67934 which is an earlier
version of the current PR.

PR: #92555
@fhahn
Copy link
Contributor

fhahn commented Jun 14, 2024

I don't think there's a big yes-yes or no-no for either. I think the most important bit is getting started on transitioning the cost computations without regressions; using overloading is probably slightly more compact than having a separate cost model class with some kind of type switch initially, but I don't feel strongly about it. I mostly have been trying to better understand the benefit/desire to have it separate.

Understood. Having a proper separation of concerns, I mentioned earlier is a big reason to me (otherwise it's not clear what should be a part of VPlan/Recipe and what should not be: should transforms and verification be a part of VPlan ?)

Is this the case in the current cost-model for instruction costs? We try to estimate the register pressure when picking the interleave factor, but that's completely separate from computing instruction costs at least at the moment?

That's exact the reason internally we added vplan-based cost model, since current regpressure heuristic is only used to trim down VF/IF candidates, but is not used later during loop estimation. Without it loop with multiple live registers and zexts was assumed profitable.

Would you be able to share an IR example for the ZExt issue? It would be interesting to understand if it is similar to the AArch64 issue and in particular with Ext/other opcode combinations are involved.

(I recommitted 90fd99c to get a additional signal into cases where it diverges so I can fix those in parallel to wrapping up the discussion re how to exactly structure the code)

@nikolaypanchenko
Copy link
Contributor

Would you be able to share an IR example for the ZExt issue? It would be interesting to understand if it is similar to the AArch64 issue and in particular with Ext/other opcode combinations are involved.

Sure: https://gist.github.com/nikolaypanchenko/41279c3330d6744cec22eda571fe9a4d

fhahn added a commit that referenced this pull request Jun 20, 2024
This reverts commit 6f538f6.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7, 3be7312.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
#67647 and
#67934 which is an earlier
version of the current PR.

PR: #92555
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
This reverts commit 6f538f6.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7, 3be7312.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
llvm#67647 and
llvm#67934 which is an earlier
version of the current PR.

PR: llvm#92555
fhahn added a commit that referenced this pull request Jul 10, 2024
This reverts commit 6f538f6.

A number of crashes have been fixed by separate fixes, including
ttps://github.com//pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
#67647 and
#67934 which is an earlier
version of the current PR.

PR: #92555
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
This reverts commit 6f538f6.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
llvm#67647 and
llvm#67934 which is an earlier
version of the current PR.

PR: llvm#92555
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to 
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
llvm#67647 and 
llvm#67934 which is an earlier
version of the current PR.


PR: llvm#92555
@arcbbb
Copy link
Contributor Author

arcbbb commented Sep 26, 2024

As #92555 is merged, this can be closed.

@arcbbb arcbbb closed this Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants