Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add new Expression execution APIs #7259

Open
revans2 opened this issue Dec 5, 2022 · 0 comments
Open

[FEA] Add new Expression execution APIs #7259

revans2 opened this issue Dec 5, 2022 · 0 comments
Labels
feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Dec 5, 2022

Is your feature request related to a problem? Please describe.
In parallel with or after #7258 we should add in some new expression execution APIs to let us execute Expressions without going over memory budgets.

There are several different classes of expressions that we need to worry about.

  • Unknown expressions. These are expressions like UDFs that have no way to fit into this pattern yet. Hopefully once we have worked on this enough we can make some of these APIs public so Rapids UDFs can fit into this pattern too.
  • Up front expressions. These are expressions that we can know with high accuracy exactly how much memory that they will take to process and output a result based only on the number of rows of input. Things like Add that take a fixed input and produce a fixed output all within a single kernel fit in this category.
  • Predictable expressions. These are expressions that we can know about how much memory will be needed to process the data, but they need the actual data to do this prediction.
  • Stateful/Modifying expressions. These are expressions like IF/ELSE with children that have side effects, or many expressions that take higher order functions. These are special because they actually change the batch that is passed down to their children and keep some state around. In the case of IF/ELSE it can split the rows in almost random ways. In the case of many higher order functions, they can explode out the input data and keep original offsets as state.

There are also several expressions that are really only expressions in name, because they cannot be executed by a project. These include aggregations, window functions, and explode functions. We don't need to worry about them in this context.

The idea here is to take a few up front expressions, and a few predictable expressions and write some APIs along with implementations for them that would allow an execution framework to avoid running past a memory budget. The hard part is that some expressions, like GpuCast, can be different things in different situations so we need to make sure that the APIs are somewhat dynamic and can be selected appropriately at runtime.

The idea is for up front expressions to provide an API that would return the output size and intermediate memory used based off of an input number of rows. predictable expressions would need a new execution API where the inputs to them are passed in as parameters along with a budget, instead of passing in a ColumnarBatch and letting the expression pull the answer from their children. If the expression can complete within the allotted budget, then it would just return the answer. If it could not it would return an error, not throw an error. That would let the framework decide to take action on splitting up the inputs.

This is likely going to need to be done going back and forth with the framework to execute the code. But for now I am splitting it up into a separate issue.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 5, 2022
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Dec 6, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

3 participants