-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UDF compiler implementations #497
Conversation
Hi, should we organize this change into smaller ones to make it more reviewable? |
I think the tests could be done separately, it is nearly 1/2 of the PR. It's hard to separate the other code, but will wait to see what others think. |
udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala
Outdated
Show resolved
Hide resolved
udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala
Outdated
Show resolved
Hide resolved
@seanprime7 please have your team review this PR. |
2eb7b62
to
941b63c
Compare
@seanprime7 Please check the history, not sure if it's ok. |
941b63c
to
8eb9b03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the commit history needs to be cleaned up more. I think we should only have UDF commits in this history.
c68a5cc
to
d4965e5
Compare
So FYI, all commits are going to get squashed into a single one when we merge. I think the above looks better, though I am not sure what happened with this commit: 4499487. I think that's OK, the end result is fine. |
@wjxiz1992 lets move forward here and address comments, and we can turn this PR into a non-draft. |
d4965e5
to
c4eee7e
Compare
udf-compiler/src/main/scala/com/nvidia/spark/udf/GpuScalaUDF.scala
Outdated
Show resolved
Hide resolved
@@ -277,5 +277,73 @@ However, Spark may produce different results for a compiled udf and the non-comp | |||
|
|||
When translating UDFs to Catalyst expressions, the supported UDF functions are limited: | |||
|
|||
| Operand type | Operation | | |||
| ------------------------------------------------------------------- | ------------------| | |||
| Operand type | Operation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer it if these were multiple tables, one per "Operand type", with a description column. The description column could be redundant in most cases, but it may allow you to have a place to put extra limitations or issues for each operation supported.
Arithmetic Unary
Operation | Description |
---|---|
+x | Unary plus |
-x | Unary minus |
Arithmetic Binary
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should put those tables into another md doc. As they are related to only the udf plugin and the table can be very long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjxiz1992 Having another page for the UDF compiler may be what we end up doing, but I wouldn't worry about doing that yet. I would just continue where you are and try to follow the formatting in that page (https://nvidia.github.io/spark-rapids/docs/compatibility.html). This is very much relevant to the compatibility section.
@wjxiz1992 pls run
|
@wjxiz1992 please don't forget to update the |
@wjxiz1992 @seanprime7 I've diffed this code with what I had before. I think that it looks good from my perspective. We'll have to clean up some things like not requiring
And then enable it @seanprime7 please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @seanprime7 is OK with this change I think I am ok with it. Pending some of the comments above. We can improve some things later, but this does plug the UDF compiler at the Project
level. I think we can work on improving it in follow up PRs and issues.
abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt
* Add UDF compiler implementations * Update related docs Signed-off-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: Nicholas Edelman <nedelman@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
790f986
to
83f92eb
Compare
build |
* Add javassist as a dependency * Simple implementation of the udf -> expression translation * Use the udf -> expressions translation when possible * Replace javassist with jvmci and broaden opcode coverage * Add support for ISUB. * Restructure the code. * Add an example for UDF to expr conversion * Correctly handle loads and stores with IFLT and GOTO * Remove combineExpr As Scala lambdas cannot have multiple returns, we can simply return the expression from the return instruction without worrying about combining multiple return values. * Fix a compile error and a traversal error * Simplify condition expressions. * Remove unnecessary import lines * Make CatalystExpressionBuilder a case class * Update connectBasicBlocks to detect implicit fallthrough edges * Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements * Correctly update and propagate the entry condition for basic block * DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ * Add more condition simplification rules * Change the traversal order with the conditional branches. The false path is now traversed before the true path so we can generate the conditions in the same evaluation order in the source. * Simplify cond repeatedly until it can't be simplified. Also add somem more simplification rules. * Rename simplifyCond to simplifyExpr * DLOAD_2, DSTORE_2, DADD * Add an example with short-circuit conditionals * Update the short circuit conditional example * Acos and Asin * Add the Apache-2 license text to the header * Add the jvm flags to enable JVMCI to scalatest * Use SparkException instead of Exception * syntax/scalastyle edits * Fix the brace style * ScalaUDF.scala:1065, remove braces from case statement * Add the support for the rest of *LOAD_* and *STORE_* instructions * Add support for *MUL and the rest of *ADD and *SUB * Adding tags to tests in UDFSuite.scala for individual test execution support * add OpcodeSuite.scala for UDF to expression catalyst builder testing * minor edits to IFEQ test * Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL * Adding script to generate OpcodeSuite reports, minor suite edits * Add support for more math ops abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt * expanding OpcodeSuite to cover more math, casting, and loading ops * Replace SharedSQLContext with SharedSparkSession in a test * use foldLeft instead of deprecated /: * Use SerializedLambda and javassist instead of JVMCI * Fix a comment * Move codes from spark to rapids repo * Add UDF compiler implementations * Add UDF compiler implementations * Update related docs Signed-off-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: Nicholas Edelman <nedelman@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: tester <nedelman@nvidia.com> Co-authored-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
* Add javassist as a dependency * Simple implementation of the udf -> expression translation * Use the udf -> expressions translation when possible * Replace javassist with jvmci and broaden opcode coverage * Add support for ISUB. * Restructure the code. * Add an example for UDF to expr conversion * Correctly handle loads and stores with IFLT and GOTO * Remove combineExpr As Scala lambdas cannot have multiple returns, we can simply return the expression from the return instruction without worrying about combining multiple return values. * Fix a compile error and a traversal error * Simplify condition expressions. * Remove unnecessary import lines * Make CatalystExpressionBuilder a case class * Update connectBasicBlocks to detect implicit fallthrough edges * Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements * Correctly update and propagate the entry condition for basic block * DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ * Add more condition simplification rules * Change the traversal order with the conditional branches. The false path is now traversed before the true path so we can generate the conditions in the same evaluation order in the source. * Simplify cond repeatedly until it can't be simplified. Also add somem more simplification rules. * Rename simplifyCond to simplifyExpr * DLOAD_2, DSTORE_2, DADD * Add an example with short-circuit conditionals * Update the short circuit conditional example * Acos and Asin * Add the Apache-2 license text to the header * Add the jvm flags to enable JVMCI to scalatest * Use SparkException instead of Exception * syntax/scalastyle edits * Fix the brace style * ScalaUDF.scala:1065, remove braces from case statement * Add the support for the rest of *LOAD_* and *STORE_* instructions * Add support for *MUL and the rest of *ADD and *SUB * Adding tags to tests in UDFSuite.scala for individual test execution support * add OpcodeSuite.scala for UDF to expression catalyst builder testing * minor edits to IFEQ test * Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL * Adding script to generate OpcodeSuite reports, minor suite edits * Add support for more math ops abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt * expanding OpcodeSuite to cover more math, casting, and loading ops * Replace SharedSQLContext with SharedSparkSession in a test * use foldLeft instead of deprecated /: * Use SerializedLambda and javassist instead of JVMCI * Fix a comment * Move codes from spark to rapids repo * Add UDF compiler implementations * Add UDF compiler implementations * Update related docs Signed-off-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: Nicholas Edelman <nedelman@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: tester <nedelman@nvidia.com> Co-authored-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
* Update cifar10 for 2.1 update best_acc migrating to job configs use number of gpus in resource test admin api scripts run data splitting on the fly as server-side handler add data download step to readme load valid set only at run time (validate function) submit jobs using admin api build in check for admin login explicitly point to model when using FedOpt train for full rounds make alpha configurable automatically refactor create dataset call submit jobs with different alpha values submit jobs using absolute path update readme update plot script update readme doc urls, set max_jobs=2 * remove unused variables from shell scripts Co-authored-by: YuanTingHsieh <yuantinghsieh@gmail.com>
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Signed-off-by: Allen Xu allxu@nvidia.com
Co-authored-by: Sean Lee selee@nvidia.com
Co-authored-by: Nicholas Edelman nedelman@nvidia.com
Co-authored-by: Alessandro Bellina abellina@nvidia.com