Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UDF compiler implementations #497

Merged
merged 46 commits into from
Aug 12, 2020

Conversation

wjxiz1992
Copy link
Collaborator

@wjxiz1992 wjxiz1992 commented Aug 3, 2020

  • Add UDF compiler implementations
  • Add compatibility doc for the compiler

Signed-off-by: Allen Xu allxu@nvidia.com

Co-authored-by: Sean Lee selee@nvidia.com
Co-authored-by: Nicholas Edelman nedelman@nvidia.com
Co-authored-by: Alessandro Bellina abellina@nvidia.com

@wjxiz1992
Copy link
Collaborator Author

wjxiz1992 commented Aug 3, 2020

Hi, should we organize this change into smaller ones to make it more reviewable?
Or we need a detailed description for this compiler?

@abellina
Copy link
Collaborator

abellina commented Aug 3, 2020

I think the tests could be done separately, it is nearly 1/2 of the PR. It's hard to separate the other code, but will wait to see what others think.

@sameerz sameerz added the feature request New feature or request label Aug 3, 2020
@anfeng
Copy link

anfeng commented Aug 3, 2020

@seanprime7 please have your team review this PR.

@wjxiz1992 wjxiz1992 force-pushed the udf-compiler-impl branch 2 times, most recently from 2eb7b62 to 941b63c Compare August 5, 2020 10:23
@wjxiz1992
Copy link
Collaborator Author

@seanprime7 Please check the history, not sure if it's ok.

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the commit history needs to be cleaned up more. I think we should only have UDF commits in this history.

@abellina
Copy link
Collaborator

abellina commented Aug 5, 2020

So FYI, all commits are going to get squashed into a single one when we merge. I think the above looks better, though I am not sure what happened with this commit: 4499487. I think that's OK, the end result is fine.

@abellina
Copy link
Collaborator

abellina commented Aug 5, 2020

@wjxiz1992 lets move forward here and address comments, and we can turn this PR into a non-draft.

@wjxiz1992 wjxiz1992 marked this pull request as ready for review August 6, 2020 08:15
docs/compatibility.md Outdated Show resolved Hide resolved
@@ -277,5 +277,73 @@ However, Spark may produce different results for a compiled udf and the non-comp

When translating UDFs to Catalyst expressions, the supported UDF functions are limited:

| Operand type | Operation |
| ------------------------------------------------------------------- | ------------------|
| Operand type | Operation |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it if these were multiple tables, one per "Operand type", with a description column. The description column could be redundant in most cases, but it may allow you to have a place to put extra limitations or issues for each operation supported.

Arithmetic Unary

Operation Description
+x Unary plus
-x Unary minus

Arithmetic Binary

...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should put those tables into another md doc. As they are related to only the udf plugin and the table can be very long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjxiz1992 Having another page for the UDF compiler may be what we end up doing, but I wouldn't worry about doing that yet. I would just continue where you are and try to follow the formatting in that page (https://nvidia.github.io/spark-rapids/docs/compatibility.html). This is very much relevant to the compatibility section.

@abellina
Copy link
Collaborator

abellina commented Aug 7, 2020

@wjxiz1992 pls run mvn verify, I found some styling warnings. The solution for these warnings is to turn any [[reference]] to a reference with back-tick `reference`.

model contains 19 documentable templates
rapids-plugin-4-spark/udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala:28: warning: Could not find any member to link for "function".
/**
^
rapids-plugin-4-spark/udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala:28: warning: Could not find any member to link for "Expression".
/**
^
two warnings found

@abellina
Copy link
Collaborator

abellina commented Aug 7, 2020

@wjxiz1992 please don't forget to update the README.md in the udf-plugin directory s.t. the jar includes the scala version: https://github.com/NVIDIA/spark-rapids/tree/branch-0.2/udf-compiler

@abellina
Copy link
Collaborator

abellina commented Aug 7, 2020

@wjxiz1992 @seanprime7 I've diffed this code with what I had before. I think that it looks good from my perspective. We'll have to clean up some things like not requiring spark.sql.extensions, and add more places to replace UDFs, but this is a good place to start. It is disabled off by default, but things work if you setup the jar as documented in the README and also the sql extension:

--conf spark.sql.extensions=com.nvidia.spark.udf.Plugin

And then enable it spark.rapids.sql.udfCompiler.enabled. You can also toggle the enabled flag and the UDF compiler stops being invoked.

@seanprime7 please take a look.

abellina
abellina previously approved these changes Aug 7, 2020
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @seanprime7 is OK with this change I think I am ok with it. Pending some of the comments above. We can improve some things later, but this does plug the UDF compiler at the Project level. I think we can work on improving it in follow up PRs and issues.

@wjxiz1992
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 9d23835 into NVIDIA:branch-0.2 Aug 12, 2020
@wjxiz1992 wjxiz1992 deleted the udf-compiler-impl branch December 18, 2020 03:48
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add javassist as a dependency

* Simple implementation of the udf -> expression translation

* Use the udf -> expressions translation when possible

* Replace javassist with jvmci and broaden opcode coverage

* Add support for ISUB.

* Restructure the code.

* Add an example for UDF to expr conversion

* Correctly handle loads and stores with IFLT and GOTO

* Remove combineExpr

As Scala lambdas cannot have multiple returns, we can simply return the
expression from the return instruction without worrying about combining
multiple return values.

* Fix a compile error and a traversal error

* Simplify condition expressions.

* Remove unnecessary import lines

* Make CatalystExpressionBuilder a case class

* Update connectBasicBlocks to detect implicit fallthrough edges

* Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements

* Correctly update and propagate the entry condition for basic block

* DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ

* Add more condition simplification rules

* Change the traversal order with the conditional branches.

The false path is now traversed before the true path so we can generate
the conditions in the same evaluation order in the source.

* Simplify cond repeatedly until it can't be simplified.

Also add somem more simplification rules.

* Rename simplifyCond to simplifyExpr

* DLOAD_2, DSTORE_2, DADD

* Add an example with short-circuit conditionals

* Update the short circuit conditional example

* Acos and Asin

* Add the Apache-2 license text to the header

* Add the jvm flags to enable JVMCI to scalatest

* Use SparkException instead of Exception

* syntax/scalastyle edits

* Fix the brace style

* ScalaUDF.scala:1065, remove braces from case statement

* Add the support for the rest of *LOAD_* and *STORE_* instructions

* Add support for *MUL and the rest of *ADD and *SUB

* Adding tags to tests in UDFSuite.scala for individual test execution support

* add OpcodeSuite.scala for UDF to expression catalyst builder testing

* minor edits to IFEQ test

* Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL

* Adding script to generate OpcodeSuite reports, minor suite edits

* Add support for more math ops

abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log,
log10, and sqrt

* expanding OpcodeSuite to cover more math, casting, and loading ops

* Replace SharedSQLContext with SharedSparkSession in a test

* use foldLeft instead of deprecated /:

* Use SerializedLambda and javassist instead of JVMCI

* Fix a comment

* Move codes from spark to rapids repo

* Add UDF compiler implementations

* Add UDF compiler implementations
* Update related docs

Signed-off-by: Allen Xu <allxu@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: Nicholas Edelman <nedelman@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: tester <nedelman@nvidia.com>
Co-authored-by: Allen Xu <allxu@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add javassist as a dependency

* Simple implementation of the udf -> expression translation

* Use the udf -> expressions translation when possible

* Replace javassist with jvmci and broaden opcode coverage

* Add support for ISUB.

* Restructure the code.

* Add an example for UDF to expr conversion

* Correctly handle loads and stores with IFLT and GOTO

* Remove combineExpr

As Scala lambdas cannot have multiple returns, we can simply return the
expression from the return instruction without worrying about combining
multiple return values.

* Fix a compile error and a traversal error

* Simplify condition expressions.

* Remove unnecessary import lines

* Make CatalystExpressionBuilder a case class

* Update connectBasicBlocks to detect implicit fallthrough edges

* Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements

* Correctly update and propagate the entry condition for basic block

* DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ

* Add more condition simplification rules

* Change the traversal order with the conditional branches.

The false path is now traversed before the true path so we can generate
the conditions in the same evaluation order in the source.

* Simplify cond repeatedly until it can't be simplified.

Also add somem more simplification rules.

* Rename simplifyCond to simplifyExpr

* DLOAD_2, DSTORE_2, DADD

* Add an example with short-circuit conditionals

* Update the short circuit conditional example

* Acos and Asin

* Add the Apache-2 license text to the header

* Add the jvm flags to enable JVMCI to scalatest

* Use SparkException instead of Exception

* syntax/scalastyle edits

* Fix the brace style

* ScalaUDF.scala:1065, remove braces from case statement

* Add the support for the rest of *LOAD_* and *STORE_* instructions

* Add support for *MUL and the rest of *ADD and *SUB

* Adding tags to tests in UDFSuite.scala for individual test execution support

* add OpcodeSuite.scala for UDF to expression catalyst builder testing

* minor edits to IFEQ test

* Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL

* Adding script to generate OpcodeSuite reports, minor suite edits

* Add support for more math ops

abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log,
log10, and sqrt

* expanding OpcodeSuite to cover more math, casting, and loading ops

* Replace SharedSQLContext with SharedSparkSession in a test

* use foldLeft instead of deprecated /:

* Use SerializedLambda and javassist instead of JVMCI

* Fix a comment

* Move codes from spark to rapids repo

* Add UDF compiler implementations

* Add UDF compiler implementations
* Update related docs

Signed-off-by: Allen Xu <allxu@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: Nicholas Edelman <nedelman@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: tester <nedelman@nvidia.com>
Co-authored-by: Allen Xu <allxu@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
pxLi pushed a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022
* Update cifar10 for 2.1

update best_acc

migrating to job configs

use number of gpus in resource

test admin api scripts

run data splitting on the fly as server-side handler

add data download step to readme

load valid set only at run time (validate function)

submit jobs using admin api

build in check for admin login

explicitly point to model when using FedOpt

train for full rounds

make alpha configurable automatically

refactor create dataset call

submit jobs with different alpha values

submit jobs using absolute path

update readme

update plot script

update readme doc urls, set max_jobs=2

* remove unused variables from shell scripts

Co-authored-by: YuanTingHsieh <yuantinghsieh@gmail.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants