Add UDF compiler implementations #497

wjxiz1992 · 2020-08-03T10:06:39Z

Add UDF compiler implementations
Add compatibility doc for the compiler

Signed-off-by: Allen Xu allxu@nvidia.com

Co-authored-by: Sean Lee selee@nvidia.com
Co-authored-by: Nicholas Edelman nedelman@nvidia.com
Co-authored-by: Alessandro Bellina abellina@nvidia.com

wjxiz1992 · 2020-08-03T10:07:58Z

Hi, should we organize this change into smaller ones to make it more reviewable?
Or we need a detailed description for this compiler?

abellina · 2020-08-03T12:54:39Z

I think the tests could be done separately, it is nearly 1/2 of the PR. It's hard to separate the other code, but will wait to see what others think.

udf-compiler/src/main/scala/com/nvidia/spark/udf/CFG.scala

udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala

anfeng · 2020-08-03T20:45:47Z

@seanprime7 please have your team review this PR.

wjxiz1992 · 2020-08-05T10:39:39Z

@seanprime7 Please check the history, not sure if it's ok.

abellina

For me the commit history needs to be cleaned up more. I think we should only have UDF commits in this history.

abellina · 2020-08-05T19:30:43Z

So FYI, all commits are going to get squashed into a single one when we merge. I think the above looks better, though I am not sure what happened with this commit: 4499487. I think that's OK, the end result is fine.

abellina · 2020-08-05T19:33:34Z

@wjxiz1992 lets move forward here and address comments, and we can turn this PR into a non-draft.

udf-compiler/src/main/scala/com/nvidia/spark/udf/GpuScalaUDF.scala

docs/compatibility.md

abellina · 2020-08-06T16:52:58Z

docs/compatibility.md

@@ -277,5 +277,73 @@ However, Spark may produce different results for a compiled udf and the non-comp

 When translating UDFs to Catalyst expressions, the supported UDF functions are limited:

-| Operand type                                                | Operation |
-| ------------------------------------------------------------------- | ------------------|
+| Operand type             | Operation                                                |


I'd prefer it if these were multiple tables, one per "Operand type", with a description column. The description column could be redundant in most cases, but it may allow you to have a place to put extra limitations or issues for each operation supported.

Arithmetic Unary

Operation Description

+x Unary plus

-x Unary minus

Arithmetic Binary

...

I'm wondering if we should put those tables into another md doc. As they are related to only the udf plugin and the table can be very long.

@wjxiz1992 Having another page for the UDF compiler may be what we end up doing, but I wouldn't worry about doing that yet. I would just continue where you are and try to follow the formatting in that page (https://nvidia.github.io/spark-rapids/docs/compatibility.html). This is very much relevant to the compatibility section.

abellina · 2020-08-07T22:10:16Z

@wjxiz1992 pls run mvn verify, I found some styling warnings. The solution for these warnings is to turn any [[reference]] to a reference with back-tick `reference`.

model contains 19 documentable templates
rapids-plugin-4-spark/udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala:28: warning: Could not find any member to link for "function".
/**
^
rapids-plugin-4-spark/udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala:28: warning: Could not find any member to link for "Expression".
/**
^
two warnings found

abellina · 2020-08-07T22:24:22Z

@wjxiz1992 please don't forget to update the README.md in the udf-plugin directory s.t. the jar includes the scala version: https://github.com/NVIDIA/spark-rapids/tree/branch-0.2/udf-compiler

abellina · 2020-08-07T22:31:58Z

@wjxiz1992 @seanprime7 I've diffed this code with what I had before. I think that it looks good from my perspective. We'll have to clean up some things like not requiring spark.sql.extensions, and add more places to replace UDFs, but this is a good place to start. It is disabled off by default, but things work if you setup the jar as documented in the README and also the sql extension:

--conf spark.sql.extensions=com.nvidia.spark.udf.Plugin

And then enable it spark.rapids.sql.udfCompiler.enabled. You can also toggle the enabled flag and the UDF compiler stops being invoked.

@seanprime7 please take a look.

abellina

If @seanprime7 is OK with this change I think I am ok with it. Pending some of the comments above. We can improve some things later, but this does plug the UDF compiler at the Project level. I think we can work on improving it in follow up PRs and issues.

…support

abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt

* Add UDF compiler implementations * Update related docs Signed-off-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: Nicholas Edelman <nedelman@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com>

wjxiz1992 · 2020-08-12T01:34:12Z

build

* Add javassist as a dependency * Simple implementation of the udf -> expression translation * Use the udf -> expressions translation when possible * Replace javassist with jvmci and broaden opcode coverage * Add support for ISUB. * Restructure the code. * Add an example for UDF to expr conversion * Correctly handle loads and stores with IFLT and GOTO * Remove combineExpr As Scala lambdas cannot have multiple returns, we can simply return the expression from the return instruction without worrying about combining multiple return values. * Fix a compile error and a traversal error * Simplify condition expressions. * Remove unnecessary import lines * Make CatalystExpressionBuilder a case class * Update connectBasicBlocks to detect implicit fallthrough edges * Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements * Correctly update and propagate the entry condition for basic block * DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ * Add more condition simplification rules * Change the traversal order with the conditional branches. The false path is now traversed before the true path so we can generate the conditions in the same evaluation order in the source. * Simplify cond repeatedly until it can't be simplified. Also add somem more simplification rules. * Rename simplifyCond to simplifyExpr * DLOAD_2, DSTORE_2, DADD * Add an example with short-circuit conditionals * Update the short circuit conditional example * Acos and Asin * Add the Apache-2 license text to the header * Add the jvm flags to enable JVMCI to scalatest * Use SparkException instead of Exception * syntax/scalastyle edits * Fix the brace style * ScalaUDF.scala:1065, remove braces from case statement * Add the support for the rest of *LOAD_* and *STORE_* instructions * Add support for *MUL and the rest of *ADD and *SUB * Adding tags to tests in UDFSuite.scala for individual test execution support * add OpcodeSuite.scala for UDF to expression catalyst builder testing * minor edits to IFEQ test * Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL * Adding script to generate OpcodeSuite reports, minor suite edits * Add support for more math ops abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt * expanding OpcodeSuite to cover more math, casting, and loading ops * Replace SharedSQLContext with SharedSparkSession in a test * use foldLeft instead of deprecated /: * Use SerializedLambda and javassist instead of JVMCI * Fix a comment * Move codes from spark to rapids repo * Add UDF compiler implementations * Add UDF compiler implementations * Update related docs Signed-off-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: Nicholas Edelman <nedelman@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com> Co-authored-by: Sean Lee <selee@nvidia.com> Co-authored-by: tester <nedelman@nvidia.com> Co-authored-by: Allen Xu <allxu@nvidia.com> Co-authored-by: Alessandro Bellina <abellina@nvidia.com>

* Update cifar10 for 2.1 update best_acc migrating to job configs use number of gpus in resource test admin api scripts run data splitting on the fly as server-side handler add data download step to readme load valid set only at run time (validate function) submit jobs using admin api build in check for admin login explicitly point to model when using FedOpt train for full rounds make alpha configurable automatically refactor create dataset call submit jobs with different alpha values submit jobs using absolute path update readme update plot script update readme doc urls, set max_jobs=2 * remove unused variables from shell scripts Co-authored-by: YuanTingHsieh <yuantinghsieh@gmail.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

abellina reviewed Aug 3, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/CFG.scala Outdated Show resolved Hide resolved

abellina reviewed Aug 3, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/CFG.scala Show resolved Hide resolved

abellina reviewed Aug 3, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala Outdated Show resolved Hide resolved

abellina reviewed Aug 3, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/CatalystExpressionBuilder.scala Outdated Show resolved Hide resolved

abellina requested changes Aug 3, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/LambdaReflection.scala Outdated Show resolved Hide resolved

wjxiz1992 requested a review from seanprime7 August 3, 2020 13:29

sameerz added the feature request New feature or request label Aug 3, 2020

seanprime7 requested a review from abellina August 3, 2020 22:43

wjxiz1992 force-pushed the udf-compiler-impl branch 2 times, most recently from 2eb7b62 to 941b63c Compare August 5, 2020 10:23

wjxiz1992 force-pushed the udf-compiler-impl branch from 941b63c to 8eb9b03 Compare August 5, 2020 10:46

abellina requested changes Aug 5, 2020

View reviewed changes

wjxiz1992 force-pushed the udf-compiler-impl branch from c68a5cc to d4965e5 Compare August 5, 2020 14:49

wjxiz1992 force-pushed the udf-compiler-impl branch from d4965e5 to c4eee7e Compare August 6, 2020 08:14

wjxiz1992 marked this pull request as ready for review August 6, 2020 08:15

abellina reviewed Aug 6, 2020

View reviewed changes

udf-compiler/src/main/scala/com/nvidia/spark/udf/GpuScalaUDF.scala Outdated Show resolved Hide resolved

abellina reviewed Aug 6, 2020

View reviewed changes

docs/compatibility.md Outdated Show resolved Hide resolved

abellina reviewed Aug 6, 2020

View reviewed changes

abellina previously approved these changes Aug 7, 2020

View reviewed changes

wjxiz1992 dismissed abellina’s stale review via 790f986 August 11, 2020 08:01

Add javassist as a dependency

41e61bc

seanprime7 and others added 19 commits August 11, 2020 16:07

Use SparkException instead of Exception

acc17ed

syntax/scalastyle edits

f0150e9

Fix the brace style

5ee3801

ScalaUDF.scala:1065, remove braces from case statement

d256dc5

Add the support for the rest of *LOAD_* and *STORE_* instructions

766638b

Add support for *MUL and the rest of *ADD and *SUB

239c540

Adding tags to tests in UDFSuite.scala for individual test execution …

fdbb089

…support

add OpcodeSuite.scala for UDF to expression catalyst builder testing

fb7ac69

minor edits to IFEQ test

d429bf2

Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL

2d43aff

Adding script to generate OpcodeSuite reports, minor suite edits

2f3f82a

Add support for more math ops

2eec237

abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log, log10, and sqrt

expanding OpcodeSuite to cover more math, casting, and loading ops

76ef25c

Replace SharedSQLContext with SharedSparkSession in a test

174bbbe

use foldLeft instead of deprecated /:

b374e83

Use SerializedLambda and javassist instead of JVMCI

9ba0a8e

Fix a comment

c8601d9

Move codes from spark to rapids repo

a7b1059

wjxiz1992 force-pushed the udf-compiler-impl branch from 790f986 to 83f92eb Compare August 11, 2020 08:10

seanprime7 approved these changes Aug 11, 2020

View reviewed changes

razajafri merged commit 9d23835 into NVIDIA:branch-0.2 Aug 12, 2020

wjxiz1992 deleted the udf-compiler-impl branch December 18, 2020 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UDF compiler implementations #497

Add UDF compiler implementations #497

wjxiz1992 commented Aug 3, 2020 •

edited

Loading

wjxiz1992 commented Aug 3, 2020 •

edited

Loading

abellina commented Aug 3, 2020

anfeng commented Aug 3, 2020

wjxiz1992 commented Aug 5, 2020

abellina left a comment

abellina commented Aug 5, 2020

abellina commented Aug 5, 2020

abellina Aug 6, 2020

wjxiz1992 Aug 7, 2020

abellina Aug 7, 2020

abellina commented Aug 7, 2020

abellina commented Aug 7, 2020

abellina commented Aug 7, 2020 •

edited

Loading

abellina left a comment •

edited

Loading

wjxiz1992 commented Aug 12, 2020

Add UDF compiler implementations #497

Add UDF compiler implementations #497

Conversation

wjxiz1992 commented Aug 3, 2020 • edited Loading

wjxiz1992 commented Aug 3, 2020 • edited Loading

abellina commented Aug 3, 2020

anfeng commented Aug 3, 2020

wjxiz1992 commented Aug 5, 2020

abellina left a comment

Choose a reason for hiding this comment

abellina commented Aug 5, 2020

abellina commented Aug 5, 2020

abellina Aug 6, 2020

Choose a reason for hiding this comment

Arithmetic Unary

Arithmetic Binary

wjxiz1992 Aug 7, 2020

Choose a reason for hiding this comment

abellina Aug 7, 2020

Choose a reason for hiding this comment

abellina commented Aug 7, 2020

abellina commented Aug 7, 2020

abellina commented Aug 7, 2020 • edited Loading

abellina left a comment • edited Loading

Choose a reason for hiding this comment

wjxiz1992 commented Aug 12, 2020

wjxiz1992 commented Aug 3, 2020 •

edited

Loading

wjxiz1992 commented Aug 3, 2020 •

edited

Loading

abellina commented Aug 7, 2020 •

edited

Loading

abellina left a comment •

edited

Loading