Skip to content

Commit

Permalink
Add UDF compiler implementations (#497)
Browse files Browse the repository at this point in the history
* Add javassist as a dependency

* Simple implementation of the udf -> expression translation

* Use the udf -> expressions translation when possible

* Replace javassist with jvmci and broaden opcode coverage

* Add support for ISUB.

* Restructure the code.

* Add an example for UDF to expr conversion

* Correctly handle loads and stores with IFLT and GOTO

* Remove combineExpr

As Scala lambdas cannot have multiple returns, we can simply return the
expression from the return instruction without worrying about combining
multiple return values.

* Fix a compile error and a traversal error

* Simplify condition expressions.

* Remove unnecessary import lines

* Make CatalystExpressionBuilder a case class

* Update connectBasicBlocks to detect implicit fallthrough edges

* Remove the unnecessary LocalVariables class and allow adding conditionals to stack elements

* Correctly update and propagate the entry condition for basic block

* DCMPG, DCONST_0, DCONST_1, *RETURN, IFLE, IFGT, IFGE, IFEQ

* Add more condition simplification rules

* Change the traversal order with the conditional branches.

The false path is now traversed before the true path so we can generate
the conditions in the same evaluation order in the source.

* Simplify cond repeatedly until it can't be simplified.

Also add somem more simplification rules.

* Rename simplifyCond to simplifyExpr

* DLOAD_2, DSTORE_2, DADD

* Add an example with short-circuit conditionals

* Update the short circuit conditional example

* Acos and Asin

* Add the Apache-2 license text to the header

* Add the jvm flags to enable JVMCI to scalatest

* Use SparkException instead of Exception

* syntax/scalastyle edits

* Fix the brace style

* ScalaUDF.scala:1065, remove braces from case statement

* Add the support for the rest of *LOAD_* and *STORE_* instructions

* Add support for *MUL and the rest of *ADD and *SUB

* Adding tags to tests in UDFSuite.scala for individual test execution support

* add OpcodeSuite.scala for UDF to expression catalyst builder testing

* minor edits to IFEQ test

* Add support for ?LOAD, ?CONST_<n>, DUP, ?2?, IFNE, IFNULL, and IFNONNULL

* Adding script to generate OpcodeSuite reports, minor suite edits

* Add support for more math ops

abs, atan, cos, cosh, sin, tan, tanh, ceil, floor, exp, log,
log10, and sqrt

* expanding OpcodeSuite to cover more math, casting, and loading ops

* Replace SharedSQLContext with SharedSparkSession in a test

* use foldLeft instead of deprecated /:

* Use SerializedLambda and javassist instead of JVMCI

* Fix a comment

* Move codes from spark to rapids repo

* Add UDF compiler implementations

* Add UDF compiler implementations
* Update related docs

Signed-off-by: Allen Xu <allxu@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: Nicholas Edelman <nedelman@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>

Co-authored-by: Sean Lee <selee@nvidia.com>
Co-authored-by: tester <nedelman@nvidia.com>
Co-authored-by: Allen Xu <allxu@nvidia.com>
Co-authored-by: Alessandro Bellina <abellina@nvidia.com>
  • Loading branch information
5 people authored Aug 12, 2020
1 parent 3985f67 commit 9d23835
Show file tree
Hide file tree
Showing 9 changed files with 1,743 additions and 9 deletions.
72 changes: 70 additions & 2 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,5 +277,73 @@ However, Spark may produce different results for a compiled udf and the non-comp

When translating UDFs to Catalyst expressions, the supported UDF functions are limited:

| Operand type | Operation |
| ------------------------------------------------------------------- | ------------------|
| Operand type | Operation |
| -------------------------| ---------------------------------------------------------|
| Arithmetic Unary | +x |
| | -x |
| Arithmetic Binary | lhs + rhs |
| | lhs - rhs |
| | lhs * rhs |
| | lhs / rhs |
| | lhs % rhs |
| Logical | lhs && rhs |
| | lhs &#124;&#124; rhs |
| | !x |
| Equality and Relational | lhs == rhs |
| | lhs < rhs |
| | lhs <= rhs |
| | lhs > rhs |
| | lhs >= rhs |
| Bitwise | lhs & rhs |
| | lhs &#124; rhs |
| | lhs ^ rhs |
| | ~x |
| | lhs << rhs |
| | lhs >> rhs |
| | lhs >>> rhs |
| Conditional | if |
| | case |
| Math | abs(x) |
| | cos(x) |
| | acos(x) |
| | asin(x) |
| | tan(x) |
| | atan(x) |
| | tanh(x) |
| | cosh(x) |
| | ceil(x) |
| | floor(x) |
| | exp(x) |
| | log(x) |
| | log10(x) |
| | sqrt(x) |
| Type Cast | * |
| String | lhs + rhs |
| | lhs.equalsIgnoreCase(String rhs) |
| | x.toUpperCase() |
| | x.trim() |
| | x.substring(int begin) |
| | x.substring(int begin, int end) |
| | x.replace(char oldChar, char newChar) |
| | x.replace(CharSequence target, CharSequence replacement) |
| | x.startsWith(String prefix) |
| | lhs.equals(Object rhs) |
| | x.toLowerCase() |
| | x.length() |
| | x.endsWith(String suffix) |
| | lhs.concat(String rhs) |
| | x.isEmpty() |
| | String.valueOf(boolean b) |
| | String.valueOf(char c) |
| | String.valueOf(double d) |
| | String.valueOf(float f) |
| | String.valueOf(int i) |
| | String.valueOf(long l) |
| | x.contains(CharSequence s) |
| | x.indexOf(String str) |
| | x.indexOf(String str, int fromIndex) |
| |x.replaceAll(String regex, String replacement) |
| |x.split(String regex) |
| |x.split(String regex, int limit) |
| |x.getBytes() |
| |x.getBytes(String charsetName) |
2 changes: 1 addition & 1 deletion udf-compiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ export SPARK_HOME=[your spark distribution directory]
export JARS=[path to cudf 0.15 jar]
$SPARK_HOME/bin/spark-shell \
--jars $JARS/cudf-0.15-SNAPSHOT-cuda10-1.jar,udf-compiler/target/rapids-4-spark-udf-0.2.0-SNAPSHOT.jar,sql-plugin/target/rapids-4-spark-sql_2.12-0.2.0-SNAPSHOT.jar \
--jars $JARS/cudf-0.15-SNAPSHOT-cuda10-1.jar,udf-compiler/target/rapids-4-spark-udf_2.12-0.2.0-SNAPSHOT.jar,sql-plugin/target/rapids-4-spark-sql_2.12-0.2.0-SNAPSHOT.jar \
--conf spark.sql.extensions="com.nvidia.spark.SQLPlugin,com.nvidia.spark.udf.Plugin"
```
Loading

0 comments on commit 9d23835

Please sign in to comment.