Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-register UDF extention when main plugin is set #1102

Merged
merged 3 commits into from
Nov 16, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,10 +290,10 @@ Casting from string to timestamp currently has the following limitations.
Only timezone 'Z' (UTC) is supported. Casting unsupported formats will result in null values.

## UDF to Catalyst Expressions
To speedup the process of UDF, spark-rapids introduces a udf-compiler extension to translate UDFs to Catalyst expressions.
To speedup the process of UDF, spark-rapids introduces a udf-compiler extension to translate UDFs to Catalyst expressions. This compiler will be injected automatically to spark extensions by setting `spark.plugins=com.nvidia.spark.SQLPlugin` and is disabled by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need this line. I don't think the user cares about the internal details of how we implemented it, just how to enable it, which is covered by the change below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, reasonable to me. Updated.


To enable this operation on the GPU, set
[`spark.rapids.sql.udfCompiler.enabled`](configs.md#sql.udfCompiler.enabled) to `true`, and `spark.sql.extensions=com.nvidia.spark.udf.Plugin`.
[`spark.rapids.sql.udfCompiler.enabled`](configs.md#sql.udfCompiler.enabled) to `true`.

However, Spark may produce different results for a compiled udf and the non-compiled. For example: a udf of `x/y` where `y` happens to be `0`, the compiled catalyst expressions will return `NULL` while the original udf would fail the entire job with a `java.lang.ArithmeticException: / by zero`

Expand Down
2 changes: 1 addition & 1 deletion udf-compiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ How to run
----------

The UDF compiler is included in the rapids-4-spark jar that is produced by the `dist` maven project. Set up your cluster to run the RAPIDS Accelerator for Apache Spark
and set the spark config `spark.sql.extensions` to include `com.nvidia.spark.udf.Plugin`.
and this udf plugin will be automatically injected to spark extensions when `com.nvidia.spark.SQLPlugin` is set.

The plugin is still disabled by default and you will need to set `spark.rapids.sql.udfCompiler.enabled` to `true` to enable it.