changes to support db 13.3+ #716

eordentlich · 2024-08-23T00:50:40Z

Also adds db version as option to db benchmark script and a gpu only option (i.e. no spark rapids plugin).

Plus some other misc. updates (e.g. to logistic reg. notebooks, spark-rapids versions).

Includes a temporary patch till NVIDIA/spark-rapids#10770 is fixed.

eordentlich · 2024-08-23T00:50:46Z

build

lijinf2

Looks good. Minor comments.

lijinf2 · 2024-08-23T22:33:10Z

python/benchmark/benchmark/bench_kmeans.py

+                dim = len(cluster_centers[0])
+                # inject unsupported expr (slice) that is essentially a noop
+                df_for_scoring = df_for_scoring.select(
+                    F.slice(feature_col, 1, dim).alias(feature_col), output_col


Interesting. Any intuition as to why slice can get the hanging resolved?

Yes. It is an unsupported expression in spark-rapids so falls back to cpu and injects columnartorow and rowtocolumnar transformations which do some batching that doesn't (but should after patching) happen otherwise.

lijinf2 · 2024-08-23T22:36:43Z

python/benchmark/databricks/README.md


-6. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.
+2. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.


Do we keep item number 5, 6 or replace with 1, 2?

Interesting. Didn't intend to change the numbers, will revert.

lijinf2 · 2024-08-23T22:45:53Z

python/benchmark/databricks/gpu_etl_cluster_spec.sh

+        "spark.sql.execution.arrow.pyspark.enabled": "true",
+        "spark.sql.files.maxPartitionBytes": "2000000000000",
+        "spark.databricks.delta.optimizeWrite.enabled": "false",
+        "spark.rapids.sql.concurrentGpuTasks": "2"


While we keep this here, do we need to remove it from gpu_cluster_spec.sh (the one without GPU ETL)?

Yes. Will delete there. Good catch.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2024-08-23T23:34:01Z

build

lijinf2 reviewed Aug 23, 2024

View reviewed changes

eordentlich added 3 commits August 23, 2024 16:19

changes to support db 13.3+

89f04cb

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

cleanup and add 15.4 as db version option

0b47cf6

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

address comments

9c85bd1

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich force-pushed the eo_db_13.3_patch branch from 40b9e84 to 9c85bd1 Compare August 23, 2024 23:20

lijinf2 approved these changes Aug 23, 2024

View reviewed changes

eordentlich merged commit 36a0979 into NVIDIA:branch-24.08 Aug 25, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes to support db 13.3+ #716

changes to support db 13.3+ #716

eordentlich commented Aug 23, 2024 •

edited

Loading

eordentlich commented Aug 23, 2024

lijinf2 left a comment

lijinf2 Aug 23, 2024

eordentlich Aug 23, 2024

lijinf2 Aug 23, 2024

eordentlich Aug 23, 2024

lijinf2 Aug 23, 2024

eordentlich Aug 23, 2024

eordentlich commented Aug 23, 2024


		6. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.
		2. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.

changes to support db 13.3+ #716

changes to support db 13.3+ #716

Conversation

eordentlich commented Aug 23, 2024 • edited Loading

eordentlich commented Aug 23, 2024

lijinf2 left a comment

Choose a reason for hiding this comment

lijinf2 Aug 23, 2024

Choose a reason for hiding this comment

eordentlich Aug 23, 2024

Choose a reason for hiding this comment

lijinf2 Aug 23, 2024

Choose a reason for hiding this comment

eordentlich Aug 23, 2024

Choose a reason for hiding this comment

lijinf2 Aug 23, 2024

Choose a reason for hiding this comment

eordentlich Aug 23, 2024

Choose a reason for hiding this comment

eordentlich commented Aug 23, 2024

eordentlich commented Aug 23, 2024 •

edited

Loading