Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes to support db 13.3+ #716

Merged
merged 3 commits into from
Aug 25, 2024

Conversation

eordentlich
Copy link
Collaborator

@eordentlich eordentlich commented Aug 23, 2024

Also adds db version as option to db benchmark script and a gpu only option (i.e. no spark rapids plugin).

Plus some other misc. updates (e.g. to logistic reg. notebooks, spark-rapids versions).

Includes a temporary patch till NVIDIA/spark-rapids#10770 is fixed.

@eordentlich
Copy link
Collaborator Author

build

Copy link
Collaborator

@lijinf2 lijinf2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor comments.

dim = len(cluster_centers[0])
# inject unsupported expr (slice) that is essentially a noop
df_for_scoring = df_for_scoring.select(
F.slice(feature_col, 1, dim).alias(feature_col), output_col
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Any intuition as to why slice can get the hanging resolved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is an unsupported expression in spark-rapids so falls back to cpu and injects columnartorow and rowtocolumnar transformations which do some batching that doesn't (but should after patching) happen otherwise.


6. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.
2. Monitor progress periodically in case of a possible hang, to avoid incurring cloud costs in such cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we keep item number 5, 6 or replace with 1, 2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Didn't intend to change the numbers, will revert.

"spark.sql.execution.arrow.pyspark.enabled": "true",
"spark.sql.files.maxPartitionBytes": "2000000000000",
"spark.databricks.delta.optimizeWrite.enabled": "false",
"spark.rapids.sql.concurrentGpuTasks": "2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we keep this here, do we need to remove it from gpu_cluster_spec.sh (the one without GPU ETL)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will delete there. Good catch.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

@eordentlich eordentlich merged commit 36a0979 into NVIDIA:branch-24.08 Aug 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants