-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parquet read tests that fail on Databricks with date/time input [databricks] #9639
Conversation
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
build |
build |
Seems like #9617 is broken then and the proper fix is to revert it? This seems like a regression in functionality vs. what we supported before. Tests for date/timestamps at the top level of the schema were passing on Databricks before. |
It was not a bug but an intentional, calculated risk. In most cases, the query runs as expected, GPU accelerated with no crashes. Yes, sometimes it can crash, but the alternative in #9617 is that now all GPU Parquet reads on Databricks involving dates or timestamps are disabled by default. That will be very impactful for many users, and thus seems like #9617 should not have been committed. |
Okay then I can revert the GPU tag logic in it. |
This reverts commit d48f173.
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Before #9617, reading parquet files fallback to CPU on
LEGACY
rebase mode only if the input contains date/time columns nested inside other columns. After PR 9617, reading parquet files now always fallback to CPU onLEGACY
rebase mode if there is any date/time input at any nested level.Since Databricks sets the rebase mode to
LEGACY
by default, some tests for parquet read now fail. This PR adds the explicit read config to fix them.Closes #9636.