Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback parquet reading with merged schema and native footer reader #5500

Merged
merged 3 commits into from
May 19, 2022

Conversation

sperlingxx
Copy link
Collaborator

@sperlingxx sperlingxx commented May 16, 2022

Fixes the test to not fail for #5493

Native footer reader for parquet fetches data fields totally based on read schema, which may lead to overflow if merge schema is enabled. When merge schema is enabled, the file schema of each file partition may not contain the complete (read) schema. In this situation, native footer reader will come up with incorrect footers.

To prevent this situation, we fallback the parquet reading to CPU if merge schema and native footer reader are both enabled.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx sperlingxx requested a review from revans2 May 16, 2022 11:07
@sperlingxx
Copy link
Collaborator Author

build

@revans2
Copy link
Collaborator

revans2 commented May 16, 2022

Can we please just disable the test temporarily instead? The NATIVE footer parser is experimental so anyone using it is taking a risk. This stops the test from failing but does not fix any underlying issues.

@revans2
Copy link
Collaborator

revans2 commented May 16, 2022

@sperlingxx did you reproduce the failure? How does does the native footer lead to an overflow?

@sperlingxx
Copy link
Collaborator Author

Hi @revans2, I didn't reproduce it. I am sorry that I came to this conclusion too hurriedly.

@revans2
Copy link
Collaborator

revans2 commented May 18, 2022

build

@sperlingxx sperlingxx merged commit a5e1e1e into NVIDIA:branch-22.06 May 19, 2022
@sperlingxx sperlingxx deleted the fix_mergeschema_native branch May 19, 2022 09:23
@sameerz sameerz added the test Only impacts tests label May 19, 2022
@sameerz sameerz added this to the May 2 - May 20 milestone May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants