Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Compatibility with Spark-330 AggregatePushDown on ORC files #4950

Closed
amahussein opened this issue Mar 14, 2022 · 0 comments · Fixed by #4957
Closed

[DOC] Compatibility with Spark-330 AggregatePushDown on ORC files #4950

amahussein opened this issue Mar 14, 2022 · 0 comments · Fixed by #4957
Assignees
Labels
documentation Improvements or additions to documentation spark 3.3+

Comments

@amahussein
Copy link
Collaborator

Report needed documentation

Based on the discussion in SPARK-34960: Aggregate (Min/Max/Count) push down for ORC, we need to document that the flag spark.sql.orc.aggregatePushdown should be disabled while reading ORC files created by the GPU.

Why do we need this

  • Spark treats all ORC files to have file statistics which does not follow ORC specifications
  • Spark-3.3.0 is expected to be released before fixing this issue. So, we need to alert the users that for Spark-330 there might be a runtime exception reading ORC file generated by the GPU.
@amahussein amahussein added documentation Improvements or additions to documentation ? - Needs Triage Need team to review and classify labels Mar 14, 2022
@amahussein amahussein self-assigned this Mar 14, 2022
@amahussein amahussein added this to the Feb 28 - Mar 18 milestone Mar 14, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation spark 3.3+
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants