Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve performance of Cache plugin #1143

Open
3 of 7 tasks
razajafri opened this issue Nov 17, 2020 · 1 comment
Open
3 of 7 tasks

[FEA] Improve performance of Cache plugin #1143

razajafri opened this issue Nov 17, 2020 · 1 comment
Labels
epic Issue that encompasses a significant feature or body of work P2 Not required for release performance A performance related task/issue Spark 3.1+ Bugs only related to Spark 3.1 or higher

Comments

@razajafri
Copy link
Collaborator

razajafri commented Nov 17, 2020

  • Support Decimals with negative scales by decomposing it to the long value Support for Decimals with negative scale for Parquet Cached Batch Serializer #2675
  • Use chunked writer when writing the CachedBatch. Issue
  • Look into NVComp to see if we can provide a better performance than Parquet - Might not be needed
  • Look into predicate push down instead of read rows and then throwing them away Issue
  • Only pass the needed conf instead of broadcasting the Map
  • Compression is AUTO right now which may compromise performance We are now using SNAPPY
  • Improve PCBS CPU read performance. It's currently slower than the DefaultCachedBatchSerializer
@razajafri razajafri added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 17, 2020
@razajafri razajafri assigned razajafri and unassigned razajafri Nov 17, 2020
@sameerz sameerz added Spark 3.1+ Bugs only related to Spark 3.1 or higher and removed ? - Needs Triage Need team to review and classify labels Nov 17, 2020
@razajafri
Copy link
Collaborator Author

This is a follow on to our work on Spark Cache that was done as a part of #444

@sameerz sameerz added performance A performance related task/issue epic Issue that encompasses a significant feature or body of work and removed feature request New feature or request labels Jun 2, 2021
@mattahrens mattahrens added the P2 Not required for release label Apr 27, 2022
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
* Fix cmake to install libcudacxx and thrust

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>

* Move changes down to section dependencies

---------

Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work P2 Not required for release performance A performance related task/issue Spark 3.1+ Bugs only related to Spark 3.1 or higher
Projects
None yet
Development

No branches or pull requests

3 participants