[FEA]Improve the file reading by using local file caching #1435

GaryShen2008 · 2020-12-29T05:05:58Z

Is your feature request related to a problem? Please describe.
For a case of loading the same data files from remote data source by multiple times, it'll be significantly improved if there's a local file caching mechanism.

Describe the solution you'd like
Alluxio is an open source project, which can do the exact caching thing for this case. We'd like to use Alluxio as the file caching service working with our plugin to provide a solution for the case of frequent remote reading.
We'd like to optimize the file reading and partitioning in our plugin according to the core of Alluxio.

Tasks:

Productionize plugin to work with different filesystems(AWS, Azure, GCP, DBFS)
Create tests for Alluxio with different filesystems
Create a user guide for Alluxio settings
Verify Alluxio can run in an on-prem cluster

wbo4958 · 2021-02-26T03:05:24Z

For now, we have merged PR #1562 for the V1 data source. And we will discuss if we need to add alluxio support for the V2 data source. So for now, just close the issue. I will re-open an another issue if we have plans to support V2 data source.

tgravescs · 2021-03-01T16:34:29Z

please open an issue to track the v2 data source, we can prioritize it as needed

[auto-merge] bot-auto-merge-branch-23.10 to branch-23.12 [skip ci] [bot]

GaryShen2008 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 29, 2020

sameerz removed the ? - Needs Triage Need team to review and classify label Jan 5, 2021

sameerz assigned tgravescs Jan 5, 2021

GaryShen2008 assigned wbo4958 Jan 10, 2021

wbo4958 closed this as completed Feb 26, 2021

sameerz added performance A performance related task/issue and removed feature request New feature or request labels Mar 2, 2021

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Merge pull request NVIDIA#1435 from NVIDIA/bot-auto-merge-branch-23.10

9f5d8e5

[auto-merge] bot-auto-merge-branch-23.10 to branch-23.12 [skip ci] [bot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]Improve the file reading by using local file caching #1435

[FEA]Improve the file reading by using local file caching #1435

GaryShen2008 commented Dec 29, 2020 •

edited by wbo4958

Loading

wbo4958 commented Feb 26, 2021

tgravescs commented Mar 1, 2021

[FEA]Improve the file reading by using local file caching #1435

[FEA]Improve the file reading by using local file caching #1435

Comments

GaryShen2008 commented Dec 29, 2020 • edited by wbo4958 Loading

wbo4958 commented Feb 26, 2021

tgravescs commented Mar 1, 2021

GaryShen2008 commented Dec 29, 2020 •

edited by wbo4958

Loading