Add abfs and abfss to the cloud scheme #4082

wbo4958 · 2021-11-11T06:35:17Z

Signed-off-by: Bobby Wang wbo4958@gmail.com

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

wbo4958 · 2021-11-11T06:35:26Z

build

jlowe · 2021-11-11T16:48:56Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

-      "generally would be total separate from the executors and likely have a higher I/O read " +
-      "cost. Many times the cloud filesystems also get better throughput when you have multiple " +
-      "readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type")
+      "filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs, abfs, abfss. Cloud " +


Nit: It would be nice to not have to keep these manually in-sync, and also would be nice to list them alphabetically. For example, moving the default cloud scheme list from GpuMultiFileReader to here with something like:

/** List of schemes that are always considered cloud storage schemes */ val DEFAULT_CLOUD_SCHEMES = Seq("abfs", "abfss", "dbfs", "gs", "s3", "s3a", "s3n", "wasbs") val CLOUD_SCHEMES = conf("spark.rapids.cloudSchemes") [...] s"filesystems. Schemes already included: ${DEFAULT_CLOUD_SCHEMES.mkString(", ")}. Cloud " +

Then GpuMultiFileReader can create its cloud schemes HashSet from the list in RapidsConf.

jlowe

Looks OK to me. I'd like to see the nit addressed, but it can be a followup.

tgravescs · 2021-11-11T16:55:47Z

did we run any perf tests on this just to make sure this performed better?

wbo4958 · 2021-11-12T02:58:47Z

build

wbo4958 · 2021-11-12T09:49:47Z

did we run any perf tests on this just to make sure this performed better?

@tgravescs

I just did the perf test locally for COALESCING and cloud reading for total 1.3G and 5000 orc files residing in Azure.

	total time
Cloud reading	506.653s
Coalescing reading	696.223s

jlowe · 2021-11-12T14:26:38Z

build

Add abfs and abfss to the cloud scheme

0eb76fe

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

wbo4958 requested a review from tgravescs November 11, 2021 09:30

jlowe reviewed Nov 11, 2021

View reviewed changes

jlowe previously approved these changes Nov 11, 2021

View reviewed changes

resolve comments

5e0d641

wbo4958 dismissed jlowe’s stale review via 5e0d641 November 12, 2021 02:58

doc

f0934e2

tgravescs approved these changes Nov 12, 2021

View reviewed changes

jlowe approved these changes Nov 12, 2021

View reviewed changes

wbo4958 merged commit 602e754 into NVIDIA:branch-21.12 Nov 12, 2021

sameerz added the task Work required that improves the product but is not user facing label Nov 16, 2021

wbo4958 deleted the abfs-scheme branch February 17, 2022 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add abfs and abfss to the cloud scheme #4082

Add abfs and abfss to the cloud scheme #4082

wbo4958 commented Nov 11, 2021

wbo4958 commented Nov 11, 2021

jlowe Nov 11, 2021

wbo4958 Nov 12, 2021

jlowe left a comment

tgravescs commented Nov 11, 2021

wbo4958 commented Nov 12, 2021

wbo4958 commented Nov 12, 2021 •

edited

Loading

jlowe commented Nov 12, 2021

Add abfs and abfss to the cloud scheme #4082

Add abfs and abfss to the cloud scheme #4082

Conversation

wbo4958 commented Nov 11, 2021

wbo4958 commented Nov 11, 2021

jlowe Nov 11, 2021

Choose a reason for hiding this comment

wbo4958 Nov 12, 2021

Choose a reason for hiding this comment

jlowe left a comment

Choose a reason for hiding this comment

tgravescs commented Nov 11, 2021

wbo4958 commented Nov 12, 2021

wbo4958 commented Nov 12, 2021 • edited Loading

jlowe commented Nov 12, 2021

wbo4958 commented Nov 12, 2021 •

edited

Loading