Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file #1366

abellina · 2020-12-10T19:23:27Z

We ran into a case where a single Parquet file loaded an order of magnitude slower (800ms vs 80ms) comparing 0.3 snapshot vs 0.2 of the plugin.

But when we set: spark.rapids.sql.format.parquet.reader.type=MULTITHREADED or spark.rapids.sql.format.parquet.reader.type=COALESCING we see consistent or better results <= 80ms in 0.3.

This seems interesting because it's a single 128MB parquet file, and there's a single batch that comes out of it. So it looks like the reader has some overheads if left by default AUTO. Note that the file was being read from hdfs://.

The file in question was one of the Parquet parts of the web_sales table for TPCDS at 1TB, and the query where we saw significant effect was q88. The file has decimals in it, and the test projected out the integers s.t. the gpu parquet scan would be enabled:

val test = spark.read.parquet("hdfs://.../web_sales/ws_sold_date_sk=2450816")
test.select("ws_web_page_sk", "ws_order_number").write.mode("overwrite").parquet("test")

The text was updated successfully, but these errors were encountered:

tgravescs · 2020-12-17T02:44:03Z

so when I run this on yarn, and just look at the parquet read stats on our yarn cluster, I see the opposite, the multi-threaded reader has a total scan time larger then auto = coalescing. But when I run the entire tpcds query 9 on the yarn cluster, using partitioned data, which by the comments I assume is what was done in raplab, the multi-threaded reader is faster. The issue here is the partitioning. If I pull in the PR to fix coalesce reader with partitioning (#1200) then AUTO = COALESCING is much faster.

this was running the query with 10G tpcds data that is partitioned

multi-threaded = 288s
AUTO=coalescing = 420s
coalescing with partitioning fix for issue 1200 = 164s

I ran q88 with similar results as well. Going by the path:
spark.read.parquet("hdfs:/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/web_sales/ws_sold_date_sk=2450816")

certainly looks like partitioned data. I'm guessing this is what is happening in raplab as well but we need to verify.

tgravescs · 2020-12-17T02:44:29Z

one thing we could do for the 0.3 release is default it to the multi-threaded reader if we are worried to many people use partitioned data

tgravescs · 2020-12-17T19:29:16Z

looking at the logs of one application that was worse on raplab we see that its reading many partitioned files:

20/12/09 23:55:28 INFO GpuParquetMultiFilePartitionReaderFactory: Using the coalesce multi-file parquet reader, files: hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451111/part-00016-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452169/part-00041-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452147/part-00052-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451469/part-00068-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452209/part-00045-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451485/part-00188-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452184/part-00068-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451063/part-00005-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452501/part-00086-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452213/part-00052-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451815/part-00152-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452546/part-00108-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet task attemptid: 12158
20/12/09 23:55:28 INFO GpuParquetMultiFilePartitionReaderFactory: Using the coalesce multi-file parquet reader, files: hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451027/part-00135-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452467/part-00198-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450886/part-00084-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452327/part-00143-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450842/part-00051-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451003/part-00107-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451243/part-00078-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452407/part-00053-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451238/part-00038-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451661/part-00069-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451208/part-00133-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452337/part-00089-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450835/part-00133-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452092/part-00194-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451555/part-00143-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451282/part-00157-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452122/part-00157-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451959/part-00084-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452051/part-00107-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451252/part-00198-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452487/part-00107-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451380/part-00084-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452296/part-00008-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet,hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450930/part-00023-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet task attemptid: 12189

These end up being split into separate batches and thus is not efficient intransferring:

20/12/09 23:55:29 INFO MultiFileParquetPartitionReader: Partition values for the next file hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450842/part-00051-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet doesn't match current hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2452327/part-00143-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet, splitting it into another batch!
20/12/09 23:55:29 INFO MultiFileParquetPartitionReader: Partition values for the next file hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2451003/part-00107-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet doesn't match current hdfs://host1:9000/data/tpcds_sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/store_sales/ss_sold_date_sk=2450842/part-00051-2e3057df-fc7e-4ba1-9a05-2147079e088e.c000.snappy.parquet, splitting it into another batch!

so I think this is a dup of #1200

abellina added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 10, 2020

sameerz added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 15, 2020

sameerz changed the title ~~[FEA] Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file~~ Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file Dec 15, 2020

tgravescs self-assigned this Dec 17, 2020

sameerz added this to the Dec 7 - Dec 18 milestone Dec 17, 2020

tgravescs mentioned this issue Dec 17, 2020

Make the multi-threaded parquet reader the default since coalescing doesn't handle partitioned files well #1424

Merged

tgravescs closed this as completed Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file #1366

Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file #1366

abellina commented Dec 10, 2020

tgravescs commented Dec 17, 2020

tgravescs commented Dec 17, 2020

tgravescs commented Dec 17, 2020 •

edited

Loading

Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file #1366

Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file #1366

Comments

abellina commented Dec 10, 2020

tgravescs commented Dec 17, 2020

tgravescs commented Dec 17, 2020

tgravescs commented Dec 17, 2020 • edited Loading

tgravescs commented Dec 17, 2020 •

edited

Loading