Ensure column names are valid when writing benchmark query results to file #1247

andygrove · 2020-12-02T21:41:49Z

Ensure column names are valid when writing benchmark query results to file.

This closes #1246

Signed-off-by: Andy Grove <andygrove@nvidia.com>

abellina · 2020-12-02T21:43:07Z

Is this an issue for CPU plans as well? Or just the GPU plans? From the exceptions in the issue it does look like Spark is complaining about this.

andygrove · 2020-12-02T22:01:57Z

Yes, this happens on CPU as well:

org.apache.spark.sql.AnalysisException: Attribute name "round((sun_sales1 / sun_sales2), 2)" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
	at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:583)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(ParquetSchemaConverter.scala:574)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.$anonfun$setSchema$2(ParquetWriteSupport.scala:472)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.$anonfun$setSchema$2$adapted(ParquetWriteSupport.scala:472)

abellina · 2020-12-02T22:14:38Z

build

sameerz · 2020-12-03T05:04:56Z

build

abellina · 2020-12-03T14:40:22Z

This PR needs to be upmerged to include the latest changes in branch-0.3

andygrove · 2020-12-03T14:50:55Z

build

andygrove · 2020-12-03T19:06:07Z

tests timed out - build

andygrove · 2020-12-03T19:10:05Z

build

… file (NVIDIA#1247) * Enforce unique column names when writing query output to file * preserve valid names Signed-off-by: Andy Grove <andygrove@nvidia.com>

…IDIA#1247) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

andygrove added 2 commits December 2, 2020 14:31

Enforce unique column names when writing query output to file

3bc9127

preserve valid names

bea0577

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove added the benchmark Benchmarking, benchmarking tools label Dec 2, 2020

andygrove added this to the Nov 23 - Dec 4 milestone Dec 2, 2020

andygrove self-assigned this Dec 2, 2020

andygrove changed the title ~~Ensure column names are valid when writing benchmark query results to file~~ WIP: Ensure column names are valid when writing benchmark query results to file Dec 2, 2020

andygrove changed the title ~~WIP: Ensure column names are valid when writing benchmark query results to file~~ Ensure column names are valid when writing benchmark query results to file Dec 2, 2020

abellina approved these changes Dec 2, 2020

View reviewed changes

Merge remote-tracking branch 'nvidia/branch-0.3' into tpcds-column-names

b1df64d

andygrove merged commit 4679591 into NVIDIA:branch-0.3 Dec 3, 2020

andygrove deleted the tpcds-column-names branch December 3, 2020 21:53

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Update submodule cudf to 62c4f99f79852a95f61f241c884e598c9164331d (NV…

c23b864

…IDIA#1247) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure column names are valid when writing benchmark query results to file #1247

Ensure column names are valid when writing benchmark query results to file #1247

andygrove commented Dec 2, 2020

abellina commented Dec 2, 2020 •

edited

Loading

andygrove commented Dec 2, 2020

abellina commented Dec 2, 2020

sameerz commented Dec 3, 2020

abellina commented Dec 3, 2020

andygrove commented Dec 3, 2020

andygrove commented Dec 3, 2020

andygrove commented Dec 3, 2020

Ensure column names are valid when writing benchmark query results to file #1247

Ensure column names are valid when writing benchmark query results to file #1247

Conversation

andygrove commented Dec 2, 2020

abellina commented Dec 2, 2020 • edited Loading

andygrove commented Dec 2, 2020

abellina commented Dec 2, 2020

sameerz commented Dec 3, 2020

abellina commented Dec 3, 2020

andygrove commented Dec 3, 2020

andygrove commented Dec 3, 2020

andygrove commented Dec 3, 2020

abellina commented Dec 2, 2020 •

edited

Loading