Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Struct support for ParquetWriter #2514

Merged
merged 8 commits into from
May 28, 2021

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented May 26, 2021

Signed-off-by: Raza Jafri rjafri@nvidia.com

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@@ -2815,7 +2815,8 @@ object GpuOverrides {
exec[DataWritingCommandExec](
"Writing data",
ExecChecks((TypeSig.commonCudfTypes +
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet")).nested(),
TypeSig.DECIMAL.withPsNote(TypeEnum.DECIMAL, "Only supported for Parquet") +
TypeSig.STRUCT.withPsNote(TypeEnum.STRUCT, "Only supported for Parquet")).nested(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a test? When the struct support went in, the fact that we forgot to update this should have been caught by the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @jlowe I remembered that we added struct support for GpuInMemoryTableScanExec and confused it with DataWritingCommandExec. We have to turn this on as part of this PR. I have updated the PR name to reflect it

@pxLi pxLi added the documentation Improvements or additions to documentation label May 27, 2021
@sameerz sameerz added this to the May 24 - Jun 4 milestone May 27, 2021
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri changed the title Updated supported_ops.md for DataWritingCommandExec to reflect Parquet Writer struct support Add Struct support for ParquetWriter May 28, 2021
@razajafri
Copy link
Collaborator Author

build

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
pytest.param([decimal_gen_default, parquet_basic_struct_gen,
StructGen([['child0', StructGen([['child1', byte_gen]])]]),
decimal_gen_scale_precision, decimal_gen_same_scale_precision, decimal_gen_64bit],
marks=pytest.mark.allow_non_gpu("CoalesceExec"))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is CoalesceExec not on the GPU? and is there an open issue that will put it on the GPU? If not we should file one and reference it here. If so we should just reference it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is fixed by #2530 which added struct support for coalesce.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to not allow Coalesce with Structs to run on CPU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still has allow_non_gpu("CoalesceExec")?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, its not refflecting it here for some reason, unless you go to the last commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK you are seeing the latest, I wasn't seeing it until I looked at the commit specifically.

So that is needed because CoalesceExec doesn't support Decimals yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is needed because CoalesceExec doesn't support Decimals yet

#2531 explicitly states it adds decimal support and appears to be testing it. So if that is somehow not working in practice an issue needs to be filed and fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for wasting your time. I didn't see _commonTypes was added and thought it was TypeSig.commonCudfTypes

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
…ts now

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri requested review from revans2 and jlowe May 28, 2021 20:01
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@jlowe
Copy link
Member

jlowe commented May 28, 2021

build

@razajafri razajafri merged commit 59b126c into NVIDIA:branch-21.06 May 28, 2021
@razajafri razajafri deleted the parquet-doc branch May 28, 2021 23:42
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* updated Parquet writer support docs

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* adding Struct to allow list for ParquetWriter

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* add nested struct test

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* removed duplicate data type from test

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* removed CoalesceExec from allowedOnCpu list because it supports structs now

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* Decimals are supported by CoalesceExec

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Co-authored-by: Raza Jafri <rjafri@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* updated Parquet writer support docs

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* adding Struct to allow list for ParquetWriter

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* add nested struct test

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* removed duplicate data type from test

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* removed CoalesceExec from allowedOnCpu list because it supports structs now

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* Decimals are supported by CoalesceExec

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

Co-authored-by: Raza Jafri <rjafri@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants