Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-594] Support for Iceberg sort order #343

Closed
cccs-jc opened this issue May 3, 2022 · 3 comments
Closed

[CT-594] Support for Iceberg sort order #343

cccs-jc opened this issue May 3, 2022 · 3 comments
Labels
enhancement New feature or request Stale

Comments

@cccs-jc
Copy link

cccs-jc commented May 3, 2022

When creating iceberg tables you can specify a sort order. This is important because a table that is partition will implicitly be sorted by iceberg (overriding the sort statement in your query).

Sort order are specified via ALTER TABLE statement. In an iceberg implementation this feature should be configurable via a dbt-config.

Reference: https://iceberg.apache.org/docs/latest/spark-ddl/

Example statement:

ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NULLS FIRST
@cccs-jc cccs-jc added enhancement New feature or request triage labels May 3, 2022
@github-actions github-actions bot changed the title Support for Iceberg sort order [CT-594] Support for Iceberg sort order May 3, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 3, 2022

Hey @cccs-jc, thanks for opening!

Sure, this sounds like a reasonable config option within the table materialization. It also sounds like something you could get working with a post_hook in the meantime.

Following the thread in #294: dbt-spark doesn't (yet) officially support Iceberg, but it seems like this is something you've managed to get working over in https://github.com/cccs-jc/dbt-spark. There are a few moving pieces in play:

Perhaps it makes sense to wait on adding support for this specific config, until we have the right pieces in place to officially support Iceberg?

@jtcohen6 jtcohen6 removed the triage label May 3, 2022
@cccs-jc
Copy link
Author

cccs-jc commented May 3, 2022

I agree. I mostly wanted to create the issue so we don't loose track of it. It's my understanding that the guys at Iceberg plan to contribute support for Iceberg in dbt-spark. I've pointed them to my POC which does include some tidbits but is not complete or production grade.

For now yes we will use a pre-hook so that when we insert the table is always configured correctly.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants