-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize insert_overwrite incremental strategy with WRITE_TRUNCATE / Partition copy #167
Conversation
2274a2e
to
75d90db
Compare
d21d712
to
4c1e485
Compare
@@ -47,11 +47,16 @@ class PartitionConfig(dbtClassMixin): | |||
data_type: str = "date" | |||
granularity: str = "day" | |||
range: Optional[Dict[str, Any]] = None | |||
time_ingestion_partitioning: bool = False | |||
copy_partitions: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@github-christophe-oudar - Is there a use case for copy_partitions outside of time_ingestion_partitioning or for not using copy_partitions with time_ingestion_partitioning? Just wondering if we need another config parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely, it's also much faster than a merge so I'm going to use it on all my models whether it's time ingestion partitioning or column type partitioning! It really fits my use case because I'm usually processing models partition per partition but it might not fit everyone (though honestly, most people in my opinion can afford a temporary loss of consistency for such a speed gain 👍 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case for time_ingestion_partitioning = True
and copy_partitions = False
? Should we allow users to do this?
@Fleid - I know you had concerns with adding more variables for users to set but I think we need to keep these separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here some use cases:
time_ingestion_partitioning = True
andcopy_partitions = False
=> for atomic merge for time ingestion partitioned table (useful for people that can't afford potential inconsistency on multiple partitions)time_ingestion_partitioning = False
andcopy_partitions = True
=>
for fast atomic insert overwrite on a single partition or fast non atomic on multiple partitions on column type partitioning (quite common at my company for small/medium tables)time_ingestion_partitioning = False
andcopy_partitions = False
=> before the 2 PRstime_ingestion_partitioning = True
andcopy_partitions = True
=> for fast atomic insert overwrite on a single partition or fast non atomic on multiple partitions (a common use case at my company for large tables as it's the best performing tables from what we tested - we might need to double check now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great context! I'm going to do a follow up PR after we merge to update our documentation with this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing some local testing, one question so far but overall it seems to function well. Hope to be able to move this and #136 forward this week
Also: apologies for the extreme delay on this, still getting ramped up but we do want to prioritize getting this merged post-Coalesce
CHANGELOG.md
Outdated
@@ -9,6 +9,11 @@ | |||
- Use dbt.tests.adapter.basic in tests (new test framework) ([#135](https://github.com/dbt-labs/dbt-bigquery/issues/135), [#142](https://github.com/dbt-labs/dbt-bigquery/pull/142)) | |||
- Adding pre-commit and black formatter hooks ([#147](https://github.com/dbt-labs/dbt-bigquery/pull/147)) | |||
- Adding pre-commit code changes ([#148](https://github.com/dbt-labs/dbt-bigquery/pull/148)) | |||
- Add support for ingestion time partitioned table using incremental materialization ([#136](https://github.com/dbt-labs/dbt-bigquery/pull/136)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like you manually added a changelog entry but we're using changie? https://github.com/dbt-labs/dbt-bigquery/blob/main/CONTRIBUTING.md#updating-docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll update, it was done before changie was introduced
56352a8
to
333865f
Compare
@colin-rogers-dbt I rebased it, I think you have put the label to trigger the integration tests so that we can check there's no regression there too. |
dbt/include/bigquery/macros/materializations/incremental_strategy/insert_overwrite.sql
Show resolved
Hide resolved
333865f
to
803429b
Compare
803429b
to
5bcff7d
Compare
9ded8d9
to
35f0e87
Compare
35f0e87
to
0bf3706
Compare
This integ test is failing due to a known issue with our testing. No action required from @Kayrnt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
resolves #77
Description
copy_partitions
topartition_by
object (default to false), throw an error ifincremental_strategy != 'insert_overwrite'
.insert_overwrite
©_partitions
, we replace theMERGE
part by:Checklist
CHANGELOG.md
and added information about my change to the "dbt-bigquery next" section.