Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1080] BigQuery dataset expiration time #289

Closed
akashgangulyhf opened this issue Aug 25, 2022 · 2 comments
Closed

[CT-1080] BigQuery dataset expiration time #289

akashgangulyhf opened this issue Aug 25, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@akashgangulyhf
Copy link

Describe the feature

BigQuery supports datasets to expire after a given time automatically.
Currently DBT doesn't support configuring the BQ dataset(schema).

Describe alternatives you've considered

We have to configure and create the dataset first in BQ and then use it form DBT.

Additional context

Creating the dataset first in BQ doesn't help fully for cases where we would want to dynamically create datasets and expire them automatically on CI pipeline.

Who will this benefit?

On GitHub PR, we can use this with DBT Core, where we create BQ datasets with a specific expiration time. That way we don't have to give DELETE access to GitHub explicitly.

Are you interested in contributing this feature?

NA

@akashgangulyhf akashgangulyhf added enhancement New feature or request triage labels Aug 25, 2022
@github-actions github-actions bot changed the title BigQuery dataset expiration time [CT-1080] BigQuery dataset expiration time Aug 25, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 6, 2022

@akashgangulyhf Thanks for opening!

Since #183 was merged, and will be included in v1.3, it will be newly possible for you to define your own create_schema macro, and have dbt use that one instead of the default:

{% macro create_schema(relation) %}
  {%- call statement('create_schema') -%}
    create schema if not exists {{ relation.without_identifier() }}
    options (
        default_table_expiration_days = 3600 -- in seconds (1 hour)
    )
  {% endcall %}
{% endmacro %}

You could configure that to take into account env vars or target values, so as to have different expiration policies in different environments (dev, CI, prod). But it would be tricky to configure on a schema-by-schema level; I'll say more about this below.


We're missing a mechanism for users to truly configure schemas/datasets in dbt today. We can say that, without necessarily implying that dbt should first / eventually become a totally generic and extensible framework for managing all types of database objects. It's already in the business of creating schemas/datasets, as soon as you define and run a model in a schema/dataset that does not yet exist. This has come up in the context of labels (#22), persisting descriptions/comments (dbt-labs/dbt-core#1714), grants (dev blog), and managing "orphaned" objects (dbt-labs/dbt-core#4957).

@Fleid something I'd be interested in talking about with you more!

@jtcohen6 jtcohen6 removed the triage label Sep 6, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 7, 2022

I'm going to close this issue in favor of a discussion, about how dbt might enable users to manage schemas more naturally in the future: dbt-labs/dbt-core#5781

I think the manual override of the create_schema macro offers a halfway decent workaround in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants