Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery labels on schemas/dataset #22

Closed
shedd opened this issue Dec 15, 2020 · 6 comments
Closed

BigQuery labels on schemas/dataset #22

shedd opened this issue Dec 15, 2020 · 6 comments
Labels
enhancement New feature or request Stale

Comments

@shedd
Copy link

shedd commented Dec 15, 2020

Describe the feature

We use labels on Google Cloud assets for inventory tracking purposes. For our Bigquery datasets, we need labels at the schema/dataset level.

It looks like it's possible to add other Bigquery schema-level configuration to the profile - e.g. dataset location.

Are there any plans to support labels at the dataset level?

Describe alternatives you've considered

We tested table-level labels implemented as per dbt-labs/dbt-core#1942 but our inventory tracking service (Vanta for SOC2 purposes) expects the labels at the dataset level.

Who will this benefit?

Other Bigquery users who are working with labels at the dataset level.

@jtcohen6
Copy link
Contributor

Thanks for the idea @shedd! It's helpful to know that this a requirement for integrating with other tools. I'm totally in favor of coming up with an approach here, I'm just not sure what it should be.

There's some helpful discussion over in dbt-labs/dbt-core#1714 around a similar idea, how to persist docs for BigQuery datasets. The same questions apply here:

  • how can labels can be attached to a schema/dataset?
  • how dbt actually sets the labels for the schemas/datasets -- does this only happen when dbt creates a schema/dataset? Or can these descriptions be updated? (relates to Update BQ labels on tables as they change  #21, we only update labels on models when they're created today)

Dataset location (#969) feels a bit different because it's an attribute of the connection, specified in profiles.yml. Of course, we could add a profile attribute that instructs dbt to label the target.dataset if it's creating it for the first time. What about other datasets dbt may create, though, in the case that a project is using custom schemas? Should that same label apply to them all?

For the purpose of integrating with your inventory tracking service, would it be sufficient to simply put a dbt label each dataset dbt creates? This wouldn't support custom labels, but it'd be easy enough to code up without developing new constructs, and it feels in line with our expectation that dbt creates and manages its own datasets of transformed models.

I'm tagging this bigquery to start, but if we come up with an approach we really like, we could see about extending it to other databases.

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021
@jtcohen6 jtcohen6 added the enhancement New feature or request label Oct 12, 2021
@sparrovv
Copy link

Hey there,

Eventually this problem was solved with this macro:

{% macro update_dataset_labels(dataset_name, project) %}

   {% set update_labels_query %}
        ALTER schema `{{project}}`.{{dataset_name}}
        SET OPTIONS (
            labels=[("foo-name", "foo-value")]
        )
   {% endset %}

   {% set get_labels %}
        SELECT
            option_value
        FROM
            `{{project}}`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
        WHERE
          schema_name="{{dataset_name}}"
          and option_name="labels"
  {% endset%}

  {% do run_query(update_labels_query) %}
  {% set result =  run_query(get_labels).columns[0].values() %}
  {% do log(result, True) %}

{% endmacro %}
 dbt run-operation update_dataset_labels --args '{"dataset_name": "dbt_test", "project": "foo"}'

Tha macro could be made more generic, and source labels from the project.yml, but for the time being it works as expected.

@BenHizak
Copy link

BenHizak commented Jun 2, 2022

@sparrovv this is great. Thank you 👍

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 30, 2022
@BenHizak
Copy link

Is there any plan to make this native functionality (without a macro) in DBT?

@github-actions github-actions bot removed the Stale label Dec 1, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

4 participants