Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[macro] [false positives] environment-aware logic in config should not cause resources to Always selected by state:modified #9564

Closed
3 tasks done
Tracked by #9562
graciegoheen opened this issue Feb 13, 2024 · 4 comments · Fixed by #10487
Labels
enhancement New feature or request state: modified state Stateful selection (state:modified, defer)

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Feb 13, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Many folks want to have logic in their dbt project that purposely differs by environment.

For example, let's say I want to materialize a model as a view in dev but as a table in prod.

To accomplish this, I use a macro in my config block:

{{
    config(
        materialized = set_materialized_config()
    )
}}

with the following macro:

{% macro set_materialized_config() %}
  {% if target.name == 'prod' %}
    {% set mat = 'table' %}
  {% else %}
    {% set mat = 'view' %}
  {% endif %}
  
  {{ return(mat) }}
{% endmacro %}

When selecting --select state:modified and comparing my dev environment to a manifest from prod, this model will ALWAYS be marked as modified. This is because, the materialized config appears different - in prod it's table, in dev it's view.

Instead, if I've changed nothing about this model OR macro, it shouldn't be marked as modified because the macro logic hasn't changed.

If I've changed the model OR the macro, it should be marked as modified.

Describe alternatives you've considered

A current work-around is to pull the raw jinja out of my macro and put all environment-based logic directly into my dbt_project.yml file (can't set configs to output of macros in dbt_project.yml).

If instead, I were to configure this model as so:

models:
  <path_to_my_model>:
    +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

it would NOT be picked up when selecting --select state:modified.

But this is cumbersome (reduces DRY code), unexpected, and can lead to an unnecessarily massive dbt_project.yml file.

Who will this benefit?

Anyone who wants to use state:modified and has environment-based logic in their dbt project.

See internal use case:

Are you interested in contributing this feature?

No response

Anything else?

Potential solution #6170 (comment)

This is relevant for all resources that can be configured, not just models.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 13, 2024

Historical context on why we've said that this is hard, at least in the past:

We'd need some way to statically extract + save the "unrendered" value of materialized, as set_materialized_config(). Right now, both macros are called + resolved in the same pass.


An alternative approach is to let users put macros in their yaml configs, such as:

# models/path/to/my_model.yml
models:
  - name: my_model
    config:
      materialized: "{{ set_materialized_config() }}"

This would make it easier for us to save "{{ set_materialized_config() }}" as the unrendered materialized config. It's also much DRYer for the end user, compared with copy-pasting the same Jinja if expression over and over. But it also risks being substantially trickier & slower to parse, which is the biggest reason why we haven't done it in the past.

If we do go down that route, we might want a different UX — a "snippet"? a "pure macro"? — to make clear that these macros can only be static input-output machines. They can reference vars + env vars + target values, but they cannot make introspective queries against the data warehouse. This category already includes the "special" generate_x_name macros (for database/schema/alias), which dbt fully resolves at parse time instead of at runtime.

@graciegoheen graciegoheen changed the title [macro] environment-aware logic in config should not cause resources to Always selected by state:modified [macro] [false positives] environment-aware logic in config should not cause resources to Always selected by state:modified Feb 14, 2024
@graciegoheen graciegoheen added state Stateful selection (state:modified, defer) state: modified labels Feb 14, 2024
@graciegoheen
Copy link
Contributor Author

Related to #3277

@sasawatc
Copy link

sasawatc commented Jul 3, 2024

Got a similar use case to @graciegoheen mentioned dynamically set my snowflake warehouse based on my environment (assuming from the title as I don't have access). Personally, having a way to specify all of the model-specific config override at the .sql file itself makes it intuitive on the override settings of that particular model.

@marius-sb1
Copy link

If we do go down that route, we might want a different UX — a "snippet"? a "pure macro"? — to make clear that these macros can only be static input-output machines. They can reference vars + env vars + target values, but they cannot make introspective queries against the data warehouse. This category already includes the "special" generate_x_name macros (for database/schema/alias), which dbt fully resolves at parse time instead of at runtime.

This might be a different issue, but it could also be related and the quoted paragraph touches on the generate_x_name-macros which definately are related to my problem. We have a set-up where we override generate_database_name to separate environments and projects into separate databases (it uses target.name from a dynamically generated profiles.yml + database/custom database).

While we have the exact same database config across our environments, it seems that only the output from get_database_name is stored in the manifest and therefore marks all our models as modified when comparing dev to prod. The use case here is slim CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request state: modified state Stateful selection (state:modified, defer)
Projects
None yet
4 participants