Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2809] Support ref in foreign key constraint expressions #8062

Closed
Tracked by #7979 ...
jtcohen6 opened this issue Jul 10, 2023 · 19 comments · Fixed by #10414 or #10489
Closed
Tracked by #7979 ...

[CT-2809] Support ref in foreign key constraint expressions #8062

jtcohen6 opened this issue Jul 10, 2023 · 19 comments · Fixed by #10414 or #10489
Assignees
Labels
enhancement New feature or request model_contracts multi_project paper_cut A small change that impacts lots of users in their day-to-day
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 10, 2023

Problem

    constraints:
      - type: FOREIGN_KEY # multi_column
        columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
        expression: "OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_FIRST_COLUMN, OTHER_MODEL_SECOND_COLUMN, ...)"
    
    columns:
      - name: FIRST_COLUMN
        data_type: DATA_TYPE
        
        # column-level constraints
        constraints:
          - type: foreign_key
            expression: OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_COLUMN)

Because you must hard-code your database.schema.table name when setting a foreign key constraint:

  • DAG dependencies are incorrect
  • multi-environment is not supported (it's very hacky)

This feature has become more important now that warehouses use foreign key constraints for better performance.

Instead, we should support ref in foreign key constraint expression - both at the model and column level.

This is similar to how the relationships data test works.

models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

Current workaround

Having to use jinja to specify the expression based on the target:

- type: foreign_key
  expression: "{{ 'prod_dataset.' if target.name!='dev' else target.dataset ~ '.prod_dataset__' }}foreign_table(foreign_key)"

Acceptance criteria

  • For foreign key constraint, I can specify which table I want to reference using ref at the column level
    columns:
      - name: my_column
        data_type: int
        constraints:
          - type: foreign_key
              to: ref('my_other_model')
              to_column: other_my_column
  • or at the model level
    constraints:
      - type: foreign_key
        columns: [first_column, second_column, ...]
        to: ref('my_other_model')
        to_columns: [other_first_column, other_second_column, ...]

Notes from technical refinement

  • this is where the current rendering happens
  • here’s where we parse constraints currently

originally left as comment in #7417

I'm opening this issue to track upvotes/comments that could inform eventual prioritization. Is this something people want/need in their production workflows? Are happy to solve by other means in the meantime (e.g. dbt_constraints)?


If we were to take FK constraints more seriously, we're missing a pretty important ingredient, which is the ability to include & template ref inside the expression field — or providing more structure, i.e.

constraints:
  - type: foreign_key
    ref_table_name: ref('other_table_name')
    ref_column_names: ['id'] # could be multiple

Per #6754 (comment), we kicked that out of scope for v1.5, and we're unlikely to prioritize it while this remains a metadata-only (nonfunctional & unenforceable) feature on the majority of data platforms.

@jtcohen6 jtcohen6 added enhancement New feature or request model_contracts labels Jul 10, 2023
@github-actions github-actions bot changed the title Support ref in foreign key constraint expressions [CT-2809] Support ref in foreign key constraint expressions Jul 10, 2023
@jgillies
Copy link

jgillies commented Aug 3, 2023

An argument in favor of prioritizing this is that BigQuery now supports the use of foreign keys for optimizing joins.

https://cloud.google.com/blog/products/data-analytics/join-optimizations-with-bigquery-primary-and-foreign-keys?hl=en

@noahjgreen295
Copy link

I would also submit that, database enforcement implementation aside, forcing the usage of explicit <schema>.<table> hardcodings and not supporting ref() is a crack in dbt's abstraction model. On its own it's certainly not the end of the world, but these breaks in the overall architectural vision and product conceptualization tend to proliferate if left unaddressed.

@awal11
Copy link

awal11 commented Nov 17, 2023

Snowflake can also use foreign keys for optimizing joins:
https://docs.snowflake.com/en/user-guide/join-elimination#setting-the-rely-constraint-property-to-eliminate-unnecessary-joins

@Juniper-vdg
Copy link

Juniper-vdg commented Nov 22, 2023

I'd really be interested in referencing a FK constraint to a model that lives in a custom schema. The referred model lives in a custom schema that is dependent on an Environment Variable that is passed in at runtime, so I cannot hardcode a <schema>.<table> reference in my constraint as I do not know what it will be ahead of time.

Until dbt is enhanced to support ref() in a foreign key constraint, I cannot model my FKs in constraints.

@noahjgreen295
Copy link

Another reason to add this is to ensure that dbt builds DAG dependencies that support the foreign keys. Because there is no ref(), but instead the hard-coding specification of <schema.table>, there’s no way to for dbt to understand the DAG dependency that a foreign key constraint creates.

For example, let’s say I have 3 models: A, B, and C

B depends on A.

So if I say dbt run -m +B it will first build A, then B.

So far so good. Now, suppose I specify a foreign key constraint on a column in B, referring to a column in C. For this to work, C has to exist. In other words, there’s now a DAG dependency between B and C, for that reason.

But with that constraint specified, dbt run -m +B still just builds A and then B. The constraint itself causes an error, because C does not exist.

In any non-trivial sized DAG, this will cause constant errors in builds, because there is no guarantee of a thread getting to C before B.

The workaround is to force the dependency by placing a SQL-commented ref() in the model .sql, as described here. In other words, something like:

-- {{ ref('C') }}

But this is just more extra work, and it becomes difficult to maintain as it scales. So this is one more reason to support ref in foreign key constraints expressions in the .yml; i.e. all in the same place.

@horony
Copy link

horony commented Dec 6, 2023

Like Snowflake and BigQuery, Redshift also uses foreign keys for optimizing joins:
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-defining-constraints.html

@elyobo
Copy link

elyobo commented Dec 20, 2023

During development we build into developer dependent datasets (e.g. dev_developer_name.dataset_name__model_name instead of dataset_name.model_name in production), so hard coding foreign keys seems impossible.

@Stochastic-Squirrel
Copy link

During development we build into developer dependent datasets (e.g. dev_developer_name.dataset_name__model_name instead of dataset_name.model_name in production), so hard coding foreign keys seems impossible.

@elyobo

The dependency issue raised by @noahjgreen295 will still be an issue and was a major issue for us in using this feature. Our pipelines were less reliable and there was essentially a race condition when running multiple models in parallel.

I use a similar naming convention to you and I used something like this in the model YAML

- type: foreign_key
  expression: "{{ 'warehouse' if target.name!='dev' else target.dataset }}.tableA(tableB_ForeignKey)"

you can define simple if-else logic in the brackets. This allows for the FKs to be created in a dev_developer_name schema under a dev target. Hope this helps!

@elyobo
Copy link

elyobo commented Dec 20, 2023

Thanks @Stochastic-Squirrel, I didn't realise you could do that; ends up something like this for ours and does indeed work, leaving the logic duplication (this is already handled in the naming macros that ref calls) and the dependency issue.

- type: foreign_key
  expression: "{{ 'prod_dataset.' if target.name!='dev' else target.dataset ~ '.prod_dataset__' }}foreign_table(foreign_key)"

Another option might be a post hooks alterations with alter table statements, but also not ideal. ref support would be ideal but can appreciate that it's a pain to implement.

@ataft
Copy link

ataft commented Feb 13, 2024

@jtcohen6 Given that Snowflake, Redshift and BigQuery use foreign keys to optimize joins, will this issue get re-prioritized? Also, I'll add that downstream tools can use PK/FK to infer table relationships, perhaps bumping the priority further.

@babschlott
Copy link

Any updates on the priority for this? I feel like dbt focus a lot in adding new features but pushes aside the improvement of great features already present...

@graciegoheen graciegoheen added the paper_cut A small change that impacts lots of users in their day-to-day label Jun 21, 2024
@vishaalkk
Copy link

Any updates on this? It defeats the purpose of foreign key constraints as we cannot use them because it seems that dbt is unable to build a correct DAG. I have to run the project a couple of times so that parent tables get built.

@MichelleArk MichelleArk self-assigned this Jul 2, 2024
@EliasAthey
Copy link

EliasAthey commented Jul 10, 2024

+1 for this functionality

I am currently using this workaround successfully. Specifying this in the in-sql config block, in the post_hook argument:

{% if (is_incremental == false) and execute%}
ALTER TABLE {{this}} ADD FOREIGN KEY ('my_root_column') REFERENCES {{ref('my_foreign_model')}} ('my_foreign_column')
{% endif %}

This applies the constraint only when the model is materialized for the first time, avoiding unnecessary runs.

@gskoff
Copy link

gskoff commented Jul 15, 2024

I would definitely appreciate a feature which fixes this open issue. I tried to implement the workaround below but found that my foreign table is built after the table I am defining the foreign key for. So, I had to go create the primary key for the foreign table manually in BigQuery in order for the production DAG to build properly.

- type: foreign_key expression: "{{ 'prod_dataset.' if target.name!='dev' else target.dataset ~ '.prod_dataset__' }}foreign_table(foreign_key)"

vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Jul 22, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its contracy enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Jul 25, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its constraint enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Jul 26, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its constraint enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
@MichelleArk MichelleArk mentioned this issue Jul 26, 2024
5 tasks
vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Aug 2, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its constraint enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Aug 5, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its constraint enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
vperron added a commit to gip-inclusion/data-inclusion that referenced this issue Aug 6, 2024
The 'services' table has an implicit dependency to the 'structures' mart
as its constraint enforces a check on its structure_id key towards that
table.

Make sure DBT knows about it so it generates the structures first.

Referenced by dbt-labs/dbt-core#8062,
might be fixed in dbt-labs/dbt-common#163
last week but:
- not released
- not documented
- not sure the commit will actually help when I read it, needs more
  changes I suppose
@shettynithya
Copy link

Hello Team,

I checked this pull request, it does not state that "ref" are now supported in the "expression" for constraints (foregin_key).

It does not work in the following version, so is it planned to be fixed ?

dbt --version
using legacy validation callback
Core:

  • installed: 1.8.4
  • latest: 1.8.4 - Up to date!

Plugins:

  • databricks: 1.8.4 - Up to date!
  • spark: 1.8.0 - Up to date!

@elyobo
Copy link

elyobo commented Aug 6, 2024

There has not been a release since before that PR was merged; did you check the code from that PR (which does show examples of ref() in a to option - it looks like expression has split into to (which takes the ref) and to_columns (the columns referenced in the to table) and if so did you use the new options, or are you using the older 1.8.4 release which doesn't have this change yet?

@shettynithya
Copy link

@elyobo thanks i tried out using to and to_columns with the 1.8.4 release, i got the following error message

Compilation Error in model customer_interactions (models/silver/customer_interactions.sql)
  No parent table defined for foreign key:  

I will try it later today with the beta 1.9 version today.

@caileyfitzgerald
Copy link

Is this enabled for DBT Cloud as well?

@graciegoheen graciegoheen added this to the v1.9 milestone Aug 28, 2024
@graciegoheen
Copy link
Contributor

This will be available in the 1.9 release of dbt-core!

If you're using dbt Cloud, you can access this early if you're running on versionless.

Documentation will be added soon -> dbt-labs/docs.getdbt.com#5983

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model_contracts multi_project paper_cut A small change that impacts lots of users in their day-to-day
Projects
None yet