Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt Constraints / model contracts #341

Merged
merged 17 commits into from
Feb 16, 2023
Merged

dbt Constraints / model contracts #341

merged 17 commits into from
Feb 16, 2023

Conversation

b-per
Copy link
Contributor

@b-per b-per commented Dec 5, 2022

Linked to the dbt-core issues dbt-labs/dbt-core#6079 and dbt-labs/dbt-core#6079

These changes, along with the dbt-Core ones, allow defining column types and constraints on columns in Snowflake.

Description

  • Supports setting the 4 Snowflake constraints described in the docs
    • unique, not null, primary key and foreign key
    • ⚠️ It is important to note that only the not null (and the not null property of primary key) are actually checked today in Snowflake. There rest of the constraints are purely metadata ones, not verified when inserting data
  • While the dbt Core implementation supports checks, I don't think that those are supported in Snowflake today
  • With this implementation:
    • all the columns need to be listed in the YML file, in the correct order, and each of them require a correct data type
    • in case that the data type is missing or incorrect, the CTA fails

Related Core/Adapter Pull Requests

Must be reviewed with passing tests

Example of behavior

The following jaffle_shop example

models:
  - name: customers
    description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
    config:
      constraints_enabled: true

    columns:
      - name: customer_id
        description: This is a unique identifier for a customer
        data_type: number(38,0)
        constraint: 
          - unique
          - not null
        tests:
          - unique
          - not_null

      - name: first_name
        data_type: varchar(100)
        description: customer's first name. pii.

      - name: last_name
        data_type: varchar(5)
        description: customer's last name. pii.

      - name: first_order
        description: date (utc) of a customer's first order
        data_type: date

      - name: most_recent_order
        description: date (utc) of a customer's most recent order
        data_type: date

      - name: number_of_orders
        description: count of the number of orders a customer has placed
        data_type: number(18,0)

      - name: total_order_amount
        description: total value (aud) of a customer's orders
        data_type: number(18,0)

generates the following SQL:

        create or replace transient table <DB>.<SCHEMA>.customers 
        
  
  (
    
      customer_id number(38,0)  unique  not null   ,
      first_name varchar(100)   ,
      last_name varchar(5)   ,
      first_order date   ,
      most_recent_order date   ,
      number_of_orders number(18,0)   ,
      total_order_amount number(18,0)   
  )
  

         as
        (with customers as (

    select * from AD_HOC.dbt_bperigaud.stg_customers

),
...

Checklist

@cla-bot cla-bot bot added the cla:yes label Dec 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2022

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide.

@sungchun12 sungchun12 self-requested a review December 5, 2022 15:25
@sungchun12
Copy link
Contributor

I should have communicated better that my PR #6271 actually uses the below config.

version: 2

models:
  - name: constraints_example
    docs:
      node_color: black
    config:
      constraints_enabled: true
    columns:
      - name: id
        data_type: integer
        description: hello
        constraints: ['not null','primary key']
        check: id > 0
        tests:
          - unique
      - name: color
        data_type: string
      - name: date_day
        data_type: date

{% set constraints = col['constraint'] -%}
{%- set checks = col['checks'] -%}
{%- if checks -%}
{{exceptions.warn("We noticed you have `checks` in your configs, these are NOT compatible with Snowflake and will be ignored")}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me a screenshot of the warning message working in your terminal? I couldn't get it to pop up on mine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I got it to work by copying your code to my branch. I'll figure out the pip installs later! Looks good @jtcohen6 what's your opinion on how the warning should look and where in the logs it should appear?

@sungchun12
Copy link
Contributor

@joshuataylor keep your eyes on this when it's ready for local testing on your machine!

@b-per b-per marked this pull request as ready for review December 19, 2022 17:35
@sungchun12
Copy link
Contributor

@joshuataylor try to test this out now!

"""


class TestMaterializedWithConstraints:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want the DDL test? I feel like any other change to create_table_as() would then require updating the tests for something unrelated.

I actually wanted to avoid a test checking the DDL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on an internal slack thread, you don't need the DDL test, but the rollback test remains to be needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can create a rollback test. Because this PR leverages Snowflake's ability to set constraints as part of the CTAS statement I think that this use case is already covered by the existing CTAS tests but it is not too difficult for me to add another one here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks great!

@joshuataylor
Copy link
Contributor

@joshuataylor try to test this out now!

Sorry I've had family issues over the past week or so, so haven't had a chance to check this out yet.

I'm planning on checking this out over the next few days, but obviously with being the holiday season I won't expect an immediate reply on feedback 🚀

{%- endfor %}
)
{%- if ns.at_least_one_check -%}
{{exceptions.warn("We noticed you have `check` configs, these are NOT compatible with Snowflake and will be ignored")}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to https://docs.snowflake.com/en/sql-reference/constraints-overview.html ?

Maybe the wording can be similar to what they have?

Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced.

@@ -0,0 +1,18 @@
{% macro snowflake__get_columns_spec_ddl() %}
{# loop through user_provided_columns to create DDL with data types and constraints #}
{% if config.get('constraints_enabled', False) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, nice work!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshuataylor
Copy link
Contributor

joshuataylor commented Dec 27, 2022

This PR brings me such joy 😍.

2023 is going to be the year we stop having to cast everything in SQL (so it doesn't randomly change 😬 ) and it'll work on a YAML level. 🎉🎉🎉

{% for i in user_provided_columns -%}
{%- set col = user_provided_columns[i] -%}
{% set constraints = col['constraints'] -%}
{%- set ns.at_least_one_check = ns.at_least_one_check or col['check'] %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{%- set ns.at_least_one_check = ns.at_least_one_check or col['check'] %}
{%- set ns.at_least_one_check = ns.at_least_one_check or col['constraints_check'] %}

Do a find and replace on all check configs in all files to be constraints_check by name.

@b-per
Copy link
Contributor Author

b-per commented Jan 17, 2023

Tests are failing due to Snowflake columns being internally stored in uppercase.
I raise a comment on the dbt-core PR to know if we handle this on the Core side or adapter side.

@Fleid Fleid added the tech_debt label Feb 1, 2023
@Fleid
Copy link
Contributor

Fleid commented Feb 1, 2023

I know it's not tech_debt but I'm temporarily using the label to sort through the backlog. Don't get offended please ;)

@dbeatty10 dbeatty10 changed the title Add support for the constraint config in Snowflake dbt Constraints / model contracts Feb 14, 2023
@jtcohen6 jtcohen6 merged commit 02c82a5 into main Feb 16, 2023
@jtcohen6 jtcohen6 deleted the dbt-constraints-snowflake branch February 16, 2023 11:01
@sungchun12
Copy link
Contributor

@joshuataylor It's official. You can give this a rougher test drive now :)

@mikealfare mikealfare restored the dbt-constraints-snowflake branch February 17, 2023 01:29
@mikealfare mikealfare deleted the dbt-constraints-snowflake branch February 17, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

[CT-1676] [Feature] Support constraints in Snowflake following the dbt Core implementation
6 participants