Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-53] [Bug] Possible regression - dbt 1.0 doesn't work properly if alias has quotes in them #4581

Closed
1 task done
jeremyyeo opened this issue Jan 18, 2022 · 6 comments
Closed
1 task done
Labels
bug Something isn't working stale Issues that have gone stale

Comments

@jeremyyeo
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

With dbt 1.0, a model that has quotes in it's alias config ('"my_Model_1"') will run into an approximate match error on the next run of the model. dbt 0.21 will happily recreate that model on each run.

Note that project level quoting is unspecified in the dbt_project.yml for either case and this happens with both core and cloud.

Expected Behavior

dbt 1.0 should follow 0.21 behaviour and recreate the model on each run - maybe? Unless we're saying 1.0 is the well behaved pattern. Fyi, the use-case here is that some users have model/alias/table names that they want to only quote some of the time and not all of the time (such as setting the quoting config on the project level) and they have been using this pattern pre-1.0.

Steps To Reproduce

Create a simple model:

-- models/my_model_1.sql
{{ config(alias = '"my_Model_1"') }}

select 1 as user

And a simple dbt_project.yml file:

name: "snowflake"
version: "1.0.0"
config-version: 2
profile: "snowflake"
model-paths: ["models"]
target-path: "target"
clean-targets:
  - "target"
  - "dbt_modules"
  - "dbt_packages"
 
models:
  snowflake:
    +materialized: view

Then run twice:

dbt clean && dbt run
dbt clean && dbt run

Relevant log output

01:46:51.418915 [debug] [Thread-1  ]: Began running node model.snowflake.my_model_1
01:46:51.420406 [info ] [Thread-1  ]: 1 of 1 START view model dbt_jyeo."my_Model_1"................................... [RUN]
01:46:51.422083 [debug] [Thread-1  ]: Acquiring new snowflake connection "model.snowflake.my_model_1"
01:46:51.422523 [debug] [Thread-1  ]: Began compiling node model.snowflake.my_model_1
01:46:51.422864 [debug] [Thread-1  ]: Compiling model.snowflake.my_model_1
01:46:51.426938 [debug] [Thread-1  ]: Writing injected SQL for node "model.snowflake.my_model_1"
01:46:51.428018 [debug] [Thread-1  ]: finished collecting timing info
01:46:51.428321 [debug] [Thread-1  ]: Began executing node model.snowflake.my_model_1
01:46:51.448521 [debug] [Thread-1  ]: finished collecting timing info
01:46:51.449002 [debug] [Thread-1  ]: Compilation Error in model my_model_1 (models/my_model_1.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing 
  which relation to use, dbt will move on. Please delete "DEVELOPMENT"."DBT_JYEO"."my_Model_1", or rename it to be less ambiguous.
  Searched for: DEVELOPMENT.DBT_JYEO."MY_MODEL_1"
  Found: "DEVELOPMENT"."DBT_JYEO"."my_Model_1"
  
  > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
  > called by macro materialization_view_snowflake (macros/materializations/view.sql)
  > called by model my_model_1 (models/my_model_1.sql)
01:46:51.449359 [debug] [Thread-1  ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'adce82d8-4d73-487c-8b9f-f9c1ca6c8e48', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107ae59d0>]}
01:46:51.449707 [error] [Thread-1  ]: 1 of 1 ERROR creating view model dbt_jyeo."my_Model_1".......................... [�[31mERROR�[0m in 0.03s]
01:46:51.450058 [debug] [Thread-1  ]: Finished running node model.snowflake.my_model_1
01:46:51.451409 [debug] [MainThread]: Acquiring new snowflake connection "master"
01:46:51.451882 [info ] [MainThread]: 
01:46:51.452267 [info ] [MainThread]: Finished running 1 view model in 5.60s.
01:46:51.452577 [debug] [MainThread]: Connection 'master' was properly closed.
01:46:51.452795 [debug] [MainThread]: Connection 'model.snowflake.my_model_1' was properly closed.
01:46:51.458308 [info ] [MainThread]: 
01:46:51.458703 [info ] [MainThread]: �[31mCompleted with 1 error and 0 warnings:�[0m
01:46:51.459107 [info ] [MainThread]: 
01:46:51.459646 [error] [MainThread]: �[33mCompilation Error in model my_model_1 (models/my_model_1.sql)�[0m
01:46:51.460105 [error] [MainThread]:   When searching for a relation, dbt found an approximate match. Instead of guessing 
01:46:51.460545 [error] [MainThread]:   which relation to use, dbt will move on. Please delete "DEVELOPMENT"."DBT_JYEO"."my_Model_1", or rename it to be less ambiguous.
01:46:51.461050 [error] [MainThread]:   Searched for: DEVELOPMENT.DBT_JYEO."MY_MODEL_1"
01:46:51.461377 [error] [MainThread]:   Found: "DEVELOPMENT"."DBT_JYEO"."my_Model_1"
01:46:51.461813 [error] [MainThread]:   
01:46:51.462496 [error] [MainThread]:   > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
01:46:51.462891 [error] [MainThread]:   > called by macro materialization_view_snowflake (macros/materializations/view.sql)
01:46:51.463398 [error] [MainThread]:   > called by model my_model_1 (models/my_model_1.sql)
01:46:51.464069 [info ] [MainThread]: 
01:46:51.464997 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
01:46:51.465708 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1079bd190>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107134490>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107a9faf0>]}

Environment

- OS: macOS 12.1 (also tested in dbt Cloud)
- Python: 3.9.7
- dbt: 1.0.1

What database are you using dbt with?

snowflake

Additional Context

dbt 1.0.1

Screen Shot 2022-01-18 at 2 46 58 PM

dbt 0.21.0

Screen Shot 2022-01-18 at 2 45 05 PM

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Jan 18, 2022
@jeremyyeo jeremyyeo changed the title [Bug] Possible regression - dbt 1.0 doesn't work properly if alias has quotes in them [CT-53] [Bug] Possible regression - dbt 1.0 doesn't work properly if alias has quotes in them Jan 18, 2022
@jeremyyeo
Copy link
Contributor Author

Seems relevant: #3835 (comment)

@ChenyuLInx
Copy link
Contributor

Hi @jeremyyeo , thanks for submitting the issue and steps to reproduce! And sorry for the delayed response! I talked with @jtcohen6 this morning about this. With the information we currently gathered, we think we are not going to fix this, but rather prefer users set all quoting config to False as recommended here. This is because a design choice Snowflake made to uppercase unquoted identifiers.

The quoting for the project can't be set for an individual model. However, the quote for sources is more configurable. Since we do understand sometimes the naming convention for those tables is out of the analyst's hand.

Let me know if you believe we definitely need to keep supporting the exact behavior in 0.21 for this case and we can reopen the issue.

@jeremyyeo
Copy link
Contributor Author

Hey @ChenyuLInx, keen to kick this back open again. There are some customers who have legacy reason that their tables have special characters, and thus they have been doing something like this on Snowflake all this while (pre-1.0) without errors:

-- my_table_model.sql
{{ config(alias='"FOO_/BAR123"') }}
select 1 as user_id

Now as soon as they upgrade to dbt 1.0, they are hit with the approximate match error on a subsequent dbt run. Cool so now what's our path forward from here?

We can try removing the quotes from the alias and set the project level quoting.identifier to true:

-- my_table_model.sql
{{ config(alias='FOO_/BAR123') }}
select 1 as user_id
# dbt_project.yml

quoting:
  identifier: true

Looks like this works:

image

Now what happens if they had previously been using a package like our logging package in their project (and so some of those audit logging tables already exists)?

Let's give it a shot:

# packages.yml
packages:
  - package: dbt-labs/logging
    version: 0.6.0
dbt deps
dbt run

We run into our good friend approximate match:

image

This means that now they have to do some tweaking (maybe drop the previous logging table or rename it in Snowflake, or change some of the code in the logging package, etc).

Now, I only tested this toy example (+ 1 package) but I imagine that if we get into incremental materializations and what not it is only going to become more tedious to upgrade.

So for users who cannot escape using special characters in their table names1, I propose we revert the behaviour or it is going to be difficult to figure out a happy path of migration from a pre-1.0 to a post-1.0 world for these users.

Footnotes

  1. Certainly the world will be a better place if everyone just used snake_case erywhere in dbt land - sql file names, alias names, etc and be done with casing 😉

@jeremyyeo jeremyyeo reopened this May 11, 2022
@jtcohen6
Copy link
Contributor

@jeremyyeo I'm very sympathetic to users who need to handle special characters in sources, over which they have no control. But is there really no choice for users who have special characters in model table names, over which they should have full control? It really leads to such a worse experience all around, both for the dbt developers, and for us trying to debug these issues.

I appreciate the complexity of the v1.0 migration, given that we're asking folks to upgrade pretty soon. I could see supporting an optional config (flag / env var) that optionally undoes the change made in #4076, only as a temporary salve while folks work on switching to snake-case model identifiers. Something like:

dbt --unsafe-cache-quote-behavior run
DBT_UNSAFE_CACHE_QUOTE_BEHAVIOR=True dbt run

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 8, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants