Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist view column comments #866

Closed

Conversation

jurasan
Copy link
Contributor

@jurasan jurasan commented Aug 9, 2023

resolves #372

Problem

View column description is not persisted in view schema.

Solution

View column description is persisted as a column level comment.
Used this dbt-snowflake commit as an example.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

I was not able to run tests on it. Here is the PR with the test, but in tests it cannot call dbt-core macros.

@internalcode
    def _fail_with_undefined_error(
        self, *args: t.Any, **kwargs: t.Any
    ) -> "te.NoReturn":
        """Raise an :exc:`UndefinedError` when operations are performed
        on the undefined value.
        """
>       raise self._undefined_exception(self._undefined_message)
E       jinja2.exceptions.UndefinedError: 'get_columns_in_query' is undefined

../../miniconda3/lib/python3.10/site-packages/jinja2/runtime.py:852: UndefinedError

@jurasan jurasan requested a review from a team as a code owner August 9, 2023 20:56
@cla-bot
Copy link

cla-bot bot commented Aug 9, 2023

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @jurasan

@jurasan
Copy link
Contributor Author

jurasan commented Aug 10, 2023

@cla-bot check:

@cla-bot cla-bot bot added the cla:yes label Aug 10, 2023
@cla-bot
Copy link

cla-bot bot commented Aug 10, 2023

The cla-bot has been summoned, and re-checked this pull request!

@colin-rogers-dbt
Copy link
Contributor

@jurasan can you add a changelog summarizing this change?

@mikealfare
Copy link
Contributor

mikealfare commented Aug 10, 2023

@jurasan The failing code check is for whitespace in dbt/include/spark/macros/adapters.sql. You can either fix it manually or run pre-commit run --all-files to have it fix it for you.

Also, I don't understand why the test in your linked PR wouldn't be able to run in this PR. We regularly call dbt-core macros. Could you talk more about that?

@Fleid
Copy link
Contributor

Fleid commented Aug 10, 2023

@JCZuurmond for information only

@jurasan
Copy link
Contributor Author

jurasan commented Aug 14, 2023

@mikealfare

@jurasan The failing code check is for whitespace in dbt/include/spark/macros/adapters.sql. You can either fix it manually or run pre-commit run --all-files to have it fix it for you.

fixed this

Also, I don't understand why the test in your linked PR wouldn't be able to run in this PR. We regularly call dbt-core macros. Could you talk more about that?

Yes, we do call dbt-core macros, but If you look at the test_macros.py I don't think those are tested.
Maybe it's because of the way those macros are called in __run_macro method?

dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Show resolved Hide resolved
dbt/include/spark/macros/adapters.sql Outdated Show resolved Hide resolved
@mikealfare mikealfare self-assigned this Aug 14, 2023
@jurasan jurasan closed this Aug 17, 2023
@jurasan jurasan reopened this Aug 17, 2023
@colin-rogers-dbt colin-rogers-dbt enabled auto-merge (squash) August 17, 2023 22:04
@colin-rogers-dbt
Copy link
Contributor

Set this to auto-merge, looks like all tests are passing. @jurasan can you update your branch with the latest from main?

Copy link
Collaborator

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jurasan : Thank you for your contribution.

If I understand correctly, the code does roughly the following:
If the persist_docs.columns flag is set to true, then comments are added to the views by matching the columns outputted by the view query with the columns in the matching model definition (yml file).

I have some doubts about executing the SQL when creating a view. A benefit of views is that they are computational cheap to create. It would be an unexpected side effect to me of setting persist_docs.columns to true that the view SQL is executed when creating a view.


{% macro spark__create_view_as(relation, sql) -%}
create or replace view {{ relation }}
{% if config.persist_column_docs() -%}
{% set model_columns = model.columns %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the model columns coming from?


{% macro spark__create_view_as(relation, sql) -%}
create or replace view {{ relation }}
{% if config.persist_column_docs() -%}
{% set model_columns = model.columns %}
{% set query_columns = get_columns_in_query(sql) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the SQL of the view executed when creating the view with persist_docs.columns = true?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issuing the get_columns_in_query query against one of the largest table internally at Databricks returned in 148 ms. It is sufficiently performant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the time, does it execute the SQL to get the columns?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It executes a limit 0.

@@ -229,9 +229,43 @@
{% endfor %}
{% endmacro %}

{% macro get_matched_column(column_name, column_dict) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark columns are case sensitive, I would not do the upper/lower matching

@@ -229,9 +229,29 @@
{% endfor %}
{% endmacro %}

{% macro get_column_comment_sql(column_name, column_dict) -%}
{% if column_name in column_dict and column_dict[column_name]["description"] -%}
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter with

| replace("'", "\\'")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what you mean? could TLDR's the filter's purpose for me?

Suggested change
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %}
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" | replace("'", "\\'")%}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to escape single-quotes to allow them to be used in the comments.

Copy link

@benc-db benc-db Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need it to operate only on the description, I think here you have it operating either on the whole string or on the final ' (I'm not the best at Jinja). In my local copy I moved the extraction and filtering to separate 'set' statement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% set escaped_description = column_dict[column_name]["description"] | replace("'", "\\'") %}
{% set column_comment_clause = "comment '" ~ escaped_description ~ "'" %}

@colin-rogers-dbt
Copy link
Contributor

resolved in #893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-764] Persist Column level comments when creating views
7 participants