Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist view column comments #866

Closed
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20230817-130731.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: Persist Column level comments when creating views
time: 2023-08-17T13:07:31.6812862Z
custom:
Author: jurasan
Issue: CT-764
34 changes: 34 additions & 0 deletions dbt/include/spark/macros/adapters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -229,9 +229,43 @@
{% endfor %}
{% endmacro %}

{% macro get_matched_column(column_name, column_dict) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark columns are case sensitive, I would not do the upper/lower matching

{% if (column_name|upper in column_dict) -%}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% set matched_column = column_name|upper -%}
{% elif (column_name|lower in column_dict) -%}
{% set matched_column = column_name|lower -%}
{% elif (column_name in column_dict) -%}
{% set matched_column = column_name -%}
{% else -%}
{% set matched_column = None -%}
{% endif -%}
{{ return(matched_column)}}
{% endmacro %}

{% macro get_column_comment_sql(column_name, column_dict) -%}
{% set matched_column = get_matched_column(column_name, column_dict) -%}
{% if matched_column and column_dict[matched_column]["description"] -%}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% set column_comment_clause = "comment '" ~ column_dict[matched_column]["description"] ~ "'" %}
{%- endif -%}
{{ adapter.quote(column_name) }} {{ column_comment_clause }}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% endmacro %}

{% macro get_persist_docs_column_list(model_columns, query_columns) %}
{% for column_name in query_columns %}
{{ get_column_comment_sql(column_name, model_columns) }}
{{- ", " if not loop.last else "" }}
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
{% endfor %}
{% endmacro %}

{% macro spark__create_view_as(relation, sql) -%}
create or replace view {{ relation }}
{% if config.persist_column_docs() -%}
{% set model_columns = model.columns %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the model columns coming from?

{% set query_columns = get_columns_in_query(sql) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the SQL of the view executed when creating the view with persist_docs.columns = true?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issuing the get_columns_in_query query against one of the largest table internally at Databricks returned in 148 ms. It is sufficiently performant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the time, does it execute the SQL to get the columns?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It executes a limit 0.

(
{{ get_persist_docs_column_list(model_columns, query_columns) }}
)
{% endif %}
{{ comment_clause() }}
{%- set contract_config = config.get('contract') -%}
{%- if contract_config.enforced -%}
Expand Down