Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancements for generate_source to include source/table descriptions and source name #64

Closed
kbrock91 opened this issue Jun 7, 2022 · 1 comment · Fixed by #66
Closed
Labels
enhancement New feature or request

Comments

@kbrock91
Copy link
Contributor

kbrock91 commented Jun 7, 2022

Describe the feature

Currently, generate_source has an option to set include_descriptions = True, but this parameters only includes descriptions at the column level. Ideally, description placeholders would also be generated for the source and tables as well. Additionally, the required parameter for generate_source macro is the schema name, but there is no option to input a name value. It is possible that a user would like to name their source a different name from the schema name.

Describe alternatives you've considered

I can manually update the yaml that is generated from the current generate_source macro, but this is time consuming and prone to yaml formatting issues.

Additional context

I have working code in my own dbt project that I believe solves for both the descriptions at the source/table level, as well as the source name <> schema name. See below.

{% macro get_tables_in_schema(schema_name, database_name=target.database, table_pattern='%', exclude='') %}
    
    {% set tables=dbt_utils.get_relations_by_pattern(
        schema_pattern=schema_name,
        database=database_name,
        table_pattern=table_pattern,
        exclude=exclude
    ) %}

    {% set table_list= tables | map(attribute='identifier') %}

    {{ return(table_list | sort) }}

{% endmacro %}


---
{% macro generate_source(schema_name, name = schema_name, database_name=target.database, generate_columns=False, include_descriptions=False, table_pattern='%', exclude='') %}

{% set sources_yaml=[] %}
{% do sources_yaml.append('version: 2') %}
{% do sources_yaml.append('') %}
{% do sources_yaml.append('sources:') %}
{% do sources_yaml.append('  - name: ' ~ name | lower) %}

{% if include_descriptions %}
    {% do sources_yaml.append('    description: ""' ) %}
{% endif %}

{% if database_name != target.database %}
{% do sources_yaml.append('    database: ' ~ database_name | lower) %}
{% endif %}

{% if schema_name != name %}
{% do sources_yaml.append('    schema: ' ~ schema_name | lower) %}
{% endif %}

{% do sources_yaml.append('    tables:') %}

{% set tables=codegen.get_tables_in_schema(schema_name, database_name, table_pattern, exclude) %}

{% for table in tables %}
    {% do sources_yaml.append('      - name: ' ~ table | lower ) %}
    {% if include_descriptions %}
        {% do sources_yaml.append('        description: ""' ) %}
    {% endif %}
    {% if generate_columns %}
    {% do sources_yaml.append('        columns:') %}

        {% set table_relation=api.Relation.create(
            database=database_name,
            schema=schema_name,
            identifier=table
        ) %}

        {% set columns=adapter.get_columns_in_relation(table_relation) %}

        {% for column in columns %}
            {% do sources_yaml.append('          - name: ' ~ column.name | lower ) %}
            {% if include_descriptions %}
                {% do sources_yaml.append('            description: ""' ) %}
            {% endif %}
        {% endfor %}
            {% do sources_yaml.append('') %}

    {% endif %}

{% endfor %}

{% if execute %}

    {% set joined = sources_yaml | join ('\n') %}
    {{ log(joined, info=True) }}
    {% do return(joined) %}

{% endif %}

{% endmacro %}

a user would then be able to run something like this in the cloud IDE to generate a more comprehensive source yaml:
{{ codegen.generate_source('tpch_sf001', name = 'tpch', database_name = 'raw', generate_columns = True, include_descriptions = True) }}

Who will this benefit?

This will benefit anyone setting up new sources for the first time in their dbt project and encourage those users to input descriptions at the source and table levels, improving their documentation. It will also eliminate confusion when a user provides the include_descriptions = True parameter without the generate_columns = True.

Currently, the following command:
{{ codegen.generate_source('tpch_sf001', database_name = 'raw', include_descriptions = True) }}
generates a yaml with no descriptions at all:

version: 2

sources:
  - name: tpch_sf001
    database: raw
    tables:
      - name: customer
      - name: lineitem
      - name: nation
      - name: orders
      - name: part
      - name: partsupp
      - name: region
      - name: supplier

As a user, i would expect this to still generate descriptions at the name/source and table level.

Are you interested in contributing this feature?

Yes, I would love to contribute to this feature! I have some working code locally, but would appreciate a hand getting this into the dbt-codegen repo the right way!

@kbrock91 kbrock91 added the enhancement New feature or request label Jun 7, 2022
@kbrock91
Copy link
Contributor Author

kbrock91 commented Jun 7, 2022

@dbeatty10 wanted to get your eyes on this!

@kbrock91 kbrock91 linked a pull request Jun 10, 2022 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant