Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate_model_yaml doesn't recognize nested BigQuery columns #27

Closed
1 task done
RBozydar opened this issue Jul 21, 2020 · 9 comments · Fixed by #54
Closed
1 task done

Generate_model_yaml doesn't recognize nested BigQuery columns #27

RBozydar opened this issue Jul 21, 2020 · 9 comments · Fixed by #54
Labels
bug Something isn't working

Comments

@RBozydar
Copy link

Describe the bug

generate_model_yaml doesn't recognize nested bigquery columns

Steps to reproduce

Create a model with nested columns, either through STRUCT() or ARRAY(), for example:

STRUCT(
   source,
   medium,
   source_medium
) analytics

Expected results

The macro should print out the nested columns as well, now it prints out only the "main" column/field.
With the example above we should get
source, medium, source_medium and analytics as columns in the yaml output

Actual results

The macro prints out only the "main" column/field which is analytics

System information

The contents of your packages.yml file:

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.5.0
  - package: fishtown-analytics/codegen
    version: 0.1.0

Which database are you using dbt with?

  • bigquery

The output of dbt --version:

installed version: 0.17.1
   latest version: 0.17.1

The operating system you're using:
Ubuntu 18.04
The output of python --version:
Python 3.8.1

Are you interested in contributing the fix?

I would love to, unfortunately my knowledge of Jinja isn't sufficient so I'd appreciate some pointers

@RBozydar RBozydar added the bug Something isn't working label Jul 21, 2020
@clrcrl
Copy link
Contributor

clrcrl commented Jul 21, 2020

@jtcohen6 — subbing you in as my favorite BigQuery expert!

Do you agree that the expected behavior should be to create one column per key? Or do you think this would be nice as an optional flag?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 21, 2020

The current behavior of the dbt Docs is to include both the top-level field and each of the nested subfields:

Screen Shot 2020-07-21 at 10 01 11 AM

I agree with @RBozydar that I'd expect codegen to create one YML column entry for each of analytics, analytics.source, analytics.medium, and analytics.source_medium.

FWIW I think this is an enhancement, not a bug :) cf dbt-labs/dbt-core#2549

@bodschut
Copy link
Contributor

I experienced this issue myself today, wanting to generate the model yaml for a model with nested bigquery columns. @jtcohen6, since the macro here simply calls the get_columns_in_relation adapter method, we probably need to fix it there?

@jtcohen6
Copy link
Contributor

@bodschut Fair point! Care to open a new issue in the dbt-bigquery repo?

@bodschut
Copy link
Contributor

Actually @jtcohen6 I took a deeper look and you can access nested fields in the result of get_columns_in_relation. I'll try to work on a fix for the dbt-codegen macro to add those to the output.

@bodschut
Copy link
Contributor

Hi All

I have a fix for this issue, basically the generate_model_yaml.sql macro file should look like this:

{% macro generate_column_yaml(column, model_yaml, parent_column_name="") %}
    {% if parent_column_name %}
        {% set column_name = parent_column_name ~ "." ~ column.name %}
    {% else %}
        {% set column_name = column.name %}
    {% endif %}

    {% do model_yaml.append('      - name: ' ~ column_name | lower ) %}
    {% do model_yaml.append('        description: ""') %}
    {% do model_yaml.append('') %}
    {% if column.fields|length > 0 %}
        {% for child_column in column.fields %}
            {% set model_yaml = generate_column_yaml(child_column, model_yaml, parent_column_name=column_name) %}
        {% endfor %}
    {% endif %}
    {% do return(model_yaml) %}
{% endmacro %}

{% macro generate_model_yaml(model_name) %}

{% set model_yaml=[] %}

{% do model_yaml.append('version: 2') %}
{% do model_yaml.append('') %}
{% do model_yaml.append('models:') %}
{% do model_yaml.append('  - name: ' ~ model_name | lower) %}
{% do model_yaml.append('    description: ""') %}
{% do model_yaml.append('    columns:') %}

{% set relation=ref(model_name) %}
{%- set columns = adapter.get_columns_in_relation(relation) -%}

{% for column in columns %}
    {% set model_yaml = generate_column_yaml(column, model_yaml) %}
{% endfor %}

{% if execute %}

    {% set joined = model_yaml | join ('\n') %}
    {{ log(joined, info=True) }}
    {% do return(joined) %}

{% endif %}

{% endmacro %}

Essentially, I've split out the logic to generate the yaml for a column and used it recursively. I've tested it on a model with multiple nesting levels and it works. I used the output to add descriptions on nested columns and they are applied perfectly in bigquery.

@bodschut
Copy link
Contributor

bodschut commented Feb 2, 2022

@jtcohen6 any chance we can incorporate this? :-)

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 2, 2022

@bodschut Nice work! It sounds like we can accomplish the fix with a change to code in this repo only. Would you be up to open a PR? :)

@bodschut
Copy link
Contributor

bodschut commented Feb 2, 2022

@jtcohen6 pull request is created. Can you help me to see if there are extra integration tests needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants