Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support column_name and column_schema in BigQuery #2936

Closed
yu-iskw opened this issue Dec 4, 2020 · 2 comments
Closed

Support column_name and column_schema in BigQuery #2936

yu-iskw opened this issue Dec 4, 2020 · 2 comments
Labels
bigquery enhancement New feature or request

Comments

@yu-iskw
Copy link
Contributor

yu-iskw commented Dec 4, 2020

Describe the feature

According to the official documentation, the CREATE TABLE DDL in BigQuery allows us to define features of columns, such as NOT NULL. It would be great that dbt users can define schemas with {{ config() }}.

{{
  config(
    column_schemas={
      "required_value": "INT64 NOT NULL",
      "nullable_value": "STRING",
      "x": "STRUCT<x1 INT64 NOT NULL, x2 STRING> NOT NULL",
      "y": "ARRAY<INT64>"
    }
  )
}}

SELECT
  1 AS required_value,
  "a" AS nullable_value,
  STRUCT(
    1 AS x1,
    "a" AS x2
  ) AS x,
  [1, 2, 3] AS array_value

Describe alternatives you've considered

As far as I know, there is no alternative in dbt.

Additional context

The feature is specialized to BigQuery.

CREATE TABLE DDL

{CREATE TABLE | CREATE TABLE IF NOT EXISTS | CREATE OR REPLACE TABLE}
[[project_name.]dataset_name.]table_name
[(
  column_name column_schema[, ...]
)]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]

Who will this benefit?

All BigQuery users. Especially, we can reduce BigQuery slots to run not_null by defining columns with NOT NULL.

Are you interested in contributing this feature?

I want to contribute!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 4, 2020

@yu-iskw Thanks for the solid write-up! You're walking onto some well-trodden ground; I'd encourage you to check out some related discussion over in #1438 and #2191. That said, you deserve special commendation for submitting real code :)

Both Snowflake and BigQuery support the inclusion of column specifications (including constraints) within CTAs, so I'm totally in favor of adding this as an (optional) feature for both. (On Postgres, Redshift, and other databases, we would instead need to run create table followed by an insert statement—for more, see #1016.)

Rather than adding column_schemas as a new config, I think we'd want to leverage the resource YAML files where we already specify model properties. Over in #2191 (or even #1570), there are recommendations for adding a property named ddl_specifies_schema or strict—since the materialization will fail if the schema and query do not strictly agree. Following your example above, you would specify something like:

# schema.yml

models:
 - name: my_model
    ddl_specifies_schema: true   # also proposals to call this `strict`
    columns:
      - name: required_value
        data_type: int64
        constraint: not null   # or should this be inferred from a `not_null` test?
      - name: nullable_value
        data_type: string
      - name: x
        data_type: struct<x1 int64 not null, x2 string>
        constraint: not null
      - name: y
        constraint: array<int64>
        description: This is a very interesting column

If the ddl_specifies_schema property is true, then dbt will include the schema as part of its CTA:

create table my_model (
    required_value int64 not null,
    nullable_value string,
    x struct<x1 int64 not null, x2 string> not null,
    y array<int64>
) as (
    SELECT
      1 AS required_value,
      "a" AS nullable_value,
      STRUCT(
        1 AS x1,
        "a" AS x2
      ) AS x,
      [1, 2, 3] AS array_value
)

What do you think?

@jtcohen6
Copy link
Contributor

I'm going to close this, in favor of the more general proposals in the other issues linked above. I'd be happy to re-open and consider implementing within the (now moved) dbt-bigquery plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants