You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, dbt-bigquery operates as a "base" (Python) adapter for most low-level operations. This means that standard stuff like creating schemas, dropping schemas, getting columns in relations, etc, are all actually API calls wired through the Python client.
As BigQuery adds more support for SQL, it's preferable to use it where possible, and where there are few trade-offs. Two reasons:
It's clearer for end users to reason about, and read in the logs
It moves create_schema + drop_schema into "user-space" code, allowing for custom reimplementations if desired. This is a capability users have on other databases. It's especially important for create_schema + drop_schema, since these don't pass through SQL/Jinja land at all when called as adapter methods from dbt's Python tasks, namely here
The trade-offs I can imagine:
Performance. Generally, BigQuery's SQL "API" is always slower than its Python client API for the equivalent operation (and occasionally more expensive)
Reliability. get_columns_in_relation can return a contracted SchemaField, which we can turn into dbt BigQueryColumn objects, much more easily than passing query results through agate
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
Prefer SQL for create_schema + drop_schema
[CT-610] Prefer SQL for create_schema + drop_schemaMay 5, 2022
@boxysean@Victoriapm We didn't manage to sneak this in ahead of cutting v1.2.0-rc1, so this won't be in for v1.2. But it will be in for v1.3! We're planning to put out a first beta (v1.3.0-b1) around the same time as v1.2.0 final, at the end of the month.
reopening #30
Today,
dbt-bigquery
operates as a "base" (Python) adapter for most low-level operations. This means that standard stuff like creating schemas, dropping schemas, getting columns in relations, etc, are all actually API calls wired through the Python client.As BigQuery adds more support for SQL, it's preferable to use it where possible, and where there are few trade-offs. Two reasons:
create_schema
+drop_schema
into "user-space" code, allowing for custom reimplementations if desired. This is a capability users have on other databases. It's especially important forcreate_schema
+drop_schema
, since these don't pass through SQL/Jinja land at all when called asadapter
methods from dbt's Python tasks, namely hereThe trade-offs I can imagine:
get_columns_in_relation
can return a contractedSchemaField
, which we can turn into dbtBigQueryColumn
objects, much more easily than passing query results throughagate
The text was updated successfully, but these errors were encountered: