Skip to content

Commit

Permalink
Add validate_sql method to base adapter with implementation for SQLAd…
Browse files Browse the repository at this point in the history
…apters (#8001)

* Add dry_run method to base adapter with implementation for SQLAdapters

resolves #7839

In the CLI integration, MetricFlow will issue dry run queries as
part of its warehouse-level validation of the semantic manifest,
including all semantic model and metric definitions.

In most cases, issuing an `explain` query is adequate, however,
BigQuery does not support the `explain` keyword and so we cannot
simply pre-pend `explain` to our input queries and expect the
correct behavior across all contexts.

This commit adds a dry_run() method to the BaseAdapter which mirrors
the execute() method in that it simply delegates to the ConnectionManager.
It also adds a working implementation to the SQLConnectionManager and
includes a few test cases for adapter maintainers to try out on their own.

The current implementation should work out of the box with most
of our adapters. BigQuery will require us to implement the dry_run
method on the BigQueryConnectionManager, and community-maintained
adapters can opt in by enabling the test and ensuring their own
implementations work as expected.

Note - we decided to make these concrete methods that throw runtime
exceptions for direct descendants of BaseAdapter in order to avoid
forcing community adapter maintainers to implement a method that does
not currently have any use cases in dbt proper.

* Switch dry_run implementation to be macro-based

The common pattern for engine-specific SQL statement construction
in dbt is to provide a default macro which can then be overridden
on a per-adapter basis by either adapter maintainers or end users.
The advantage of this is users can take advantage of alternative
SQL syntax for performance or other reasons, or even to enable
local usage if an engine relies on a non-standard expression and
the adapter maintainer has not updated the package.

Although there are some risks here they are minimal, and the benefit
of added expressiveness and consistency with other similar constructs
is clear, so we adopt this approach here.

* Improve error message for InvalidConnectionError in test_invalid_dry_run.

* Rename dry_run to validate_sql

The validate_sql name has less chance of colliding with dbt's
command nomenclature, both now and in some future where we have
dry-run operations.

* Rename macro and test files to validate_sql

* Fix changelog entry
  • Loading branch information
tlento authored Jul 11, 2023
1 parent 07c3dcd commit 4ffd633
Show file tree
Hide file tree
Showing 6 changed files with 132 additions and 5 deletions.
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20230629-175712.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: Add validate_sql method to BaseAdapter with implementation for SQLAdapter
time: 2023-06-29T17:57:12.599313-07:00
custom:
Author: tlento
Issue: "7839"
14 changes: 11 additions & 3 deletions core/dbt/adapters/base/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,17 @@ def execute(
"""
return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)

def validate_sql(self, sql: str) -> AdapterResponse:
"""Submit the given SQL to the engine for validation, but not execution.
This should throw an appropriate exception if the input SQL is invalid, although
in practice that will generally be handled by delegating to an existing method
for execution and allowing the error handler to take care of the rest.
:param str sql: The sql to validate
"""
raise NotImplementedError("`validate_sql` is not implemented for this adapter!")

@available.parse(lambda *a, **k: [])
def get_column_schema_from_query(self, sql: str) -> List[BaseColumn]:
"""Get a list of the Columns with names and data types from the given sql."""
Expand Down Expand Up @@ -785,7 +796,6 @@ def _make_match(
schema: str,
identifier: str,
) -> List[BaseRelation]:

matches = []

search = self._make_match_kwargs(database, schema, identifier)
Expand Down Expand Up @@ -1063,7 +1073,6 @@ def _get_one_catalog(
schemas: Set[str],
manifest: Manifest,
) -> agate.Table:

kwargs = {"information_schema": information_schema, "schemas": schemas}
table = self.execute_macro(
GET_CATALOG_MACRO_NAME,
Expand Down Expand Up @@ -1453,7 +1462,6 @@ def render_model_constraint(cls, constraint: ModelLevelConstraint) -> Optional[s
def catch_as_completed(
futures, # typing: List[Future[agate.Table]]
) -> Tuple[agate.Table, List[Exception]]:

# catalogs: agate.Table = agate.Table(rows=[])
tables: List[agate.Table] = []
exceptions: List[Exception] = []
Expand Down
1 change: 0 additions & 1 deletion core/dbt/adapters/sql/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ def add_query(
bindings: Optional[Any] = None,
abridge_sql_log: bool = False,
) -> Tuple[Connection, Any]:

connection = self.get_thread_connection()
if auto_begin and connection.transaction_open is False:
self.begin()
Expand Down
31 changes: 30 additions & 1 deletion core/dbt/adapters/sql/impl.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import agate
from typing import Any, Optional, Tuple, Type, List

from dbt.contracts.connection import Connection
from dbt.contracts.connection import Connection, AdapterResponse
from dbt.exceptions import RelationTypeNullError
from dbt.adapters.base import BaseAdapter, available
from dbt.adapters.cache import _make_ref_key_dict
Expand All @@ -22,6 +22,7 @@
TRUNCATE_RELATION_MACRO_NAME = "truncate_relation"
DROP_RELATION_MACRO_NAME = "drop_relation"
ALTER_COLUMN_TYPE_MACRO_NAME = "alter_column_type"
VALIDATE_SQL_MACRO_NAME = "validate_sql"


class SQLAdapter(BaseAdapter):
Expand Down Expand Up @@ -218,6 +219,34 @@ def check_schema_exists(self, database: str, schema: str) -> bool:
results = self.execute_macro(CHECK_SCHEMA_EXISTS_MACRO_NAME, kwargs=kwargs)
return results[0][0] > 0

def validate_sql(self, sql: str) -> AdapterResponse:
"""Submit the given SQL to the engine for validation, but not execution.
By default we simply prefix the query with the explain keyword and allow the
exceptions thrown by the underlying engine on invalid SQL inputs to bubble up
to the exception handler. For adjustments to the explain statement - such as
for adapters that have different mechanisms for hinting at query validation
or dry-run - callers may be able to override the validate_sql_query macro with
the addition of an <adapter>__validate_sql implementation.
:param sql str: The sql to validate
"""
kwargs = {
"sql": sql,
}
result = self.execute_macro(VALIDATE_SQL_MACRO_NAME, kwargs=kwargs)
# The statement macro always returns an AdapterResponse in the output AttrDict's
# `response` property, and we preserve the full payload in case we want to
# return fetched output for engines where explain plans are emitted as columnar
# results. Any macro override that deviates from this behavior may encounter an
# assertion error in the runtime.
adapter_response = result.response # type: ignore[attr-defined]
assert isinstance(adapter_response, AdapterResponse), (
f"Expected AdapterResponse from validate_sql macro execution, "
f"got {type(adapter_response)}."
)
return adapter_response

# This is for use in the test suite
def run_sql_for_tests(self, sql, fetch, conn):
cursor = conn.handle.cursor()
Expand Down
10 changes: 10 additions & 0 deletions core/dbt/include/global_project/macros/adapters/validate_sql.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{% macro validate_sql(sql) -%}
{{ return(adapter.dispatch('validate_sql', 'dbt')(sql)) }}
{% endmacro %}

{% macro default__validate_sql(sql) -%}
{% call statement('validate_sql') -%}
explain {{ sql }}
{% endcall %}
{{ return(load_result('validate_sql')) }}
{% endmacro %}
75 changes: 75 additions & 0 deletions tests/adapter/dbt/tests/adapter/utils/test_validate_sql.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from typing import Type

import pytest

from dbt.adapters.base.impl import BaseAdapter
from dbt.exceptions import DbtRuntimeError, InvalidConnectionError


class BaseDryRunMethod:
"""Tests the behavior of the dry run method for the relevant adapters.
The valid and invalid SQL should work with most engines by default, but
both inputs can be overridden as needed for a given engine to get the correct
behavior.
The base method is meant to throw the appropriate custom exception when dry_run
fails.
"""

@pytest.fixture(scope="class")
def valid_sql(self) -> str:
"""Returns a valid statement for issuing as a dry run query.
Ideally this would be checkable for non-execution. For example, we could use a
CREATE TABLE statement with an assertion that no table was created. However,
for most adapter types this is unnecessary - the EXPLAIN keyword has exactly the
behavior we want, and here we are essentially testing to make sure it is
supported. As such, we return a simple SELECT query, and leave it to
engine-specific test overrides to specify more detailed behavior as appropriate.
"""

return "select 1"

@pytest.fixture(scope="class")
def invalid_sql(self) -> str:
"""Returns an invalid statement for issuing a bad dry run query."""

return "Let's run some invalid SQL and see if we get an error!"

@pytest.fixture(scope="class")
def expected_exception(self) -> Type[Exception]:
"""Returns the Exception type thrown by a failed query.
Defaults to dbt.exceptions.DbtRuntimeError because that is the most common
base exception for adapters to throw."""
return DbtRuntimeError

def test_valid_dry_run(self, adapter: BaseAdapter, valid_sql: str) -> None:
"""Executes a dry run query on valid SQL. No news is good news."""
with adapter.connection_named("test_valid_sql_validation"):
adapter.validate_sql(valid_sql)

def test_invalid_dry_run(
self,
adapter: BaseAdapter,
invalid_sql: str,
expected_exception: Type[Exception],
) -> None:
"""Executes a dry run query on invalid SQL, expecting the exception."""
with pytest.raises(expected_exception=expected_exception) as excinfo:
with adapter.connection_named("test_invalid_sql_validation"):
adapter.validate_sql(invalid_sql)

# InvalidConnectionError is a subclass of DbtRuntimeError, so we have to handle
# it separately.
if excinfo.type == InvalidConnectionError:
raise ValueError(
"Unexpected InvalidConnectionError. This typically indicates a problem "
"with the test setup, rather than the expected error for an invalid "
"validate_sql query."
) from excinfo.value


class TestDryRunMethod(BaseDryRunMethod):
pass

0 comments on commit 4ffd633

Please sign in to comment.