-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dbt postgres adapter wrapper for integration tests #614
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
snyk failures are expected due to pyproject.toml changes |
a4ac935
to
85cd7fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, clicked approve a little too soon. Taking a look still.
if pd.isnull(cell): | ||
# use null keyword instead of isNA/None/etc. | ||
cells.append("null") | ||
elif type(cell) in [str, pd.Timestamp]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally works, but I vaguely recall running into type errors with one SQL engine where specifying a string when a time is expected threw an error. Cross that bridge if / when we get there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigQuery almost certainly has this issue.
logger.info(f"Finished running the dry_run in {stop - start:.2f}s") | ||
return | ||
|
||
def create_table_from_dataframe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is straightforward, but yet I have a suspicion that we might run into edge cases. Just a comment, no action needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is temporary. It'll be a nightmare to maintain as we extend across dialects which is why I'm hoping to move to dbt managing dataset creation ASAP.
@@ -40,6 +41,8 @@ def test_query(sql_client: SqlClient) -> None: # noqa: D | |||
|
|||
def test_query_with_execution_params(sql_client: SqlClient) -> None: | |||
"""Test querying with execution parameters of all supported datatypes.""" | |||
if isinstance(sql_client, AdapterBackedSqlClient): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this this check / skip logic into a function might be nice so that we can use the IDE tooling to update this later when support is added. Also, reduces isinstance
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once all of our production engines are cut over to adapters I'll either remove these tests or keep them solely for a DuckDB "example" implementation of a SQLClient. I don't think the adapter-backed client will support bind parameters any time soon.
If we don't have one already we should add a test to make sure bind parameters are preserved and propagated through the rendering layers, though. I know they are, generally, but I don't remember what kind of testing we have around that behavior.
[tool.hatch.envs.postgres-env] | ||
description = "Dev environment for working with Postgres adapter" | ||
pre-install-commands = [ | ||
"pip install dbt-postgres", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for installing via pip instead of specifying it as a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could've sworn I had a comment about this, but clearly there isn't one. I'll add one before merging.
We don't depend on dbt-postgres, or any specific adapter, anywhere in the codebase, so we don't want to force an install in the normal dependency list. We could make it an env-specific dependency but then we're bound to a version range, which means keeping the version ranges in sync between here and where we specify dbt-core, and dealing with things like dbt-semantic-interfaces version requirements.
This way we get the relevant adapter installed in the environment, the way an end user would, and pip takes care of the rest as best it's able.
Once we have all of the packaging laid out with dbt-metricflow we'll likely update this to use an editable dependency on the local package. The pre-install is apparently the only way hatch supports editable dependency installations, so that'll have to be here anyway, but that should allow us to work around dbt-semantic-interfaces version update conflicts in most circumstances.
This helper function was tagged as only being used in tests, but it was sitting in the main package. This moves it to the test package to avoid confusion as we restructure around dbt adapter integrations.
As part of the effort to provide basic metric querying capabilities for local installations of dbt core, MetricFlow is removing all engine-specific dependencies and instead delegating warehouse connection and query management to dbt adapters for integration tests and command line operations. This commit is the first step towards moving onto dbt adapters. It provides a new wrapper class for dbt adapters, conforming to MetricFlow's SqlClient protocol, and delegating all operations down to the underlying adapter. It uses this adapter in test cases only, and sources it from a dummy project in the test fixtures directory. The dummy project is configured to allow for different adapters (configured through test runner environment variables), and in future it will likely be used as the basis for MetricFlow's core integration test table management. Note the implementation of the SqlClient wrapper class has some highly temporary work-arounds to certain rough edges that exist between dbt's adapter interface and the current SqlClient requirements. Most of these arise from table and schema management calls, which are only used in the MetricFlow CLI tutorial and integration test suite. As such, we expect those methods to be removed from the SqlClient interface as we move management of these datasets to dbt. Further, the dbt adapter fetching is a bit suspect, and not to be copied - we are relying on the dbtRunner()'s state loading to remain valid as we access certain internal APIs from dbt core to get the profile we need. All of this works, and we expect it to continue working, but we will need to maintain this connection and keep it up to date as the dbt core project updates its external accessors for the dbt profile and dbt project which ultimately initialize and house the adapter instance.
The switch to dbt adapters breaks the Postgres test run for generating snapshots. Since Postgres runs so fast we simply run all tests via the relevant hatch env command.
The dbt adapter-backed SqlClient needs to skip some tests and depends on some custom package installations. This commit updates the logic for the former to depend only on DuckDB, which is not currently planned for migration to adapters. We also add a comment explaining the installation configuration in the pyproject.toml.
85cd7fa
to
cfab7c7
Compare
As part of the effort to provide basic metric querying capabilities
for local installations of dbt core, MetricFlow is removing all
engine-specific dependencies and instead delegating warehouse
connection and query management to dbt adapters for integration
tests and command line operations.
This commit is the first step towards moving onto dbt adapters. It
provides a new wrapper class for dbt adapters, conforming to
MetricFlow's SqlClient protocol, and delegating all operations
down to the underlying adapter. It uses this adapter in test cases
only, and sources it from a dummy project in the test fixtures
directory. The dummy project is configured to allow for different
adapters (configured through test runner environment variables), and
in future it will likely be used as the basis for MetricFlow's
core integration test table management.
Note the implementation of the SqlClient wrapper class has some
highly temporary work-arounds to certain rough edges that exist
between dbt's adapter interface and the current SqlClient requirements.
Most of these arise from table and schema management calls, which are
only used in the MetricFlow CLI tutorial and integration test suite.
As such, we expect those methods to be removed from the SqlClient
interface as we move management of these datasets to dbt.
Further, the dbt adapter fetching is a bit suspect, and not to be
copied - we are relying on the dbtRunner()'s state loading to remain
valid as we access certain internal APIs from dbt core to get the
profile we need. All of this works, and we expect it to continue
working, but we will need to maintain this connection and keep it
up to date as the dbt core project updates its external accessors for
the dbt profile and dbt project which ultimately initialize and house
the adapter instance.