Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dbt postgres adapter wrapper for integration tests #614

Merged
merged 4 commits into from
Jun 23, 2023

Conversation

tlento
Copy link
Contributor

@tlento tlento commented Jun 22, 2023

As part of the effort to provide basic metric querying capabilities
for local installations of dbt core, MetricFlow is removing all
engine-specific dependencies and instead delegating warehouse
connection and query management to dbt adapters for integration
tests and command line operations.

This commit is the first step towards moving onto dbt adapters. It
provides a new wrapper class for dbt adapters, conforming to
MetricFlow's SqlClient protocol, and delegating all operations
down to the underlying adapter. It uses this adapter in test cases
only, and sources it from a dummy project in the test fixtures
directory. The dummy project is configured to allow for different
adapters (configured through test runner environment variables), and
in future it will likely be used as the basis for MetricFlow's
core integration test table management.

Note the implementation of the SqlClient wrapper class has some
highly temporary work-arounds to certain rough edges that exist
between dbt's adapter interface and the current SqlClient requirements.
Most of these arise from table and schema management calls, which are
only used in the MetricFlow CLI tutorial and integration test suite.
As such, we expect those methods to be removed from the SqlClient
interface as we move management of these datasets to dbt.

Further, the dbt adapter fetching is a bit suspect, and not to be
copied - we are relying on the dbtRunner()'s state loading to remain
valid as we access certain internal APIs from dbt core to get the
profile we need. All of this works, and we expect it to continue
working, but we will need to maintain this connection and keep it
up to date as the dbt core project updates its external accessors for
the dbt profile and dbt project which ultimately initialize and house
the adapter instance.

Copy link
Contributor Author

tlento commented Jun 22, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@tlento
Copy link
Contributor Author

tlento commented Jun 22, 2023

snyk failures are expected due to pyproject.toml changes

@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 22, 2023 01:36 — with GitHub Actions Failure
@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 22, 2023 01:37 — with GitHub Actions Failure
@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 22, 2023 01:37 — with GitHub Actions Failure
@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 22, 2023 01:37 — with GitHub Actions Failure
@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 22, 2023 01:37 — with GitHub Actions Failure
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 22, 2023 01:43 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 22, 2023 01:43 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 22, 2023 01:43 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 22, 2023 01:43 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 22, 2023 01:43 — with GitHub Actions Inactive
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, clicked approve a little too soon. Taking a look still.

if pd.isnull(cell):
# use null keyword instead of isNA/None/etc.
cells.append("null")
elif type(cell) in [str, pd.Timestamp]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally works, but I vaguely recall running into type errors with one SQL engine where specifying a string when a time is expected threw an error. Cross that bridge if / when we get there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery almost certainly has this issue.

logger.info(f"Finished running the dry_run in {stop - start:.2f}s")
return

def create_table_from_dataframe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is straightforward, but yet I have a suspicion that we might run into edge cases. Just a comment, no action needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is temporary. It'll be a nightmare to maintain as we extend across dialects which is why I'm hoping to move to dbt managing dataset creation ASAP.

@@ -40,6 +41,8 @@ def test_query(sql_client: SqlClient) -> None: # noqa: D

def test_query_with_execution_params(sql_client: SqlClient) -> None:
"""Test querying with execution parameters of all supported datatypes."""
if isinstance(sql_client, AdapterBackedSqlClient):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this this check / skip logic into a function might be nice so that we can use the IDE tooling to update this later when support is added. Also, reduces isinstance calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once all of our production engines are cut over to adapters I'll either remove these tests or keep them solely for a DuckDB "example" implementation of a SQLClient. I don't think the adapter-backed client will support bind parameters any time soon.

If we don't have one already we should add a test to make sure bind parameters are preserved and propagated through the rendering layers, though. I know they are, generally, but I don't remember what kind of testing we have around that behavior.

[tool.hatch.envs.postgres-env]
description = "Dev environment for working with Postgres adapter"
pre-install-commands = [
"pip install dbt-postgres",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for installing via pip instead of specifying it as a dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could've sworn I had a comment about this, but clearly there isn't one. I'll add one before merging.

We don't depend on dbt-postgres, or any specific adapter, anywhere in the codebase, so we don't want to force an install in the normal dependency list. We could make it an env-specific dependency but then we're bound to a version range, which means keeping the version ranges in sync between here and where we specify dbt-core, and dealing with things like dbt-semantic-interfaces version requirements.

This way we get the relevant adapter installed in the environment, the way an end user would, and pip takes care of the rest as best it's able.

Once we have all of the packaging laid out with dbt-metricflow we'll likely update this to use an editable dependency on the local package. The pre-install is apparently the only way hatch supports editable dependency installations, so that'll have to be here anyway, but that should allow us to work around dbt-semantic-interfaces version update conflicts in most circumstances.

Base automatically changed from move-snapshots-to-engine-type-paths to main June 22, 2023 21:54
This helper function was tagged as only being used in tests, but
it was sitting in the main package. This moves it to the test package
to avoid confusion as we restructure around dbt adapter integrations.
As part of the effort to provide basic metric querying capabilities
for local installations of dbt core, MetricFlow is removing all
engine-specific dependencies and instead delegating warehouse
connection and query management to dbt adapters for integration
tests and command line operations.

This commit is the first step towards moving onto dbt adapters. It
provides a new wrapper class for dbt adapters, conforming to
MetricFlow's SqlClient protocol, and delegating all operations
down to the underlying adapter. It uses this adapter in test cases
only, and sources it from a dummy project in the test fixtures
directory. The dummy project is configured to allow for different
adapters (configured through test runner environment variables), and
in future it will likely be used as the basis for MetricFlow's
core integration test table management.

Note the implementation of the SqlClient wrapper class has some
highly temporary work-arounds to certain rough edges that exist
between dbt's adapter interface and the current SqlClient requirements.
Most of these arise from table and schema management calls, which are
only used in the MetricFlow CLI tutorial and integration test suite.
As such, we expect those methods to be removed from the SqlClient
interface as we move management of these datasets to dbt.

Further, the dbt adapter fetching is a bit suspect, and not to be
copied - we are relying on the dbtRunner()'s state loading to remain
valid as we access certain internal APIs from dbt core to get the
profile we need. All of this works, and we expect it to continue
working, but we will need to maintain this connection and keep it
up to date as the dbt core project updates its external accessors for
the dbt profile and dbt project which ultimately initialize and house
the adapter instance.
The switch to dbt adapters breaks the Postgres test run for
generating snapshots. Since Postgres runs so fast we simply
run all tests via the relevant hatch env command.
The dbt adapter-backed SqlClient needs to skip some tests and
depends on some custom package installations. This commit updates
the logic for the former to depend only on DuckDB, which is not
currently planned for migration to adapters. We also add a comment
explaining the installation configuration in the pyproject.toml.
@tlento tlento merged commit 09281fb into main Jun 23, 2023
@tlento tlento deleted the add-dbt-adapter-shim branch June 23, 2023 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants