Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer iff unselected reference does not exist in current env #2946

Merged
merged 2 commits into from
Dec 16, 2020

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Dec 12, 2020

resolves #2909

Description

Today, --defer is a blanket override: build the resources included by my selection criteria, and use the --state manifest's namespaces for any references to resources not included by my selection criteria.

Instead, this PR would make it a more flexible fallback: If there's a resource I want to build, and it references a resource I haven't selected, use the other manifest's namespace if and only if the upstream resource doesn't exist in my current environment namespace. In this way, --defer is more like a "fail-over" mechanism, using the deferred environment to fill in the gaps of whatever's missing in the current environment.

Pros:

  • Fixes current unavoidable behavior whereby dbt run --defer would always defer non-model resources (seeds and snapshots). If you've just seeded or snapshotted into your scratch/CI schema, you probably want to use those objects to build your models in the following step.
  • Fixes annoying behavior whereby subsequent run steps would defer resources run by previous run steps. This is a common use case for testing changed incremental models in CI by rerunning only them (dbt run -m state:modified ... followed by dbt run -m state:modified,config.materialized:incremental ...).
  • If we wanted to, we could reopen test deferral (Support deferred tests #2701). I cherry-picked 280803c (from Feature/defer tests #2706), and it just worked. This is after months after convincing myself (and other people, too) that test deferral was A Bad Thing, Actually. Really, I just didn't see a way to do it right. Of course, many of the caveats still apply, e.g. that relationships tests don't make a lot of sense in CI environments.

Cons:

  • If you have a lot of cruft hanging around in your scratch schema, and you want to use deferral as a "clean" override—use prod references for everything I'm not currently building—you'll need to drop the cruft first. You could see this as a minor regression from current behavior, though IMO it's very minor. The intended use case for --defer is dev, CI, and other scratch schemas that can be created, dropped, and recreated with aplomb.
  • If you instantiate your dev/CI/scratch schema by first zero-copy cloning on Snowflake, you wouldn't be able to defer any of your references. But you also wouldn't need to; cloning and deferral accomplish the same purpose.

If you buy the premise, the solution is very straightforward—a pleasant surprise. We made deferral a beta feature in v0.18 for exactly this reason: we knew there would be wrinkles to iron out. All the same, it's better to make this change sooner rather than later.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Dec 12, 2020
@jtcohen6 jtcohen6 marked this pull request as ready for review December 14, 2020 19:37
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty straightforward!

@jtcohen6 jtcohen6 mentioned this pull request Dec 15, 2020
4 tasks
Copy link
Contributor

@kwigley kwigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy it, lgtm!

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about a potential KeyError, but beyond that, this LGTM!

I do wonder if we should rethink the flag/feature name --defer as we make this subtle but significant change in functionality. Happy to adjudicate that one outside the scope of this particular PR, but think that as we see this functionality rolling out of "beta" we should be really cognizant of the words we use to describe it! It's very cool and very powerful, and i feel like the biggest blocker to its utility could actually be people not understanding how to use it / what it is for :D

if (
node.resource_type in refables and
not node.is_ephemeral and
unique_id not in selected
unique_id not in selected and
not adapter.get_relation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know if this happens before or after a relation cache has been built? Curious if this implementation is going to run one introspective query per node in the deferral manifest, or if they'll all be cache lookups?

@@ -898,10 +899,14 @@ def merge_from_artifact(
refables = set(NodeType.refable())
merged = set()
for unique_id, node in other.nodes.items():
current = self.nodes[unique_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a scenario where unique_id is present in the other manifest, but not in this manifest? I'm thinking that could happen if a model was deleted, for instance. If I'm thinking about that right, then I think this could raise a KeyError, and we'd probably want to use self.nodes.get(unique_id) and do something smart if the result is None.

I might be way off-base here, it just jumped out at me though and wanted to confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good point, I was able to produce a KeyError here and changed the line to handle. this also gave me a clue to #2875 :)

with adapter.connection_named('master'):
self.create_schemas(adapter, selected_uids)
self.populate_adapter_cache(adapter)
self.defer_to_manifest(adapter, selected_uids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - i am seeing this now, and you can ignore my comment about caching above. Very clever, very cool. Leaving that comment above parce que je suis impressionné

@jtcohen6
Copy link
Contributor Author

Thank you all for taking a look! I'm excited that we'll manage to sneak this into v0.19.0.

I do wonder if we should rethink the flag/feature name --defer as we make this subtle but significant change in functionality...

I'm going through the exact same thought process, and definitely happy to continue it offline. I can't quite decide if "defer" better encapsulates the old behavior, the new behavior, or if it doesn't describe either very well at all. (I'm between #2 and #3.)

The features here feel tricky, as do the words—"state comparison," "job deferral"—and I can't decide if we'd be better served by something a bit more... colorful? I mean, --in-loco-parentis is just sitting there waiting for us.

@jtcohen6
Copy link
Contributor Author

I'm going to merge this. Let's continue the conversation elsewhere about potentially renaming --defer.

@jtcohen6 jtcohen6 merged commit aff7299 into dev/kiyoshi-kuromiya Dec 16, 2020
@jtcohen6 jtcohen6 deleted the fix/defer-if-not-exist branch December 16, 2020 16:22
@jtcohen6 jtcohen6 mentioned this pull request Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Defer refers to outdated seed
4 participants