Defer iff unselected reference does not exist in current env #2946

jtcohen6 · 2020-12-12T00:29:27Z

resolves #2909

Description

Today, --defer is a blanket override: build the resources included by my selection criteria, and use the --state manifest's namespaces for any references to resources not included by my selection criteria.

Instead, this PR would make it a more flexible fallback: If there's a resource I want to build, and it references a resource I haven't selected, use the other manifest's namespace if and only if the upstream resource doesn't exist in my current environment namespace. In this way, --defer is more like a "fail-over" mechanism, using the deferred environment to fill in the gaps of whatever's missing in the current environment.

Pros:

Fixes current unavoidable behavior whereby dbt run --defer would always defer non-model resources (seeds and snapshots). If you've just seeded or snapshotted into your scratch/CI schema, you probably want to use those objects to build your models in the following step.
Fixes annoying behavior whereby subsequent run steps would defer resources run by previous run steps. This is a common use case for testing changed incremental models in CI by rerunning only them (dbt run -m state:modified ... followed by dbt run -m state:modified,config.materialized:incremental ...).
If we wanted to, we could reopen test deferral (Support deferred tests #2701). I cherry-picked 280803c (from Feature/defer tests #2706), and it just worked. This is after months after convincing myself (and other people, too) that test deferral was A Bad Thing, Actually. Really, I just didn't see a way to do it right. Of course, many of the caveats still apply, e.g. that relationships tests don't make a lot of sense in CI environments.

Cons:

If you have a lot of cruft hanging around in your scratch schema, and you want to use deferral as a "clean" override—use prod references for everything I'm not currently building—you'll need to drop the cruft first. You could see this as a minor regression from current behavior, though IMO it's very minor. The intended use case for --defer is dev, CI, and other scratch schemas that can be created, dropped, and recreated with aplomb.
If you instantiate your dev/CI/scratch schema by first zero-copy cloning on Snowflake, you wouldn't be able to defer any of your references. But you also wouldn't need to; cloning and deferral accomplish the same purpose.

If you buy the premise, the solution is very straightforward—a pleasant surprise. We made deferral a beta feature in v0.18 for exactly this reason: we knew there would be wrinkles to iron out. All the same, it's better to make this change sooner rather than later.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

gshank

Looks pretty straightforward!

kwigley

I buy it, lgtm!

drewbanin

One comment about a potential KeyError, but beyond that, this LGTM!

I do wonder if we should rethink the flag/feature name --defer as we make this subtle but significant change in functionality. Happy to adjudicate that one outside the scope of this particular PR, but think that as we see this functionality rolling out of "beta" we should be really cognizant of the words we use to describe it! It's very cool and very powerful, and i feel like the biggest blocker to its utility could actually be people not understanding how to use it / what it is for :D

drewbanin · 2020-12-16T04:05:05Z

core/dbt/contracts/graph/manifest.py

            if (
                node.resource_type in refables and
                not node.is_ephemeral and
-                unique_id not in selected
+                unique_id not in selected and
+                not adapter.get_relation(


do you know if this happens before or after a relation cache has been built? Curious if this implementation is going to run one introspective query per node in the deferral manifest, or if they'll all be cache lookups?

drewbanin · 2020-12-16T04:06:47Z

core/dbt/contracts/graph/manifest.py

@@ -898,10 +899,14 @@ def merge_from_artifact(
        refables = set(NodeType.refable())
        merged = set()
        for unique_id, node in other.nodes.items():
+            current = self.nodes[unique_id]


is there a scenario where unique_id is present in the other manifest, but not in this manifest? I'm thinking that could happen if a model was deleted, for instance. If I'm thinking about that right, then I think this could raise a KeyError, and we'd probably want to use self.nodes.get(unique_id) and do something smart if the result is None.

I might be way off-base here, it just jumped out at me though and wanted to confirm

really good point, I was able to produce a KeyError here and changed the line to handle. this also gave me a clue to #2875 :)

drewbanin · 2020-12-16T04:08:19Z

core/dbt/task/run.py

        with adapter.connection_named('master'):
            self.create_schemas(adapter, selected_uids)
            self.populate_adapter_cache(adapter)
+            self.defer_to_manifest(adapter, selected_uids)


ok - i am seeing this now, and you can ignore my comment about caching above. Very clever, very cool. Leaving that comment above parce que je suis impressionné

jtcohen6 · 2020-12-16T05:32:59Z

Thank you all for taking a look! I'm excited that we'll manage to sneak this into v0.19.0.

I do wonder if we should rethink the flag/feature name --defer as we make this subtle but significant change in functionality...

I'm going through the exact same thought process, and definitely happy to continue it offline. I can't quite decide if "defer" better encapsulates the old behavior, the new behavior, or if it doesn't describe either very well at all. (I'm between #2 and #3.)

The features here feel tricky, as do the words—"state comparison," "job deferral"—and I can't decide if we'd be better served by something a bit more... colorful? I mean, --in-loco-parentis is just sitting there waiting for us.

jtcohen6 · 2020-12-16T16:22:14Z

I'm going to merge this. Let's continue the conversation elsewhere about potentially renaming --defer.

cla-bot bot added the cla:yes label Dec 12, 2020

jtcohen6 marked this pull request as ready for review December 14, 2020 19:37

jtcohen6 requested review from gshank and kwigley December 14, 2020 19:37

gshank approved these changes Dec 14, 2020

View reviewed changes

jtcohen6 force-pushed the fix/defer-if-not-exist branch from e2c7465 to 80eedb5 Compare December 15, 2020 01:27

Check if relation exists before deferring

b025f20

jtcohen6 force-pushed the fix/defer-if-not-exist branch from 80eedb5 to b025f20 Compare December 15, 2020 03:24

jtcohen6 mentioned this pull request Dec 15, 2020

Feature: defer tests #2954

Merged

4 tasks

kwigley approved these changes Dec 15, 2020

View reviewed changes

drewbanin reviewed Dec 16, 2020

View reviewed changes

Handle keyerror if old node missing

08e425b

jtcohen6 merged commit aff7299 into dev/kiyoshi-kuromiya Dec 16, 2020

jtcohen6 deleted the fix/defer-if-not-exist branch December 16, 2020 16:22

jtcohen6 mentioned this pull request Dec 18, 2020

Review UX for --defer #2968

Closed

jtcohen6 mentioned this pull request Feb 5, 2022

[CT-168] Cache objects for selected resources only? #4688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer iff unselected reference does not exist in current env #2946

Defer iff unselected reference does not exist in current env #2946

jtcohen6 commented Dec 12, 2020 •

edited

Loading

gshank left a comment

kwigley left a comment

drewbanin left a comment

drewbanin Dec 16, 2020

drewbanin Dec 16, 2020

jtcohen6 Dec 16, 2020

drewbanin Dec 16, 2020

jtcohen6 commented Dec 16, 2020

jtcohen6 commented Dec 16, 2020

Defer iff unselected reference does not exist in current env #2946

Defer iff unselected reference does not exist in current env #2946

Conversation

jtcohen6 commented Dec 12, 2020 • edited Loading

Description

Checklist

gshank left a comment

Choose a reason for hiding this comment

kwigley left a comment

Choose a reason for hiding this comment

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin Dec 16, 2020

Choose a reason for hiding this comment

drewbanin Dec 16, 2020

Choose a reason for hiding this comment

jtcohen6 Dec 16, 2020

Choose a reason for hiding this comment

drewbanin Dec 16, 2020

Choose a reason for hiding this comment

jtcohen6 commented Dec 16, 2020

jtcohen6 commented Dec 16, 2020

jtcohen6 commented Dec 12, 2020 •

edited

Loading