Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature CT-426 manage schemas #5392

Closed

Conversation

bneijt
Copy link

@bneijt bneijt commented Jun 17, 2022

resolves #4957 CT-426

Description

Add support to manage (drop/monitor) schema for unmanaged models.

TODO because of draft

  • Update documentation
  • Test on different model types (view/table)
  • Test with multiple schema's
  • Test logging/warning behavior of runs

Checklist

@cla-bot
Copy link

cla-bot bot commented Jun 17, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @bneijt

@bneijt bneijt force-pushed the feature/ct-426-manage-schemas branch from 06fa34d to 5b4be03 Compare June 17, 2022 14:13
@cla-bot
Copy link

cla-bot bot commented Jun 17, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @bneijt

2 similar comments
@cla-bot
Copy link

cla-bot bot commented Jun 17, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @bneijt

@cla-bot
Copy link

cla-bot bot commented Jun 17, 2022

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @bneijt

@cla-bot cla-bot bot added the cla:yes label Jun 17, 2022
@bneijt bneijt changed the title Feature/ct 426 manage schemas Feature CT-426 manage schemas Jun 27, 2022
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really appreciative of you both for opening up the PR, in which we can ground more concrete discussion. I took a first look through and left some initial comments.

I sense that the current implementation assumes certain patterns for how users are leveraging custom schemas across environments. I think it's good to be opinionated — it's an optional feature, people don't need to use it — so long as we're clear on what those opinions are. The most successful version of this feature asks us to also provide even clearer recommendations for how users ought to be structuring their DWH (logical database/schema organization), so that dbt knows how to clean up after them.

core/dbt/config/profile.py Outdated Show resolved Hide resolved
core/dbt/contracts/project.py Outdated Show resolved Hide resolved
core/dbt/task/run.py Outdated Show resolved Hide resolved
core/dbt/contracts/project.py Show resolved Hide resolved
core/dbt/task/run.py Outdated Show resolved Hide resolved
@agoblet agoblet force-pushed the feature/ct-426-manage-schemas branch from 53fbfbc to 3a17595 Compare September 16, 2022 12:59
@jtcohen6 jtcohen6 self-assigned this Sep 20, 2022
@jtcohen6 jtcohen6 added the Refinement Maintainer input needed label Sep 20, 2022
@agoblet
Copy link
Contributor

agoblet commented Oct 4, 2022

@jtcohen6 could you review this PR? I think that the only missing part is the docs. But for that I would like to know whether the feature is finalized as is.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 16, 2022

@agoblet Sorry for the delay! I've been away on vacation, and heads down getting ready for Coalesce (this week!). This is definitely on my list to review later in the month.

Thanks for the great conversation so far. If we can get this merge-ready, I would love to include it in an upcoming release. Next step here is on me to review and provide feedback.

@agoblet
Copy link
Contributor

agoblet commented Nov 7, 2022

@agoblet Sorry for the delay! I've been away on vacation, and heads down getting ready for Coalesce (this week!). This is definitely on my list to review later in the month.

Thanks for the great conversation so far. If we can get this merge-ready, I would love to include it in an upcoming release. Next step here is on me to review and provide feedback.

@jtcohen6 any news? Regarding the docs update, I think we only need to extend the dbt_project.yml spec

@agoblet
Copy link
Contributor

agoblet commented Nov 21, 2022

@jtcohen6 what is blocking us from moving forward with this? How can we solve it?

@agoblet
Copy link
Contributor

agoblet commented Nov 24, 2022

@jtcohen6 if you have no time for the review, maybe another reviewer could take a look, such as @gshank ?

@jtcohen6
Copy link
Contributor

@agoblet Sorry for the radio silence - this is still on me!

The remaining blocker at this point is me taking the time to sit down with the latest set of changes, and provide a functional review, including product/UX refinement as appropriate. Once I've signed off, I'll remove the Refinement label, add the ready_for_review label, and we'll tap in a software engineer to provide a code review. It's a big-enough addition that I want to give it the attention it deserves, and not rush through it.

Apologies that the limiting reagent here has been my time. We're still working through the backlog of new issues that came in during Coalesce.

To provide a sense of timing: Since this is a new feature, it would be included in the next minor version release after it's merged. The next minor version we have scheduled is v1.4, for release in January — so, we still have some leeway. I'm really appreciative of your patience in the meantime.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! I've had a chance to sit down with this for a few hours. There's a lot of promise here! I've also left a lot of feedback. Most of it is quite tactical, and I think we've got a solid path to the finish line.

# dbt_project.yml
managed-schemas:
  # if using postgres, 'database' should always be default / {target.database}
  - schema:                 # matches models without a custom schema configuration --> *probably* {target.schema}, but could vary if using a custom 'generate_schema_name' macro
    prune-models: warn
  - schema: another_schema  # matches models configured with 'schema: another_schema' --> where they actually land will vary based on 'generate_schema_name' macro!
    prune-models: warn

# What about when:
#  - using generate_schema_name_for_env, and every model has the same 'resolved' schema, but with different 'prune' actions?
#  - we just deleted the last model with the 'another_schema' config? should we raise a warning? still prune?
# - ... lots of other edge cases! ...

That feedback is in two forms: I dropped comments inline in this PR; and I also pushed up a commit with a whole bunch of code comments and illustrative changes (though not always the prettiest ones): 3a1c689

As far as next steps:

  • Once this has passed functional review, one of the Core engineers will give it code review as well. The good news is that the changes here are relatively self-contained. Still, tasks are a tricky business; while any significant refactoring should be out of scope, there might be some cleanup / reorg in order.
  • @dbeatty10 has offered to provide additional UX feedback. (Doug, I'd be happy to get your thoughts on any of the comments I left on this PR, or thoughts expressed in the commit linked above!)
  • We'll need to identify follow-on work for:

@bneijt @agoblet The question to answer first: Is this something you're still interested in working on, and carrying over the finish line? Or would you rather one of us to pick it up from here? (I completely understand if your interest & capacity have shifted in the months since you last worked on this—the delay is on me.)

core/dbt/task/manage.py Outdated Show resolved Hide resolved
core/dbt/task/manage.py Outdated Show resolved Hide resolved
core/dbt/task/manage.py Outdated Show resolved Hide resolved
core/dbt/task/manage.py Outdated Show resolved Hide resolved
warn_or_error(
f"Found unused model {target_database}.{target_schema}.{target_identifier}"
)
elif target_action == PruneModelsAction.DROP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For visibility: What do you think about some info-level logging, right before dropping each schema? (The debug-level logs will probably include the SQL that's submitted by adapter.drop_relation.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agoblet will look into this

core/dbt/task/manage.py Show resolved Hide resolved
models_in_database[(target_database, target_schema, target_identifier)]
)

def _assert_schema_uniqueness(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this validation check here, let's do it at parse time, while dbt is reading + validating dbt_project.yml. We can do that by defining a custom validate method on SchemaManagementConfiguration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agoblet will look into this

core/dbt/task/manage.py Show resolved Hide resolved
core/dbt/task/manage.py Show resolved Hide resolved
core/dbt/task/manage.py Show resolved Hide resolved
@jtcohen6 jtcohen6 removed their assignment Dec 14, 2022
@agoblet
Copy link
Contributor

agoblet commented Dec 16, 2022

@jtcohen6 thanks for the valuable feedback! We are still interested in finishing this. I will start fixing some of these things hopefully next week. What is your idea about the timeline for this PR? Do we still want to add it to the january release, or do you think that is too tight?

@dbeatty10
Copy link
Contributor

@jtcohen6 thanks for the valuable feedback! We are still interested in finishing this. I will start fixing some of these things hopefully next week. What is your idea about the timeline for this PR? Do we still want to add it to the january release, or do you think that is too tight?

Very excited about your work on this @agoblet ! I defer to @jtcohen6, but we might be too tight for the January release -- after today, we'll all be out of the office until January 3rd, and we're planning on cutting the release candidate (RC) for 1.4 just a handful of days later. The final release is planned for about two weeks after the RC.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 4, 2023

@agoblet Agree that the timing here might be tight for v1.4, since we're planning to put out the release candidate (= lock any code changes) next week. Apologies again for the delay on my side.

That's okay! I'd still be very excited to merge this PR, when it's ready, without adding undue rush. Our current plan (hope) is to put out an early beta of v1.5 in February (final release scheduled for April), including a bunch of internal API changes that are currently on a feature branch (feature/click-cli), so that folks can start testing those out early & often. That would also represent a great opportunity to get beta feedback for this feature!

One big thing that remains is merging / rebasing changes in main. If you do get around to that in the next few days, in addition to the feedback above, you can mark the PR as "ready" (not draft), and we can get someone from the engineering team to give a technical code review. Again, no undue rush — this is something folks have wanted for a long time, and I'm okay with making them wait a little longer if it means we're shipping something high-quality and thought-through.

@ChenyuLInx
Copy link
Contributor

Hey @agoblet, Thanks for contributing this! We are focusing on getting the feature/click-cli branch merged this/early next week. Once that's merged I will circle back and provide some tips on moving this PR into using the new click-cli, should be just some relative quick adjustment we need to do.

@agoblet
Copy link
Contributor

agoblet commented Jan 25, 2023

Great, thanks @ChenyuLInx , we did not rebase yet, will do that once the new cli is on main. I will also try to fix the remaining comments next week.

@ChenyuLInx ChenyuLInx self-requested a review January 26, 2023 06:19
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bneijt We recently merged the feature branch! Let me know if there's any help needed moving this PR to the click framework!

core/dbt/main.py Outdated
@@ -448,6 +449,21 @@ def _build_debug_subparser(subparsers, base_subparser):
return sub


def _build_manage_subparser(subparsers, base_subparser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter definition and how we invoke the manage task will need to be moved to the new params.py and click command(link)

add noop and warn tests

improve tests

rename tests

add view dropping test

add unmanaged schema test

make tests more dry

Delete tmp.csv

Manage schemas is optional

Add --target-path as a CLI option. (dbt-labs#5402)

Include py.typed in MANIFEST.in (dbt-labs#5703)

This enables packages that install dbt-core from pypi to use mypy.

wip: move manage logic to separate command

Add manage command
@bneijt
Copy link
Author

bneijt commented Feb 4, 2024

I left the organisation I where I started this work and have not used dbt for a few years now. I nolonger wish to contribute to this and I’m closing this pr.

@bneijt bneijt closed this Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes cli Refinement Maintainer input needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-426] [Feature] Manage full schema content with dbt by dropping tables/views not maintained in models
5 participants