Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional arguments to include database and schema properties in sources.yml generated from generate_source #123

Closed
jeremyholtzman opened this issue Apr 10, 2023 · 10 comments · Fixed by #124
Labels
enhancement New feature or request

Comments

@jeremyholtzman
Copy link
Contributor

Describe the feature

The generate_source macro should always include the properties for database and schema to explicitly call out where the source data comes from. This should be a best practice especially for those who are new to dbt and the source.yml files.

Describe alternatives you've considered

The alternative is the current method - where database and schema are only added if they are different from the target database and name argument respectively.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
This should work across databases just as generate_source works currently.

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

  • people learning dbt and how sources work
  • people new to a dbt project and want clarity on where their source data comes from

Are you interested in contributing this feature?

Yes

@dbeatty10
Copy link
Contributor

@jeremyholtzman thanks for opening this feature suggestion!

The docs for sources notes that:

*By default, schema will be the same as name. Add schema only if you want to use a source name that differs from the existing schema.

If we update the behavior of codegen, we'll want to re-consider the language in those docs at the same time.

Would like to get affirmation from one of the maintainers of dbt-project-evaluator of the best practices for the schema property.

@Grace do you know who could weigh in on automatically adding schema to codegen?

@jeremyholtzman
Copy link
Contributor Author

I did post an internal Slack poll in our training team channel and so far to see in general Do you think it’s best practice to always define both the database and schema properties for all sources configured in the sources.yml file?

So far, all 5 others (luckily enough including Grace) have seemed to agree that it is best practice, but I'll definitely let Grace weigh in here directly. If we do make this change, I'd definitely love to change the docs for sources like you mentioned

@jeremyholtzman
Copy link
Contributor Author

@dbeatty10 I think you meant to tag in @graciegoheen for your question above

@dbeatty10
Copy link
Contributor

Thanks @jeremyholtzman -- Indeed, I did intend to tag @graciegoheen.

The results of the internal Slack poll sound good to me -- let's proceed with reviewing the implementation you started in #124

@graciegoheen
Copy link
Contributor

I definitely like to do this when teaching/training new dbt users because it demonstrates all of the levers they have available for defining source definitions. That being said, it's not DRY and would lead to longer yml files - so I welcome a debate as to whether or not we would consider this "best practice."

Another enhancement here could be adjusting the words "schema" and "database" depending on the warehouse you're in - for example, in BigQuery use dataset (instead of schema) and project (instead of database).

@jeremyholtzman
Copy link
Contributor Author

I love the idea of adjusting the words schema and database depending on the warehouse!

Some points from the Slack thread that argue for adding the database and schema fields:

  • helps provide leverage to analysts - based on this post
  • I think it also helps “optimize for the long term” by showing them explicitly a lever than can pull in the future for when this happens
  • By building these into the yaml at first, it's really clear where things are pointed in your project. It also makes it way easier for when an update needs to be made, they go to the file and the config is already there, they just update it!

@jeremyholtzman
Copy link
Contributor Author

@dbeatty10 Below are the updated results from the poll for this feature:

  • 9 (including myself) agree that database and schema should be generated by default (64%)
  • 5 added an option for mixed feelings which didn't seem to explicitly take a stance on the default (29%)
  • 1 says no need to generate either (7%)

I'm not sure whether you think we should or shouldn't proceed with the task

@graciegoheen
Copy link
Contributor

Thoughts on just adding an optional argument to determine the behavior of this macro?

@jeremyholtzman
Copy link
Contributor Author

Yeah that might be the easiest solution, I'm good with that!

@jeremyholtzman jeremyholtzman changed the title Always include database and schema properties in sources.yml generated from generate_source Add optional parameters to include database and schema properties in sources.yml generated from generate_source May 1, 2023
@jeremyholtzman jeremyholtzman changed the title Add optional parameters to include database and schema properties in sources.yml generated from generate_source Add optional arguments to include database and schema properties in sources.yml generated from generate_source May 1, 2023
@dbeatty10
Copy link
Contributor

Thanks for soliciting all that feedback @jeremyholtzman ! 🏆

Agreed, let's go with @graciegoheen's suggestion of making this optional to preserve the current behavior, but also allow for richer configurability.

Specifically, let's add two new optional parameters to generate_source (both defaulting to False):

  • include_database
  • include_schema

When either or both of these are True, then they will explicitly include database and/or schema within the output YAML.

Further discussion of implementation details can take place in the associated pull request: #124.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants