Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse docs blocks when they appear in .sql files #979

Closed
ghost opened this issue Sep 6, 2018 · 15 comments
Closed

Parse docs blocks when they appear in .sql files #979

ghost opened this issue Sep 6, 2018 · 15 comments
Labels
dbt-docs [dbt feature] documentation site, powered by metadata artifacts enhancement New feature or request stale Issues that have gone stale

Comments

@ghost
Copy link

ghost commented Sep 6, 2018

Feature

Feature description

It's my belief that documentation being in a separate file from the code it's documenting will result in the documentation falling behind model development in all but the most disciplined organizations. Therefore, I believe that {% docs %} blocks should be parsed from .sql files in addition to .md files.

Granted, syntax highlighting is unlikely to be happy about the presence of markdown in a .sql file, but this is something we already deal with, having Jinja in .sql files. In the same way that we can nest a SQL comment within a Jinja comment, i.e. {% /* comment fo' reals, yo! */ #} so that both SQL and Jinja syntax highlighters will ignore the comment, users could conceivably wrap a docs block in a SQL block comment to the same effect, e.g.:

/* {% docs foo %}
## Bar!
{% enddocs %} */
select ...

Who will this benefit?

This would benefit people who wish to include documentation in the model itself, and would not negatively impact people who choose to document in the .yml file, or people who choose not to document at all (the heathens!)

@drewbanin drewbanin modified the milestone: 0.11.1 - Lucretia Mott Sep 7, 2018
@drewbanin
Copy link
Contributor

We could probably implement this by adding an option for docs blocks file extensions. Right now, dbt looks for them in .md files, but we could add a config option to also look in .sql files (or really, extensions of your choosing).

What happens if there are multiple docs blocks in a model file? This could happen if you wanted to document the model itself, and some of the resulting columns. Should dbt automatically infer that eg. the first docs block is the documentation for the model? Or, should we add some other specifier to the {% docs %} signature that indicates that it's for the current model/a specific column/etc?

I'm into this idea, and I think the next step is the flesh out the syntax that will ultimately be used here. Once we have a good grasp on the syntax and semantics, I think the implementation shouldn't be too tricky.

@jakebiesinger
Copy link

One possible convention:

  1. All docs blocks in .sql files are required to have the model name as a prefix.
  2. We document the model itself by looking for a block w/a name that matches the model name.
  3. We document columns by looking for blocks w/name model_name.column_name.
  4. No restrictions are placed on the positioning of any doc blocks inside the .sql file

Documenting models would be super-simple, and column docs could live right next to their column definitions. E.g., my_model.sql:

{% docs my_model %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

SELECT
  {% docs my_model.foo %}
    The number of `Foo`s we should create.
  {% enddocs %} 
  1 AS foo,
  2 AS bar

I don't think I'd mind having to explicitly say the model name or the column names. Doing so still beats defining this stuff in .md or .yml files IMHO.

@ghost
Copy link
Author

ghost commented Oct 2, 2018

The simplest solution might be to only parse a single docs block out of the .sql file and assume it to be the model documentation. Not the cleanest or most intuitive, but it would work.

Or it could be irrelevant entirely; since under the current implementation the docs block needs to be referenced in the schema.yml file, that behavior could be retained requiring the user to make that link once for each model, but thereafter only need to change the docs block in the model to update the documentation.

this_model.sql

{% docs something %}
...
{% enddocs %}

select
...

schema.yml

version: 2

models:
  - name: this_model
    description: {{ doc('something') }}
    ...

Lastly, maybe it's possible to have an unnamed docs block, only valid in a model file, which is always model documentation?

{% docs %}
...
{% enddocs %}

select
...

@jakebiesinger
Copy link

jakebiesinger commented Oct 2, 2018

I like the anonymous, single-block idea, and you're likely right that the implementation would be easier if we stick to single-blocks.

I'm not a big fan of having to go to the schema.yml to explicitly tie the docs to the model. IMHO, the user already told us that it's a docstring for the model by putting it in the model's .sql file!

On the other hand, it's a one-time cost for each model and would be better than what we have now.

But I also really like the idea of being able to document models and columns inline, without any other configuration. People might actually document stuff if we made it that easy.

jakebiesinger pushed a commit to jakebiesinger/dbt-core that referenced this issue Oct 3, 2018
Since the .sql files have macros that aren't available to the docs
parser (e.g., `ref`), we can instead get the AST for the docs macros and
directly read out the contents without rendering them.

This assumes that when people write `docs` blocks, they don't contain to
any other macros or other Jinja tomfoolery. It's hard for me to tell if
that's a reasonable assumption or not.

Relevant to dbt-labs#979
@tayloramurphy
Copy link

Turns out I'm basically asking for the same thing here #1051 but for custom tests!

@drewbanin drewbanin added dbt-docs [dbt feature] documentation site, powered by metadata artifacts enhancement New feature or request labels Oct 13, 2018
@MartinGuindon
Copy link

Yes please. 😁

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 7, 2022
@martinspaeth
Copy link

martinspaeth commented Jan 7, 2022

We are also using dbt more and more and the project says a lot to me in many aspects. The documentation is very clear and the API is very straight forward. However, one point that also bothers me is the artificial separation of model definition and sql. I likewise think that the schema yml are not really necessary.

Background:

Currently we are thinking about splitting our schema yml into several yml's - one per model. A single yml file quickly becomes confusing.

Our folder structure would then look like this.


models
 - model_a.yml 
 - model_a.sql

 - model_b.yml 
 - model_b.sql

This would make the structure clearer with the current tool, but it would require a lot of typing. The model name as well as the schema yml structure are not necessary with an approach as it is described here.

Related topics

In the forum I came across a scipt that can add similar functionality:

https://discourse.getdbt.com/t/here-is-a-way-to-write-dbt-docs-as-sql-comment/1658

The approach described here is quite nice, but preprocessing outside the standard is not an option for us.

First MR's that would allow a development in this direction natively are unfortunately not continued: #1042 (comment) Also the described approach to integrate sql files via a config was unfortunately not pursued further.

I think the activities as well as the many likes of the issue deserve some attention :-)

Proposal

Here my 3 ct to the topic Documentation in the sql file.

The first anonymous docs block found describes the documentation of the model associated with the file:

Content of model_a.sql:

{% docs %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

SELECT
  {% docs .foo %}
    The number of `Foo`s we should create.
  {% enddocs %} 
  1 AS foo,
  2 AS bar

The parser could automatically enrich the anonymous docs block with the name of the file it was found in respectively the model name - in this case "docs model_a".

{% docs %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

Inline documentation of columns could be implemented using a leading "." :

  {% docs .foo %}
    The number of `Foo`s we should create.
  {% enddocs %} 

Again, the parser could extrapolize the model name: {% docs model_a.foo %}

The schema. yml handling in dbt docs could then default to "description: {{ doc("model_name") }}" if no model description was explicitly specified. Analogous for columns
then description: {{ doc("model_name.columnName") }} would be possible.

This approach would be, from what I can see so far, backwards compatible and implementable with manageable effort.

What this approach does not yet cover: how to deal with tests. if a readable format can also be found to include them directly in sql, it would be possible to describe all the information belonging to the model directly in the model.

@github-actions github-actions bot removed the stale Issues that have gone stale label Jan 8, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jul 7, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@alexscott-ff
Copy link

Just wanted to get a read here from maintainers if possible - is this something that has been discussed and nixed, or is it of interest? We are now in the position of documenting both the .sql files AND putting documentation in schema.yml and it leads to documentation getting out of sync.

The documentation within the model file itself is preferable when reviewing PRs changing a model, updating an existing model, and/or updating documentation about an existing model. schema.yml often gets forgotten about so our documentation tends to get stale.

@visserp
Copy link

visserp commented Nov 24, 2023

We're also dealing with this issue. Mostly writing docs in separate .md files but it creates file clutter and having the documentation separated from the SQL makes it harder to make changes as the goal and intentions of the model are detached from the code.

Having documentation with the code would be best as everything the model is trying to achieve is all in the same context

@Phil-T1
Copy link

Phil-T1 commented Nov 29, 2023

This is definitely an issue.

As a developer you really want the same information in both places without the unnecessary duplication and maintenance.

Being able to add the doc block before the SQL makes so much more sense.

@ktuft-cbh
Copy link

Agree with the recent comments; this problem is more acute than ever as our projects & docs grow in size & complexity. Multi-project instances, Python support, Semantic layer, metrics - it's not easy for a developer to keep track of, especially the why of certain things. SQL-embedded doc blocks feel like a natural solution that would fit very well within the modern dbt paradigm.

@nicodp-leap
Copy link

Totally agree with recent comments.
Having the info/documentation in the same file where the model is defined is something needed to avoid unnecessary duplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt-docs [dbt feature] documentation site, powered by metadata artifacts enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

10 participants