Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates DocumentParser to look for docs in .sql files #1042

Closed

Conversation

jakebiesinger
Copy link

Since the .sql files have macros that aren't available to the docs
parser (e.g., ref), we can instead get the AST for the docs macros and
directly read out the contents without rendering them.

This assumes that when people write docs blocks, they don't contain to
any other macros or other Jinja tomfoolery. It's hard for me to tell if
that's a reasonable assumption or not. If not, we can be more careful. This
is just meant to be a jumping-off point.

Relevant to #979

Since the .sql files have macros that aren't available to the docs
parser (e.g., `ref`), we can instead get the AST for the docs macros and
directly read out the contents without rendering them.

This assumes that when people write `docs` blocks, they don't contain to
any other macros or other Jinja tomfoolery. It's hard for me to tell if
that's a reasonable assumption or not.

Relevant to dbt-labs#979
@jakebiesinger
Copy link
Author

For whatever reason, I can't see the AppVeyor failure-- it just loads a blank screen for me when I hit the Details link.

@cmcarthur
Copy link
Member

@jakebiesinger thanks for the PR! I want to think on this a bit.

My intuition is that we should not hack in these one-off Jinja AST manipulations, since, as you noted they can break subtly if docs blocks contain certain constructs, e.g. ref. I suspect there are other situations like this, e.g. docs blocks inside another block, or regular macros defined in model SQL files.

Were you able to test this on your machine?

@jakebiesinger
Copy link
Author

In the current implementation,

  • There's no problem with the docs blocks appearing inside of other kinds of blocks. The find_all method will find {grand-}*child docs blocks.
  • There would be a problem if someone tried to stick some other template inside of a docs block, though. It's not clear what macros we want to support inside of docs blocks though? E.g., ref doesn't work in current docs blocks (it'd be nice if it resolved to the fully-qualified DB name or something, but right now it yields nothing).

It seems to me that we should support docs blocks in all sql files, including models, macros, and analysis. #1043 enables this for models, but it'd be trivial to also port that to macros and close #1041. The idea behind using the AST rather than rendering the template came from #746, where it's suggested that you shouldn't have to render macros if all you're trying to do is discover dependencies. We could alternatively grab the docs blocks from the AST like we're doing here and render only the contents of those blocks.

I tested the new functionality locally, but haven't run your unit/integration tests locally just yet. As I mentioned, I can't see anything in your appveyor instance, so I can't even see what the current failure is. I'll try to slice some time off to get a propoer dev env running here in the next day or so.

@drewbanin
Copy link
Contributor

@jakebiesinger just took a look at this -- Appveyor didn't run at all because this PR has conflicts with the base branch.

I agree with @cmcarthur that AST inspection probably isn't the right level of abstraction for solving this problem. We certainly do want to render the docs block contents with jinja.

I think there is also some functionality missing with this code, as I can indeed see the docs entry in the manifest, but the documentation isn't actually patched into the relevant model, so it doesn't show up in the documentation website.

Thanks for taking a stab at this! It's definitely a useful jumping off point. Forgetting the implementation: the goal here is to colocate documentation within models, right?

Is there any reason we can't just create a dbt_project.yml config option for docs file extensions? Right now it's [.md], but updating it to [.md, .sql] would make it possible to define these docs right inside of the model file. You'd still need to reference the docs block in a schema.yml file, but it gets like 80% of the way there.

I imagine we could do something like auto-document models if the docs resource path == the model path. I don't actually think that's a good idea (why wouldn't you want to be explicit in your schema.yml file? And where did you get your syntax highlighter :p). Regardless of that implementation, I also think it's very doable if we choose to do it.

Let me know what you think about all of this @jakebiesinger

@jakebiesinger
Copy link
Author

I'm not fussed either way about AST vs rendering. I was attempting to simplify by pruning features I didn't think were important. I'll poke again at the generated documentation; I definitely tested that, but I was also working on #1043 so I may have missed something.

A dbt_project.yml override for docs filetypes SGTM. Regarding using the resource path expression to auto-attach to models, I have one concern: I want to attach column docs in the same way. Perhaps a resource path + in-doc label could be used? So, in mymodel.sql, I could have a {% docs columnName %} which would get auto-attached to mymodel.columnName.

Speaking of #1043, yeah, I'm a fan of having documentation include as low a barrier to entry as possible. I firmly believe a non-negligible % of docs would either not be written or would not be correctly attached if we require both a docs block AND a schema.yml update. We're fairly new to DBT, but generating a schema.yml file at all is in many cases an afterthought.

On reflection, I'm actually of the opinion that everything that's currently in the schema.yml file could as easily go into separate model .sql files, including all docs, and tests/constraints. It's not distracting IMHO to know what to expect about the table I'm about to look at, and to have everything about the table and its columns all in one place.

@drewbanin
Copy link
Contributor

going to close this out as I think we decided that

Is there any reason we can't just create a dbt_project.yml config option for docs file extensions? Right now it's [.md], but updating it to [.md, .sql] would make it possible to define these docs right inside of the model file. You'd still need to reference the docs block in a schema.yml file, but it gets like 80% of the way there.

is a more appropriate approach for this change. Thanks for opening this @jakebiesinger!

@polmonso
Copy link

Did this have a follow-up issue/PR? I couldn't find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants