[dagster-dbt] rework dagster dbt logging, cleanup #8681
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary & Motivation
resolves: #8601 and unticked requests for the ability to turn off
--log-format json
.This is a whole rabbit hole, but essentially the way we were handling the output of the dbt cli command was a bit confusing and somewhat incorrect. This corrects some issues and refactors a bit of code around that area.
First, this resolves a longstanding issue where, if your dbt command had an error, the error message would not show human-readable information (as we were not correctly parsing the json lines). The below screenshot is taken after the changes. Prior to the changes, the parsed output (the thing that you can actually read) would be empty, and the main visible metadata would be the bottom thing (which is pretty unreadable).
Next, this allows the user to tell Dagster not to insert the "--log-format json" flag when invoking dbt. In an ideal world, there'd be no reason to turn this off, because there should be no material impact on execution when using this flag, and we should be able to parse out json-formatted log lines and display them in dagit better than just reading raw text.
However, it turns out that due to a bug in the dbt-bigquery adapter, this log format flag can actually cause dbt to produce unhelpful errors in cases where it should emit helpful ones: dbt-labs/dbt-bigquery#206
I submitted a PR to correct this behavior on dbt's side, but it's possible that there are other places where this could crop up (and it's unclear exactly when my PR would get merged in), so it's now a config option. I tried to keep the dagit experience relatively consistent when this option is turned off, but some sacrifices had to be made. For example, we can't tell what log level each message was originally at, so all logs are set to the INFO level when you do this. There's also a dbt-produced timestamp at the start of each of these lines, which I decided not to attempt to remove:
Finally, this PR adds a new target argument to load_assets_from_dbt_manifest. This is due to an unfortunate git mishap, but it's a tiny low risk change so it's here now.
How I Tested These Changes
unit tests, dagit workflow