Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write microbatch compiled + run code to separate target files #10743

Merged
merged 8 commits into from
Sep 24, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Sep 19, 2024

Resolves #10714

Problem

During execution of microbatch models, each batch is re-compiled with batch-level time filters. Currently, this results in the compiled/run .sql files becoming clobbered on each write when there should be one file per compiled/run batch.

Solution

Pass an additional argument, split_suffix (naming suggestions welcome!) that splits the write path by the suffix and writes to a top-level directory without the suffix.

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@cla-bot cla-bot bot added the cla:yes label Sep 19, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

This comment was marked as outdated.

@MichelleArk MichelleArk changed the title first pass: split_suffix Write microbatch compiled + run code to separate target files Sep 20, 2024
@MichelleArk MichelleArk marked this pull request as ready for review September 20, 2024 16:27
@MichelleArk MichelleArk requested a review from a team as a code owner September 20, 2024 16:27
with patch_microbatch_end_time("2020-01-03 13:57:00"):
run_dbt(["run", "--event-time-start", "2020-01-01"])

# Compiled paths - compiled model without filter only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @graciegoheen, we decided it could still be useful to see the non-batched (no filters applied) model file, and at the top-level is where users would expect it. I've added a test to formalize this expected behaviour, but it's something that would be easy to change if we get beta feedback!

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for doing this work. Found one aesthetic thing, but not blocking.

Comment on lines 575 to 583
def format_batch_start(self, batch_start: Optional[datetime]) -> Optional[str]:
if batch_start is None:
return batch_start

return str(
batch_start.date()
if (batch_start and self.config.batch_size != BatchSize.hour)
else batch_start
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Feels somewhat weird that this is on the ParsedNode class instead of the MicrobatchBuilder class. Just feels out of place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree! I think at first I didn't do this because it's necessary in providers.py which didn't Added it as a static method. Good call 👍

@MichelleArk MichelleArk merged commit a1e4753 into main Sep 24, 2024
65 of 66 checks passed
@MichelleArk MichelleArk deleted the split-compiled-path branch September 24, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write compiled code to one file per batch during execution of microbatch models
2 participants