-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run tests after model creation, persisting updates only if the tests pass #1054
Comments
I think this is a great idea. There are a few inter-related issues and features which I'll try to start picking apart. This would be a pretty major architecture change, so I think we will all benefit if we can get clear on the requirements, priorities, and streams of work. TransactionsI feel strongly that DBT should be running in pseudo-transactions(henceforth "transactions") when it's running in production (i.e., there are downstream users querying the transformations). Here's why: in the middle of a DBT run, end users could receive invalid query results if two relations have become inconsistent. Imagine you're calculating an order rate among users as the share of users that have an order. If this query gets execute after the Proposed architecture:
In this world, end users will never be querying data that are part old and part new. Concerns:
DBT Run ModesAs I've alluding to above, you probably do not want this transaction-like behavior when you're developing. To support this, I'd recommend implementing DBT run "modes" that allow users to control how DBT behaves with respect to handling this transaction-like behavior. To start, I'd propose the following modes (not mutually exclusive):
Note: I'm not at all committed to these names and I'm sure we can come up with better ones. TestsOnce the above concepts are implemented, we can add some configuration so that tests can cause "errors" which may or may not halt a DBT run depending upon which run mode we're in. One could imagine another configuration flag that indicates whether or not tests should be counted as "errors" for the purpose of a dbt run. The last feature, which I expect people will want, would be the ability to configure tests with severity levels. So we could introduce "log", "warn", and "error" types (or whatever) for tests, then we could update the configuration options so that different test failure levels can be handled differently. In my mind, this feature comes last. |
See also #1005 which is relevant. |
Shooting blind here, but would it be possible to use tagging under the hood somehow to implement this? After running This would also help in cases where you run successfully, test and find a few models failing tests, fix the issues with the models, and want to only re-run the models that ran successfully but didn't test successfully. In that case you could just run with tag |
@joshtemple I really like the workflow you're describing! I'm unsure if tags are going to be the best way to do it, but I think this is absolutely something that dbt should support |
hey @michael Kaminsky -- this is a pretty significant departure from how dbt currently works. I just took another scan through here, and I think your description of an all-or-nothing build is most closely aligned with how I think this should work. I think there are still some unanswered questions here, and I think the first step in making headway on this issue is probably making sure that we have most/all of our bases covered. Some questions I have:
|
Great questions @drewbanin ! Going to table bigquery for now and come back to it (I've never used it, so am fuzzy on a bunch of important details that we'd need to solve it). How should incremental models work in this paradigm?Great question, and something I hadn't thought of yet. As I've conceptualized it, there are two issues:
Options:
Easy, but sort of defeats the purpose of this whole exercise as it doesn't ensure consistency during your builds
There will be some performance penalty for models that are downstream of the incremental model, but it's unclear how much (depends on how big the change is and how the data are distributed).
This will be slow (unsure how slow??) for large tables, though notably not nearly as slow as rebuilding the whole table -- we're basically just making redshift to do a lot of extra disk I/O for nothing. Might actually be an okay compromise, but would want to test it with someone who has bigger-data to get a sense of performance impact. How do we interleave tests with models?I don't think we need to implement this yet. I think we should table and come back to it when we want to implement the testing part of this (phase 2). At what “level” are the old/new relations swapped? Do we rename whole schemas/datasets? Or the individual database relations?The way I use dbt, I could / would do it at the schema level (dbt-managed schemas are wholly separate from non-dbt managed schemas). From what I've gathered in the slack channel, not everyone follows this rule so we'd probably have to do everything at the relation / model level. LMK if I'm missing something here, though! I'm thinking this would be trivial as long as we keep track of where everything is in the staging schema and where everything "belongs" but let me know if I'm just not thinking of something. |
@mikekaminsky I really like your I'm always super wary of modifying dbt's materialization code, as we've done seemingly innocuous but wholly incorrect things in the past. There's a whole list of things to know like:
There is more to this list, but I just wanted to sketch out the kind of thinking that goes into a feature like this. These constraints are individually workable, but together, make it challenging to design a feature that 1) operates as intended and 2) uses each database to its fullest potential. In the past, we've built features that weren't implemented on certain databases due to limitations of the database itself. BigQuery, for instance, only got Last, I agree that we should tackle the model building part of this first, then add testing into the mix separately. That change has more to do with how dbt's graph selector includes nodes in a given run, and should "just work" once this "atomic swap" paradigm is implemented. PS I think we'll need to make dbt drop these new relations somehow. A failed run should stick around long enough to debug the problem, and then should be easy to clean up with a dbt command IMO. |
Lots of good thoughts, @drewbanin! I'd be interested in moving this forward without bigquery support (especially because of the draconian ddl limits!!) You definitely have a better perspective on where / how this can go wrong given the diversity of implementations you've seen. I've only really seen a handful (and all of them with redshift), so I just haven't seen these issues crop up. I'll defer to you if this is worth moving forward on! Regarding having DBT clean up, it sort of depends on how you implement it. We could have DBT drop the old, failed relations, or we could also just let those get blown away when the next successful run happens (if we build everything in a dedicated staging schema like Another comment incoming on late-binding views ... |
When building views in atomic mode, we need to make sure they are not late-binding. Let's imagine we have In atomic mode, we run At the end of the run, we run So the view would need to be "regular" in order for this to work. Since we use late-binding views to get around the |
@mikekaminsky I totally buy the premise but I'm unsure about that implementation. For one, relations on Redshift can't be renamed across schemas :) Will keep this in mind, but I think that we'll likely be best served by late binding views in the general case. Redshift (and Postgres) are really the only databases out there that support bound views, so I'd be hesitant to leverage them as a core part of this strategy. |
"Redshift chokes if you look at it funny" quote of the century. There's another potential option @mikekaminsky 's Downsides are that it commits implicitly, and can't be run in a transaction, also needs a vacuum, although Redshift does that automatically these days so don't worry about it. Upsides are that it is a metadata operation and almost instant. CREATE TABLE "tempy" LIKE destination_table
.. do the incremental insert into "tempy"
...create the view over "tempy" & destination_table
... run the tests
--if the tests succeed:
BEGIN TRANSACTION;
DELETE FROM destination_table WHERE exists ( select unique_key from "tempy" table );
ALTER TABLE destination_table APPEND FROM "tempy" table;
COMMIT; There might be a cleverer way to do it that doesn't lead to a timespan where
|
hey @davehowell - cool idea! We played around with
dbt's incremental models currently use |
If you create a permanent (I called it "tempy" because it's permanent but you would drop it later) table using "like" then the properties are identical, you've cloned the metadata of the destination table. That the increment comes from a select statement is inconsequential, because the "append" is from "tempy" to the destination table. If the incremental select can be coerced directly into the destination table then it follows it will successfully be inserted into a table just "like" it. Sure when the incremental model is first run it determines the datatypes of that destination because CTAS, but in this scenario the datatypes have been set in stone already. |
Oh, roger, i think i misread dbt actually does some amount of "column expansion" for incremental models currently. The first time you run an incremental model, it's possible that the type of your In order to determine the datatypes for the newly selected data, we'd probably need to create a temp table from a select statement, inspect the types, then insert into the tempy table, then append into the destination table. That would ameliorate the benefits of the The alternative is of course to not do column expansion (or try to handle changing column type inconsistencies at all!) which is honestly a pretty reasonable answer. It just means that folks might be running a whole lot of --full-refreshes :) In the interest of completeness, I think that another solution to this is to have users specify their model types in their sql statements, eg: select
first_name::varchar(128) as first_name,
last_name::varchar(128) as last_name,
... I think if we imposed this responsibility on users, then the Final thoughts: If you'd like, we can re-open the discussion on #395 to discuss the use of |
Ah yes that would be a problem. I knew about the column expansion, that's bitten me in a weird edge case where the dist-key field became a candidate for expansion, and altering dist-keys is another case where Redshift chokes. Of course, the fix was exactly as you suggested, explicitly casting the column type :) I think column-expansion is really nice for analysts, same with CTAS type-inference generally, so it totally makes sense for the dbt use-case of accelerating the analyst workflow, but I secretly hate inferred types and have started casting explicitly anyway. I also apply some encoding/compression as well with macros on first-run of incremental models. Actually I hate that I have to do so much hand-holding for Redshift and I keep hearing the wonders of Snowflake...but I digress. So yeah I totally agree that column-expansion is a case of being too nice to users, but now it's there you can't really take it away without making analysts face up to the realities of strong types. Maybe allow disabling of that feature, or disabling it when this testing mode is enabled. Or maybe just as you said take the functionality away and make people use explicit types. In most cases they wouldn't be needed anyway. |
IMO this is the correct implementation, very similar to post-hook running it inside transaction. rollingback if something fails. Most of the databases are supporting transactions and it should be possible to run {DML + Testst (Selects)} and do rollback of the transaction. For the databases not supporting transactions it should be up to the macro to implement rollback scenario.
In general i would avoid drop creating target model/tables, or swapping them. for-example: bigquery supports following actions inside transaction and these are sufficient to implement the feature. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Feature
Feature description
There have been a bunch of suggestions int Slack to run tests during model creation, and only persist changes if the tests pass. Here is a sampling of such suggestions:
jdub:
Josh Temple:
Leon:
tmurphy:
Who will this benefit?
A paradigm like this would help ensure that errant transformations are caught before they are deployed to production.
Things to think about:
There's lots more to discuss here too, so please drop your thoughts in the comments below!
The text was updated successfully, but these errors were encountered: