Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks on target server when more than 1 dbt thread #52

Open
IliasIB opened this issue Aug 8, 2023 · 3 comments
Open

Deadlocks on target server when more than 1 dbt thread #52

IliasIB opened this issue Aug 8, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@IliasIB
Copy link

IliasIB commented Aug 8, 2023

Expected Behaviour

dbt can run many tests in parallel without causing database deadlocks.
dbt can query information_schema and do DDL on separate threads without deadlocks.

Actual Behaviour

Intermittent deadlocks. So far only observed on dbt build (with tests) and dbt test.

('40001', '[40001] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Transaction (Process ID 237) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)')

Steps to reproduce

dbt test --threads 10

Configuration

A set of small generic dbt tests - I have ~50 tests
dbt 1.4.6
Fabric Warehouse

Possible causes

This issue seems to be similar to the one for dbt-sqlserver.

Other information

Current workaround: limit dbt threads to 1.

Just like in the original issue, the workaround seems to function correctly and dbt run has no issues.

@prdpsvs prdpsvs self-assigned this Aug 24, 2023
@prdpsvs prdpsvs added the bug Something isn't working label Aug 24, 2023
@IlonaTag
Copy link

IlonaTag commented Sep 28, 2023

I've experienced the same situation when I run dbt run on a fabric warehouse with threads > 1. It works fine first but after having processed 20-30 models successfully, the process suddenly hangs completely in a way that the warehouse cannot be accessed completely. I have to manually kill the hanging/deadlocked insert statements to be able to proceed. I wonder how a deadlock situation can occur because all models are certainly different data objects.

The only workaround seems to be when I set threads to 1.

I've tested it also in combination with Airflow where I setup single dbt run processes for each model separately. Makes no difference, as soon as I run more than one task in parallel, it works for 20-30 models, than hangs completely. After the first hang, unfortunately, parallelism does not work at all anymore.

I also experienced quite a poor performance, especially with incremental models when there are 0 changes. In this case, dbt run seems to run forever. Hard to understand because in my scenario I use Adventure Works database which does only contain quite small volumes of data.

A solution is highly appreciated.

@IlonaTag
Copy link

IlonaTag commented Oct 15, 2023

In the meantime, the situation has slightly changed. I can run now a complete dbt run (with threads=10) within dbt core without any problem and good performance, the deadlocks only occur when I process dbt run separately for each single dbt model in parallel via Apache Airflow (with a parallelism of 26). The good thing: The retries work, it can "heal itself":
image

@IlonaTag
Copy link

IlonaTag commented Oct 18, 2023

If I lower the degree of parallelism also to 10 or less in Airflow, it runs a little bit more stable (less number of deadlocks). Performance btw is significantly worse as with one single dbt run command.

@nszoni nszoni mentioned this issue Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants