Database Deadlocks with MSSQL 2019 #348

rafaelhofmann · 2022-12-20T13:27:35Z

Prerequisites

I am running the latest version (We are running 11.5)
I checked the documentation and found no answer
I checked to make sure that this issue has not already been filed

For bug reports

When we execute around 300 - 400 tasks per minute, we experience database deadlocks on the table scheduled_tasks.
The deadlock report from the MSSQL database shows a deadlock between either one of the indexes execution_time_idx or last_heartbeat_idx and the clustered primary key of the table.

The deadlock happens when a Custom-Task tries to reschedule itself for another execution (PollingTaskConfiguration:42 in the stacktrace below) with executionOperations.reschedule.

Workaround

For now we have deleted both indexes to stop the deadlocks from happening (so far successfully), but probably at the cost of reduced throughput.

Steps to Reproduce the bug

Currently unclear

Context

DB-Scheduler Version : 11.5
Java Version : 1.8
Spring Boot (check for Yes) : [x]
Database and Version : Microsoft SQL Server 2019
Database Transaction Isolation Level: Read Committed Snapshot

Logs


Failed while completing execution Execution: task=polling-task, id=polling-task-155175, executionTime=2022-12-17T23:28:53.936Z, picked=true, pickedBy=workflow-engine-36-v6vm9, lastHeartbeat=2022-12-17T23:29:03.339Z, version=642. Execution will likely remain scheduled and locked/picked. The execution should be detected as dead after a while, and handled according to the tasks DeadExecutionHandler.
                com.github.kagkarlsson.jdbc.SQLRuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
                at com.github.kagkarlsson.jdbc.JdbcRunner.translateException(JdbcRunner.java:126)
                at com.github.kagkarlsson.jdbc.JdbcRunner.lambda$execute$2(JdbcRunner.java:93)
                at com.github.kagkarlsson.jdbc.JdbcRunner.withConnection(JdbcRunner.java:140)
                at com.github.kagkarlsson.jdbc.JdbcRunner.execute(JdbcRunner.java:66)
                at com.github.kagkarlsson.jdbc.JdbcRunner.execute(JdbcRunner.java:54)
                at com.github.kagkarlsson.scheduler.jdbc.JdbcTaskRepository.rescheduleInternal(JdbcTaskRepository.java:251)
                at com.github.kagkarlsson.scheduler.jdbc.JdbcTaskRepository.reschedule(JdbcTaskRepository.java:247)
                at com.github.kagkarlsson.scheduler.task.ExecutionOperations.reschedule(ExecutionOperations.java:62)
                at XXX.PollingTaskConfiguration.lambda$null$1(PollingTaskConfiguration.java:42)
                at com.github.kagkarlsson.scheduler.ExecutePicked.complete(ExecutePicked.java:104)
                at com.github.kagkarlsson.scheduler.ExecutePicked.executePickedExecution(ExecutePicked.java:88)
                at com.github.kagkarlsson.scheduler.ExecutePicked.run(ExecutePicked.java:68)
                at com.github.kagkarlsson.scheduler.FetchCandidates.lambda$run$1(FetchCandidates.java:89)
                at java.util.Optional.ifPresent(Optional.java:159)
                at com.github.kagkarlsson.scheduler.FetchCandidates.lambda$run$2(FetchCandidates.java:87)
                at com.github.kagkarlsson.scheduler.Executor.lambda$addToQueue$0(Executor.java:52)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:750)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
                at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265)
                at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1676)
                at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:615)
                at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:537)
                at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7730)
                at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3786)
                at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:268)
                at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:242)
                at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:515)
                at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
                at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
                at com.github.kagkarlsson.jdbc.JdbcRunner.lambda$execute$2(JdbcRunner.java:85)
                ... 17 common frames omitted

The text was updated successfully, but these errors were encountered:

kagkarlsson · 2022-12-21T07:32:03Z

I do not know what is causing this. I'm surprised a deadlock is happening this frequent. Would you list your db-scheduler config? (the spring boot properties)

kagkarlsson · 2022-12-21T07:35:56Z

Similar to #327

kagkarlsson · 2022-12-21T07:44:45Z

This might be related to issue discussed here: https://stackoverflow.com/a/59196002
MSSQL leaving locks on rows it traverses to find the row to update

rafaelhofmann · 2022-12-21T11:38:59Z

This is our config. We have been using it for more than two years now without any problem. Only when the amount of tasks increased we had the deadlock problem. Roughly around 300 to 400 tasks per minute are the threshold when the problem starts to appear.

db-scheduler:
  delay-startup-until-context-ready: true
  immediate-execution-enabled: true
  polling-interval: 10s
  polling-limit: 5
  threads: 2

I have also attached the MSSQL deadlock report:

<event name="xml_deadlock_report" package="sqlserver" timestamp="2022-12-07T09:01:29.190Z">
    <data name="xml_report">
        <value>
            <deadlock>
                <victim-list>
                    <victimProcess id="process26c6d7ab468" />
                </victim-list>
                <process-list>
                    <process id="process26c6d7ab468" taskpriority="0" logused="1004"
                        waitresource="KEY: 9:72057594058113024 (4c22163c40c7)" waittime="4657" ownerId="5421774626"
                        transactionname="UPDATE" lasttranstarted="2022-12-07T10:01:24.530" XDES="0x261abad0428"
                        lockMode="U" schedulerid="7" kpid="17324" status="suspended" spid="61" sbid="0" ecid="0"
                        priority="0" trancount="2" lastbatchstarted="2022-12-07T10:01:24.530"
                        lastbatchcompleted="2022-12-07T10:01:24.523" lastattention="1900-01-01T00:00:00.523"
                        clientapp="Microsoft JDBC Driver for SQL Server" hostname="workflow-engine-28-cfvv7" hostpid="0"
                        loginname="db-user" isolationlevel="read committed (2)" xactid="5421774626" currentdb="9"
                        currentdbname="DB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
                        <executionStack>
                            <frame procname="adhoc" line="1" stmtstart="198" stmtend="560"
                                sqlhandle="0x0200000056b25017aaa8a864d1f0a432b6a123b48afea3c30000000000000000000000000000000000000000">
                                unknown</frame>
                            <frame procname="unknown" line="1"
                                sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
                                unknown</frame>
                        </executionStack>
                        <inputbuf>
                            (@P0 bit,@P1 nvarchar(4000),@P2 datetime2,@P3 bit,@P4 nvarchar(4000),@P5 nvarchar(4000),@P6
                            bigint)
                            update scheduled_tasks
                            set picked = @P0, picked_by = @P1, last_heartbeat = @P2, version = version + 1
                            where picked = @P3 and task_name = @P4 and task_instance = @P5 and version = @P6
                        </inputbuf>
                    </process>
                    <process id="process241cd639468" taskpriority="0" logused="1352"
                        waitresource="KEY: 9:72057594046775296 (f692f3e2bbb3)" waittime="4657" ownerId="5421774627"
                        transactionname="UPDATE" lasttranstarted="2022-12-07T10:01:24.530" XDES="0x26433a14428"
                        lockMode="U" schedulerid="14" kpid="13960" status="suspended" spid="56" sbid="0" ecid="0"
                        priority="0" trancount="2" lastbatchstarted="2022-12-07T10:01:24.530"
                        lastbatchcompleted="2022-12-07T10:01:24.527" lastattention="1900-01-01T00:00:00.527"
                        clientapp="Microsoft JDBC Driver for SQL Server" hostname="workflow-engine-28-cfvv7" hostpid="0"
                        loginname="db-user" isolationlevel="read committed (2)" xactid="5421774627" currentdb="9"
                        currentdbname="DB" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
                        <executionStack>
                            <frame procname="adhoc" line="1" stmtstart="324" stmtend="868"
                                sqlhandle="0x02000000008cbf296485126f49222b7d62069155af5cd8830000000000000000000000000000000000000000">
                                unknown </frame>
                            <frame procname="unknown" line="1"
                                sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
                                unknown </frame>
                        </executionStack>
                        <inputbuf>
                            (@P0 bit,@P1 nvarchar(4000),@P2 datetime2,@P3 datetime2,@P4 datetime2,@P5 int,@P6
                            datetime2,@P7 varbinary(8000),@P8 nvarchar(4000),@P9 nvarchar(4000),@P10 bigint)

                            update scheduled_tasks

                            set picked = @P0, picked_by = @P1, last_heartbeat = @P2, last_success = @P3, last_failure =
                            @P4, consecutive_failures = @P5, execution_time = @P6, task_data = @P7, version = version +
                            1

                            where task_name = @P8 and task_instance = @P9 and version = @P10
                        </inputbuf>
                    </process>
                </process-list>
                <resource-list>
                    <keylock hobtid="72057594058113024" dbid="9" objectname="DB.workflow.scheduled_tasks"
                        indexname="execution_time_idx" id="lock2646d22ce00" mode="X"
                        associatedObjectId="72057594058113024">
                        <owner-list>
                            <owner id="process241cd639468" mode="X" />
                        </owner-list>
                        <waiter-list>
                            <waiter id="process26c6d7ab468" mode="U" requestType="wait" />
                        </waiter-list>
                    </keylock>
                    <keylock hobtid="72057594046775296" dbid="9" objectname="DB.workflow.scheduled_tasks"
                        indexname="PK__schedule__0BE25508E6B4E257" id="lock23e68e7aa80" mode="X"
                        associatedObjectId="72057594046775296">
                        <owner-list>
                            <owner id="process26c6d7ab468" mode="X" />
                        </owner-list>
                        <waiter-list>
                            <waiter id="process241cd639468" mode="U" requestType="wait" />
                        </waiter-list>
                    </keylock>
                </resource-list>
            </deadlock>
        </value>
    </data>
</event>

kagkarlsson · 2022-12-26T19:25:52Z

Would you also paste your table and index definitions for the scheduled_tasks table?

kagkarlsson · 2022-12-26T19:34:59Z

I have no experience with MSSQL, but I am curious if it makes any difference if we make the primary key NONCLUSTERED?

rafaelhofmann · 2023-01-03T14:34:33Z

Would you also paste your table and index definitions for the scheduled_tasks table?

The initial table definition was without the indexes:

create table workflow.scheduled_tasks (
  task_name varchar(250) not null,
  task_instance varchar(250) not null,
  task_data  nvarchar(max),
  execution_time datetimeoffset  not null,
  picked bit,
  picked_by text,
  last_success datetimeoffset ,
  last_failure datetimeoffset ,
  consecutive_failures INT,
  last_heartbeat datetimeoffset ,
  [version] BIGINT not null,
  PRIMARY KEY (task_name, task_instance)
)

We later added both indexes when we updated the library version and saw the change in the release notes:

CREATE INDEX execution_time_idx ON workflow.scheduled_tasks (execution_time);
CREATE INDEX last_heartbeat_idx ON workflow.scheduled_tasks (last_heartbeat);

rafaelhofmann · 2023-01-03T14:35:33Z

I have no experience with MSSQL, but I am curious if it makes any difference if we make the primary key NONCLUSTERED?

Sounds good in theory. We will try to test this on a UAT instance and see if we can reproduce the problem.

rafaelhofmann · 2023-01-15T10:35:46Z

A quick update from my experiments. The non-clustered primary key surprisingly has worsened the situation and leads to a lot more deadlocks on my local MSSQL instance compared to the clustered primary.
Currently I am experimenting with lock hints (more precise: READPAST, ROWLOCK, and UPDLOCK) for the SELECT and UPDATE queries and implementing the lock-and-fetch strategy for MssqlJdbcCustomization.

kagkarlsson · 2023-01-24T11:12:29Z

Not sure but lock-and-fetch for MSSQL possibly needs JdbcRunner to support transaction spanning multiple statement

kagkarlsson · 2023-04-21T08:03:26Z

I was able to reproduce locally when using AzureSqlEdge, but the problem went away when I added with (readpast) to the select due query.

Not getting the problem for the new lock-and-fetch for mssql.

See #371

@kagkarlsson

…unner transaction support. (#371) * Adding support for `lock-and-fetch` aka. `SELECT FOR UPDATE .. SKIP LOCKED` for MSSQL/SqlServer. Though testing has shown that lock-and-fetch are prone to deadlocks, so it is not recommended until that is understood/resolved. However, a query-hint was also added to `fetch-and-lock-on-execute` for MSSQL, and it appears this resolves issues with deadlocks for that strategy. * Updates jdbc-handling dependency `micro-jdbc` * Re-written transaction-handling (in `micro-jdbc`) * Shading `micro-jdbc` dependency * Explicit limit also for SqlServer, MySQL, Oracle * Adds ClusterTest for Sql Server (test concurrency) ## Fixes * #264 * #337 * #348 ? * #273 ? ## Further work / fix later * Mysql v8 skip locked syntax * MariaDB skip locked syntax * Oracle skip locked syntax ## Reminders - [x] Added/ran automated tests - [x] Update README and/or examples --- cc @kagkarlsson

kagkarlsson · 2024-07-01T12:29:12Z

Closing as I am hoping this was fixed with the new query-hints for SQL Server

kagkarlsson mentioned this issue Dec 21, 2022

scheduler.RunUntilShutdown causes a deadlock #327

Open

kagkarlsson mentioned this issue Apr 17, 2023

feat: Support lock-and-fetch for MSSQL. Explicit LIMIT for all. #371

Merged

2 tasks

kagkarlsson mentioned this issue Jul 15, 2023

Pessimistic locking cluster mode #410

Closed

kagkarlsson closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database Deadlocks with MSSQL 2019 #348

Database Deadlocks with MSSQL 2019 #348

rafaelhofmann commented Dec 20, 2022 •

edited

Loading

kagkarlsson commented Dec 21, 2022

kagkarlsson commented Dec 21, 2022

kagkarlsson commented Dec 21, 2022

rafaelhofmann commented Dec 21, 2022

kagkarlsson commented Dec 26, 2022

kagkarlsson commented Dec 26, 2022

rafaelhofmann commented Jan 3, 2023

rafaelhofmann commented Jan 3, 2023

rafaelhofmann commented Jan 15, 2023

kagkarlsson commented Jan 24, 2023

kagkarlsson commented Apr 21, 2023

kagkarlsson commented Jul 1, 2024

Database Deadlocks with MSSQL 2019 #348

Database Deadlocks with MSSQL 2019 #348

Comments

rafaelhofmann commented Dec 20, 2022 • edited Loading

Prerequisites

For bug reports

Workaround

Steps to Reproduce the bug

Context

Logs

kagkarlsson commented Dec 21, 2022

kagkarlsson commented Dec 21, 2022

kagkarlsson commented Dec 21, 2022

rafaelhofmann commented Dec 21, 2022

kagkarlsson commented Dec 26, 2022

kagkarlsson commented Dec 26, 2022

rafaelhofmann commented Jan 3, 2023

rafaelhofmann commented Jan 3, 2023

rafaelhofmann commented Jan 15, 2023

kagkarlsson commented Jan 24, 2023

kagkarlsson commented Apr 21, 2023

kagkarlsson commented Jul 1, 2024

rafaelhofmann commented Dec 20, 2022 •

edited

Loading