Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port schema to MySQL 5.7 / MariaDB 10.2 #203

Merged
merged 3 commits into from
Jul 19, 2024
Merged

Port schema to MySQL 5.7 / MariaDB 10.2 #203

merged 3 commits into from
Jul 19, 2024

Conversation

@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label May 28, 2024
@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 2 times, most recently from 87b2fb5 to 7bf3c7d Compare May 28, 2024 10:39
Copy link
Collaborator

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you spin this up for a test without any other compatibility issues?

schema/pgsql/upgrades/025.sql Outdated Show resolved Hide resolved
schema/pgsql/upgrades/025.sql Outdated Show resolved Hide resolved
schema/mysql/schema.sql Outdated Show resolved Hide resolved
@Al2Klimov
Copy link
Member Author

Good catch! I completely forgot the GHA!

@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 4 times, most recently from 1c25cc3 to 74d69b5 Compare May 29, 2024 13:56
schema/mysql/schema.sql Outdated Show resolved Hide resolved
schema/pgsql/schema.sql Outdated Show resolved Hide resolved
@Al2Klimov Al2Klimov marked this pull request as draft June 3, 2024 14:14
@nilmerg
Copy link
Member

nilmerg commented Jun 7, 2024

Then, why is the username column of contact not case insensitive in neither schema yet?

@Al2Klimov
Copy link
Member Author

Because it's not mandatory for porting the schema. Just an accidental finding.

@nilmerg
Copy link
Member

nilmerg commented Jun 7, 2024

Then open a bug, referencing the Web issue. Don't reference it here, if you don't plan to change it here.

@Al2Klimov
Copy link
Member Author

Whether I change something here depends on whether this PR or Icinga/icinga-notifications-web#196 is completed first.

@nilmerg
Copy link
Member

nilmerg commented Jun 10, 2024

That's fine. But then create a separate issue! It doesn't magically change otherwise. This PR is about the MySQL port, not about fixes in the Postgres schema.

@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 2 times, most recently from 09175f1 to 058eb59 Compare June 14, 2024 11:34
@Al2Klimov Al2Klimov marked this pull request as ready for review June 14, 2024 11:35
@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 2 times, most recently from 50e0d37 to 1690d69 Compare June 17, 2024 13:48
@oxzi
Copy link
Member

oxzi commented Jul 16, 2024

Continuing #203 (review).

Turns out, the DELETE query sets the lock. When removing it, my test does not run into a deadlock.

_, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
if err != nil {
return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
}

To be continued... again.

doc/02-Installation.md Outdated Show resolved Hide resolved
.github/workflows/tests_with_database.yml Outdated Show resolved Hide resolved
.github/workflows/tests_with_database.yml Outdated Show resolved Hide resolved
doc/02-Installation.md Outdated Show resolved Hide resolved
@oxzi
Copy link
Member

oxzi commented Jul 17, 2024

Continuing #203 (review), #203 (comment).

The object_extra_tag table has a composite primary key of the object_id and tag columns, being a binary(32) and varchar(255) respectively.

A SELECT query with a WHERE clause against a specific object_id results in a ref query. However, when creating a DELETE query with the same clause, range is chosen. Note that the latter has no ref.

MariaDB [notifications]> explain select * from object_extra_tag where object_id = 0xc08011a318c4479d2db5b9268b3ff0b7e7c3ea53b2da1aaaabb3378b4d98;
+------+-------------+------------------+------+---------------+---------+---------+-------+------+-------------+
| id   | select_type | table            | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+------+-------------+------------------+------+---------------+---------+---------+-------+------+-------------+
|    1 | SIMPLE      | object_extra_tag | ref  | PRIMARY       | PRIMARY | 32      | const | 1    | Using where |
+------+-------------+------------------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.001 sec)

MariaDB [notifications]> explain delete from object_extra_tag where object_id = 0xc08011a318c4479d2db5b9268b3ff0b7e7c3ea53b2da1aaaabb3378b4d98;
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | object_extra_tag | range | PRIMARY       | PRIMARY | 32      | NULL | 1    | Using where |
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.001 sec)

While I am not completely certain, I would guess based on SHOW ENGINE INNODB STATUS that the DELETE query creates a next-key lock. The locked interval seems to start from the largest value up to infinity, also known as supremum.

RECORD LOCKS space id 4595 page no 3 n bits 8 index PRIMARY of table "notifications"."object_extra_tag" trx id 696018 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

After some tinkering, I was able to reproduce this with to conflicting sessions. First, I selected the maximum object_id to insert two values afterwards, to run into the supremum interval.

MariaDB [notifications]> select max(id) from object;
+--------------------------------------------------------------------+
| max(id)                                                            |
+--------------------------------------------------------------------+
| 0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DA2DFF |
+--------------------------------------------------------------------+
1 row in set (0.000 sec)
Transaction 1 Transaction 2
start transaction;
start transaction;
insert into object (id, source_id, name) values (0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF00F, 1, 'foo');
insert into object (id, source_id, name) values (0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF0FF, 1, 'bar');
delete from object_extra_tag where object_id = 0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF00F;
delete from object_extra_tag where object_id = 0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF0FF;
insert into object_extra_tag values (0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF00F, 'foo', '');
insert into object_extra_tag values (0xFF5916B73A281B19D0F57696F8973FBF8AD6A7DFB85EF62CD6A64641E4DAF0FF, 'foo', '');
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Furthermore, I tried adding a new INDEX on object_extra_tag(object_id), but this changed nothing. I tried some other options, but those only made things worse. I even achieved an ALL type once.

For comparison, the query plan generated by PostgreSQL looks identical, at least for the selection part.

notifications=# explain select * from object_extra_tag where object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131';
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on object_extra_tag  (cost=4.45..23.19 rows=5 width=56)
   Recheck Cond: (object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131'::bytea)
   ->  Bitmap Index Scan on pk_object_extra_tag  (cost=0.00..4.45 rows=5 width=0)
         Index Cond: (object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131'::bytea)
(4 rows)

notifications=# explain delete from object_extra_tag where object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Delete on object_extra_tag  (cost=4.45..23.19 rows=0 width=0)
   ->  Bitmap Heap Scan on object_extra_tag  (cost=4.45..23.19 rows=5 width=6)
         Recheck Cond: (object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131'::bytea)
         ->  Bitmap Index Scan on pk_object_extra_tag  (cost=0.00..4.45 rows=5 width=0)
               Index Cond: (object_id = '\xefb1df788174bb134c82056ce6a850916e030ed0aefe1f2d7be659e5c1bcf131'::bytea)
(5 rows)

@julianbrost
Copy link
Collaborator

If MySQL isn't able to handle what we do in concurrent transactions, we have to change something anyways. Even without understanding the exact cause yet, I think there are some obvious things that we can try:

  1. The transaction currently isn't retried on such a failure. That can be added. That sounds like something we'd want anyway but of course it would be nicer if we wouldn't have to rely on this often.
  2. The current implementation was pretty much the easiest one to update the extra tags: delete all existing ones of the object and then (re)insert all. Alternative implementations could be tried so experimentally check if they show better behavior. Something like reading what's currently stored and just applying the delta. Or maybe add something like and "tag" not in (?, ?, ...) to the delete statement (however, I have low hopes for that one).

@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 2 times, most recently from 9653e73 to 0406d24 Compare July 17, 2024 09:16
@oxzi
Copy link
Member

oxzi commented Jul 17, 2024

If MySQL isn't able to handle what we do in concurrent transactions, we have to change something anyways. Even without understanding the exact cause yet, I think there are some obvious things that we can try:

1. The transaction currently isn't retried on such a failure. That can be added. That sounds like something we'd want anyway but of course it would be nicer if we wouldn't have to rely on this often.

Even though I am not the biggest fan of retrying everything, it might be a strategy here. Especially since situations like this could be lurking anywhere in the codebase.

2. The current implementation was pretty much the easiest one to update the extra tags: delete all existing ones of the object and then (re)insert all. Alternative implementations could be tried so experimentally check if they show better behavior. Something like reading what's currently stored and just applying the delta. Or maybe add something like `and "tag" not in (?, ?, ...)` to the `delete` statement (however, I have low hopes for that one).

After brainstorming together with @yhabteab, he suggested the same. Unfortunately, this does not worked either.

Thus, I just moved the logic from the database into the code, which worked, but isn't something I am proud of.

diff --git a/internal/object/object.go b/internal/object/object.go
index fc15c26..3a3e9c9 100644
--- a/internal/object/object.go
+++ b/internal/object/object.go
@@ -114,14 +114,26 @@ func FromEvent(ctx context.Context, db *database.DB, ev *event.Event) (*Object,
 		return nil, fmt.Errorf("failed to upsert object id tags: %w", err)
 	}
 
-	extraTag := &ExtraTagRow{ObjectId: newObject.ID}
-	_, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
+	var tags []string
+	stmt = db.Rebind(`SELECT "tag" FROM "object_extra_tag" WHERE "object_id" = ?`)
+	err = tx.SelectContext(ctx, &tags, stmt, newObject.ID)
 	if err != nil {
-		return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
+		return nil, fmt.Errorf("failed to fetch object_extra_tags: %w", err)
+	}
+
+	stmt = db.Rebind(`DELETE FROM "object_extra_tag" WHERE "object_id" = ? AND "tag" = ?`)
+	for _, tag := range tags {
+		if _, ok := ev.ExtraTags[tag]; ok {
+			continue
+		}
+		_, err := tx.ExecContext(ctx, stmt, newObject.ID, tag)
+		if err != nil {
+			return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
+		}
 	}
 
 	if len(ev.ExtraTags) > 0 {
-		stmt, _ := db.BuildInsertStmt(extraTag)
+		stmt, _ := db.BuildUpsertStmt(&ExtraTagRow{ObjectId: newObject.ID})
 		_, err = tx.NamedExecContext(ctx, stmt, mapToTagRows(newObject.ID, ev.ExtraTags))
 		if err != nil {
 			return nil, fmt.Errorf("failed to insert object extra tags: %w", err)

Edit: This might just work because the DELETE query is never being executed. In our test cases, the extra tags do not change.. D'oh!

@Al2Klimov Al2Klimov force-pushed the mysql-schema-154 branch 2 times, most recently from b686876 to d6c3668 Compare July 17, 2024 09:40
@julianbrost
Copy link
Collaborator

Edit: This might just work because the DELETE query is never being executed. In our test cases, the extra tags do not change.. D'oh!

In general, those should stay unchanged most of the time. So in combination with retrying on deadlocks, this might be good enough.

@oxzi
Copy link
Member

oxzi commented Jul 17, 2024

Edit: This might just work because the DELETE query is never being executed. In our test cases, the extra tags do not change.. D'oh!

In general, those should stay unchanged most of the time. So in combination with retrying on deadlocks, this might be good enough.

Out of curiosity, I changed the code to DELETE each object extra tag directly addressed by both its primary key fields. Well, this also results in a dead lock, as EXPLAIN already tells us.

MariaDB [notifications]> explain select * from object_extra_tag where object_id = 0xFFFEE3DA6EE137D96E7FDD6757EE7518E7B1A6687884F7A993397DD74DC4D683 and tag = 'servicegroup/webserver';
+------+-------------+------------------+-------+---------------+---------+---------+-------------+------+-------+
| id   | select_type | table            | type  | possible_keys | key     | key_len | ref         | rows | Extra |
+------+-------------+------------------+-------+---------------+---------+---------+-------------+------+-------+
|    1 | SIMPLE      | object_extra_tag | const | PRIMARY       | PRIMARY | 1054    | const,const | 1    |       |
+------+-------------+------------------+-------+---------------+---------+---------+-------------+------+-------+
1 row in set (0.001 sec)

MariaDB [notifications]> explain delete from object_extra_tag where object_id = 0xFFFEE3DA6EE137D96E7FDD6757EE7518E7B1A6687884F7A993397DD74DC4D683 and tag = 'servicegroup/webserver';
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | object_extra_tag | range | PRIMARY       | PRIMARY | 1054    | NULL | 1    | Using where |
+------+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.001 sec)

While starting to question what "database engine" even means1, I would second the idea to retry deadlocks.

Footnotes

  1. Maybe just three CSV files in a trench coat.

@Al2Klimov
Copy link
Member Author

While starting to question what "database engine" even means1, I would second the idea to retry deadlocks.

Footnotes

  1. Maybe just three CSV files in a trench coat.

How ironic. You've built up this whole thing on a DB which allows nothing and now a DB which can nothing crosses your way.

schema/mysql/schema.sql Outdated Show resolved Hide resolved
@yhabteab
Copy link
Member

I don't know exactly how that's even possible but here you're.

2024-07-17T16:02:11.160+0200    INFO    incident        Contact "Icinga Admin" role changed from subscriber to manager  {"object": "docker-master!swap", "incident": "#5"}
2024-07-17T16:02:11.185+0200    ERROR   incident        Can't insert incident event to the database     {"object": "docker-master!swap", "incident": "#5", "error": "Error 1062 (23000): Duplicate entry '5-348' for key 'PRIMARY'"}
2024-07-17T16:02:11.185+0200    ERROR   icinga2 Cannot process event    {"source-id": 1, "event": "[time=2024-07-17 16:02:11.150006 +0200 CEST m=+0.912499501 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"}

---

2024-07-17T16:08:16.259+0200    ERROR   incident        Can't insert incident event to the database     {"object": "docker-master!swap", "incident": "#5", "error": "Error 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (\"notifications\".\"incident_event\", CONSTRAINT \"fk_incident_event_event\" FOREIGN KEY (\"event_id\") REFERENCES \"event\" (\"id\"))"}
2024-07-17T16:08:16.260+0200    ERROR   icinga2 Cannot process event    {"source-id": 1, "event": "[time=2024-07-17 16:08:16.231294 +0200 CEST m=+0.969004418 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"}

@yhabteab
Copy link
Member

I don't know exactly how that's even possible but here you're.

2024-07-17T16:02:11.160+0200    INFO    incident        Contact "Icinga Admin" role changed from subscriber to manager  {"object": "docker-master!swap", "incident": "#5"}
2024-07-17T16:02:11.185+0200    ERROR   incident        Can't insert incident event to the database     {"object": "docker-master!swap", "incident": "#5", "error": "Error 1062 (23000): Duplicate entry '5-348' for key 'PRIMARY'"}
2024-07-17T16:02:11.185+0200    ERROR   icinga2 Cannot process event    {"source-id": 1, "event": "[time=2024-07-17 16:02:11.150006 +0200 CEST m=+0.912499501 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"}

---

2024-07-17T16:08:16.259+0200    ERROR   incident        Can't insert incident event to the database     {"object": "docker-master!swap", "incident": "#5", "error": "Error 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (\"notifications\".\"incident_event\", CONSTRAINT \"fk_incident_event_event\" FOREIGN KEY (\"event_id\") REFERENCES \"event\" (\"id\"))"}
2024-07-17T16:08:16.260+0200    ERROR   icinga2 Cannot process event    {"source-id": 1, "event": "[time=2024-07-17 16:08:16.231294 +0200 CEST m=+0.969004418 type=\"acknowledgement-set\" severity=]", "error": "can't insert incident event to the database"}

Fixed in #243 🙈

@julianbrost
Copy link
Collaborator

failed to insert object extra tags: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Regarding this error, @oxzi pointed the following out to me:

It should currently only be possible to get this error within the test cases. The code issuing the queries running into a conflict are actually done while holding a mutex:

cacheMu.Lock()
defer cacheMu.Unlock()
newObject := new(Object)
object, objectExists := cache[id.String()]
if !objectExists {
newObject = New(db, ev)
newObject.ID = id
} else {
*newObject = *object
newObject.ExtraTags = ev.ExtraTags
newObject.Name = ev.Name
newObject.URL = utils.ToDBString(ev.URL)
if ev.Mute.Valid {
if ev.Mute.Bool {
newObject.MuteReason = utils.ToDBString(ev.MuteReason)
} else {
// The ongoing event unmutes the object, so reset the mute reason to null.
newObject.MuteReason = types.String{}
}
}
}
tx, err := db.BeginTxx(ctx, nil)
if err != nil {
return nil, fmt.Errorf("failed to start object database transaction: %w", err)
}
defer func() { _ = tx.Rollback() }()
stmt, _ := db.BuildUpsertStmt(&Object{})
_, err = tx.NamedExecContext(ctx, stmt, newObject)
if err != nil {
return nil, fmt.Errorf("failed to insert object: %w", err)
}
stmt, _ = db.BuildUpsertStmt(&IdTagRow{})
_, err = tx.NamedExecContext(ctx, stmt, mapToTagRows(newObject.ID, ev.Tags))
if err != nil {
return nil, fmt.Errorf("failed to upsert object id tags: %w", err)
}
extraTag := &ExtraTagRow{ObjectId: newObject.ID}
_, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
if err != nil {
return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
}
if len(ev.ExtraTags) > 0 {
stmt, _ := db.BuildInsertStmt(extraTag)
_, err = tx.NamedExecContext(ctx, stmt, mapToTagRows(newObject.ID, ev.ExtraTags))
if err != nil {
return nil, fmt.Errorf("failed to insert object extra tags: %w", err)
}
}
if err = tx.Commit(); err != nil {
return nil, fmt.Errorf("can't commit object database transaction: %w", err)
}

Thus, an Icinga Notifications daemon won't deadlock with itself here. However, when running the tests, go test can run the tests for different packages in parallel within different processes. Note that t.Parallel() actually only affects the parallelism between tests within one package.

So I took the liberty to prepare a commit (8a824f1) that would be ready to use here that disables this behavior. And indeed, with it, the tests work reliably: https://github.com/Icinga/icinga-notifications/actions/runs/9977349345

Only running a single test case against a given database sounds like a good idea in general, so I consider setting -p 1 there a good idea regardless of this PR.

(There's still the issue that we probably want to be able to perform these database queries in parallel at some point, but that's a story for another day and PR.)

@Al2Klimov Feel free to adopt this commit in this PR, that will save you a rebase (otherwise, I'd create it as an independent PR).

@yhabteab
Copy link
Member

Apart from #203 (comment) looks fine for me, I don't know exactly why you replaced text with mediumtext throughout the schema, even though it's not necessary in most cases.

Al2Klimov and others added 2 commits July 18, 2024 16:55
By default, multiple packages may be tested in parallel in different processes.
Passing -p 1 reduces this to one process to prevent test cases in different
packages from accessing the same database. Note that t.Parallel() only affects
parallelism within one process, i.e.  within the tests of one package.

Co-authored-by: Alvar Penning <alvar.penning@icinga.com>
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFTM.

@julianbrost julianbrost merged commit db7fee5 into main Jul 19, 2024
26 checks passed
@julianbrost julianbrost deleted the mysql-schema-154 branch July 19, 2024 08:38
@oxzi
Copy link
Member

oxzi commented Jul 19, 2024

One last thought regarding #203 (comment):

I just had the idea to explicitly use index hints and gave it a last try. First, I have added another INDEX being identical to the primary key.

CREATE INDEX idx_object_extra_tag_object_id_tag ON object_extra_tag(object_id, tag);

Then, I crafted a DELETE statement using this index, which requires the quirky multi-table DELETE syntax:

MariaDB [notifications]> explain delete object_extra_tag.* from object_extra_tag use index(idx_object_extra_tag_object_id_tag) where object_id = 0x00D1C60C34E2B0ADF413EAAFCF31D29CE23A632B57FBC3E7F30312A3F8343571\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: object_extra_tag
         type: ref
possible_keys: idx_object_extra_tag_object_id_tag
          key: idx_object_extra_tag_object_id_tag
      key_len: 32
          ref: const
         rows: 1
        Extra: Using where
1 row in set (0.000 sec)

As one can see, this query would now be of the ref type. Not const, but something1. Thus, I have updated Icinga Notifcations, only to run in the same deadlock again. After manually debugging the query and locks2 I found out that even this ref query creates a next-key lock, potentially hitting the supremum. So far.

Click here to see the abandoned diff

diff --git a/internal/object/object.go b/internal/object/object.go
index 9e54253..be0591e 100644
--- a/internal/object/object.go
+++ b/internal/object/object.go
@@ -114,8 +114,22 @@ func FromEvent(ctx context.Context, db *database.DB, ev *event.Event) (*Object,
                return nil, fmt.Errorf("failed to upsert object id tags: %w", err)
        }

+       if tx.DriverName() == database.MySQL {
+               // Without referring an index, the query will choose the suboptimal "range" join type query strategy. Doing so
+               // will result in a next-key lock, resulting in a potential deadlock.
+               // https://github.com/Icinga/icinga-notifications/pull/203#issuecomment-2232740619
+               //
+               // Note: MySQL/MariaDB only supports indexes for DELETEs when using multi-table statements.
+               // https://dev.mysql.com/doc/refman/8.0/en/index-hints.html
+               stmt = `DELETE object_extra_tag.* FROM "object_extra_tag" USE INDEX (idx_object_extra_tag_object_id_tag)`
+       } else {
+               // PostgreSQL just does the right thing by default :)
+               stmt = `DELETE FROM "object_extra_tag"`
+       }
+       stmt += ` WHERE "object_id" = :object_id`
+
        extraTag := &ExtraTagRow{ObjectId: newObject.ID}
-       _, err = tx.NamedExecContext(ctx, `DELETE FROM "object_extra_tag" WHERE "object_id" = :object_id`, extraTag)
+       _, err = tx.NamedExecContext(ctx, stmt, extraTag)
        if err != nil {
                return nil, fmt.Errorf("failed to delete object extra tags: %w", err)
        }
diff --git a/schema/mysql/schema.sql b/schema/mysql/schema.sql
index 37040cf..8810f14 100644
--- a/schema/mysql/schema.sql
+++ b/schema/mysql/schema.sql
@@ -269,6 +269,10 @@ CREATE TABLE object_extra_tag (
     CONSTRAINT fk_object_extra_tag_object FOREIGN KEY (object_id) REFERENCES object(id)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

+-- This index is identical to the primary key, but is allowed to be used as an index hint in both MySQL and MariaDB.
+-- Using it for DELETE statements optimizes the query's "join type" from "range" to "ref".
+CREATE INDEX idx_object_extra_tag_object_id_tag ON object_extra_tag(object_id, tag);
+
 CREATE TABLE event (
     id bigint NOT NULL AUTO_INCREMENT,
     time bigint NOT NULL,

Footnotes

  1. Creating an index just for object_id without key does not change a thing.

  2. Enabling the InnoDB Monitor and Lock Monitor does help and shows more output in SHOW ENGINE INNODB STATUS, https://mariadb.com/kb/en/innodb-monitors/.

@julianbrost
Copy link
Collaborator

First, I have added another INDEX being identical to the primary key.

I'm not entirely sure what the purpose of that second identical index would have been here. Should it have some or did you just miss the following sentence in the documentation? 😅

To refer to a primary key, use the name PRIMARY.

@oxzi
Copy link
Member

oxzi commented Jul 19, 2024

First, I have added another INDEX being identical to the primary key.

I'm not entirely sure what the purpose of that second identical index would have been here. Should it have some or did you just miss the following sentence in the documentation? 😅

To refer to a primary key, use the name PRIMARY.

Honestly, I was not quite sure if this would work in MariaDB, thus I wanted to be certain. For example, MariaDB's "documentation" does not mention USE KEY or keys at all, https://mariadb.com/kb/en/use-index/, https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/.

I just tested it, and both USE KEY (PRIMARY) and USE INDEX (PRIMARY) do work on MariaDB. However, it "works" as good as the duplicate index does, not very good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MySQL Schema?
6 participants