Commit 3cf2c7a
committed
Retry once after getting a deadlock when attempting to decrement a semaphore
This tries to address a tricky deadlock we've seen about once every couple of days,
where three jobs that compete for the semaphore are enqueued at the same time.
One of them wins at creating the semaphore, and the other two transactions acquire
a shared lock over the just created semaphore row, by key. Then, they try to upgrade
that lock to an exclusive lock to perform an UPDATE (attempting to decrement the
semaphore), leading to a deadlock because each one of them is waiting for the other one
to release the shared lock.
From `SHOW ENGINE INNODB STATUS`:
```
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-12-27 07:57:28 140410341029440
*** (1) TRANSACTION:
TRANSACTION 1972990032, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012240, OS thread handle 140409154041408, query id 7398762432 bigip-vip-new.rw-ash-int.37signals.com 10.20.0.24 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.002702' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (2) TRANSACTION:
TRANSACTION 1972990013, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012575, OS thread handle 140275687212608, query id 7398762530 bigip-vip.sc-chi-int.37signals.com 10.10.0.37 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.007153' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** WE ROLL BACK TRANSACTION (2)
```
With this change, on the transaction that gets killed because of the deadlock,
we'll try to wait again, but this time without having a shared lock because we
won't try to create the semaphore, we know the semaphore is already created.
A problem that could happen here would be something deleting the semaphore while
we're retrying. However, that should be ok as we only delete semaphores as part
of periodic maintenance, and that happens only for expired semaphores. This retry
is necessary when the semaphore just got created, so we can assume it won't expire
and will be deleted under us right on the very same moment.1 parent 8606ec8 commit 3cf2c7a
File tree
2 files changed
+71
-50
lines changed- app/models/solid_queue
- test/integration
2 files changed
+71
-50
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | | - | |
5 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
6 | 7 | | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
11 | 12 | | |
12 | | - | |
13 | | - | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
14 | 16 | | |
15 | | - | |
16 | 17 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
22 | 23 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | | - | |
30 | 31 | | |
31 | | - | |
32 | | - | |
33 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
35 | | - | |
36 | | - | |
| 36 | + | |
| 37 | + | |
37 | 38 | | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
44 | 45 | | |
45 | | - | |
46 | | - | |
47 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
48 | 54 | | |
49 | | - | |
50 | | - | |
51 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
52 | 58 | | |
53 | | - | |
54 | | - | |
55 | | - | |
| 59 | + | |
| 60 | + | |
56 | 61 | | |
57 | | - | |
58 | | - | |
59 | | - | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
60 | 66 | | |
61 | | - | |
62 | | - | |
63 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
64 | 85 | | |
65 | 86 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
0 commit comments