-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demotion reloaded #2
Demotion reloaded #2
Conversation
To implement a new throttling policy for RT cgroups, the already existing mechanism is removed from rt.c. Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
The runtime of RT tasks controlled by CGroups are enforced by the SCHED_DEADLINE scheduling class, based on the runtime and period (the deadline is set equal to the period) parameters. sched_dl_entity may also represent a group of RT tasks, providing a rt_rq. Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Add pointer to rt_rq used when a demoted task was demoted (before any migrations of the demoted task). This allows locking the correct rq for lists manipulation of the rt_se's cfs_throttle_task. Signed-off-by: Andres Oportus <andresoportus@google.com>
…ottled dl (rt group) tasks
/* | ||
if (running) | ||
put_prev_task(rq, p); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied to the email, but I do not see replies here... So, here it is again:
this function is invoked by cfs_throttle_rt_tasks(), that is invoked by update_curr_rt().
Invoking put_prev_task() would result in another invocation of update_curr_rt(), potentially causing some issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Also, considering that put_prev_task_rt() would only enqueue the task in the pushable list (and we don't want that). It seems save to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I remember: I removed it due to some crash I was seeing, that I decided to be caused by infinite recursion (update_curr_rt -> cfs_throttle_rt_tasks -> __setprio_fifo -> put_prev_task_rt -> update_curr_rt -> cfs_throttle_rt_tasks -> ...)
enqueue_task(cpu_rq(cpu), p, ENQUEUE_REPLENISH | ENQUEUE_MOVE | ENQUEUE_RESTORE); | ||
|
||
check_class_changed(cpu_rq(cpu), p, prev_class, oldprio); | ||
out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This label doesn't seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmm... Right. I suspect is a leftover from some previous change; I am going to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I checked:
your original patch contained an
-
if (p->sched_class == &rt_sched_class)
-
goto out;
near the beginning of __setprio_fifo().
Since I think that entering __setprio_fifo() with sched_class == rt_sched_class, I changed this to
-
BUG_ON(p->sched_class == &rt_sched_class);
but I forgot to remove the "out:" label.
On 22 June 2017 at 12:37, Juri Lelli ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In kernel/sched/core.c
<#2 (comment)>:
> + const struct sched_class *prev_class;
+
+ lockdep_assert_held(&rq->lock);
+
+ oldprio = p->prio;
+ prev_class = p->sched_class;
+ queued = task_on_rq_queued(p);
+ running = task_current(rq, p);
+ BUG_ON(!rt_throttled(p));
+
+ if (queued)
+ dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_MOVE);
+/*
+ if (running)
+ put_prev_task(rq, p);
+*/
Why is this commented out?
This is the code demoting a task from RT to CFS; it is invoked when the
runtime (of the dl_entity associated with the RT runqueue) becomes
negative. This is done by update_curr_rt(). Invoking put_prev_task() would
invoke update_curr_rt() again, resulting in potential issues.
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEC-4fLGS8A7NjYaabOQDuElCymqQ7xjks5sGkPZgaJpZM4N_e0M>
.
|
So, periodic and periodic1 doesn't seem to have problems (anymore).
|
Uhm... This crash looks like the previous one... I am going to check if I
can reproduce it
…On 22 June 2017 at 14:21, Juri Lelli ***@***.***> wrote:
So, periodic and periodic1 doesn't seem to have problems (anymore).
But, periodic2 generates the following (when run for over 100 sec):
[ 147.659662] Unable to handle kernel NULL pointer dereference at virtual address 00000038
[ 147.667862] pgd = ffffff800a7ac000
[ 147.671300] [00000038] *pgd=000000007bffe003, *pud=000000007bffe003, *pmd=0000000000000000
[ 147.679683] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 147.685326] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 147.694823] Hardware name: HiKey Development Board (DT)
[ 147.700106] task: ffffffc035156100 ti: ffffffc035158000 task.ti: ffffffc035158000
[ 147.707681] PC is at set_next_entity+0x2c/0x10a0
[ 147.712351] LR is at pick_next_task_fair+0xb0/0xd10
[ 147.717281] pc : [<ffffff800810a3d8>] lr : [<ffffff800811908c>] pstate: 600001c5
[ 147.724759] sp : ffffffc03515bd50
[ 147.728107] x29: ffffffc03515bd50 x28: ffffff8008d60428
[ 147.733491] x27: ffffff8008d60000 x26: ffffffc0794a6f80
[ 147.738873] x25: ffffffc035156700 x24: 0000000000000000
[ 147.744254] x23: ffffff8009854000 x22: ffffffc0794a6f98 [ 147.749209] CPU0: update max cpu_capacity 1024
[ 147.753970]
[ 147.755650] x21: ffffffc0794a7038 x20: ffffffc0794a6f80
[ 147.761028] x19: 0000000000000000 x18: 0000000000000000
[ 147.766405] x17: 0000000000000000 x16: 0000000000000000
[ 147.771783] x15: 0000000000000000 x14: 0000000000000000
[ 147.777160] x13: 0000000000000000 x12: 0000000034d5d91d
[ 147.782537] x11: ffffff8008d60420 x10: 0000000000000005
[ 147.787914] x9 : ffffff80098f7000 x8 : 0000000000000004
[ 147.793292] x7 : ffffff8008d40980 x6 : 0000000000000000
[ 147.798668] x5 : 0000000000000080 x4 : ffffff8008118fdc
[ 147.804044] x3 : 0000000000000001 x2 : ffffff80081069ec
[ 147.809421] x1 : 0000000000000000 x0 : ffffff800811908c
[ 147.814801]
[ 147.814801] SP: 0xffffffc03515bcd0:
[ 147.819817] bcd0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000 35156700 ffffffc0
[ 147.828134] bcf0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80 3515bd50 ffffffc0
[ 147.836448] bd10 0811908c ffffff80 3515bd50 ffffffc0 0810a3d8 ffffff80 600001c5 00000000
[ 147.844763] bd30 3515bd80 ffffffc0 081316d4 ffffff80 ffffffff ffffffff 35158000 ffffffc0
[ 147.853079] bd50 3515bde0 ffffffc0 0811908c ffffff80 00000000 00000000 794a6f80 ffffffc0
[ 147.861393] bd70 794a7038 ffffffc0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000
[ 147.869708] bd90 35156700 ffffffc0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80
[ 147.878024] bdb0 794a7038 ffffffc0 794a6f80 ffffffc0 794a7038 ffffffc0 794a6f98 ffffffc0
[ 147.886347]
[ 147.886347] X20: 0xffffffc0794a6f00:
[ 147.891451] 6f00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.899767] 6f20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.908082] 6f40 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.916397] 6f60 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.924713] 6f80 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0 099b0f60 ffffff80
[ 147.933028] 6fa0 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80 00000003 00000000
[ 147.941343] 6fc0 080fc5e8 ffffff80 00000001 00000000 00000000 00000000 00000000 00000000
[ 147.949658] 6fe0 00000004 00000000 00000010 00000000 0000001b 00000000 ffff6b0f 00000000
[ 147.957974]
[ 147.957974] X21: 0xffffffc0794a6fb8:
[ 147.963078] 6fb8 00000003 00000000 080fc5e8 ffffff80 00000001 00000000 00000000 00000000
[ 147.971394] 6fd8 00000000 00000000 00000004 00000000 00000010 00000000 0000001b 00000000
[ 147.979709] 6ff8 ffff6b0f 00000000 00000000 00000000 00000000 00000000 00000001 00000000
[ 147.988025] 7018 000000ce 00000000 00000000 00000000 000043b8 00000000 00007b88 00000000
[ 147.996339] 7038 000000ce 00000000 00000000 00000000 00000001 00000001 7db4db90 00000008
[ 148.004654] 7058 18552f7c 00000034 44a07210 ffffffc0 00000000 00000000 00000000 00000000
[ 148.012969] 7078 00000000 00000000 00000000 00000000 00000000 00000000 0000001e 00000000
[ 148.021284] 7098 6132d986 00000022 002356fe 00000000 00b376ae 00000052 00000030 00000000
[ 148.029600]
[ 148.029600] X22: 0xffffffc0794a6f18:
[ 148.034704] 6f18 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.043018] 6f38 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.051334] 6f58 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.059649] 6f78 00000000 00000000 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0
[ 148.067964] 6f98 099b0f60 ffffff80 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80
[ 148.076279] 6fb8 00000003 00000000 080fc5e8 ffffff80 00000001 00000000 00000000 00000000
[ 148.084594] 6fd8 00000000 00000000 00000004 00000000 00000010 00000000 0000001b 00000000
[ 148.092909] 6ff8 ffff6b0f 00000000 00000000 00000000 00000000 00000000 00000001 00000000
[ 148.101226]
[ 148.101226] X25: 0xffffffc035156680:
[ 148.106330] 6680 00000001 00000000 00000000 00000000 00000001 00000000 00000000 00000000
[ 148.114645] 66a0 00000000 00000000 00000000 00000000 00000000 dead4ead ffffffff 00000000
[ 148.122960] 66c0 ffffffff ffffffff 099ad6e8 ffffff80 00000000 00000000 00000000 00000000
[ 148.131276] 66e0 0900a3b0 ffffff80 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.139592] 6700 00002b64 00000000 03938700 00000000 03938700 00000000 00000000 00000000
[ 148.147907] 6720 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.156223] 6740 35156740 ffffffc0 35156740 ffffffc0 35156750 ffffffc0 35156750 ffffffc0
[ 148.164539] 6760 35156760 ffffffc0 35156760 ffffffc0 00000000 00000000 3d922040 ffffffc0
[ 148.172857]
[ 148.172857] X26: 0xffffffc0794a6f00:
[ 148.177960] 6f00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.186275] 6f20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.194590] 6f40 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.202905] 6f60 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.211220] 6f80 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0 099b0f60 ffffff80
[ 148.219535] 6fa0 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80 00000003 00000000
[ 148.227850] 6fc0 080fc5e8 ffffff80 00000001 00000000 00000000 00000000 00000000 00000000
[ 148.236165] 6fe0 00000004 00000000 00000010 00000000 0000001b 00000000 ffff6b0f 00000000
[ 148.244482]
[ 148.244482] X29: 0xffffffc03515bcd0:
[ 148.249585] bcd0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000 35156700 ffffffc0
[ 148.257900] bcf0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80 3515bd50 ffffffc0
[ 148.266215] bd10 0811908c ffffff80 3515bd50 ffffffc0 0810a3d8 ffffff80 600001c5 00000000
[ 148.274531] bd30 3515bd80 ffffffc0 081316d4 ffffff80 ffffffff ffffffff 35158000 ffffffc0
[ 148.282846] bd50 3515bde0 ffffffc0 0811908c ffffff80 00000000 00000000 794a6f80 ffffffc0
[ 148.291161] bd70 794a7038 ffffffc0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000
[ 148.299477] bd90 35156700 ffffffc0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80
[ 148.307792] bdb0 794a7038 ffffffc0 794a6f80 ffffffc0 794a7038 ffffffc0 794a6f98 ffffffc0
[ 148.316107]
[ 148.317611] Process swapper/3 (pid: 0, stack limit = 0xffffffc035158020)
[ 148.324384] Stack: (0xffffffc03515bd50 to 0xffffffc03515c000)
[ 148.330191] bd40: ffffffc03515bde0 ffffff800811908c
[ 148.338107] bd60: 0000000000000000 ffffffc0794a6f80 ffffffc0794a7038 ffffffc0794a6f98
[ 148.346022] bd80: ffffff8009854000 0000000000000000 ffffffc035156700 ffffffc0794a6f80
[ 148.353937] bda0: ffffff8008d60000 ffffff8008d60428 ffffffc0794a7038 ffffffc0794a6f80
[ 148.361852] bdc0: ffffffc0794a7038 ffffffc0794a6f98 ffffff8009854000 ffffff80081316d4
[ 148.369767] bde0: ffffffc03515be90 ffffff8008d40cb4 ffffffc0794a6f80 ffffffc035156100
[ 148.377683] be00: 0000000000000000 ffffffc0794a6f98 ffffff8009854000 0000000000000000
[ 148.385598] be20: ffffffc035156700 ffffffc0794a6f80 ffffff8008d60000 ffffff8008d60428
[ 148.393513] be40: ffffff80093def80 ffffffc035156100 ffffffc0794a7038 ffffff80098f7cd0
[ 148.401428] be60: ffffffc0794a7038 ffffff8008d605f0 ffffffc000000000 ffffffc035156100
[ 148.409343] be80: ffffffc0794a6f80 ffffffc035156100 ffffffc03515bf20 ffffff8008d41574
[ 148.417258] bea0: ffffffc035158000 ffffff8008d5f000 ffffff80099a0000 ffffffc071d94400
[ 148.425173] bec0: ffffff8009946ab8 ffffff8009218cc0 ffffff80093ddc50 ffffffc035158000
[ 148.433088] bee0: ffffff800999e000 ffffff8009852000 ffffffc03515bf20 ffffff8008d4156c
[ 148.441004] bf00: ffffffc035158000 ffffff8008d5f000 ffffff80099a0000 ffffff8008d41574
[ 148.448919] bf20: ffffffc03515bf40 ffffff8008d415f0 ffffff8009852000 ffffff8008d5f000
[ 148.456834] bf40: ffffffc03515bf50 ffffff8008121754 ffffffc03515bfc0 ffffff8008090e64
[ 148.464749] bf60: 0000000000000003 ffffff800989e080 ffffffc035158000 0000000000000000
[ 148.472663] bf80: 0000000000000000 0000000000000000 00000000027a9000 00000000027ac000
[ 148.480579] bfa0: ffffff80080828d0 0000000000000000 00000000ffffffff ffffffc035158000
[ 148.488494] bfc0: 0000000000000000 0000000000d4d03c 0000000034d5d91d 0000000000000e12
[ 148.496409] bfe0: 0000000000000000 0000000000000000 00ee003e00e900a5 e9db62ffd3fb42ff
[ 148.504322] Call trace:
[ 148.506793] Exception stack(0xffffffc03515bb80 to 0xffffffc03515bcb0)
[ 148.513303] bb80: 0000000000000000 0000008000000000 ffffffc03515bd50 ffffff800810a3d8
[ 148.521218] bba0: 0000000000000055 0000000000000114 ffffffc03515bcd0 ffffff8008136434
[ 148.529134] bbc0: ffffffc035158000 ffffff800a6f0288 0000000000000000 0000000000000000
[ 148.537048] bbe0: 0000000000000002 0000000000000001 0000000000000000 ffffff800816cae8
[ 148.544963] bc00: 00000000000001c0 ffffff80099a0468 0000000000000000 0000000000000000
[ 148.552878] bc20: ffffff800811908c 0000000000000000 ffffff80081069ec 0000000000000001
[ 148.560793] bc40: ffffff8008118fdc 0000000000000080 0000000000000000 ffffff8008d40980
[ 148.568708] bc60: 0000000000000004 ffffff80098f7000 0000000000000005 ffffff8008d60420
[ 148.576622] bc80: 0000000034d5d91d 0000000000000000 0000000000000000 0000000000000000
[ 148.584536] bca0: 0000000000000000 0000000000000000
[ 148.589468] [<ffffff800810a3d8>] set_next_entity+0x2c/0x10a0
[ 148.595189] [<ffffff800811908c>] pick_next_task_fair+0xb0/0xd10
[ 148.601176] [<ffffff8008d40cb4>] __schedule+0x420/0xc10
[ 148.606458] [<ffffff8008d41574>] schedule+0x40/0xa0
[ 148.611389] [<ffffff8008d415f0>] schedule_preempt_disabled+0x1c/0x2c
[ 148.617815] [<ffffff8008121754>] cpu_startup_entry+0x13c/0x464
[ 148.623713] [<ffffff8008090e64>] secondary_start_kernel+0x164/0x1b4
[ 148.630046] [<0000000000d4d03c>] 0xd4d03c
[ 148.634099] Code: aa0103f3 aa0003f5 aa1e03e0 d503201f (b9403a60)
[ 148.749686] BUG: spinlock lockup suspected on CPU#0, kworker/0:1/578
[ 148.756117] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 148.764563] CPU: 0 PID: 578 Comm: kworker/0:1 Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 148.775637] Hardware name: HiKey Development Board (DT)
[ 148.780924] Workqueue: events_freezable thermal_zone_device_check
[ 148.787086] Call trace:
[ 148.789559] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 148.795016] [<ffffff800808b098>] show_stack+0x20/0x28
[ 148.800125] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 148.805231] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 148.810248] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 148.816057] [<ffffff8008d46fb4>] _raw_spin_lock_irqsave+0x78/0x98
[ 148.822217] [<ffffff8008123a60>] cpufreq_notifier_trans+0x128/0x14c
[ 148.828552] [<ffffff80080ef154>] notifier_call_chain+0x64/0x9c
[ 148.834449] [<ffffff80080efbdc>] __srcu_notifier_call_chain+0xa0/0xf0
[ 148.840958] [<ffffff80080efc64>] srcu_notifier_call_chain+0x38/0x44
[ 148.847296] [<ffffff80088f5644>] cpufreq_notify_transition+0xfc/0x2e0
[ 148.853807] [<ffffff80088f7bec>] cpufreq_freq_transition_end+0x3c/0xb0
[ 148.860405] [<ffffff80088f84a0>] __cpufreq_driver_target+0x1dc/0x320
[ 148.866829] [<ffffff80088fa460>] cpufreq_governor_performance+0x50/0x60
[ 148.873516] [<ffffff80088f6034>] __cpufreq_governor+0xb8/0x1ec
[ 148.879411] [<ffffff80088f6994>] cpufreq_set_policy+0x2ac/0x3f0
[ 148.885394] [<ffffff80088f9164>] cpufreq_update_policy+0x84/0x114
[ 148.891555] [<ffffff80088da4ec>] cpufreq_set_cur_state+0x64/0x94
[ 148.897626] [<ffffff80088d4ca4>] thermal_cdev_update.part.26+0x9c/0x22c
[ 148.904312] [<ffffff80088d5b48>] power_actor_set_power+0x70/0x9c
[ 148.910384] [<ffffff80088d9bc0>] power_allocator_throttle+0x4c8/0xad8
[ 148.916893] [<ffffff80088d4e9c>] handle_thermal_trip.part.21+0x68/0x334
[ 148.923579] [<ffffff80088d56e4>] thermal_zone_device_update+0xb8/0x280
[ 148.930177] [<ffffff80088d58cc>] thermal_zone_device_check+0x20/0x2c
[ 148.936601] [<ffffff80080e55a8>] process_one_work+0x1f8/0x70c
[ 148.942408] [<ffffff80080e5bf8>] worker_thread+0x13c/0x4a4
[ 148.947953] [<ffffff80080ed5cc>] kthread+0xe8/0xfc
[ 148.952796] [<ffffff8008085ed0>] ret_from_fork+0x10/0x40
[ 149.166404] BUG: spinlock lockup suspected on CPU#4, periodic2.sh/2858
[ 149.173013] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 149.181457] CPU: 4 PID: 2858 Comm: periodic2.sh Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 149.192707] Hardware name: HiKey Development Board (DT)
[ 149.197985] Call trace:
[ 149.200458] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 149.205915] [<ffffff800808b098>] show_stack+0x20/0x28
[ 149.211021] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 149.216126] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 149.221144] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 149.226952] [<ffffff8008d46f20>] _raw_spin_lock+0x6c/0x88
[ 149.232411] [<ffffff80080fae64>] __task_rq_lock+0x58/0xdc
[ 149.237868] [<ffffff80080ffb10>] wake_up_new_task+0xdc/0x318
[ 149.243588] [<ffffff80080c5f14>] _do_fork+0xfc/0x6f0
[ 149.248606] [<ffffff80080c6658>] SyS_clone+0x44/0x50
[ 149.253623] [<ffffff8008085f30>] el0_svc_naked+0x24/0x28
[ 149.640303] BUG: spinlock lockup suspected on CPU#3, swapper/3/0
[ 149.646376] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 149.654819] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 149.665542] Hardware name: HiKey Development Board (DT)
[ 149.670820] Call trace:
[ 149.673291] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 149.678748] [<ffffff800808b098>] show_stack+0x20/0x28
[ 149.683853] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 149.688959] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 149.693976] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 149.699783] [<ffffff8008d46f20>] _raw_spin_lock+0x6c/0x88
[ 149.705240] [<ffffff80080fc5e8>] scheduler_tick+0x50/0x2ac
[ 149.710785] [<ffffff800815cf78>] update_process_times+0x58/0x70
[ 149.716769] [<ffffff80081703a4>] tick_sched_timer+0x7c/0xfc
[ 149.722401] [<ffffff800815d5b8>] __hrtimer_run_queues+0x164/0x624
[ 149.728560] [<ffffff800815eb74>] hrtimer_interrupt+0xb0/0x1f4
[ 149.734369] [<ffffff8008933150>] arch_timer_handler_phys+0x3c/0x48
[ 149.740618] [<ffffff8008149e90>] handle_percpu_devid_irq+0xe8/0x3d0
[ 149.746954] [<ffffff8008145104>] generic_handle_irq+0x34/0x4c
[ 149.752761] [<ffffff80081451ac>] __handle_domain_irq+0x90/0xf8
[ 149.758656] [<ffffff8008082544>] gic_handle_irq+0x64/0xc4
[ 149.764113] Exception stack(0xffffffc0792e0050 to 0xffffffc0792e0180)
[ 149.770622] 0040: ffffffc03515b900 0000008000000000
[ 149.778537] 0060: ffffffc03515ba30 ffffff8008d47230 0000000060000145 ffffffc035156100
[ 149.786452] 0080: ffffffc03515ba30 ffffffc03515b900 0000000000000000 0000000000000000
[ 149.794366] 00a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.802281] 00c0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.810196] 00e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.818111] 0100: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.826026] 0120: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.833940] 0140: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.841855] 0160: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.849770] [<ffffff80080857b8>] el1_irq+0xb8/0x130
[ 149.854699] [<ffffff8008d47230>] _raw_spin_unlock_irq+0x3c/0x74
[ 149.860682] [<ffffff800808b168>] die+0xc8/0x1b4
[ 149.865262] [<ffffff800809b36c>] __do_kernel_fault.part.6+0x7c/0x90
[ 149.871601] [<ffffff80080986a0>] do_translation_fault+0x0/0xec
[ 149.877498] [<ffffff8008098768>] do_translation_fault+0xc8/0xec
[ 149.883481] [<ffffff80080822e0>] do_mem_abort+0x54/0xb4
[ 149.888761] Exception stack(0xffffffc03515bb80 to 0xffffffc03515bcb0)
[ 149.895272] bb80: 0000000000000000 0000008000000000 ffffffc03515bd50 ffffff800810a3d8
[ 149.903187] bba0: 0000000000000055 0000000000000114 ffffffc03515bcd0 ffffff8008136434
[ 149.911102] bbc0: ffffffc035158000 ffffff800a6f0288 0000000000000000 0000000000000000
[ 149.919017] bbe0: 0000000000000002 0000000000000001 0000000000000000 ffffff800816cae8
[ 149.926932] bc00: 00000000000001c0 ffffff80099a0468 0000000000000000 0000000000000000
[ 149.934847] bc20: ffffff800811908c 0000000000000000 ffffff80081069ec 0000000000000001
[ 149.942762] bc40: ffffff8008118fdc 0000000000000080 0000000000000000 ffffff8008d40980
[ 149.950676] bc60: 0000000000000004 ffffff80098f7000 0000000000000005 ffffff8008d60420
[ 149.958591] bc80: 0000000034d5d91d 0000000000000000 0000000000000000 0000000000000000
[ 149.966505] bca0: 0000000000000000 0000000000000000
[ 149.971434] [<ffffff80080855c8>] el1_da+0x18/0x78
[ 149.976189] [<ffffff800811908c>] pick_next_task_fair+0xb0/0xd10
[ 149.982174] [<ffffff8008d40cb4>] __schedule+0x420/0xc10
[ 149.987456] [<ffffff8008d41574>] schedule+0x40/0xa0
[ 149.992387] [<ffffff8008d415f0>] schedule_preempt_disabled+0x1c/0x2c
[ 149.998809] [<ffffff8008121754>] cpu_startup_entry+0x13c/0x464
[ 150.004705] [<ffffff8008090e64>] secondary_start_kernel+0x164/0x1b4
[ 150.011038] [<0000000000d4d03c>] 0xd4d03c
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEC-4YWfzBPD-4HPG8jqsK4QVis1e1Kfks5sGlxGgaJpZM4N_e0M>
.
|
So, it crashes after it finished to switch between RT and OTHER?
Luca
…On 22 June 2017 at 14:21, Juri Lelli ***@***.***> wrote:
So, periodic and periodic1 doesn't seem to have problems (anymore).
But, periodic2 generates the following (when run for over 100 sec):
[ 147.659662] Unable to handle kernel NULL pointer dereference at virtual address 00000038
[ 147.667862] pgd = ffffff800a7ac000
[ 147.671300] [00000038] *pgd=000000007bffe003, *pud=000000007bffe003, *pmd=0000000000000000
[ 147.679683] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 147.685326] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 147.694823] Hardware name: HiKey Development Board (DT)
[ 147.700106] task: ffffffc035156100 ti: ffffffc035158000 task.ti: ffffffc035158000
[ 147.707681] PC is at set_next_entity+0x2c/0x10a0
[ 147.712351] LR is at pick_next_task_fair+0xb0/0xd10
[ 147.717281] pc : [<ffffff800810a3d8>] lr : [<ffffff800811908c>] pstate: 600001c5
[ 147.724759] sp : ffffffc03515bd50
[ 147.728107] x29: ffffffc03515bd50 x28: ffffff8008d60428
[ 147.733491] x27: ffffff8008d60000 x26: ffffffc0794a6f80
[ 147.738873] x25: ffffffc035156700 x24: 0000000000000000
[ 147.744254] x23: ffffff8009854000 x22: ffffffc0794a6f98 [ 147.749209] CPU0: update max cpu_capacity 1024
[ 147.753970]
[ 147.755650] x21: ffffffc0794a7038 x20: ffffffc0794a6f80
[ 147.761028] x19: 0000000000000000 x18: 0000000000000000
[ 147.766405] x17: 0000000000000000 x16: 0000000000000000
[ 147.771783] x15: 0000000000000000 x14: 0000000000000000
[ 147.777160] x13: 0000000000000000 x12: 0000000034d5d91d
[ 147.782537] x11: ffffff8008d60420 x10: 0000000000000005
[ 147.787914] x9 : ffffff80098f7000 x8 : 0000000000000004
[ 147.793292] x7 : ffffff8008d40980 x6 : 0000000000000000
[ 147.798668] x5 : 0000000000000080 x4 : ffffff8008118fdc
[ 147.804044] x3 : 0000000000000001 x2 : ffffff80081069ec
[ 147.809421] x1 : 0000000000000000 x0 : ffffff800811908c
[ 147.814801]
[ 147.814801] SP: 0xffffffc03515bcd0:
[ 147.819817] bcd0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000 35156700 ffffffc0
[ 147.828134] bcf0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80 3515bd50 ffffffc0
[ 147.836448] bd10 0811908c ffffff80 3515bd50 ffffffc0 0810a3d8 ffffff80 600001c5 00000000
[ 147.844763] bd30 3515bd80 ffffffc0 081316d4 ffffff80 ffffffff ffffffff 35158000 ffffffc0
[ 147.853079] bd50 3515bde0 ffffffc0 0811908c ffffff80 00000000 00000000 794a6f80 ffffffc0
[ 147.861393] bd70 794a7038 ffffffc0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000
[ 147.869708] bd90 35156700 ffffffc0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80
[ 147.878024] bdb0 794a7038 ffffffc0 794a6f80 ffffffc0 794a7038 ffffffc0 794a6f98 ffffffc0
[ 147.886347]
[ 147.886347] X20: 0xffffffc0794a6f00:
[ 147.891451] 6f00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.899767] 6f20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.908082] 6f40 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.916397] 6f60 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 147.924713] 6f80 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0 099b0f60 ffffff80
[ 147.933028] 6fa0 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80 00000003 00000000
[ 147.941343] 6fc0 080fc5e8 ffffff80 00000001 00000000 00000000 00000000 00000000 00000000
[ 147.949658] 6fe0 00000004 00000000 00000010 00000000 0000001b 00000000 ffff6b0f 00000000
[ 147.957974]
[ 147.957974] X21: 0xffffffc0794a6fb8:
[ 147.963078] 6fb8 00000003 00000000 080fc5e8 ffffff80 00000001 00000000 00000000 00000000
[ 147.971394] 6fd8 00000000 00000000 00000004 00000000 00000010 00000000 0000001b 00000000
[ 147.979709] 6ff8 ffff6b0f 00000000 00000000 00000000 00000000 00000000 00000001 00000000
[ 147.988025] 7018 000000ce 00000000 00000000 00000000 000043b8 00000000 00007b88 00000000
[ 147.996339] 7038 000000ce 00000000 00000000 00000000 00000001 00000001 7db4db90 00000008
[ 148.004654] 7058 18552f7c 00000034 44a07210 ffffffc0 00000000 00000000 00000000 00000000
[ 148.012969] 7078 00000000 00000000 00000000 00000000 00000000 00000000 0000001e 00000000
[ 148.021284] 7098 6132d986 00000022 002356fe 00000000 00b376ae 00000052 00000030 00000000
[ 148.029600]
[ 148.029600] X22: 0xffffffc0794a6f18:
[ 148.034704] 6f18 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.043018] 6f38 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.051334] 6f58 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.059649] 6f78 00000000 00000000 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0
[ 148.067964] 6f98 099b0f60 ffffff80 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80
[ 148.076279] 6fb8 00000003 00000000 080fc5e8 ffffff80 00000001 00000000 00000000 00000000
[ 148.084594] 6fd8 00000000 00000000 00000004 00000000 00000010 00000000 0000001b 00000000
[ 148.092909] 6ff8 ffff6b0f 00000000 00000000 00000000 00000000 00000000 00000001 00000000
[ 148.101226]
[ 148.101226] X25: 0xffffffc035156680:
[ 148.106330] 6680 00000001 00000000 00000000 00000000 00000001 00000000 00000000 00000000
[ 148.114645] 66a0 00000000 00000000 00000000 00000000 00000000 dead4ead ffffffff 00000000
[ 148.122960] 66c0 ffffffff ffffffff 099ad6e8 ffffff80 00000000 00000000 00000000 00000000
[ 148.131276] 66e0 0900a3b0 ffffff80 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.139592] 6700 00002b64 00000000 03938700 00000000 03938700 00000000 00000000 00000000
[ 148.147907] 6720 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.156223] 6740 35156740 ffffffc0 35156740 ffffffc0 35156750 ffffffc0 35156750 ffffffc0
[ 148.164539] 6760 35156760 ffffffc0 35156760 ffffffc0 00000000 00000000 3d922040 ffffffc0
[ 148.172857]
[ 148.172857] X26: 0xffffffc0794a6f00:
[ 148.177960] 6f00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.186275] 6f20 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.194590] 6f40 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.202905] 6f60 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 148.211220] 6f80 f009f008 dead4ead 00000003 00000000 35156100 ffffffc0 099b0f60 ffffff80
[ 148.219535] 6fa0 09bcd430 ffffff80 09bdbac0 ffffff80 0900fe08 ffffff80 00000003 00000000
[ 148.227850] 6fc0 080fc5e8 ffffff80 00000001 00000000 00000000 00000000 00000000 00000000
[ 148.236165] 6fe0 00000004 00000000 00000010 00000000 0000001b 00000000 ffff6b0f 00000000
[ 148.244482]
[ 148.244482] X29: 0xffffffc03515bcd0:
[ 148.249585] bcd0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000 35156700 ffffffc0
[ 148.257900] bcf0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80 3515bd50 ffffffc0
[ 148.266215] bd10 0811908c ffffff80 3515bd50 ffffffc0 0810a3d8 ffffff80 600001c5 00000000
[ 148.274531] bd30 3515bd80 ffffffc0 081316d4 ffffff80 ffffffff ffffffff 35158000 ffffffc0
[ 148.282846] bd50 3515bde0 ffffffc0 0811908c ffffff80 00000000 00000000 794a6f80 ffffffc0
[ 148.291161] bd70 794a7038 ffffffc0 794a6f98 ffffffc0 09854000 ffffff80 00000000 00000000
[ 148.299477] bd90 35156700 ffffffc0 794a6f80 ffffffc0 08d60000 ffffff80 08d60428 ffffff80
[ 148.307792] bdb0 794a7038 ffffffc0 794a6f80 ffffffc0 794a7038 ffffffc0 794a6f98 ffffffc0
[ 148.316107]
[ 148.317611] Process swapper/3 (pid: 0, stack limit = 0xffffffc035158020)
[ 148.324384] Stack: (0xffffffc03515bd50 to 0xffffffc03515c000)
[ 148.330191] bd40: ffffffc03515bde0 ffffff800811908c
[ 148.338107] bd60: 0000000000000000 ffffffc0794a6f80 ffffffc0794a7038 ffffffc0794a6f98
[ 148.346022] bd80: ffffff8009854000 0000000000000000 ffffffc035156700 ffffffc0794a6f80
[ 148.353937] bda0: ffffff8008d60000 ffffff8008d60428 ffffffc0794a7038 ffffffc0794a6f80
[ 148.361852] bdc0: ffffffc0794a7038 ffffffc0794a6f98 ffffff8009854000 ffffff80081316d4
[ 148.369767] bde0: ffffffc03515be90 ffffff8008d40cb4 ffffffc0794a6f80 ffffffc035156100
[ 148.377683] be00: 0000000000000000 ffffffc0794a6f98 ffffff8009854000 0000000000000000
[ 148.385598] be20: ffffffc035156700 ffffffc0794a6f80 ffffff8008d60000 ffffff8008d60428
[ 148.393513] be40: ffffff80093def80 ffffffc035156100 ffffffc0794a7038 ffffff80098f7cd0
[ 148.401428] be60: ffffffc0794a7038 ffffff8008d605f0 ffffffc000000000 ffffffc035156100
[ 148.409343] be80: ffffffc0794a6f80 ffffffc035156100 ffffffc03515bf20 ffffff8008d41574
[ 148.417258] bea0: ffffffc035158000 ffffff8008d5f000 ffffff80099a0000 ffffffc071d94400
[ 148.425173] bec0: ffffff8009946ab8 ffffff8009218cc0 ffffff80093ddc50 ffffffc035158000
[ 148.433088] bee0: ffffff800999e000 ffffff8009852000 ffffffc03515bf20 ffffff8008d4156c
[ 148.441004] bf00: ffffffc035158000 ffffff8008d5f000 ffffff80099a0000 ffffff8008d41574
[ 148.448919] bf20: ffffffc03515bf40 ffffff8008d415f0 ffffff8009852000 ffffff8008d5f000
[ 148.456834] bf40: ffffffc03515bf50 ffffff8008121754 ffffffc03515bfc0 ffffff8008090e64
[ 148.464749] bf60: 0000000000000003 ffffff800989e080 ffffffc035158000 0000000000000000
[ 148.472663] bf80: 0000000000000000 0000000000000000 00000000027a9000 00000000027ac000
[ 148.480579] bfa0: ffffff80080828d0 0000000000000000 00000000ffffffff ffffffc035158000
[ 148.488494] bfc0: 0000000000000000 0000000000d4d03c 0000000034d5d91d 0000000000000e12
[ 148.496409] bfe0: 0000000000000000 0000000000000000 00ee003e00e900a5 e9db62ffd3fb42ff
[ 148.504322] Call trace:
[ 148.506793] Exception stack(0xffffffc03515bb80 to 0xffffffc03515bcb0)
[ 148.513303] bb80: 0000000000000000 0000008000000000 ffffffc03515bd50 ffffff800810a3d8
[ 148.521218] bba0: 0000000000000055 0000000000000114 ffffffc03515bcd0 ffffff8008136434
[ 148.529134] bbc0: ffffffc035158000 ffffff800a6f0288 0000000000000000 0000000000000000
[ 148.537048] bbe0: 0000000000000002 0000000000000001 0000000000000000 ffffff800816cae8
[ 148.544963] bc00: 00000000000001c0 ffffff80099a0468 0000000000000000 0000000000000000
[ 148.552878] bc20: ffffff800811908c 0000000000000000 ffffff80081069ec 0000000000000001
[ 148.560793] bc40: ffffff8008118fdc 0000000000000080 0000000000000000 ffffff8008d40980
[ 148.568708] bc60: 0000000000000004 ffffff80098f7000 0000000000000005 ffffff8008d60420
[ 148.576622] bc80: 0000000034d5d91d 0000000000000000 0000000000000000 0000000000000000
[ 148.584536] bca0: 0000000000000000 0000000000000000
[ 148.589468] [<ffffff800810a3d8>] set_next_entity+0x2c/0x10a0
[ 148.595189] [<ffffff800811908c>] pick_next_task_fair+0xb0/0xd10
[ 148.601176] [<ffffff8008d40cb4>] __schedule+0x420/0xc10
[ 148.606458] [<ffffff8008d41574>] schedule+0x40/0xa0
[ 148.611389] [<ffffff8008d415f0>] schedule_preempt_disabled+0x1c/0x2c
[ 148.617815] [<ffffff8008121754>] cpu_startup_entry+0x13c/0x464
[ 148.623713] [<ffffff8008090e64>] secondary_start_kernel+0x164/0x1b4
[ 148.630046] [<0000000000d4d03c>] 0xd4d03c
[ 148.634099] Code: aa0103f3 aa0003f5 aa1e03e0 d503201f (b9403a60)
[ 148.749686] BUG: spinlock lockup suspected on CPU#0, kworker/0:1/578
[ 148.756117] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 148.764563] CPU: 0 PID: 578 Comm: kworker/0:1 Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 148.775637] Hardware name: HiKey Development Board (DT)
[ 148.780924] Workqueue: events_freezable thermal_zone_device_check
[ 148.787086] Call trace:
[ 148.789559] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 148.795016] [<ffffff800808b098>] show_stack+0x20/0x28
[ 148.800125] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 148.805231] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 148.810248] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 148.816057] [<ffffff8008d46fb4>] _raw_spin_lock_irqsave+0x78/0x98
[ 148.822217] [<ffffff8008123a60>] cpufreq_notifier_trans+0x128/0x14c
[ 148.828552] [<ffffff80080ef154>] notifier_call_chain+0x64/0x9c
[ 148.834449] [<ffffff80080efbdc>] __srcu_notifier_call_chain+0xa0/0xf0
[ 148.840958] [<ffffff80080efc64>] srcu_notifier_call_chain+0x38/0x44
[ 148.847296] [<ffffff80088f5644>] cpufreq_notify_transition+0xfc/0x2e0
[ 148.853807] [<ffffff80088f7bec>] cpufreq_freq_transition_end+0x3c/0xb0
[ 148.860405] [<ffffff80088f84a0>] __cpufreq_driver_target+0x1dc/0x320
[ 148.866829] [<ffffff80088fa460>] cpufreq_governor_performance+0x50/0x60
[ 148.873516] [<ffffff80088f6034>] __cpufreq_governor+0xb8/0x1ec
[ 148.879411] [<ffffff80088f6994>] cpufreq_set_policy+0x2ac/0x3f0
[ 148.885394] [<ffffff80088f9164>] cpufreq_update_policy+0x84/0x114
[ 148.891555] [<ffffff80088da4ec>] cpufreq_set_cur_state+0x64/0x94
[ 148.897626] [<ffffff80088d4ca4>] thermal_cdev_update.part.26+0x9c/0x22c
[ 148.904312] [<ffffff80088d5b48>] power_actor_set_power+0x70/0x9c
[ 148.910384] [<ffffff80088d9bc0>] power_allocator_throttle+0x4c8/0xad8
[ 148.916893] [<ffffff80088d4e9c>] handle_thermal_trip.part.21+0x68/0x334
[ 148.923579] [<ffffff80088d56e4>] thermal_zone_device_update+0xb8/0x280
[ 148.930177] [<ffffff80088d58cc>] thermal_zone_device_check+0x20/0x2c
[ 148.936601] [<ffffff80080e55a8>] process_one_work+0x1f8/0x70c
[ 148.942408] [<ffffff80080e5bf8>] worker_thread+0x13c/0x4a4
[ 148.947953] [<ffffff80080ed5cc>] kthread+0xe8/0xfc
[ 148.952796] [<ffffff8008085ed0>] ret_from_fork+0x10/0x40
[ 149.166404] BUG: spinlock lockup suspected on CPU#4, periodic2.sh/2858
[ 149.173013] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 149.181457] CPU: 4 PID: 2858 Comm: periodic2.sh Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 149.192707] Hardware name: HiKey Development Board (DT)
[ 149.197985] Call trace:
[ 149.200458] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 149.205915] [<ffffff800808b098>] show_stack+0x20/0x28
[ 149.211021] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 149.216126] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 149.221144] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 149.226952] [<ffffff8008d46f20>] _raw_spin_lock+0x6c/0x88
[ 149.232411] [<ffffff80080fae64>] __task_rq_lock+0x58/0xdc
[ 149.237868] [<ffffff80080ffb10>] wake_up_new_task+0xdc/0x318
[ 149.243588] [<ffffff80080c5f14>] _do_fork+0xfc/0x6f0
[ 149.248606] [<ffffff80080c6658>] SyS_clone+0x44/0x50
[ 149.253623] [<ffffff8008085f30>] el0_svc_naked+0x24/0x28
[ 149.640303] BUG: spinlock lockup suspected on CPU#3, swapper/3/0
[ 149.646376] lock: 0xffffffc0794a6f80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
[ 149.654819] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D 4.4.43-HCBS-Demotion-05354-g332859bfbe08-dirty #4
[ 149.665542] Hardware name: HiKey Development Board (DT)
[ 149.670820] Call trace:
[ 149.673291] [<ffffff800808ae98>] dump_backtrace+0x0/0x1e0
[ 149.678748] [<ffffff800808b098>] show_stack+0x20/0x28
[ 149.683853] [<ffffff8008553374>] dump_stack+0xa8/0xe0
[ 149.688959] [<ffffff800813a4d4>] spin_dump+0x78/0x9c
[ 149.693976] [<ffffff800813a7c8>] do_raw_spin_lock+0x180/0x1b4
[ 149.699783] [<ffffff8008d46f20>] _raw_spin_lock+0x6c/0x88
[ 149.705240] [<ffffff80080fc5e8>] scheduler_tick+0x50/0x2ac
[ 149.710785] [<ffffff800815cf78>] update_process_times+0x58/0x70
[ 149.716769] [<ffffff80081703a4>] tick_sched_timer+0x7c/0xfc
[ 149.722401] [<ffffff800815d5b8>] __hrtimer_run_queues+0x164/0x624
[ 149.728560] [<ffffff800815eb74>] hrtimer_interrupt+0xb0/0x1f4
[ 149.734369] [<ffffff8008933150>] arch_timer_handler_phys+0x3c/0x48
[ 149.740618] [<ffffff8008149e90>] handle_percpu_devid_irq+0xe8/0x3d0
[ 149.746954] [<ffffff8008145104>] generic_handle_irq+0x34/0x4c
[ 149.752761] [<ffffff80081451ac>] __handle_domain_irq+0x90/0xf8
[ 149.758656] [<ffffff8008082544>] gic_handle_irq+0x64/0xc4
[ 149.764113] Exception stack(0xffffffc0792e0050 to 0xffffffc0792e0180)
[ 149.770622] 0040: ffffffc03515b900 0000008000000000
[ 149.778537] 0060: ffffffc03515ba30 ffffff8008d47230 0000000060000145 ffffffc035156100
[ 149.786452] 0080: ffffffc03515ba30 ffffffc03515b900 0000000000000000 0000000000000000
[ 149.794366] 00a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.802281] 00c0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.810196] 00e0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.818111] 0100: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.826026] 0120: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.833940] 0140: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.841855] 0160: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 149.849770] [<ffffff80080857b8>] el1_irq+0xb8/0x130
[ 149.854699] [<ffffff8008d47230>] _raw_spin_unlock_irq+0x3c/0x74
[ 149.860682] [<ffffff800808b168>] die+0xc8/0x1b4
[ 149.865262] [<ffffff800809b36c>] __do_kernel_fault.part.6+0x7c/0x90
[ 149.871601] [<ffffff80080986a0>] do_translation_fault+0x0/0xec
[ 149.877498] [<ffffff8008098768>] do_translation_fault+0xc8/0xec
[ 149.883481] [<ffffff80080822e0>] do_mem_abort+0x54/0xb4
[ 149.888761] Exception stack(0xffffffc03515bb80 to 0xffffffc03515bcb0)
[ 149.895272] bb80: 0000000000000000 0000008000000000 ffffffc03515bd50 ffffff800810a3d8
[ 149.903187] bba0: 0000000000000055 0000000000000114 ffffffc03515bcd0 ffffff8008136434
[ 149.911102] bbc0: ffffffc035158000 ffffff800a6f0288 0000000000000000 0000000000000000
[ 149.919017] bbe0: 0000000000000002 0000000000000001 0000000000000000 ffffff800816cae8
[ 149.926932] bc00: 00000000000001c0 ffffff80099a0468 0000000000000000 0000000000000000
[ 149.934847] bc20: ffffff800811908c 0000000000000000 ffffff80081069ec 0000000000000001
[ 149.942762] bc40: ffffff8008118fdc 0000000000000080 0000000000000000 ffffff8008d40980
[ 149.950676] bc60: 0000000000000004 ffffff80098f7000 0000000000000005 ffffff8008d60420
[ 149.958591] bc80: 0000000034d5d91d 0000000000000000 0000000000000000 0000000000000000
[ 149.966505] bca0: 0000000000000000 0000000000000000
[ 149.971434] [<ffffff80080855c8>] el1_da+0x18/0x78
[ 149.976189] [<ffffff800811908c>] pick_next_task_fair+0xb0/0xd10
[ 149.982174] [<ffffff8008d40cb4>] __schedule+0x420/0xc10
[ 149.987456] [<ffffff8008d41574>] schedule+0x40/0xa0
[ 149.992387] [<ffffff8008d415f0>] schedule_preempt_disabled+0x1c/0x2c
[ 149.998809] [<ffffff8008121754>] cpu_startup_entry+0x13c/0x464
[ 150.004705] [<ffffff8008090e64>] secondary_start_kernel+0x164/0x1b4
[ 150.011038] [<0000000000d4d03c>] 0xd4d03c
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEC-4YWfzBPD-4HPG8jqsK4QVis1e1Kfks5sGlxGgaJpZM4N_e0M>
.
|
While switching between the two classes yes. Exactly when I'm not sure without adding some debug output. |
Uhm... So, I do not understand... The script seem to switch beteen FIFO and
OTHER for 10 seconds (20 cycles with sleep 0.5), and the crash happens more
than 10s after the start of the test, right?
Luca
…On 22 June 2017 at 15:46, Juri Lelli ***@***.***> wrote:
While switching between the two classes yes. Exactly when I'm not sure
without adding some debug output.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEC-4aLGH6XsyLfOa86PeXFdQLN-8ijMks5sGnBSgaJpZM4N_e0M>
.
|
No sorry I wasn't clear. I extended the test to 200s and crash seems to happens after 100s (but this varies). |
On 22 June 2017 at 17:26, Juri Lelli ***@***.***> wrote:
No sorry I wasn't clear. I extended the test to 200s and crash seems to
happens after 100s (but this varies).
Ok; I will increase the number of cycles to 400 and retest
Luca
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEC-4VI9WDELNUUIJQAvBIFj3pb5HQ1kks5sGoeJgaJpZM4N_e0M>
.
|
In some cases, the scheduler invokes set_curr_task() before enqueueing the task. But if the task is enqueued as RT, it can be demoted during enqueue... In this case, set_curr_task() is called for the RT scheduling class, but the task ends up being enqueued in the CFS rq... And set_curr_task() is not invoked for the CFS scheduling class! Fix this by explicitly invoking the CFS set_curr_task() (if needed) in case of demotion, before enqueueing in the CFS rq.
The demotion mechanism currently still has some bugs, that can be triggered by using the "periodic1.sh" or "periodic2.sh" scripts. After some experiments, it turned out that these changes improve the stability of the patchset (with this patch, the demotion mechanism can survive to 33mins of "periodic1.sh" or "periodic2.sh". The bugs are probably still there, though.
…l calls Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem. The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller. In such a case, the following warning is produced. ====================================================== WARNING: possible circular locking dependency detected 4.14.0-fscache+ torvalds#243 Tainted: G E ------------------------------------------------------ modpost/16701 is trying to acquire lock: (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs] but task is already holding lock: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #2 (&call->user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #0 (&vnode->io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 other info that might help us debug this: Chain exists of: &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&call->user_mutex); lock(&mm->mmap_sem); lock(&vnode->io_lock); *** DEADLOCK *** 1 lock held by modpost/16701: #0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ torvalds#243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:ffff880071e93da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720 RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00 R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07 RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900 RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000 R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000 R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com>
Jiri Pirko says: ==================== mlxsw: GRE offloading fixes Petr says: This patchset fixes a couple bugs in offloading GRE tunnels in mlxsw driver. Patch #1 fixes a problem that local routes pointing at a GRE tunnel device are offloaded even if that netdevice is down. Patch #2 detects that as a result of moving a GRE netdevice to a different VRF, two tunnels now have a conflict of local addresses, something that the mlxsw driver can't offload. Patch #3 fixes a FIB abort caused by forming a route pointing at a GRE tunnel that is eligible for offloading but already onloaded. Patch #4 fixes a problem that next hops migrated to a new RIF kept the old RIF reference, which went dangling shortly afterwards. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
In the function brcmf_sdio_firmware_callback() the driver is unbound from the sdio function devices in the error path. However, the order in which it is done resulted in a use-after-free issue (see brcmf_ops_sdio_remove() in bcmsdh.c). Hence change the order and first unbind sdio function #2 device and then unbind sdio function #1 device. Cc: stable@vger.kernel.org # v4.12.x Fixes: 7a51461 ("brcmfmac: unbind all devices upon failure in firmware callback") Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Default value of pcc_subspace_idx is -1. Make sure to check pcc_subspace_idx before using the same as array index. This will avoid following KASAN warnings too. [ 15.113449] ================================================================== [ 15.116983] BUG: KASAN: global-out-of-bounds in cppc_get_perf_caps+0xf3/0x3b0 [ 15.116983] Read of size 8 at addr ffffffffb9a5c0d8 by task swapper/0/1 [ 15.116983] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2+ #2 [ 15.116983] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016 [ 15.116983] Call Trace: [ 15.116983] dump_stack+0x7c/0xbb [ 15.116983] print_address_description+0x1df/0x290 [ 15.116983] kasan_report+0x28a/0x370 [ 15.116983] ? cppc_get_perf_caps+0xf3/0x3b0 [ 15.116983] cppc_get_perf_caps+0xf3/0x3b0 [ 15.116983] ? cpc_read+0x210/0x210 [ 15.116983] ? __rdmsr_on_cpu+0x90/0x90 [ 15.116983] ? rdmsrl_on_cpu+0xa9/0xe0 [ 15.116983] ? rdmsr_on_cpu+0x100/0x100 [ 15.116983] ? wrmsrl_on_cpu+0x9c/0xd0 [ 15.116983] ? wrmsrl_on_cpu+0x9c/0xd0 [ 15.116983] ? wrmsr_on_cpu+0xe0/0xe0 [ 15.116983] __intel_pstate_cpu_init.part.16+0x3a2/0x530 [ 15.116983] ? intel_pstate_init_cpu+0x197/0x390 [ 15.116983] ? show_no_turbo+0xe0/0xe0 [ 15.116983] ? __lockdep_init_map+0xa0/0x290 [ 15.116983] intel_pstate_cpu_init+0x30/0x60 [ 15.116983] cpufreq_online+0x155/0xac0 [ 15.116983] cpufreq_add_dev+0x9b/0xb0 [ 15.116983] subsys_interface_register+0x1ae/0x290 [ 15.116983] ? bus_unregister_notifier+0x40/0x40 [ 15.116983] ? mark_held_locks+0x83/0xb0 [ 15.116983] ? _raw_write_unlock_irqrestore+0x32/0x60 [ 15.116983] ? intel_pstate_setup+0xc/0x104 [ 15.116983] ? intel_pstate_setup+0xc/0x104 [ 15.116983] ? cpufreq_register_driver+0x1ce/0x2b0 [ 15.116983] cpufreq_register_driver+0x1ce/0x2b0 [ 15.116983] ? intel_pstate_setup+0x104/0x104 [ 15.116983] intel_pstate_register_driver+0x3a/0xa0 [ 15.116983] intel_pstate_init+0x3c4/0x434 [ 15.116983] ? intel_pstate_setup+0x104/0x104 [ 15.116983] ? intel_pstate_setup+0x104/0x104 [ 15.116983] do_one_initcall+0x9c/0x206 [ 15.116983] ? parameq+0xa0/0xa0 [ 15.116983] ? initcall_blacklisted+0x150/0x150 [ 15.116983] ? lock_downgrade+0x2c0/0x2c0 [ 15.116983] kernel_init_freeable+0x327/0x3f0 [ 15.116983] ? start_kernel+0x612/0x612 [ 15.116983] ? _raw_spin_unlock_irq+0x29/0x40 [ 15.116983] ? finish_task_switch+0xdd/0x320 [ 15.116983] ? finish_task_switch+0x8e/0x320 [ 15.116983] ? rest_init+0xd0/0xd0 [ 15.116983] kernel_init+0xf/0x11a [ 15.116983] ? rest_init+0xd0/0xd0 [ 15.116983] ret_from_fork+0x24/0x30 [ 15.116983] The buggy address belongs to the variable: [ 15.116983] __key.36299+0x38/0x40 [ 15.116983] Memory state around the buggy address: [ 15.116983] ffffffffb9a5bf80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 fa fa fa [ 15.116983] ffffffffb9a5c000: fa fa fa fa 00 fa fa fa fa fa fa fa 00 fa fa fa [ 15.116983] >ffffffffb9a5c080: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00 [ 15.116983] ^ [ 15.116983] ffffffffb9a5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 15.116983] ffffffffb9a5c180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 15.116983] ================================================================== Fixes: 85b1407 (ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs) Reported-by: Changbin Du <changbin.du@intel.com> Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.4, take #2 Special PMU edition: - Fix cycle counter truncation - Fix cycle counter overflow limit on pure 64bit system - Allow chained events to be actually functional - Correct sample period after overflow
All bonding device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes bonding use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond for i in {1..5} do let A=$i-1 ip link add bond$i type bond ip link set bond$i master bond$A done ip link set bond5 master bond0 Splat looks like: [ 307.992912] WARNING: possible recursive locking detected [ 307.993656] 5.4.0-rc3+ torvalds#96 Tainted: G W [ 307.994367] -------------------------------------------- [ 307.995092] ip/761 is trying to acquire lock: [ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.997045] but task is already holding lock: [ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.999215] other info that might help us debug this: [ 308.000251] Possible unsafe locking scenario: [ 308.001137] CPU0 [ 308.001533] ---- [ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.003302] *** DEADLOCK *** [ 308.004310] May be due to missing lock nesting notation [ 308.005319] 3 locks held by ip/761: [ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding] [ 308.009422] stack backtrace: [ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ torvalds#96 [ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 308.012179] Call Trace: [ 308.012601] dump_stack+0x7c/0xbb [ 308.013089] __lock_acquire+0x269d/0x3de0 [ 308.013669] ? register_lock_class+0x14d0/0x14d0 [ 308.014318] lock_acquire+0x164/0x3b0 [ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.015520] _raw_spin_lock_nested+0x2e/0x60 [ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.017215] bond_get_stats+0xb8/0x500 [bonding] [ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 308.019710] ? rcu_read_lock_held+0x90/0xa0 [ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding] [ 308.021953] dev_get_stats+0x1ec/0x270 [ 308.022508] bond_get_stats+0x1d1/0x500 [bonding] Fixes: d3fff6c ("net: add netdev_lockdep_set_classes() helper") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the extent tree is modified, it should be protected by inode cluster lock and ip_alloc_sem. The extent tree is accessed and modified in the ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem. The following is a case. The function ocfs2_fiemap is accessing the extent tree, which is modified at the same time. kernel BUG at fs/ocfs2/extent_map.c:475! invalid opcode: 0000 [#1] SMP Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...] CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018 task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000 RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] Call Trace: ocfs2_fiemap+0x1e3/0x430 [ocfs2] do_vfs_ioctl+0x155/0x510 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x18/0xd8 Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] ---[ end trace c8aa0c8180e869dc ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This issue can be reproduced every week in a production environment. This issue is related to the usage mode. If others use ocfs2 in this mode, the kernel will panic frequently. [akpm@linux-foundation.org: coding style fixes] [Fix new warning due to unused function by removing said function - Linus ] Link: http://lkml.kernel.org/r/1568772175-2906-2-git-send-email-sunny.s.zhang@oracle.com Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FuzzUSB (a variant of syzkaller) found a free-while-still-in-use bug in the USB scatter-gather library: BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:26 [inline] BUG: KASAN: use-after-free in usb_hcd_unlink_urb+0x5f/0x170 drivers/usb/core/hcd.c:1607 Read of size 4 at addr ffff888065379610 by task kworker/u4:1/27 CPU: 1 PID: 27 Comm: kworker/u4:1 Not tainted 5.5.11 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Workqueue: scsi_tmf_2 scmd_eh_abort_handler Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xce/0x128 lib/dump_stack.c:118 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374 __kasan_report+0x153/0x1cb mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x152/0x1b0 mm/kasan/generic.c:192 __kasan_check_read+0x11/0x20 mm/kasan/common.c:95 atomic_read include/asm-generic/atomic-instrumented.h:26 [inline] usb_hcd_unlink_urb+0x5f/0x170 drivers/usb/core/hcd.c:1607 usb_unlink_urb+0x72/0xb0 drivers/usb/core/urb.c:657 usb_sg_cancel+0x14e/0x290 drivers/usb/core/message.c:602 usb_stor_stop_transport+0x5e/0xa0 drivers/usb/storage/transport.c:937 This bug occurs when cancellation of the S-G transfer races with transfer completion. When that happens, usb_sg_cancel() may continue to access the transfer's URBs after usb_sg_wait() has freed them. The bug is caused by the fact that usb_sg_cancel() does not take any sort of reference to the transfer, and so there is nothing to prevent the URBs from being deallocated while the routine is trying to use them. The fix is to take such a reference by incrementing the transfer's io->count field while the cancellation is in progres and decrementing it afterward. The transfer's URBs are not deallocated until io->complete is triggered, which happens when io->count reaches zero. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: Kyungtae Kim <kt0755@gmail.com> CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2003281615140.14837-100000@netrider.rowland.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…f fs_info::journal_info [BUG] One run of btrfs/063 triggered the following lockdep warning: ============================================ WARNING: possible recursive locking detected 5.6.0-rc7-custom+ torvalds#48 Not tainted -------------------------------------------- kworker/u24:0/7 is trying to acquire lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] but task is already holding lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_internal#2); lock(sb_internal#2); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u24:0/7: #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80 #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80 #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs] stack backtrace: CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ torvalds#48 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] Call Trace: dump_stack+0xc2/0x11a __lock_acquire.cold+0xce/0x214 lock_acquire+0xe6/0x210 __sb_start_write+0x14e/0x290 start_transaction+0x66c/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] find_free_extent+0x1504/0x1a50 [btrfs] btrfs_reserve_extent+0xd5/0x1f0 [btrfs] btrfs_alloc_tree_block+0x1ac/0x570 [btrfs] btrfs_copy_root+0x213/0x580 [btrfs] create_reloc_root+0x3bd/0x470 [btrfs] btrfs_init_reloc_root+0x2d2/0x310 [btrfs] record_root_in_trans+0x191/0x1d0 [btrfs] btrfs_record_root_in_trans+0x90/0xd0 [btrfs] start_transaction+0x16e/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] btrfs_work_helper+0x116/0x9a0 [btrfs] process_one_work+0x632/0xb80 worker_thread+0x80/0x690 kthread+0x1a3/0x1f0 ret_from_fork+0x27/0x50 It's pretty hard to reproduce, only one hit so far. [CAUSE] This is because we're calling btrfs_join_transaction() without re-using the current running one: btrfs_finish_ordered_io() |- btrfs_join_transaction() <<< Call #1 |- btrfs_record_root_in_trans() |- btrfs_reserve_extent() |- btrfs_join_transaction() <<< Call #2 Normally such btrfs_join_transaction() call should re-use the existing one, without trying to re-start a transaction. But the problem is, in btrfs_join_transaction() call #1, we call btrfs_record_root_in_trans() before initializing current::journal_info. And in btrfs_join_transaction() call #2, we're relying on current::journal_info to avoid such deadlock. [FIX] Call btrfs_record_root_in_trans() after we have initialized current::journal_info. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for Linux 5.7, take #2 - Fix compilation with Clang - Correctly initialize GICv4.1 in the absence of a virtual ITS - Move SP_EL0 save/restore to the guest entry/exit code - Handle PC wrap around on 32bit guests, and narrow all 32bit registers on userspace access
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup is and controls the activation of use_delay mechanism. Once a cgroup goes over budget from forced IOs, it has to pay it back with its future budget. The progress guarantee on debt paying comes from the iocg being active - active iocgs are processed by the periodic timer, which ensures that as time passes the debts dissipate and the iocg returns to normal operation. However, both iocg activation and vdebt handling are asynchronous and a sequence like the following may happen. 1. The iocg is in the process of being deactivated by the periodic timer. 2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns without anything because it still sees that the iocg is already active. 3. The iocg is deactivated. 4. The bio from #2 is over budget but needs to be forced. It increases abs_vdebt and goes over the threshold and enables use_delay. 5. IO control is enabled for the iocg's subtree and now IOs are attributed to the descendant cgroups and the iocg itself no longer issues IOs. This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no further IOs which can activate it. This can end up unduly punishing all the descendants cgroups. The usual throttling path has the same issue - the iocg must be active while throttled to ensure that future event will wake it up - and solves the problem by synchronizing the throttling path with a spinlock. abs_vdebt handling is another form of overage handling and shares a lot of characteristics including the fact that it isn't in the hottest path. This patch fixes the above and other possible races by strictly synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Vlad Dmitriev <vvd@fb.com> Cc: stable@vger.kernel.org # v5.4+ Fixes: e1518f6 ("blk-iocost: Don't let merges push vtime into the future") Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since 5.7-rc1, on btrfs we have a percpu counter initialization for which we always pass a GFP_KERNEL gfp_t argument (this happens since commit 2992df7 ("btrfs: Implement DREW lock")). That is safe in some contextes but not on others where allowing fs reclaim could lead to a deadlock because we are either holding some btrfs lock needed for a transaction commit or holding a btrfs transaction handle open. Because of that we surround the call to the function that initializes the percpu counter with a NOFS context using memalloc_nofs_save() (this is done at btrfs_init_fs_root()). However it turns out that this is not enough to prevent a possible deadlock because percpu_alloc() determines if it is in an atomic context by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this case) and it is not aware that a NOFS context is set. Because percpu_alloc() thinks it is in a non atomic context it locks the pcpu_alloc_mutex. This can result in a btrfs deadlock when pcpu_balance_workfn() is running, has acquired that mutex and is waiting for reclaim, while the btrfs task that called percpu_counter_init() (and therefore percpu_alloc()) is holding either the btrfs commit_root semaphore or a transaction handle (done fs/btrfs/backref.c: iterate_extent_inodes()), which prevents reclaim from finishing as an attempt to commit the current btrfs transaction will deadlock. Lockdep reports this issue with the following trace: ====================================================== WARNING: possible circular locking dependency detected 5.6.0-rc7-btrfs-next-77 #1 Not tainted ------------------------------------------------------ kswapd0/91 is trying to acquire lock: ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] but task is already holding lock: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc+0x5f/0x3a0 pcpu_create_chunk+0x19/0x230 pcpu_balance_workfn+0x56a/0x680 process_one_work+0x235/0x5f0 worker_thread+0x50/0x3b0 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 -> #3 (pcpu_alloc_mutex){+.+.}: __mutex_lock+0xa9/0xaf0 pcpu_alloc+0x480/0x7c0 __percpu_counter_init+0x50/0xd0 btrfs_drew_lock_init+0x22/0x70 [btrfs] btrfs_get_fs_root+0x29c/0x5c0 [btrfs] resolve_indirect_refs+0x120/0xa30 [btrfs] find_parent_nodes+0x50b/0xf30 [btrfs] btrfs_find_all_leafs+0x60/0xb0 [btrfs] iterate_extent_inodes+0x139/0x2f0 [btrfs] iterate_inodes_from_logical+0xa1/0xe0 [btrfs] btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs] btrfs_ioctl+0x165a/0x3130 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #2 (&fs_info->commit_root_sem){++++}: down_write+0x38/0x70 btrfs_cache_block_group+0x2ec/0x500 [btrfs] find_free_extent+0xc6a/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] commit_cowonly_roots+0x55/0x310 [btrfs] btrfs_commit_transaction+0x509/0xb20 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x93/0xc0 exit_to_usermode_loop+0xf9/0x100 do_syscall_64+0x20d/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&space_info->groups_sem){++++}: down_read+0x3c/0x140 find_free_extent+0xef6/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] btrfs_search_slot+0x50c/0xd60 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] __btrfs_update_delayed_inode+0x90/0x280 [btrfs] __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs] __btrfs_run_delayed_items+0x8e/0x180 [btrfs] btrfs_commit_transaction+0x31b/0xb20 [btrfs] iterate_supers+0x87/0xf0 ksys_sync+0x60/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(pcpu_alloc_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/91: #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0 #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8f/0xd0 check_noncircular+0x170/0x190 __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL to percpu_counter_init() in contextes where it is not reclaim safe, however that type of approach is discouraged since memalloc_[nofs|noio]_save() were introduced. Therefore this change makes pcpu_alloc() look up into an existing nofs/noio context before deciding whether it is in an atomic context or not. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where: 1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets. In this case, we advertise 0 rwnd (due to #3) but the application does not drain the receive queue (no wakeup because of #1 and #2) so the flow stalls. Modify the heuristic for SO_RCVLOWAT so that, if we are advertising rwnd<=rcv_mss, force a wakeup to prevent a stall. Without this patch, setting tcp_rmem to 6143 and disabling TCP autotune causes a stalled flow. With this patch, no stall occurs. This is with RPC-style traffic with large messages. Fixes: 03f45c8 ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave reported a problem with my rwsem conversion patch where we got the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.9.0-default+ #1297 Not tainted ------------------------------------------------------ kswapd0/76 is trying to acquire lock: ffff9d5d25df2530 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] but task is already holding lock: ffffffffa40cbba0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 fs_reclaim_acquire.part.0+0x25/0x30 kmem_cache_alloc+0x30/0x9c0 alloc_inode+0x81/0x90 iget_locked+0xcd/0x1a0 kernfs_get_inode+0x1b/0x130 kernfs_get_tree+0x136/0x210 sysfs_get_tree+0x1a/0x50 vfs_get_tree+0x1d/0xb0 path_mount+0x70f/0xa80 do_mount+0x75/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (kernfs_mutex){+.+.}-{3:3}: __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 __mutex_lock+0xa0/0xaf0 kernfs_add_one+0x23/0x150 kernfs_create_dir_ns+0x58/0x80 sysfs_create_dir_ns+0x70/0xd0 kobject_add_internal+0xbb/0x2d0 kobject_add+0x7a/0xd0 btrfs_sysfs_add_block_group_type+0x141/0x1d0 [btrfs] btrfs_read_block_groups+0x1f1/0x8c0 [btrfs] open_ctree+0x981/0x1108 [btrfs] btrfs_mount_root.cold+0xe/0xb0 [btrfs] legacy_get_tree+0x2d/0x60 vfs_get_tree+0x1d/0xb0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x2d/0x60 vfs_get_tree+0x1d/0xb0 path_mount+0x70f/0xa80 do_mount+0x75/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (btrfs-extent-00){++++}-{3:3}: __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 down_read_nested+0x45/0x220 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot+0x6d4/0xfd0 [btrfs] check_committed_ref+0x69/0x200 [btrfs] btrfs_cross_ref_exist+0x65/0xb0 [btrfs] run_delalloc_nocow+0x446/0x9b0 [btrfs] btrfs_run_delalloc_range+0x61/0x6a0 [btrfs] writepage_delalloc+0xae/0x160 [btrfs] __extent_writepage+0x262/0x420 [btrfs] extent_write_cache_pages+0x2b6/0x510 [btrfs] extent_writepages+0x43/0x90 [btrfs] do_writepages+0x40/0xe0 __writeback_single_inode+0x62/0x610 writeback_sb_inodes+0x20f/0x500 wb_writeback+0xef/0x4a0 wb_do_writeback+0x49/0x2e0 wb_workfn+0x81/0x340 process_one_work+0x233/0x5d0 worker_thread+0x50/0x3b0 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 -> #1 (btrfs-fs-00){++++}-{3:3}: __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 down_read_nested+0x45/0x220 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot+0x6d4/0xfd0 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] __btrfs_update_delayed_inode+0x93/0x2c0 [btrfs] __btrfs_commit_inode_delayed_items+0x7de/0x850 [btrfs] __btrfs_run_delayed_items+0x8e/0x140 [btrfs] btrfs_commit_transaction+0x367/0xbc0 [btrfs] btrfs_mksubvol+0x2db/0x470 [btrfs] btrfs_mksnapshot+0x7b/0xb0 [btrfs] __btrfs_ioctl_snap_create+0x16f/0x1a0 [btrfs] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs] btrfs_ioctl+0xd0b/0x2690 [btrfs] __x64_sys_ioctl+0x6f/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add+0x91/0xc60 validate_chain+0xa6e/0x2a20 __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 __mutex_lock+0xa0/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x3cc/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x16d/0x3b0 shrink_slab+0xb1/0x2e0 shrink_node+0x230/0x6a0 balance_pgdat+0x325/0x750 kswapd+0x206/0x4d0 kthread+0x137/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> kernfs_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(kernfs_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/76: #0: ffffffffa40cbba0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffffa40b8b58 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0 #2: ffff9d5d322390e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 2 PID: 76 Comm: kswapd0 Not tainted 5.9.0-default+ #1297 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x77/0x97 check_noncircular+0xff/0x110 ? save_trace+0x50/0x470 check_prev_add+0x91/0xc60 validate_chain+0xa6e/0x2a20 ? save_trace+0x50/0x470 __lock_acquire+0x582/0xac0 lock_acquire+0xca/0x430 ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] __mutex_lock+0xa0/0xaf0 ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] ? __lock_acquire+0x582/0xac0 ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] ? btrfs_evict_inode+0x30b/0x560 [btrfs] ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x3cc/0x560 [btrfs] evict+0xd6/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x121/0x1a0 do_shrink_slab+0x16d/0x3b0 shrink_slab+0xb1/0x2e0 shrink_node+0x230/0x6a0 balance_pgdat+0x325/0x750 kswapd+0x206/0x4d0 ? finish_wait+0x90/0x90 ? balance_pgdat+0x750/0x750 kthread+0x137/0x150 ? kthread_mod_delayed_work+0xc0/0xc0 ret_from_fork+0x1f/0x30 This happens because we are still holding the path open when we start adding the sysfs files for the block groups, which creates a dependency on fs_reclaim via the tree lock. Fix this by dropping the path before we start doing anything with sysfs. Reported-by: David Sterba <dsterba@suse.com> CC: stable@vger.kernel.org # 5.8+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Very sporadically I had test case btrfs/069 from fstests hanging (for years, it is not a recent regression), with the following traces in dmesg/syslog: [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0 [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds. [162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000 [162513.514751] Call Trace: [162513.514761] __schedule+0x5ce/0xd00 [162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.514771] schedule+0x46/0xf0 [162513.514844] wait_current_trans+0xde/0x140 [btrfs] [162513.514850] ? finish_wait+0x90/0x90 [162513.514864] start_transaction+0x37c/0x5f0 [btrfs] [162513.514879] transaction_kthread+0xa4/0x170 [btrfs] [162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs] [162513.514894] kthread+0x153/0x170 [162513.514897] ? kthread_stop+0x2c0/0x2c0 [162513.514902] ret_from_fork+0x22/0x30 [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds. [162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000 [162513.515682] Call Trace: [162513.515688] __schedule+0x5ce/0xd00 [162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.515697] schedule+0x46/0xf0 [162513.515712] wait_current_trans+0xde/0x140 [btrfs] [162513.515716] ? finish_wait+0x90/0x90 [162513.515729] start_transaction+0x37c/0x5f0 [btrfs] [162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.515758] ? __ia32_sys_fdatasync+0x20/0x20 [162513.515761] iterate_supers+0x87/0xf0 [162513.515765] ksys_sync+0x60/0xb0 [162513.515768] __do_sys_sync+0xa/0x10 [162513.515771] do_syscall_64+0x33/0x80 [162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.515781] RIP: 0033:0x7f5238f50bd7 [162513.515782] Code: Bad RIP value. [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0 [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds. [162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000 [162513.516620] Call Trace: [162513.516625] __schedule+0x5ce/0xd00 [162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.516634] schedule+0x46/0xf0 [162513.516647] wait_current_trans+0xde/0x140 [btrfs] [162513.516650] ? finish_wait+0x90/0x90 [162513.516662] start_transaction+0x4d7/0x5f0 [btrfs] [162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs] [162513.516686] __vfs_setxattr+0x66/0x80 [162513.516691] __vfs_setxattr_noperm+0x70/0x200 [162513.516697] vfs_setxattr+0x6b/0x120 [162513.516703] setxattr+0x125/0x240 [162513.516709] ? lock_acquire+0xb1/0x480 [162513.516712] ? mnt_want_write+0x20/0x50 [162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0 [162513.516723] ? preempt_count_add+0x49/0xa0 [162513.516725] ? __sb_start_write+0x19b/0x290 [162513.516727] ? preempt_count_add+0x49/0xa0 [162513.516732] path_setxattr+0xba/0xd0 [162513.516739] __x64_sys_setxattr+0x27/0x30 [162513.516741] do_syscall_64+0x33/0x80 [162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.516745] RIP: 0033:0x7f5238f56d5a [162513.516746] Code: Bad RIP value. [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470 [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700 [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004 [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0 [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds. [162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000 [162513.517780] Call Trace: [162513.517786] __schedule+0x5ce/0xd00 [162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.517796] schedule+0x46/0xf0 [162513.517810] wait_current_trans+0xde/0x140 [btrfs] [162513.517814] ? finish_wait+0x90/0x90 [162513.517829] start_transaction+0x37c/0x5f0 [btrfs] [162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.517862] ? __ia32_sys_fdatasync+0x20/0x20 [162513.517865] iterate_supers+0x87/0xf0 [162513.517869] ksys_sync+0x60/0xb0 [162513.517872] __do_sys_sync+0xa/0x10 [162513.517875] do_syscall_64+0x33/0x80 [162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.517881] RIP: 0033:0x7f5238f50bd7 [162513.517883] Code: Bad RIP value. [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053 [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0 [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053 [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds. [162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000 [162513.519160] Call Trace: [162513.519165] __schedule+0x5ce/0xd00 [162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.519174] schedule+0x46/0xf0 [162513.519190] wait_current_trans+0xde/0x140 [btrfs] [162513.519193] ? finish_wait+0x90/0x90 [162513.519206] start_transaction+0x4d7/0x5f0 [btrfs] [162513.519222] btrfs_create+0x57/0x200 [btrfs] [162513.519230] lookup_open+0x522/0x650 [162513.519246] path_openat+0x2b8/0xa50 [162513.519270] do_filp_open+0x91/0x100 [162513.519275] ? find_held_lock+0x32/0x90 [162513.519280] ? lock_acquired+0x33b/0x470 [162513.519285] ? do_raw_spin_unlock+0x4b/0xc0 [162513.519287] ? _raw_spin_unlock+0x29/0x40 [162513.519295] do_sys_openat2+0x20d/0x2d0 [162513.519300] do_sys_open+0x44/0x80 [162513.519304] do_syscall_64+0x33/0x80 [162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.519309] RIP: 0033:0x7f5238f4a903 [162513.519310] Code: Bad RIP value. [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903 [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470 [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002 [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013 [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620 [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds. [162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002 [162513.520511] Call Trace: [162513.520516] __schedule+0x5ce/0xd00 [162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.520525] schedule+0x46/0xf0 [162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs] [162513.520548] ? finish_wait+0x90/0x90 [162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs] [162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs] [162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs] [162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs] [162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs] [162513.520643] ? do_sigaction+0xf3/0x240 [162513.520645] ? find_held_lock+0x32/0x90 [162513.520648] ? do_sigaction+0xf3/0x240 [162513.520651] ? lock_acquired+0x33b/0x470 [162513.520655] ? _raw_spin_unlock_irq+0x24/0x50 [162513.520657] ? lockdep_hardirqs_on+0x7d/0x100 [162513.520660] ? _raw_spin_unlock_irq+0x35/0x50 [162513.520662] ? do_sigaction+0xf3/0x240 [162513.520671] ? __x64_sys_ioctl+0x83/0xb0 [162513.520672] __x64_sys_ioctl+0x83/0xb0 [162513.520677] do_syscall_64+0x33/0x80 [162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.520681] RIP: 0033:0x7fc3cd307d87 [162513.520682] Code: Bad RIP value. [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87 [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003 [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003 [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001 [162513.520703] Showing all locks held in the system: [162513.520712] 1 lock held by khungtaskd/54: [162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197 [162513.520728] 1 lock held by in:imklog/596: [162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [162513.520782] 1 lock held by btrfs-transacti/1356167: [162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs] [162513.520798] 1 lock held by btrfs/1356190: [162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60 [162513.520805] 1 lock held by fsstress/1356184: [162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520811] 3 locks held by fsstress/1356185: [162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520815] #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120 [162513.520820] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520833] 1 lock held by fsstress/1356196: [162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520838] 3 locks held by fsstress/1356197: [162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520843] #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50 [162513.520846] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520858] 2 locks held by btrfs/1356211: [162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs] [162513.520877] #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] This was weird because the stack traces show that a transaction commit, triggered by a device replace operation, is blocking trying to pause any running scrubs but there are no stack traces of blocked tasks doing a scrub. After poking around with drgn, I noticed there was a scrub task that was constantly running and blocking for shorts periods of time: >>> t = find_task(prog, 1356190) >>> prog.stack_trace(t) #0 __schedule+0x5ce/0xcfc #1 schedule+0x46/0xe4 #2 schedule_timeout+0x1df/0x475 #3 btrfs_reada_wait+0xda/0x132 #4 scrub_stripe+0x2a8/0x112f #5 scrub_chunk+0xcd/0x134 #6 scrub_enumerate_chunks+0x29e/0x5ee #7 btrfs_scrub_dev+0x2d5/0x91b #8 btrfs_ioctl+0x7f5/0x36e7 #9 __x64_sys_ioctl+0x83/0xb0 torvalds#10 do_syscall_64+0x33/0x77 torvalds#11 entry_SYSCALL_64+0x7c/0x156 Which corresponds to: int btrfs_reada_wait(void *handle) { struct reada_control *rc = handle; struct btrfs_fs_info *fs_info = rc->fs_info; while (atomic_read(&rc->elems)) { if (!atomic_read(&fs_info->reada_works_cnt)) reada_start_machine(fs_info); wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0, (HZ + 9) / 10); } (...) So the counter "rc->elems" was set to 1 and never decreased to 0, causing the scrub task to loop forever in that function. Then I used the following script for drgn to check the readahead requests: $ cat dump_reada.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) def dump_re(re): nzones = re.nzones.value_() print(f're at {hex(re.value_())}') print(f'\t logical {re.logical.value_()}') print(f'\t refcnt {re.refcnt.value_()}') print(f'\t nzones {nzones}') for i in range(nzones): dev = re.zones[i].device name = dev.name.str.string_() print(f'\t\t dev id {dev.devid.value_()} name {name}') print() for _, e in radix_tree_for_each(fs_info.reada_tree): re = cast('struct reada_extent *', e) dump_re(re) $ drgn dump_reada.py re at 0xffff8f3da9d25ad8 logical 38928384 refcnt 1 nzones 1 dev id 0 name b'/dev/sdd' $ So there was one readahead extent with a single zone corresponding to the source device of that last device replace operation logged in dmesg/syslog. Also the ID of that zone's device was 0 which is a special value set in the source device of a device replace operation when the operation finishes (constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()), confirming again that device /dev/sdd was the source of a device replace operation. Normally there should be as many zones in the readahead extent as there are devices, and I wasn't expecting the extent to be in a block group with a 'single' profile, so I went and confirmed with the following drgn script that there weren't any single profile block groups: $ cat dump_block_groups.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) BTRFS_BLOCK_GROUP_DATA = (1 << 0) BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1) BTRFS_BLOCK_GROUP_METADATA = (1 << 2) BTRFS_BLOCK_GROUP_RAID0 = (1 << 3) BTRFS_BLOCK_GROUP_RAID1 = (1 << 4) BTRFS_BLOCK_GROUP_DUP = (1 << 5) BTRFS_BLOCK_GROUP_RAID10 = (1 << 6) BTRFS_BLOCK_GROUP_RAID5 = (1 << 7) BTRFS_BLOCK_GROUP_RAID6 = (1 << 8) BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9) BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10) def bg_flags_string(bg): flags = bg.flags.value_() ret = '' if flags & BTRFS_BLOCK_GROUP_DATA: ret = 'data' if flags & BTRFS_BLOCK_GROUP_METADATA: if len(ret) > 0: ret += '|' ret += 'meta' if flags & BTRFS_BLOCK_GROUP_SYSTEM: if len(ret) > 0: ret += '|' ret += 'system' if flags & BTRFS_BLOCK_GROUP_RAID0: ret += ' raid0' elif flags & BTRFS_BLOCK_GROUP_RAID1: ret += ' raid1' elif flags & BTRFS_BLOCK_GROUP_DUP: ret += ' dup' elif flags & BTRFS_BLOCK_GROUP_RAID10: ret += ' raid10' elif flags & BTRFS_BLOCK_GROUP_RAID5: ret += ' raid5' elif flags & BTRFS_BLOCK_GROUP_RAID6: ret += ' raid6' elif flags & BTRFS_BLOCK_GROUP_RAID1C3: ret += ' raid1c3' elif flags & BTRFS_BLOCK_GROUP_RAID1C4: ret += ' raid1c4' else: ret += ' single' return ret def dump_bg(bg): print() print(f'block group at {hex(bg.value_())}') print(f'\t start {bg.start.value_()} length {bg.length.value_()}') print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}') bg_root = fs_info.block_group_cache_tree.address_of_() for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'): dump_bg(bg) $ drgn dump_block_groups.py block group at 0xffff8f3d673b0400 start 22020096 length 16777216 flags 258 - system raid6 block group at 0xffff8f3d53ddb400 start 38797312 length 536870912 flags 260 - meta raid6 block group at 0xffff8f3d5f4d9c00 start 575668224 length 2147483648 flags 257 - data raid6 block group at 0xffff8f3d08189000 start 2723151872 length 67108864 flags 258 - system raid6 block group at 0xffff8f3db70ff000 start 2790260736 length 1073741824 flags 260 - meta raid6 block group at 0xffff8f3d5f4dd800 start 3864002560 length 67108864 flags 258 - system raid6 block group at 0xffff8f3d67037000 start 3931111424 length 2147483648 flags 257 - data raid6 $ So there were only 2 reasons left for having a readahead extent with a single zone: reada_find_zone(), called when creating a readahead extent, returned NULL either because we failed to find the corresponding block group or because a memory allocation failed. With some additional and custom tracing I figured out that on every further ocurrence of the problem the block group had just been deleted when we were looping to create the zones for the readahead extent (at reada_find_extent()), so we ended up with only one zone in the readahead extent, corresponding to a device that ends up getting replaced. So after figuring that out it became obvious why the hang happens: 1) Task A starts a scrub on any device of the filesystem, except for device /dev/sdd; 2) Task B starts a device replace with /dev/sdd as the source device; 3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently starting to scrub a stripe from block group X. This call to btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add() calls reada_add_block(), it passes the logical address of the extent tree's root node as its 'logical' argument - a value of 38928384; 4) Task A then enters reada_find_extent(), called from reada_add_block(). It finds there isn't any existing readahead extent for the logical address 38928384, so it proceeds to the path of creating a new one. It calls btrfs_map_block() to find out which stripes exist for the block group X. On the first iteration of the for loop that iterates over the stripes, it finds the stripe for device /dev/sdd, so it creates one zone for that device and adds it to the readahead extent. Before getting into the second iteration of the loop, the cleanup kthread deletes block group X because it was empty. So in the iterations for the remaining stripes it does not add more zones to the readahead extent, because the calls to reada_find_zone() returned NULL because they couldn't find block group X anymore. As a result the new readahead extent has a single zone, corresponding to the device /dev/sdd; 4) Before task A returns to btrfs_reada_add() and queues the readahead job for the readahead work queue, task B finishes the device replace and at btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new device /dev/sdg; 5) Task A returns to reada_add_block(), which increments the counter "->elems" of the reada_control structure allocated at btrfs_reada_add(). Then it returns back to btrfs_reada_add() and calls reada_start_machine(). This queues a job in the readahead work queue to run the function reada_start_machine_worker(), which calls __reada_start_machine(). At __reada_start_machine() we take the device list mutex and for each device found in the current device list, we call reada_start_machine_dev() to start the readahead work. However at this point the device /dev/sdd was already freed and is not in the device list anymore. This means the corresponding readahead for the extent at 38928384 is never started, and therefore the "->elems" counter of the reada_control structure allocated at btrfs_reada_add() never goes down to 0, causing the call to btrfs_reada_wait(), done by the scrub task, to wait forever. Note that the readahead request can be made either after the device replace started or before it started, however in pratice it is very unlikely that a device replace is able to start after a readahead request is made and is able to complete before the readahead request completes - maybe only on a very small and nearly empty filesystem. This hang however is not the only problem we can have with readahead and device removals. When the readahead extent has other zones other than the one corresponding to the device that is being removed (either by a device replace or a device remove operation), we risk having a use-after-free on the device when dropping the last reference of the readahead extent. For example if we create a readahead extent with two zones, one for the device /dev/sdd and one for the device /dev/sde: 1) Before the readahead worker starts, the device /dev/sdd is removed, and the corresponding btrfs_device structure is freed. However the readahead extent still has the zone pointing to the device structure; 2) When the readahead worker starts, it only finds device /dev/sde in the current device list of the filesystem; 3) It starts the readahead work, at reada_start_machine_dev(), using the device /dev/sde; 4) Then when it finishes reading the extent from device /dev/sde, it calls __readahead_hook() which ends up dropping the last reference on the readahead extent through the last call to reada_extent_put(); 5) At reada_extent_put() it iterates over each zone of the readahead extent and attempts to delete an element from the device's 'reada_extents' radix tree, resulting in a use-after-free, as the device pointer of the zone for /dev/sdd is now stale. We can also access the device after dropping the last reference of a zone, through reada_zone_release(), also called by reada_extent_put(). And a device remove suffers the same problem, however since it shrinks the device size down to zero before removing the device, it is very unlikely to still have readahead requests not completed by the time we free the device, the only possibility is if the device has a very little space allocated. While the hang problem is exclusive to scrub, since it is currently the only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free problem affects any path that triggers readhead, which includes btree_readahead_hook() and __readahead_hook() (a readahead worker can trigger readahed for the children of a node) for example - any path that ends up calling reada_add_block() can trigger the use-after-free after a device is removed. So fix this by waiting for any readahead requests for a device to complete before removing a device, ensuring that while waiting for existing ones no new ones can be made. This problem has been around for a very long time - the readahead code was added in 2011, device remove exists since 2008 and device replace was introduced in 2013, hard to pick a specific commit for a git Fixes tag. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
When enabling qgroups we walk the tree_root and then add a qgroup item for every root that we have. This creates a lock dependency on the tree_root and qgroup_root, which results in the following lockdep splat (with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022: ====================================================== WARNING: possible circular locking dependency detected 5.9.0-default+ #1299 Not tainted ------------------------------------------------------ btrfs/24552 is trying to acquire lock: ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] but task is already holding lock: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-root-00){++++}-{3:3}: __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 down_read_nested+0x46/0x130 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_item+0x6e/0x140 [btrfs] btrfs_create_tree+0x1cb/0x240 [btrfs] btrfs_quota_enable+0xcd/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (btrfs-quota-00){++++}-{3:3}: check_prev_add+0x91/0xc30 validate_chain+0x491/0x750 __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 down_read_nested+0x46/0x130 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_empty_items+0x58/0xa0 [btrfs] add_qgroup_item.part.0+0x72/0x210 [btrfs] btrfs_quota_enable+0x3bb/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(btrfs-quota-00); lock(btrfs-root-00); lock(btrfs-quota-00); *** DEADLOCK *** 5 locks held by btrfs/24552: #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0 #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs] #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs] #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs] #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] stack backtrace: CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x77/0x97 check_noncircular+0xf3/0x110 check_prev_add+0x91/0xc30 validate_chain+0x491/0x750 __lock_acquire+0x3fb/0x730 lock_acquire.part.0+0x6a/0x130 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] ? lock_acquire+0xc4/0x140 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] down_read_nested+0x46/0x130 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] __btrfs_tree_read_lock+0x35/0x1c0 [btrfs] ? btrfs_root_node+0xd9/0x200 [btrfs] __btrfs_read_lock_root_node+0x3a/0x50 [btrfs] btrfs_search_slot_get_root+0x11d/0x290 [btrfs] btrfs_search_slot+0xc3/0x9f0 [btrfs] btrfs_insert_empty_items+0x58/0xa0 [btrfs] add_qgroup_item.part.0+0x72/0x210 [btrfs] btrfs_quota_enable+0x3bb/0x790 [btrfs] btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs] __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by dropping the path whenever we find a root item, add the qgroup item, and then re-lookup the root item we found and continue processing roots. Reported-by: David Sterba <dsterba@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
I got the following lockdep splat with tree locks converted to rwsem patches on btrfs/104: ====================================================== WARNING: possible circular locking dependency detected 5.9.0+ torvalds#102 Not tainted ------------------------------------------------------ btrfs-cleaner/903 is trying to acquire lock: ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170 but task is already holding lock: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&fs_info->commit_root_sem){++++}-{3:3}: down_read+0x40/0x130 caching_thread+0x53/0x5a0 btrfs_work_helper+0xfa/0x520 process_one_work+0x238/0x540 worker_thread+0x55/0x3c0 kthread+0x13a/0x150 ret_from_fork+0x1f/0x30 -> #2 (&caching_ctl->mutex){+.+.}-{3:3}: __mutex_lock+0x7e/0x7b0 btrfs_cache_block_group+0x1e0/0x510 find_free_extent+0xb6e/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&space_info->groups_sem){++++}-{3:3}: down_read+0x40/0x130 find_free_extent+0x2ed/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (btrfs-root-00){++++}-{3:3}: __lock_acquire+0x1167/0x2150 lock_acquire+0xb9/0x3d0 down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 btrfs_find_root+0x35/0x1b0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 kthread+0x13a/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->commit_root_sem); lock(&caching_ctl->mutex); lock(&fs_info->commit_root_sem); lock(btrfs-root-00); *** DEADLOCK *** 3 locks held by btrfs-cleaner/903: #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140 #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0 #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 stack backtrace: CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ torvalds#102 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 __lock_acquire+0x1167/0x2150 ? __bfs+0x42/0x210 lock_acquire+0xb9/0x3d0 ? __btrfs_tree_read_lock+0x32/0x170 down_read_nested+0x43/0x130 ? __btrfs_tree_read_lock+0x32/0x170 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 ? find_held_lock+0x2b/0x80 btrfs_find_root+0x35/0x1b0 ? do_raw_spin_unlock+0x4b/0xa0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 ? btrfs_clean_one_deleted_snapshot+0x73/0x110 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 ? btrfs_alloc_root+0x50/0x50 kthread+0x13a/0x150 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 BTRFS info (device sdb): disk space caching is enabled BTRFS info (device sdb): has skinny extents This happens because qgroups does a backref lookup when we create a delayed ref. From here it may have to look up a root from an indirect ref, which does a normal lookup on the tree_root, which takes the read lock on the tree_root nodes. To fix this we need to add a variant for looking up roots that searches the commit root of the tree_root. Then when we do the backref search using the commit root we are sure to not take any locks on the tree_root nodes. This gets rid of the lockdep splat when running btrfs/104. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Ido Schimmel says: ==================== mlxsw: Various fixes This patch set contains various fixes for mlxsw. Patch #1 ensures that only link modes that are supported by both the device and the driver are advertised. When a link mode that is not supported by the driver is negotiated by the device, it will be presented as an unknown speed by ethtool, causing the bond driver to wrongly assume that the link is down. Patch #2 fixes a trivial memory leak upon module removal. Patch #3 fixes a use-after-free that syzkaller was able to trigger once on a slow emulator after a few months of fuzzing. ==================== Link: https://lore.kernel.org/r/20201024133733.2107509-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for v5.10, take #2 - Fix compilation error when PMD and PUD are folded - Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs with 11TB of ram, I hit the following panic: BUG: Kernel NULL pointer dereference on read at 0x00000007 Faulting instruction address: 0xc000000000456048 Oops: Kernel access of bad area, sig: 11 [#2] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1 NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350 REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004228 XER: 00000000 CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0 GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000 GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320 GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000 GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1 GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8 GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000 GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001 NIP [c000000000456048] __kmalloc_node+0x108/0x790 LR [c000000000455fd4] __kmalloc_node+0x94/0x790 Call Trace: kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x10c/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2c4/0x470 cgroup_mkdir+0x408/0x5f0 kernfs_iop_mkdir+0x90/0x100 vfs_mkdir+0x138/0x250 do_mkdirat+0x154/0x1c0 system_call_exception+0xf8/0x200 system_call_common+0xf0/0x27c Instruction dump: e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a 2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78 This pointing to the following code: mm/slub.c:2851 if (unlikely(!object || !node_match(page, node))) { c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0 c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054 <__kmalloc_node+0x114> node_match(): mm/slub.c:2491 if (node != NUMA_NO_NODE && page_to_nid(page) != node) c000000000456040: 30 02 92 41 beq cr4,c000000000456270 <__kmalloc_node+0x330> page_to_nid(): include/linux/mm.h:1294 c000000000456044: 10 00 27 e9 ld r9,16(r7) c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL node_match(): mm/slub.c:2491 c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9 c000000000456050: 20 02 9e 41 beq cr7,c000000000456270 <__kmalloc_node+0x330> The panic occurred in slab_alloc_node() when checking for the page's node: object = c->freelist; page = c->page; if (unlikely(!object || !node_match(page, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); stat(s, ALLOC_SLOWPATH); The issue is that object is not NULL while page is NULL which is odd but may happen if the cache flush happened after loading object but before loading page. Thus checking for the page pointer is required too. The cache flush is done through an inter processor interrupt when a piece of memory is off-lined. That interrupt is triggered when a memory hot-unplug operation is initiated and offline_pages() is calling the slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback() which is calling flush_cpu_slab(). If that interrupt is caught between the reading of c->freelist and the reading of c->page, this could lead to such a situation. That situation is expected and the later call to this_cpu_cmpxchg_double() will detect the change to c->freelist and redo the whole operation. In commit 6159d0f ("mm/slub.c: page is always non-NULL in node_match()") check on the page pointer has been removed assuming that page is always valid when it is called. It happens that this is not true in that particular case, so check for page before calling node_match() here. Fixes: 6159d0f ("mm/slub.c: page is always non-NULL in node_match()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Christoph Lameter <cl@linux.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 torvalds#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 torvalds#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) torvalds#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 torvalds#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 torvalds#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 torvalds#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 torvalds#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 torvalds#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 torvalds#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 torvalds#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 torvalds#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 torvalds#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 torvalds#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 torvalds#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 torvalds#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 torvalds#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 torvalds#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) torvalds#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 torvalds#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 torvalds#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 torvalds#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 torvalds#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 torvalds#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 torvalds#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 torvalds#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 torvalds#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 torvalds#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 torvalds#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 torvalds#24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 torvalds#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 torvalds#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) torvalds#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 torvalds#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 torvalds#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 torvalds#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 torvalds#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 torvalds#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 torvalds#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 torvalds#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 torvalds#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 torvalds#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 torvalds#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 torvalds#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 torvalds#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 torvalds#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 torvalds#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 torvalds#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) torvalds#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 torvalds#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 torvalds#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 torvalds#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 torvalds#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 torvalds#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 torvalds#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 torvalds#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 torvalds#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 torvalds#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 torvalds#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 torvalds#25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 torvalds#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 torvalds#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 torvalds#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 torvalds#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 torvalds#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 torvalds#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 torvalds#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 torvalds#17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Actually, burst size is equal to '1 << desc->rqcfg.brst_size'. we should use burst size, not desc->rqcfg.brst_size. dma memcpy performance on Rockchip RV1126 @ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA: dmatest: /# echo dma0chan0 > /sys/module/dmatest/parameters/channel /# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size /# echo 8 > /sys/module/dmatest/parameters/iterations /# echo y > /sys/module/dmatest/parameters/norandom /# echo y > /sys/module/dmatest/parameters/verbose /# echo 1 > /sys/module/dmatest/parameters/run dmatest: dma0chan0-copy0: result #1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 Before: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0) After this patch: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0) After this patch and increase dma clk to 400MHz: dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0) Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com> Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock-chips.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
Ido Schimmel says: ==================== mlxsw: Couple of fixes Patch #1 fixes firmware flashing when CONFIG_MLXSW_CORE=y and CONFIG_MLXFW=m. Patch #2 prevents EMAD transactions from needlessly failing when the system is under heavy load by using exponential backoff. Please consider patch #2 for stable. ==================== Link: https://lore.kernel.org/r/20201117173352.288491-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…asid() While digesting the XSAVE-related horrors which got introduced with the supervisor/user split, the recent addition of ENQCMD-related functionality got on the radar and turned out to be similarly broken. update_pasid(), which is only required when X86_FEATURE_ENQCMD is available, is invoked from two places: 1) From switch_to() for the incoming task 2) Via a SMP function call from the IOMMU/SMV code #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr() by enforcing the state to be 'present', but all the conditionals in that code are completely pointless for that. Also the invocation is just useless overhead because at that point it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task and all of this can be handled at return to user space. #2 is broken beyond repair. The comment in the code claims that it is safe to invoke this in an IPI, but that's just wishful thinking. FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH-disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true. Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is invoked with a NULL pointer argument, so it can hit a completely unrelated task and unconditionally force an update for nothing. Worse, it can hit a kernel thread which operates on a user space address space and set a random PASID for it. The offending commit does not cleanly revert, but it's sufficient to force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid() code to make this dysfunctional all over the place. Anything more complex would require more surgery and none of the related functions outside of the x86 core code are blatantly wrong, so removing those would be overkill. As nothing enables the PASID bit in the IA32_XSS MSR yet, which is required to make this actually work, this cannot result in a regression except for related out of tree train-wrecks, but they are broken already today. Fixes: 20f0afd ("x86/mmu: Allocate/free a PASID") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
ASan reported a memory leak caused by info_linear not being deallocated. The info_linear was allocated during in perf_event__synthesize_one_bpf_prog(). This patch adds the corresponding free() when bpf_prog_info_node is freed in perf_env__purge_bpf(). $ sudo ./perf record -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ] ================================================================= ==297735==ERROR: LeakSanitizer: detected memory leaks Direct leak of 7688 byte(s) in 19 object(s) allocated from: #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f) #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16 #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16 #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9 #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8 #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8 #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8 #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2 torvalds#10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3 torvalds#11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16 Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add the following Telit FD980 composition 0x1056: Cfg #1: mass storage Cfg #2: rndis, tty, adb, tty, tty, tty, tty Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Link: https://lore.kernel.org/r/20210803194711.3036-1-dnlplm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
Often some test cases like btrfs/161 trigger lockdep splats that complain about possible unsafe lock scenario due to the fact that during mount, when reading the chunk tree we end up calling blkdev_get_by_path() while holding a read lock on a leaf of the chunk tree. That produces a lockdep splat like the following: [ 3653.683975] ====================================================== [ 3653.685148] WARNING: possible circular locking dependency detected [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted [ 3653.687239] ------------------------------------------------------ [ 3653.688400] mount/447465 is trying to acquire lock: [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.691054] but task is already holding lock: [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.693978] which lock already depends on the new lock. [ 3653.695510] the existing dependency chain (in reverse order) is: [ 3653.696915] -> #3 (btrfs-chunk-00){++++}-{3:3}: [ 3653.698053] down_read_nested+0x4b/0x140 [ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs] [ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs] [ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs] [ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs] [ 3653.705405] __x64_sys_ioctl+0x83/0xb0 [ 3653.706215] do_syscall_64+0x3b/0xc0 [ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.708040] -> #2 (sb_internal#2){.+.+}-{0:0}: [ 3653.708994] lock_release+0x13d/0x4a0 [ 3653.709533] up_write+0x18/0x160 [ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs] [ 3653.710699] __loop_update_dio+0xbd/0x170 [loop] [ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop] [ 3653.711929] block_ioctl+0x48/0x50 [ 3653.712442] __x64_sys_ioctl+0x83/0xb0 [ 3653.712991] do_syscall_64+0x3b/0xc0 [ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.714233] -> #1 (&lo->lo_mutex){+.+.}-{3:3}: [ 3653.715026] __mutex_lock+0x92/0x900 [ 3653.715648] lo_open+0x28/0x60 [loop] [ 3653.716275] blkdev_get_whole+0x28/0x90 [ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320 [ 3653.717537] blkdev_open+0x5e/0xa0 [ 3653.718043] do_dentry_open+0x163/0x390 [ 3653.718604] path_openat+0x3f0/0xa80 [ 3653.719128] do_filp_open+0xa9/0x150 [ 3653.719652] do_sys_openat2+0x97/0x160 [ 3653.720197] __x64_sys_openat+0x54/0x90 [ 3653.720766] do_syscall_64+0x3b/0xc0 [ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.721986] -> #0 (&disk->open_mutex){+.+.}-{3:3}: [ 3653.722775] __lock_acquire+0x130e/0x2210 [ 3653.723348] lock_acquire+0xd7/0x310 [ 3653.723867] __mutex_lock+0x92/0x900 [ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.725041] blkdev_get_by_path+0xb8/0xd0 [ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.729130] legacy_get_tree+0x30/0x50 [ 3653.729676] vfs_get_tree+0x28/0xc0 [ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.731427] legacy_get_tree+0x30/0x50 [ 3653.731970] vfs_get_tree+0x28/0xc0 [ 3653.732486] path_mount+0x2d4/0xbe0 [ 3653.732997] __x64_sys_mount+0x103/0x140 [ 3653.733560] do_syscall_64+0x3b/0xc0 [ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.734782] other info that might help us debug this: [ 3653.735784] Chain exists of: &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00 [ 3653.737123] Possible unsafe locking scenario: [ 3653.737865] CPU0 CPU1 [ 3653.738435] ---- ---- [ 3653.739007] lock(btrfs-chunk-00); [ 3653.739449] lock(sb_internal#2); [ 3653.740193] lock(btrfs-chunk-00); [ 3653.740955] lock(&disk->open_mutex); [ 3653.741431] *** DEADLOCK *** [ 3653.742176] 3 locks held by mount/447465: [ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0 [ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs] [ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.747066] stack backtrace: [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1 [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 3653.750592] Call Trace: [ 3653.750967] dump_stack_lvl+0x57/0x72 [ 3653.751526] check_noncircular+0xf3/0x110 [ 3653.752136] ? stack_trace_save+0x4b/0x70 [ 3653.752748] __lock_acquire+0x130e/0x2210 [ 3653.753356] lock_acquire+0xd7/0x310 [ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.754596] ? lock_is_held_type+0xe8/0x140 [ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.756338] __mutex_lock+0x92/0x900 [ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0 [ 3653.757930] ? _raw_spin_unlock+0x29/0x40 [ 3653.758437] ? bd_prepare_to_claim+0x129/0x150 [ 3653.758999] ? trace_module_get+0x2b/0xd0 [ 3653.759508] ? try_module_get.part.0+0x50/0x80 [ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0 [ 3653.761288] blkdev_get_by_path+0xb8/0xd0 [ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs] [ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.764370] ? kvm_sched_clock_read+0x14/0x40 [ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.765493] ? super_setup_bdi_name+0x79/0xd0 [ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.767488] ? kfree+0x1f2/0x3c0 [ 3653.767979] legacy_get_tree+0x30/0x50 [ 3653.768548] vfs_get_tree+0x28/0xc0 [ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.771086] ? kfree+0x1f2/0x3c0 [ 3653.771574] legacy_get_tree+0x30/0x50 [ 3653.772136] vfs_get_tree+0x28/0xc0 [ 3653.772673] path_mount+0x2d4/0xbe0 [ 3653.773201] __x64_sys_mount+0x103/0x140 [ 3653.773793] do_syscall_64+0x3b/0xc0 [ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.775094] RIP: 0033:0x7f648bc45aaa This happens because through btrfs_read_chunk_tree(), which is called only during mount, ends up acquiring the mutex open_mutex of a block device while holding a read lock on a leaf of the chunk tree while other paths need to acquire other locks before locking extent buffers of the chunk tree. Since at mount time when we call btrfs_read_chunk_tree() we know that we don't have other tasks running in parallel and modifying the chunk tree, we can simply skip locking of chunk tree extent buffers. So do that and move the assertion that checks the fs is not yet mounted to the top block of btrfs_read_chunk_tree(), with a comment before doing it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
In thread__comm_len(),strlen() is called outside of the thread->comm_lock critical section,which may cause a UAF problems if comm__free() is called by the process_thread concurrently. backtrace of the core file is as follows: (gdb) bt #0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77 #1 0x000055ad15d31de5 in thread__comm_len (thread=0x7f627d20e300) at util/thread.c:320 #2 0x000055ad15d4fade in hists__calc_col_len (h=0x7f627d295940, hists=0x55ad1772bfe0) at util/hist.c:103 #3 hists__calc_col_len (hists=0x55ad1772bfe0, h=0x7f627d295940) at util/hist.c:79 #4 0x000055ad15d52c8c in output_resort (hists=hists@entry=0x55ad1772bfe0, prog=0x0, use_callchain=false, cb=cb@entry=0x0, cb_arg=0x0) at util/hist.c:1926 #5 0x000055ad15d530a4 in evsel__output_resort_cb (evsel=evsel@entry=0x55ad1772bde0, prog=prog@entry=0x0, cb=cb@entry=0x0, cb_arg=cb_arg@entry=0x0) at util/hist.c:1945 #6 0x000055ad15d53110 in evsel__output_resort (evsel=evsel@entry=0x55ad1772bde0, prog=prog@entry=0x0) at util/hist.c:1950 #7 0x000055ad15c6ae9a in perf_top__resort_hists (t=t@entry=0x7ffcd9cbf4f0) at builtin-top.c:311 #8 0x000055ad15c6cc6d in perf_top__print_sym_table (top=0x7ffcd9cbf4f0) at builtin-top.c:346 #9 display_thread (arg=0x7ffcd9cbf4f0) at builtin-top.c:700 torvalds#10 0x00007f6282fab4fa in start_thread (arg=<optimized out>) at pthread_create.c:443 torvalds#11 0x00007f628302e200 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 The reason is that strlen() get a pointer to a memory that has been freed. The string pointer is stored in the structure comm_str, which corresponds to a rb_tree node,when the node is erased, the memory of the string is also freed. In thread__comm_len(),it gets the pointer within the thread->comm_lock critical section, but passed to strlen() outside of the thread->comm_lock critical section, and the perf process_thread may called comm__free() concurrently, cause this segfault problem. The process is as follows: display_thread process_thread -------------- -------------- thread__comm_len -> thread__comm_str # held the comm read lock -> __thread__comm_str(thread) # release the comm read lock thread__delete # held the comm write lock -> comm__free -> comm_str__put(comm->comm_str) -> zfree(&cs->str) # release the comm write lock # The memory of the string pointed to by comm has been free. -> thread->comm_len = strlen(comm); This patch expand the critical section range of thread->comm_lock in thread__comm_len(), to make strlen() called safe. Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yunfeng Ye <yeyunfeng@huawei.com> Link: https://lore.kernel.org/r/322bfb49-840b-f3b6-9ef1-f9ec3435b07e@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I got a report of a msan failure like below: $ sudo perf lock con -ab -- sleep 1 ... ==224416==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5651160d6c96 in lock_contention_read util/bpf_lock_contention.c:290:8 #1 0x565115f90870 in __cmd_contention builtin-lock.c:1919:3 #2 0x565115f90870 in cmd_lock builtin-lock.c:2385:8 #3 0x565115f03a83 in run_builtin perf.c:330:11 #4 0x565115f03756 in handle_internal_command perf.c:384:8 #5 0x565115f02d53 in run_argv perf.c:428:2 #6 0x565115f02d53 in main perf.c:562:3 #7 0x7f43553bc632 in __libc_start_main #8 0x565115e865a9 in _start It was because the 'key' variable is not initialized. Actually it'd be set by bpf_map_get_next_key() but msan didn't seem to understand it. Let's make msan happy by initializing the variable. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230324001922.937634-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Seen in "perf stat --bpf-counters --for-each-cgroup test" running in a container: libbpf: Failed to bump RLIMIT_MEMLOCK (err = -1), you might need to do it explicitly! libbpf: Error in bpf_object__probe_loading():Operation not permitted(1). Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough value. libbpf: failed to load object 'bperf_cgroup_bpf' libbpf: failed to load BPF skeleton 'bperf_cgroup_bpf': -1 Failed to load cgroup skeleton #0 0x55f28a650981 in list_empty tools/include/linux/list.h:189 #1 0x55f28a6593b4 in evsel__exit util/evsel.c:1518 #2 0x55f28a6596af in evsel__delete util/evsel.c:1544 #3 0x55f28a89d166 in bperf_cgrp__destroy util/bpf_counter_cgroup.c:283 #4 0x55f28a899e9a in bpf_counter__destroy util/bpf_counter.c:816 #5 0x55f28a659455 in evsel__exit util/evsel.c:1520 #6 0x55f28a6596af in evsel__delete util/evsel.c:1544 #7 0x55f28a640d4d in evlist__purge util/evlist.c:148 #8 0x55f28a640ea6 in evlist__delete util/evlist.c:169 #9 0x55f28a4efbf2 in cmd_stat tools/perf/builtin-stat.c:2598 torvalds#10 0x55f28a6050c2 in run_builtin tools/perf/perf.c:330 torvalds#11 0x55f28a605633 in handle_internal_command tools/perf/perf.c:384 torvalds#12 0x55f28a6059fb in run_argv tools/perf/perf.c:428 torvalds#13 0x55f28a6061d3 in main tools/perf/perf.c:562 Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230410205659.3131608-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
…us union field If bperf (perf tools that use BPF skels) sets evsel->leader_skel or evsel->follower_skel then it appears that evsel->bpf_skel is set and can trigger the following use-after-free: ==13575==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c000014080 at pc 0x55684b939880 bp 0x7ffdfcf30d70 sp 0x7ffdfcf30d68 READ of size 8 at 0x60c000014080 thread T0 #0 0x55684b93987f in sample_filter_bpf__destroy tools/perf/bpf_skel/sample_filter.skel.h:44:11 #1 0x55684b93987f in perf_bpf_filter__destroy tools/perf/util/bpf-filter.c:155:2 #2 0x55684b98f71e in evsel__exit tools/perf/util/evsel.c:1521:2 #3 0x55684b98a352 in evsel__delete tools/perf/util/evsel.c:1547:2 #4 0x55684b981918 in evlist__purge tools/perf/util/evlist.c:148:3 #5 0x55684b981918 in evlist__delete tools/perf/util/evlist.c:169:2 #6 0x55684b887d60 in cmd_stat tools/perf/builtin-stat.c:2598:2 .. 0x60c000014080 is located 0 bytes inside of 128-byte region [0x60c000014080,0x60c000014100) freed by thread T0 here: #0 0x55684b780e86 in free compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 #1 0x55684b9462da in bperf_cgroup_bpf__destroy tools/perf/bpf_skel/bperf_cgroup.skel.h:61:2 #2 0x55684b9462da in bperf_cgrp__destroy tools/perf/util/bpf_counter_cgroup.c:282:2 #3 0x55684b944c75 in bpf_counter__destroy tools/perf/util/bpf_counter.c:819:2 #4 0x55684b98f716 in evsel__exit tools/perf/util/evsel.c:1520:2 #5 0x55684b98a352 in evsel__delete tools/perf/util/evsel.c:1547:2 #6 0x55684b981918 in evlist__purge tools/perf/util/evlist.c:148:3 #7 0x55684b981918 in evlist__delete tools/perf/util/evlist.c:169:2 #8 0x55684b887d60 in cmd_stat tools/perf/builtin-stat.c:2598:2 ... previously allocated by thread T0 here: #0 0x55684b781338 in calloc compiler-rt/lib/asan/asan_malloc_linux.cpp:77:3 #1 0x55684b944e25 in bperf_cgroup_bpf__open_opts tools/perf/bpf_skel/bperf_cgroup.skel.h:73:35 #2 0x55684b944e25 in bperf_cgroup_bpf__open tools/perf/bpf_skel/bperf_cgroup.skel.h:97:9 #3 0x55684b944e25 in bperf_load_program tools/perf/util/bpf_counter_cgroup.c:55:9 #4 0x55684b944e25 in bperf_cgrp__load tools/perf/util/bpf_counter_cgroup.c:178:23 #5 0x55684b889289 in __run_perf_stat tools/perf/builtin-stat.c:713:7 #6 0x55684b889289 in run_perf_stat tools/perf/builtin-stat.c:949:8 #7 0x55684b888029 in cmd_stat tools/perf/builtin-stat.c:2537:12 Resolve by clearing 'evsel->bpf_skel' as part of bpf_counter__destroy(). Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20230411051718.267228-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
gpi_ch_init() doesn't lock the ctrl_lock mutex, so there is no need to unlock it too. Instead the mutex is handled by the function gpi_alloc_chan_resources(), which properly locks and unlocks the mutex. ===================================== WARNING: bad unlock balance detected! 6.3.0-rc5-00253-g99792582ded1-dirty torvalds#15 Not tainted ------------------------------------- kworker/u16:0/9 is trying to release lock (&gpii->ctrl_lock) at: [<ffffb99d04e1284c>] gpi_alloc_chan_resources+0x108/0x5bc but there are no more locks to release! other info that might help us debug this: 6 locks held by kworker/u16:0/9: #0: ffff575740010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x220/0x594 #1: ffff80000809bdd0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x220/0x594 #2: ffff575740f2a0f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188 #3: ffff57574b5570f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188 #4: ffffb99d06a2f180 (of_dma_lock){+.+.}-{3:3}, at: of_dma_request_slave_channel+0x138/0x280 #5: ffffb99d06a2ee20 (dma_list_mutex){+.+.}-{3:3}, at: dma_get_slave_channel+0x28/0x10c stack backtrace: CPU: 7 PID: 9 Comm: kworker/u16:0 Not tainted 6.3.0-rc5-00253-g99792582ded1-dirty torvalds#15 Hardware name: Google Pixel 3 (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace+0xa0/0xfc show_stack+0x18/0x24 dump_stack_lvl+0x60/0xac dump_stack+0x18/0x24 print_unlock_imbalance_bug+0x130/0x148 lock_release+0x270/0x300 __mutex_unlock_slowpath+0x48/0x2cc mutex_unlock+0x20/0x2c gpi_alloc_chan_resources+0x108/0x5bc dma_chan_get+0x84/0x188 dma_get_slave_channel+0x5c/0x10c gpi_of_dma_xlate+0x110/0x1a0 of_dma_request_slave_channel+0x174/0x280 dma_request_chan+0x3c/0x2d4 geni_i2c_probe+0x544/0x63c platform_probe+0x68/0xc4 really_probe+0x148/0x2ac __driver_probe_device+0x78/0xe0 driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0x9c/0x188 device_initial_probe+0x14/0x20 bus_probe_device+0xac/0xb0 device_add+0x60c/0x7d8 of_device_add+0x44/0x60 of_platform_device_create_pdata+0x90/0x124 of_platform_bus_create+0x15c/0x3c8 of_platform_populate+0x58/0xf8 devm_of_platform_populate+0x58/0xbc geni_se_probe+0xf0/0x164 platform_probe+0x68/0xc4 really_probe+0x148/0x2ac __driver_probe_device+0x78/0xe0 driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0x9c/0x188 device_initial_probe+0x14/0x20 bus_probe_device+0xac/0xb0 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x2bc/0x594 worker_thread+0x228/0x438 kthread+0x108/0x10c ret_from_fork+0x10/0x20 Fixes: 5d0c353 ("dmaengine: qcom: Add GPI dma driver") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230409233355.453741-1-dmitry.baryshkov@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
Hayes Wang says: ==================== r8152: fix 2.5G devices v3: For patch #2, modify the comment. v2: For patch #1, Remove inline for fc_pause_on_auto() and fc_pause_off_auto(), and update the commit message. For patch #2, define the magic value for OCP register 0xa424. v1: These patches are used to fix some issues of RTL8156. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Sai Krishna says: ==================== octeontx2: Miscellaneous fixes This patchset includes following fixes. Patch #1 Fix for the race condition while updating APR table Patch #2 Fix end bit position in NPC scan config Patch #3 Fix depth of CAM, MEM table entries Patch #4 Fix in increase the size of DMAC filter flows Patch #5 Fix driver crash resulting from invalid interface type information retrieved from firmware Patch #6 Fix incorrect mask used while installing filters involving fragmented packets Patch #7 Fixes for NPC field hash extract w.r.t IPV6 hash reduction, IPV6 filed hash configuration. Patch #8 Fix for NPC hardware parser configuration destination address hash, IPV6 endianness issues. Patch #9 Fix for skipping mbox initialization for PFs disabled by firmware. Patch torvalds#10 Fix disabling packet I/O in case of mailbox timeout. Patch torvalds#11 Fix detaching LF resources in case of VF probe fail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Demotion reloaded, without migration