Minor improvement for smsc95xx netusb driver performance. #139

SiarheiVolkau · 2014-11-26T05:42:44Z

Reduce number of memcpy's by 1-2 improve transmit performance by 2-4%, or reduce cpu usage on a comparable value.

Reduce number of memcpy's by 1-2 improve transmit performance by 2-4% or reduce cpu usage on a comparable value.

Corrects the following checkpatch gripes: WARNING: quoted string split across lines torvalds#95: FILE: drivers/mfd/ab3100-core.c:95: + "write error (write register) " + "%d bytes transferred (expected 2)\n", WARNING: quoted string split across lines torvalds#139: FILE: drivers/mfd/ab3100-core.c:139: + "write error (write test register) " + "%d bytes transferred (expected 2)\n", WARNING: quoted string split across lines torvalds#175: FILE: drivers/mfd/ab3100-core.c:175: + "write error (send register address) " + "%d bytes transferred (expected 1)\n", WARNING: quoted string split across lines torvalds#193: FILE: drivers/mfd/ab3100-core.c:193: + "write error (read register) " + "%d bytes transferred (expected 1)\n", WARNING: quoted string split across lines torvalds#241: FILE: drivers/mfd/ab3100-core.c:241: + "write error (send first register address) " + "%d bytes transferred (expected 1)\n", WARNING: quoted string split across lines torvalds#256: FILE: drivers/mfd/ab3100-core.c:256: + "write error (read register page) " + "%d bytes transferred (expected %d)\n", WARNING: quoted string split across lines torvalds#299: FILE: drivers/mfd/ab3100-core.c:299: + "write error (maskset send address) " + "%d bytes transferred (expected 1)\n", WARNING: quoted string split across lines torvalds#314: FILE: drivers/mfd/ab3100-core.c:314: + "write error (maskset read register) " + "%d bytes transferred (expected 1)\n", WARNING: quoted string split across lines torvalds#334: FILE: drivers/mfd/ab3100-core.c:334: + "write error (write register) " + "%d bytes transferred (expected 2)\n", WARNING: please, no spaces at the start of a line torvalds#374: FILE: drivers/mfd/ab3100-core.c:374: + return blocking_notifier_chain_unregister(&ab3100->event_subscribers,$ WARNING: Prefer seq_puts to seq_printf torvalds#458: FILE: drivers/mfd/ab3100-core.c:458: + seq_printf(s, "AB3100 registers:\n"); WARNING: quoted string split across lines torvalds#564: FILE: drivers/mfd/ab3100-core.c:564: + "debug write reg[0x%02x] with 0x%02x, " + "after readback: 0x%02x\n", WARNING: quoted string split across lines torvalds#723: FILE: drivers/mfd/ab3100-core.c:723: + "AB3100 P1E variant detected, " + "forcing chip to 32KHz\n"); WARNING: quoted string split across lines torvalds#882: FILE: drivers/mfd/ab3100-core.c:882: + "could not communicate with the AB3100 analog " + "baseband chip\n"); WARNING: quoted string split across lines torvalds#906: FILE: drivers/mfd/ab3100-core.c:906: + dev_err(&client->dev, "accepting it anyway. Please update " + "the driver.\n"); total: 0 errors, 15 warnings, 999 lines checked Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>

If EIO happens after we have dropped j_state_lock, we won't notice that the journal has been aborted. So it is reasonable to move this check after we have grabbed the j_checkpoint_mutex and re-grabbed the j_state_lock. This patch helps to prevent false positive complain after EIO. #DMESG: __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available __jbd2_log_wait_for_space: no way to get more journal space in ram1-8 ------------[ cut here ]------------ WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200() Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 15 PID: 6739 Comm: fsstress Tainted: G W 3.17.0-rc2-00429-g684de57 torvalds#139 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8 0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898 ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0 Call Trace: [<ffffffff815c1a8c>] dump_stack+0x51/0x6d [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20 [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200 [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180 [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0 [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330 [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110 [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff812025bb>] ext4_create+0x8b/0x150 [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0 [<ffffffff8118097b>] do_last+0x7db/0xcf0 [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50 [<ffffffff811845d2>] path_openat+0x242/0x590 [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140 [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0 [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140 [<ffffffff81172f20>] do_sys_open+0x170/0x220 [<ffffffff8117300e>] SyS_open+0x1e/0x20 [<ffffffff811715d6>] SyS_creat+0x16/0x20 [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b ---[ end trace cd71c831f82059db ]--- Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Elizafox · 2015-01-08T13:48:36Z

Linus doesn't accept pull requests from GitHub. Consult https://github.com/torvalds/linux/blob/master/Documentation/HOWTO instead.

Stress testing which injects soft offlining events for the process which iterates "mmap-pagefault-munmap" loop can trigger BUG_ON in __free_one_page due to PageHWPoison flag. If page migration succeeds, the source page is supposed to be freed after migration. But it seems that there can be a strange page state where it's almost a free page, but it is somewhat reachable via drain_pages_zone. maybe due to a race between __pagevec_lru_add_fn and putback_lru_page, there could be [ 14.025761] Soft offlining page 0x70fe1 at 0x70100008d000 [ 14.029400] Soft offlining page 0x705fb at 0x70300008d000 [ 14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 [ 14.031186] flags: 0x1fffff80800000(hwpoison) [ 14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) [ 14.031186] ------------[ cut here ]------------ [ 14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585! [ 14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy [ 14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 [ 14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000 [ 14.031186] RIP: 0010:[<ffffffff811a806a>] [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP: 0018:ffff88007a117d28 EFLAGS: 00010096 [ 14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006 [ 14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0 [ 14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe [ 14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1 [ 14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840 [ 14.031186] FS: 00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000 [ 14.031186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0 [ 14.031186] Stack: [ 14.031186] ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998 [ 14.031186] ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003 [ 14.031186] ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998 [ 14.031186] Call Trace: [ 14.031186] [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50 [ 14.031186] [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50 [ 14.031186] [<ffffffff811a839d>] drain_local_pages+0x1d/0x30 [ 14.031186] [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80 [ 14.031186] [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0 [ 14.031186] [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0 [ 14.031186] [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780 [ 14.031186] [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50 [ 14.031186] [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120 [ 14.031186] [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70 [ 14.031186] [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170 [ 14.031186] [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d [ 14.031186] [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17 [ 14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff [ 14.031186] RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP <ffff88007a117d28> [ 14.031186] ---[ end trace 53926436e76d1f35 ]--- ate_pages() This looks strange to me because soft offline makes target page free at first then mark PageHWPoison. We call drain_all_pages() ... Currently hard offline (memory_failure()) and soft offline isolate the in-use target page differently. Hard offline keeps its refcount (so it intentionally leaks error pages,) but soft offline frees it then marks PageHWPoison. This "free then mark PageHWPoison" behavior sometimes doesn't work due to pcplist (commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()" refers to it, but unfortunately it didn't solve this problem.) That's because page freeing from drain_pages_zone() can't handle hwpoisoned page and it's still on a pcplist after page migration with "freed" status (so drain_all_pages() doesn't work.) I don't have clear idea about a fix on freeing code, but think that we should avoid freeing hwpoisoned page as hard offline code does. So this patch does this. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): [ 14.025761] Soft offlining page 0x70fe1 at 0x70100008d000 [ 14.029400] Soft offlining page 0x705fb at 0x70300008d000 [ 14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 [ 14.031186] flags: 0x1fffff80800000(hwpoison) [ 14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) [ 14.031186] ------------[ cut here ]------------ [ 14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585! [ 14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy [ 14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 [ 14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000 [ 14.031186] RIP: 0010:[<ffffffff811a806a>] [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP: 0018:ffff88007a117d28 EFLAGS: 00010096 [ 14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006 [ 14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0 [ 14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe [ 14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1 [ 14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840 [ 14.031186] FS: 00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000 [ 14.031186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0 [ 14.031186] Stack: [ 14.031186] ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998 [ 14.031186] ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003 [ 14.031186] ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998 [ 14.031186] Call Trace: [ 14.031186] [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50 [ 14.031186] [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50 [ 14.031186] [<ffffffff811a839d>] drain_local_pages+0x1d/0x30 [ 14.031186] [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80 [ 14.031186] [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0 [ 14.031186] [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0 [ 14.031186] [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780 [ 14.031186] [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50 [ 14.031186] [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120 [ 14.031186] [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70 [ 14.031186] [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170 [ 14.031186] [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d [ 14.031186] [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17 [ 14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff [ 14.031186] RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP <ffff88007a117d28> [ 14.031186] ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can give to page migration code the "reason" which shows the caller, so avoiding calling putback_lru_page() when called from soft offline does what we need. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): [ 14.025761] Soft offlining page 0x70fe1 at 0x70100008d000 [ 14.029400] Soft offlining page 0x705fb at 0x70300008d000 [ 14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 [ 14.031186] flags: 0x1fffff80800000(hwpoison) [ 14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) [ 14.031186] ------------[ cut here ]------------ [ 14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585! [ 14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy [ 14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 [ 14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000 [ 14.031186] RIP: 0010:[<ffffffff811a806a>] [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP: 0018:ffff88007a117d28 EFLAGS: 00010096 [ 14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006 [ 14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0 [ 14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe [ 14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1 [ 14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840 [ 14.031186] FS: 00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000 [ 14.031186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0 [ 14.031186] Stack: [ 14.031186] ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998 [ 14.031186] ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003 [ 14.031186] ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998 [ 14.031186] Call Trace: [ 14.031186] [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50 [ 14.031186] [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50 [ 14.031186] [<ffffffff811a839d>] drain_local_pages+0x1d/0x30 [ 14.031186] [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80 [ 14.031186] [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0 [ 14.031186] [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0 [ 14.031186] [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780 [ 14.031186] [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50 [ 14.031186] [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120 [ 14.031186] [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70 [ 14.031186] [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170 [ 14.031186] [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d [ 14.031186] [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17 [ 14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff [ 14.031186] RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP <ffff88007a117d28> [ 14.031186] ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can pass to page migration code the "reason" which shows the caller, so let's use this more to avoid calling putback_lru_page() when called from soft offline, which effectively does the isolation for soft offline. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes the related code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): [ 14.025761] Soft offlining page 0x70fe1 at 0x70100008d000 [ 14.029400] Soft offlining page 0x705fb at 0x70300008d000 [ 14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 [ 14.031186] flags: 0x1fffff80800000(hwpoison) [ 14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) [ 14.031186] ------------[ cut here ]------------ [ 14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585! [ 14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy [ 14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 [ 14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000 [ 14.031186] RIP: 0010:[<ffffffff811a806a>] [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP: 0018:ffff88007a117d28 EFLAGS: 00010096 [ 14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006 [ 14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0 [ 14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe [ 14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1 [ 14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840 [ 14.031186] FS: 00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000 [ 14.031186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0 [ 14.031186] Stack: [ 14.031186] ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998 [ 14.031186] ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003 [ 14.031186] ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998 [ 14.031186] Call Trace: [ 14.031186] [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50 [ 14.031186] [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50 [ 14.031186] [<ffffffff811a839d>] drain_local_pages+0x1d/0x30 [ 14.031186] [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80 [ 14.031186] [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0 [ 14.031186] [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0 [ 14.031186] [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780 [ 14.031186] [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50 [ 14.031186] [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120 [ 14.031186] [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70 [ 14.031186] [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170 [ 14.031186] [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d [ 14.031186] [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17 [ 14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff [ 14.031186] RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP <ffff88007a117d28> [ 14.031186] ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can pass to page migration code the "reason" which shows the caller, so let's use this more to avoid calling putback_lru_page() when called from soft offline, which effectively does the isolation for soft offline. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes the related code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): Soft offlining page 0x70fe1 at 0x70100008d000 Soft offlining page 0x705fb at 0x70300008d000 page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 flags: 0x1fffff80800000(hwpoison) page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) ------------[ cut here ]------------ kernel BUG at /src/linux-dev/mm/page_alloc.c:585! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 #139 RIP: free_pcppages_bulk+0x52a/0x6f0 Call Trace: drain_pages_zone+0x3d/0x50 drain_local_pages+0x1d/0x30 on_each_cpu_mask+0x46/0x80 drain_all_pages+0x14b/0x1e0 soft_offline_page+0x432/0x6e0 SyS_madvise+0x73c/0x780 system_call_fastpath+0x12/0x17 Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 RSP <ffff88007a117d28> ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can pass to page migration code the "reason" which shows the caller, so let's use this more to avoid calling putback_lru_page() when called from soft offline, which effectively does the isolation for soft offline. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes the related code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

WARNING: line over 80 characters torvalds#110: FILE: drivers/block/drbd/drbd_bitmap.c:1010: + page = mempool_alloc(drbd_md_io_page_pool, __GFP_HIGHMEM|__GFP_RECLAIM); WARNING: line over 80 characters torvalds#139: FILE: drivers/block/nvme-core.c:1039: + ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, __GFP_RECLAIM); WARNING: line over 80 characters torvalds#466: FILE: include/linux/gfp.h:110: +#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM)) ERROR: code indent should use tabs where possible torvalds#547: FILE: kernel/power/swap.c:978: +^I^I __get_free_page(__GFP_RECLAIM | __GFP_HIGH);$ ERROR: code indent should use tabs where possible torvalds#557: FILE: kernel/power/swap.c:1245: +^I^I __GFP_RECLAIM | __GFP_HIGH :$ ERROR: code indent should use tabs where possible torvalds#558: FILE: kernel/power/swap.c:1246: +^I^I __GFP_RECLAIM | __GFP_NOWARN |$ WARNING: line over 80 characters torvalds#570: FILE: lib/percpu_ida.c:138: + * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep ERROR: code indent should use tabs where possible torvalds#596: FILE: mm/failslab.c:19: + if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$ WARNING: please, no spaces at the start of a line torvalds#596: FILE: mm/failslab.c:19: + if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_RECLAIM))$ WARNING: line over 80 characters torvalds#617: FILE: mm/filemap.c:2717: + * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS). total: 4 errors, 6 warnings, 463 lines checked NOTE: Whitespace errors detected. You may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/mm-page_alloc-rename-__gfp_wait-to-__gfp_reclaim.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[ Upstream commit add05ce ] Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): Soft offlining page 0x70fe1 at 0x70100008d000 Soft offlining page 0x705fb at 0x70300008d000 page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 flags: 0x1fffff80800000(hwpoison) page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) ------------[ cut here ]------------ kernel BUG at /src/linux-dev/mm/page_alloc.c:585! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 RIP: free_pcppages_bulk+0x52a/0x6f0 Call Trace: drain_pages_zone+0x3d/0x50 drain_local_pages+0x1d/0x30 on_each_cpu_mask+0x46/0x80 drain_all_pages+0x14b/0x1e0 soft_offline_page+0x432/0x6e0 SyS_madvise+0x73c/0x780 system_call_fastpath+0x12/0x17 Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 RSP <ffff88007a117d28> ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can pass to page migration code the "reason" which shows the caller, so let's use this more to avoid calling putback_lru_page() when called from soft offline, which effectively does the isolation for soft offline. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes the related code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

Stress testing showed that soft offline events for a process iterating "mmap-pagefault-munmap" loop can trigger VM_BUG_ON(PAGE_FLAGS_CHECK_AT_PREP) in __free_one_page(): [ 14.025761] Soft offlining page 0x70fe1 at 0x70100008d000 [ 14.029400] Soft offlining page 0x705fb at 0x70300008d000 [ 14.030208] page:ffffea0001c3f840 count:0 mapcount:0 mapping: (null) index:0x2 [ 14.031186] flags: 0x1fffff80800000(hwpoison) [ 14.031186] page dumped because: VM_BUG_ON_PAGE(page->flags & ((1 << 25) - 1)) [ 14.031186] ------------[ cut here ]------------ [ 14.031186] kernel BUG at /src/linux-dev/mm/page_alloc.c:585! [ 14.031186] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 14.031186] Modules linked in: cfg80211 rfkill crc32c_intel microcode ppdev parport_pc pcspkr serio_raw virtio_balloon parport i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy [ 14.031186] CPU: 3 PID: 1779 Comm: test_base_madv_ Not tainted 4.0.0-v4.0-150511-1451-00009-g82360a3730e6 torvalds#139 [ 14.031186] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 14.031186] task: ffff88007d33bae0 ti: ffff88007a114000 task.ti: ffff88007a114000 [ 14.031186] RIP: 0010:[<ffffffff811a806a>] [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP: 0018:ffff88007a117d28 EFLAGS: 00010096 [ 14.031186] RAX: 0000000000000042 RBX: ffffea0001c3f860 RCX: 0000000000000006 [ 14.031186] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88011f50d3d0 [ 14.031186] RBP: ffff88007a117da8 R08: 000000000000000a R09: 00000000fffffffe [ 14.031186] R10: 0000000000001d3e R11: 0000000000000002 R12: 0000000000070fe1 [ 14.031186] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea0001c3f840 [ 14.031186] FS: 00007f8a8e3e1740(0000) GS:ffff88011f500000(0000) knlGS:0000000000000000 [ 14.031186] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.031186] CR2: 00007f78c7341258 CR3: 000000007bb08000 CR4: 00000000000007e0 [ 14.031186] Stack: [ 14.031186] ffff88011f5189c8 ffff88011f5189b8 ffffea0001c3f840 ffff88011f518998 [ 14.031186] ffffea0001d30cc0 0000001200000002 0000000200000012 0000000000000003 [ 14.031186] ffff88007ffda6c0 000000000000000a ffff88007a117dd8 ffff88011f518998 [ 14.031186] Call Trace: [ 14.031186] [<ffffffff811a8380>] ? page_alloc_cpu_notify+0x50/0x50 [ 14.031186] [<ffffffff811a82bd>] drain_pages_zone+0x3d/0x50 [ 14.031186] [<ffffffff811a839d>] drain_local_pages+0x1d/0x30 [ 14.031186] [<ffffffff81122a96>] on_each_cpu_mask+0x46/0x80 [ 14.031186] [<ffffffff811a5e8b>] drain_all_pages+0x14b/0x1e0 [ 14.031186] [<ffffffff812151a2>] soft_offline_page+0x432/0x6e0 [ 14.031186] [<ffffffff811e2dac>] SyS_madvise+0x73c/0x780 [ 14.031186] [<ffffffff810dcb3f>] ? put_prev_task_fair+0x2f/0x50 [ 14.031186] [<ffffffff81143f74>] ? __audit_syscall_entry+0xc4/0x120 [ 14.031186] [<ffffffff8105bdac>] ? do_audit_syscall_entry+0x6c/0x70 [ 14.031186] [<ffffffff8105cc63>] ? syscall_trace_enter_phase1+0x103/0x170 [ 14.031186] [<ffffffff816f908e>] ? int_check_syscall_exit_work+0x34/0x3d [ 14.031186] [<ffffffff816f8e72>] system_call_fastpath+0x12/0x17 [ 14.031186] Code: ff 89 45 b4 48 8b 45 c0 48 83 b8 a8 00 00 00 00 0f 85 e3 fb ff ff 0f 1f 00 0f 0b 48 8b 7d 90 48 c7 c6 e8 95 a6 81 e8 e6 32 02 00 <0f> 0b 8b 45 cc 49 89 47 30 41 8b 47 18 83 f8 ff 0f 85 10 ff ff [ 14.031186] RIP [<ffffffff811a806a>] free_pcppages_bulk+0x52a/0x6f0 [ 14.031186] RSP <ffff88007a117d28> [ 14.031186] ---[ end trace 53926436e76d1f35 ]--- When soft offline successfully migrates page, the source page is supposed to be freed. But there is a race condition where a source page looks isolated (i.e. the refcount is 0 and the PageHWPoison is set) but somewhat linked to pcplist. Then another soft offline event calls drain_all_pages() and tries to free such hwpoisoned page, which is forbidden. This odd page state seems to happen due to the race between put_page() in putback_lru_page() and __pagevec_lru_add_fn(). But I don't want to play with tweaking drain code as done in commit 9ab3b59 "mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()", or to change page freeing code for this soft offline's purpose. Instead, let's think about the difference between hard offline and soft offline. There is an interesting difference in how to isolate the in-use page between these, that is, hard offline marks PageHWPoison of the target page at first, and doesn't free it by keeping its refcount 1. OTOH, soft offline tries to free the target page then marks PageHWPoison. This difference might be the source of complexity and result in bugs like the above. So making soft offline isolate with keeping refcount can be a solution for this problem. We can pass to page migration code the "reason" which shows the caller, so let's use this more to avoid calling putback_lru_page() when called from soft offline, which effectively does the isolation for soft offline. With this change, target pages of soft offline never be reused without changing migratetype, so this patch also removes the related code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

While hacking on kTLS, I ran into the following panic from an unprivileged netserver / netperf TCP session: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0 Oops: 0010 [#1] SMP KASAN PTI CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ torvalds#139 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88036abcf740 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26 RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280 RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48 R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280 R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350 FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? tls_sw_poll+0xa4/0x160 [tls] ? sock_poll+0x20a/0x680 ? do_select+0x77b/0x11a0 ? poll_schedule_timeout.constprop.12+0x130/0x130 ? pick_link+0xb00/0xb00 ? read_word_at_a_time+0x13/0x20 ? vfs_poll+0x270/0x270 ? deref_stack_reg+0xad/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 [...] Debugging further, it turns out that calling into ctx->sk_poll() is invalid since sk_poll itself is NULL which was saved from the original TCP socket in order for tls_sw_poll() to invoke it. Looks like the recent conversion from poll to poll_mask callback started in 1525242 ("net: add support for ->poll_mask in proto_ops") missed to eventually convert kTLS, too: TCP's ->poll was converted over to the ->poll_mask in commit 2c7d3da ("net/tcp: convert to ->poll_mask") and therefore kTLS wrongly saved the ->poll old one which is now NULL. Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN | POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in tcp_poll_mask() as well that is mangled here. Fixes: 2c7d3da ("net/tcp: convert to ->poll_mask") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Watson <davejwatson@fb.com>

While hacking on kTLS, I ran into the following panic from an unprivileged netserver / netperf TCP session: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0 Oops: 0010 [#1] SMP KASAN PTI CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ torvalds#139 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88036abcf740 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26 RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280 RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48 R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280 R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350 FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? tls_sw_poll+0xa4/0x160 [tls] ? sock_poll+0x20a/0x680 ? do_select+0x77b/0x11a0 ? poll_schedule_timeout.constprop.12+0x130/0x130 ? pick_link+0xb00/0xb00 ? read_word_at_a_time+0x13/0x20 ? vfs_poll+0x270/0x270 ? deref_stack_reg+0xad/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 [...] Debugging further, it turns out that calling into ctx->sk_poll() is invalid since sk_poll itself is NULL which was saved from the original TCP socket in order for tls_sw_poll() to invoke it. Looks like the recent conversion from poll to poll_mask callback started in 1525242 ("net: add support for ->poll_mask in proto_ops") missed to eventually convert kTLS, too: TCP's ->poll was converted over to the ->poll_mask in commit 2c7d3da ("net/tcp: convert to ->poll_mask") and therefore kTLS wrongly saved the ->poll old one which is now NULL. Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN | POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in tcp_poll_mask() as well that is mangled here. Fixes: 2c7d3da ("net/tcp: convert to ->poll_mask") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Watson <davejwatson@fb.com> Tested-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>

WARNING: line over 80 characters torvalds#34: FILE: fs/ocfs2/alloc.c:1484: + status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci), ERROR: code indent should use tabs where possible torvalds#35: FILE: fs/ocfs2/alloc.c:1485: +^I^I^I^I "Owner %llu has empty extent list (next_free_rec == 0)\n",$ WARNING: line over 80 characters torvalds#36: FILE: fs/ocfs2/alloc.c:1486: + (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci)); ERROR: code indent should use tabs where possible torvalds#36: FILE: fs/ocfs2/alloc.c:1486: +^I^I^I^I (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci));$ WARNING: line over 80 characters torvalds#46: FILE: fs/ocfs2/alloc.c:1492: + status = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci), ERROR: code indent should use tabs where possible torvalds#47: FILE: fs/ocfs2/alloc.c:1493: +^I^I^I^I "Owner %llu has extent list where extent # %d has no physical block start\n",$ WARNING: line over 80 characters torvalds#48: FILE: fs/ocfs2/alloc.c:1494: + (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i); ERROR: code indent should use tabs where possible torvalds#48: FILE: fs/ocfs2/alloc.c:1494: +^I^I^I^I (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), i);$ WARNING: line over 80 characters torvalds#61: FILE: fs/ocfs2/alloc.c:3215: + ret = ocfs2_error(ocfs2_metadata_cache_get_super(et->et_ci), ERROR: code indent should use tabs where possible torvalds#62: FILE: fs/ocfs2/alloc.c:3216: +^I^I^I^I "Owner %llu has empty extent block at %llu\n",$ WARNING: line over 80 characters torvalds#63: FILE: fs/ocfs2/alloc.c:3217: + (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci), ERROR: code indent should use tabs where possible torvalds#63: FILE: fs/ocfs2/alloc.c:3217: +^I^I^I^I (unsigned long long)ocfs2_metadata_cache_owner(et->et_ci),$ WARNING: line over 80 characters torvalds#64: FILE: fs/ocfs2/alloc.c:3218: + (unsigned long long)le64_to_cpu(eb->h_blkno)); ERROR: code indent should use tabs where possible torvalds#64: FILE: fs/ocfs2/alloc.c:3218: +^I^I^I^I (unsigned long long)le64_to_cpu(eb->h_blkno));$ ERROR: code indent should use tabs where possible torvalds#79: FILE: fs/ocfs2/alloc.c:4412: +^I^I^I^I^I "Extent block #%llu has an invalid l_next_free_rec of %d. It should have matched the l_count of %d\n",$ WARNING: line over 80 characters torvalds#80: FILE: fs/ocfs2/alloc.c:4413: + (unsigned long long)le64_to_cpu(eb->h_blkno), ERROR: code indent should use tabs where possible torvalds#80: FILE: fs/ocfs2/alloc.c:4413: +^I^I^I^I^I (unsigned long long)le64_to_cpu(eb->h_blkno),$ WARNING: line over 80 characters torvalds#81: FILE: fs/ocfs2/alloc.c:4414: + le16_to_cpu(new_el->l_next_free_rec), ERROR: code indent should use tabs where possible torvalds#81: FILE: fs/ocfs2/alloc.c:4414: +^I^I^I^I^I le16_to_cpu(new_el->l_next_free_rec),$ WARNING: line over 80 characters torvalds#82: FILE: fs/ocfs2/alloc.c:4415: + le16_to_cpu(new_el->l_count)); ERROR: code indent should use tabs where possible torvalds#82: FILE: fs/ocfs2/alloc.c:4415: +^I^I^I^I^I le16_to_cpu(new_el->l_count));$ ERROR: code indent should use tabs where possible torvalds#96: FILE: fs/ocfs2/alloc.c:4466: +^I^I^I^I^I "Extent block #%llu has an invalid l_next_free_rec of %d\n",$ WARNING: line over 80 characters torvalds#97: FILE: fs/ocfs2/alloc.c:4467: + (unsigned long long)le64_to_cpu(eb->h_blkno), ERROR: code indent should use tabs where possible torvalds#97: FILE: fs/ocfs2/alloc.c:4467: +^I^I^I^I^I (unsigned long long)le64_to_cpu(eb->h_blkno),$ WARNING: line over 80 characters torvalds#98: FILE: fs/ocfs2/alloc.c:4468: + le16_to_cpu(new_el->l_next_free_rec)); ERROR: code indent should use tabs where possible torvalds#98: FILE: fs/ocfs2/alloc.c:4468: +^I^I^I^I^I le16_to_cpu(new_el->l_next_free_rec));$ WARNING: line over 80 characters torvalds#114: FILE: fs/ocfs2/localalloc.c:666: + status = ocfs2_error(osb->sb, "local alloc inode %llu says it has %u used bits, but a count shows %u\n", WARNING: line over 80 characters torvalds#115: FILE: fs/ocfs2/localalloc.c:667: + (unsigned long long)le64_to_cpu(alloc->i_blkno), ERROR: code indent should use tabs where possible torvalds#115: FILE: fs/ocfs2/localalloc.c:667: +^I^I^I (unsigned long long)le64_to_cpu(alloc->i_blkno),$ ERROR: code indent should use tabs where possible torvalds#116: FILE: fs/ocfs2/localalloc.c:668: +^I^I^I le32_to_cpu(alloc->id1.bitmap1.i_used),$ ERROR: code indent should use tabs where possible torvalds#117: FILE: fs/ocfs2/localalloc.c:669: +^I^I^I ocfs2_local_alloc_count_bits(alloc));$ ERROR: code indent should use tabs where possible torvalds#138: FILE: fs/ocfs2/quota_local.c:142: +^I^I^I "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n",$ WARNING: %Lu is non-standard C, use %llu torvalds#138: FILE: fs/ocfs2/quota_local.c:142: + "Quota file %llu is probably corrupted! Requested to read block %Lu but file has size only %Lu\n", ERROR: code indent should use tabs where possible torvalds#139: FILE: fs/ocfs2/quota_local.c:143: +^I^I^I (unsigned long long)OCFS2_I(inode)->ip_blkno,$ ERROR: code indent should use tabs where possible torvalds#140: FILE: fs/ocfs2/quota_local.c:144: +^I^I^I (unsigned long long)v_block,$ ERROR: code indent should use tabs where possible torvalds#141: FILE: fs/ocfs2/quota_local.c:145: +^I^I^I (unsigned long long)i_size_read(inode));$ total: 21 errors, 15 warnings, 108 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. NOTE: Whitespace errors detected. You may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/ocfs2-return-erofs-when-filesystem-becomes-read-only.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

[ Upstream commit 803f317 ] Fix an integer underflow that leads to a null pointer dereference in 'mt7601u_rx_skb_from_seg()'. The variable 'dma_len' in the URB packet could be manipulated, which could trigger an integer underflow of 'seg_len' in 'mt7601u_rx_process_seg()'. This underflow subsequently causes the 'bad_frame' checks in 'mt7601u_rx_skb_from_seg()' to be bypassed, eventually leading to a dereference of the pointer 'p', which is a null pointer. Ensure that 'dma_len' is greater than 'min_seg_len'. Found by a modified version of syzkaller. KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G W O 5.14.0+ torvalds#139 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_add_rx_frag+0x143/0x370 Code: e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 86 01 00 00 4c 8d 7d 08 44 89 68 08 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cd 01 00 00 48 8b 45 08 a8 01 0f 85 3d 01 00 00 RSP: 0018:ffffc900000cfc90 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888115520dc0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8881118430c0 RDI: ffff8881118430f8 RBP: 0000000000000000 R08: 0000000000000e09 R09: 0000000000000010 R10: ffff888111843017 R11: ffffed1022308602 R12: 0000000000000000 R13: 0000000000000e09 R14: 0000000000000010 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffff88811a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004035af40 CR3: 00000001157f2000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mt7601u_rx_tasklet+0xc73/0x1270 ? mt7601u_submit_rx_buf.isra.0+0x510/0x510 ? tasklet_action_common.isra.0+0x79/0x2f0 tasklet_action_common.isra.0+0x206/0x2f0 __do_softirq+0x1b5/0x880 ? tasklet_unlock+0x30/0x30 run_ksoftirqd+0x26/0x50 smpboot_thread_fn+0x34f/0x7d0 ? smpboot_register_percpu_thread+0x370/0x370 kthread+0x3a1/0x480 ? set_kthread_struct+0x120/0x120 ret_from_fork+0x1f/0x30 Modules linked in: 88XXau(O) 88x2bu(O) ---[ end trace 57f34f93b4da0f9b ]--- RIP: 0010:skb_add_rx_frag+0x143/0x370 Code: e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 86 01 00 00 4c 8d 7d 08 44 89 68 08 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cd 01 00 00 48 8b 45 08 a8 01 0f 85 3d 01 00 00 RSP: 0018:ffffc900000cfc90 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888115520dc0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8881118430c0 RDI: ffff8881118430f8 RBP: 0000000000000000 R08: 0000000000000e09 R09: 0000000000000010 R10: ffff888111843017 R11: ffffed1022308602 R12: 0000000000000000 R13: 0000000000000e09 R14: 0000000000000010 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffff88811a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004035af40 CR3: 00000001157f2000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221229092906.2328282-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Sasha Levin <sashal@kernel.org>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. Then if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of length before calling dax_zero_range(). This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes case of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

[ Upstream commit 803f317 ] Fix an integer underflow that leads to a null pointer dereference in 'mt7601u_rx_skb_from_seg()'. The variable 'dma_len' in the URB packet could be manipulated, which could trigger an integer underflow of 'seg_len' in 'mt7601u_rx_process_seg()'. This underflow subsequently causes the 'bad_frame' checks in 'mt7601u_rx_skb_from_seg()' to be bypassed, eventually leading to a dereference of the pointer 'p', which is a null pointer. Ensure that 'dma_len' is greater than 'min_seg_len'. Found by a modified version of syzkaller. KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G W O 5.14.0+ torvalds#139 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_add_rx_frag+0x143/0x370 Code: e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 86 01 00 00 4c 8d 7d 08 44 89 68 08 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cd 01 00 00 48 8b 45 08 a8 01 0f 85 3d 01 00 00 RSP: 0018:ffffc900000cfc90 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888115520dc0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8881118430c0 RDI: ffff8881118430f8 RBP: 0000000000000000 R08: 0000000000000e09 R09: 0000000000000010 R10: ffff888111843017 R11: ffffed1022308602 R12: 0000000000000000 R13: 0000000000000e09 R14: 0000000000000010 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffff88811a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004035af40 CR3: 00000001157f2000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mt7601u_rx_tasklet+0xc73/0x1270 ? mt7601u_submit_rx_buf.isra.0+0x510/0x510 ? tasklet_action_common.isra.0+0x79/0x2f0 tasklet_action_common.isra.0+0x206/0x2f0 __do_softirq+0x1b5/0x880 ? tasklet_unlock+0x30/0x30 run_ksoftirqd+0x26/0x50 smpboot_thread_fn+0x34f/0x7d0 ? smpboot_register_percpu_thread+0x370/0x370 kthread+0x3a1/0x480 ? set_kthread_struct+0x120/0x120 ret_from_fork+0x1f/0x30 Modules linked in: 88XXau(O) 88x2bu(O) ---[ end trace 57f34f93b4da0f9b ]--- RIP: 0010:skb_add_rx_frag+0x143/0x370 Code: e2 07 83 c2 03 38 ca 7c 08 84 c9 0f 85 86 01 00 00 4c 8d 7d 08 44 89 68 08 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cd 01 00 00 48 8b 45 08 a8 01 0f 85 3d 01 00 00 RSP: 0018:ffffc900000cfc90 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888115520dc0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8881118430c0 RDI: ffff8881118430f8 RBP: 0000000000000000 R08: 0000000000000e09 R09: 0000000000000010 R10: ffff888111843017 R11: ffffed1022308602 R12: 0000000000000000 R13: 0000000000000e09 R14: 0000000000000010 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffff88811a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000004035af40 CR3: 00000001157f2000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221229092906.2328282-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Sasha Levin <sashal@kernel.org>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc CC: stable@vger.kernel.org Fixes: 2aa3048 ("iomap: switch iomap_zero_range to use iomap_iter") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>

commit fcced95 upstream. PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc CC: stable@vger.kernel.org Fixes: 2aa3048 ("iomap: switch iomap_zero_range to use iomap_iter") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

BugLink: https://bugs.launchpad.net/bugs/2036075 commit fcced95 upstream. PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 torvalds#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc CC: stable@vger.kernel.org Fixes: 2aa3048 ("iomap: switch iomap_zero_range to use iomap_iter") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

Minor improvement for smsc95xx netusb driver performance.

fadbf30

Reduce number of memcpy's by 1-2 improve transmit performance by 2-4% or reduce cpu usage on a comparable value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor improvement for smsc95xx netusb driver performance. #139

Minor improvement for smsc95xx netusb driver performance. #139

SiarheiVolkau commented Nov 26, 2014

Elizafox commented Jan 8, 2015

Minor improvement for smsc95xx netusb driver performance. #139

Are you sure you want to change the base?

Minor improvement for smsc95xx netusb driver performance. #139

Conversation

SiarheiVolkau commented Nov 26, 2014

Elizafox commented Jan 8, 2015