Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless: FIMC driver misc fixes #1

Merged
merged 6 commits into from
Mar 25, 2014
Merged

Endless: FIMC driver misc fixes #1

merged 6 commits into from
Mar 25, 2014

Conversation

ndufresne
Copy link

I've made several fixes tot he fimc drivers. The one that are not backport has been submitted for review to linux-media ML.

For tiled format, we need to allocated a multiple of the row size. A good
example is for 1280x720, wich get adjusted to 1280x736. In tiles this mean Y
plane is 20x23 and UV plane 20x12. Because of the rounding, the previous code
would only have enough space to fit half of the last row.
The supported planar YUV format (YUV420P and YUV422P) has 3 planes, where
the bytesperline for each should be width, widht/2, width/2. It is expected that
width has been aligned to a multiple of 4 to stay word aligned.
Depth and payload is defined per memory plane. It's better to iterate using
number of memory planes. this was not causing much issue since the rest
of the arrays involved where intialized to zero.
All YUV 422 has 16bit per pixels.
This formula did not take into account the required tiled alignement for
NV12MT format. As this was already computed an store in payload array
initially, simply reuse that value.
This is not allowed by the spec, and only served the purpose of hiding bugs
in size calculation so far.
dsd added a commit that referenced this pull request Mar 25, 2014
Endless: FIMC driver misc fixes
@dsd dsd merged commit 7b3e0af into dsd:endless Mar 25, 2014
dsd pushed a commit that referenced this pull request Apr 2, 2014
commit 2172fa7 upstream.

Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] ------------[ cut here ]------------
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode: 0000 [hardkernel#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[  474.556058] Stack:
[  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP <ffff8805c0ac3c38>
[  475.328734] ---[ end trace f076482e9d754adc ]---

Reported-by:  Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Apr 2, 2014
…rm_data

commit ac323d8 upstream.

Fix NULL pointer dereference of "chip->pdata" if platform_data was not
supplied to the driver.

The driver during probe stored the pointer to the platform_data:
	chip->pdata = client->dev.platform_data;
Later it was dereferenced in max17040_get_online() and
max17040_get_status().

If platform_data was not supplied, the NULL pointer exception would
happen:

[    6.626094] Unable to handle kernel  of a at virtual address 00000000
[    6.628557] pgd = c0004000
[    6.632868] [00000000] *pgd=66262564
[    6.634636] Unable to handle kernel paging request at virtual address e6262000
[    6.642014] pgd = de468000
[    6.644700] [e6262000] *pgd=00000000
[    6.648265] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    6.653552] Modules linked in:
[    6.656598] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 3.10.14-02717-gc58b4b4 torvalds#505
[    6.664334] Workqueue: events max17040_work
[    6.668488] task: dfa11b80 ti: df9f6000 task.ti: df9f6000
[    6.673873] PC is at show_pte+0x80/0xb8
[    6.677687] LR is at show_pte+0x3c/0xb8
[    6.681503] pc : [<c001b7b8>]    lr : [<c001b774>]    psr: 600f0113
[    6.681503] sp : df9f7d58  ip : 600f0113  fp : 00000009
[    6.692965] r10: 00000000  r9 : 00000000  r8 : dfa11b80
[    6.698171] r7 : df9f7ea0  r6 : e6262000  r5 : 00000000  r4 : 00000000
[    6.704680] r3 : 00000000  r2 : e6262000  r1 : 600f0193  r0 : c05b3750
[    6.711194] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    6.718485] Control: 10c53c7d  Table: 5e46806a  DAC: 00000015
[    6.724218] Process kworker/0:1 (pid: 31, stack limit = 0xdf9f6238)
[    6.730465] Stack: (0xdf9f7d58 to 0xdf9f8000)
[    6.914325] [<c001b7b8>] (show_pte+0x80/0xb8) from [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74)
[    6.923425] [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74) from [<c001bb7c>] (do_page_fault+0x2c4/0x360)
[    6.933144] [<c001bb7c>] (do_page_fault+0x2c4/0x360) from [<c0008400>] (do_DataAbort+0x34/0x9c)
[    6.941825] [<c0008400>] (do_DataAbort+0x34/0x9c) from [<c000e5d8>] (__dabt_svc+0x38/0x60)
[    6.950058] Exception stack(0xdf9f7ea0 to 0xdf9f7ee8)
[    6.955099] 7ea0: df0c1790 00000000 00000002 00000000 df0c1794 df0c1790 df0c1790 00000042
[    6.963271] 7ec0: df0c1794 00000001 00000000 00000009 00000000 df9f7ee8 c0306268 c0306270
[    6.971419] 7ee0: a00f0113 ffffffff
[    6.974902] [<c000e5d8>] (__dabt_svc+0x38/0x60) from [<c0306270>] (max17040_work+0x8c/0x144)
[    6.983317] [<c0306270>] (max17040_work+0x8c/0x144) from [<c003f364>] (process_one_work+0x138/0x440)
[    6.992429] [<c003f364>] (process_one_work+0x138/0x440) from [<c003fa64>] (worker_thread+0x134/0x3b8)
[    7.001628] [<c003fa64>] (worker_thread+0x134/0x3b8) from [<c00454bc>] (kthread+0xa4/0xb0)
[    7.009875] [<c00454bc>] (kthread+0xa4/0xb0) from [<c000eb28>] (ret_from_fork+0x14/0x2c)
[    7.017943] Code: e1a03005 e2422480 e0826104 e59f002c (e7922104)
[    7.024017] ---[ end trace 73bc7006b9cc5c79 ]---

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: c6f4a42
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
Avoid circular mutex lock by pushing the dev->lock to the .fini
callback on each extension.

As em28xx-dvb, em28xx-alsa and em28xx-rc have their own data
structures, and don't touch at the common structure during .fini,
only em28xx-v4l needs to be locked.

[   90.994317] ======================================================
[   90.994356] [ INFO: possible circular locking dependency detected ]
[   90.994395] 3.13.0-rc1+ hardkernel#24 Not tainted
[   90.994427] -------------------------------------------------------
[   90.994458] khubd/54 is trying to acquire lock:
[   90.994490]  (&card->controls_rwsem){++++.+}, at: [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.994656]
[   90.994656] but task is already holding lock:
[   90.994688]  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]
[   90.994843]
[   90.994843] which lock already depends on the new lock.
[   90.994843]
[   90.994874]
[   90.994874] the existing dependency chain (in reverse order) is:
[   90.994905]
-> #1 (&dev->lock){+.+.+.}:
[   90.995057]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995121]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995182]        [<ffffffff816a5b6c>] mutex_lock_nested+0x5c/0x3c0
[   90.995245]        [<ffffffffa0422cca>] em28xx_vol_put_mute+0x1ba/0x1d0 [em28xx_alsa]
[   90.995309]        [<ffffffffa017813d>] snd_ctl_elem_write+0xfd/0x140 [snd]
[   90.995376]        [<ffffffffa01791c2>] snd_ctl_ioctl+0xe2/0x810 [snd]
[   90.995442]        [<ffffffff811db8b0>] do_vfs_ioctl+0x300/0x520
[   90.995504]        [<ffffffff811dbb51>] SyS_ioctl+0x81/0xa0
[   90.995568]        [<ffffffff816b1929>] system_call_fastpath+0x16/0x1b
[   90.995630]
-> #0 (&card->controls_rwsem){++++.+}:
[   90.995780]        [<ffffffff810b7a47>] check_prevs_add+0x947/0x950
[   90.995841]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995901]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995962]        [<ffffffff816a762b>] down_write+0x3b/0xa0
[   90.996022]        [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.996088]        [<ffffffffa017a255>] snd_device_free+0x65/0x140 [snd]
[   90.996154]        [<ffffffffa017a751>] snd_device_free_all+0x61/0xa0 [snd]
[   90.996219]        [<ffffffffa0173af4>] snd_card_do_free+0x14/0x130 [snd]
[   90.996283]        [<ffffffffa0173f14>] snd_card_free+0x84/0x90 [snd]
[   90.996349]        [<ffffffffa0423397>] em28xx_audio_fini+0x97/0xb0 [em28xx_alsa]
[   90.996411]        [<ffffffffa040dba6>] em28xx_close_extension+0x56/0x90 [em28xx]
[   90.996475]        [<ffffffffa040f639>] em28xx_usb_disconnect+0x79/0x90 [em28xx]
[   90.996539]        [<ffffffff814a06e7>] usb_unbind_interface+0x67/0x1d0
[   90.996620]        [<ffffffff8142920f>] __device_release_driver+0x7f/0xf0
[   90.996682]        [<ffffffff814292a5>] device_release_driver+0x25/0x40
[   90.996742]        [<ffffffff81428b0c>] bus_remove_device+0x11c/0x1a0
[   90.996801]        [<ffffffff81425536>] device_del+0x136/0x1d0
[   90.996863]        [<ffffffff8149e0c0>] usb_disable_device+0xb0/0x290
[   90.996923]        [<ffffffff814930c5>] usb_disconnect+0xb5/0x1d0
[   90.996984]        [<ffffffff81495ab6>] hub_port_connect_change+0xd6/0xad0
[   90.997044]        [<ffffffff814967c3>] hub_events+0x313/0x9b0
[   90.997105]        [<ffffffff81496e95>] hub_thread+0x35/0x170
[   90.997165]        [<ffffffff8108ea2f>] kthread+0xff/0x120
[   90.997226]        [<ffffffff816b187c>] ret_from_fork+0x7c/0xb0
[   90.997287]
[   90.997287] other info that might help us debug this:
[   90.997287]
[   90.997318]  Possible unsafe locking scenario:
[   90.997318]
[   90.997348]        CPU0                    CPU1
[   90.997378]        ----                    ----
[   90.997408]   lock(&dev->lock);
[   90.997497]                                lock(&card->controls_rwsem);
[   90.997607]                                lock(&dev->lock);
[   90.997697]   lock(&card->controls_rwsem);
[   90.997786]
[   90.997786]  *** DEADLOCK ***
[   90.997786]
[   90.997817] 5 locks held by khubd/54:
[   90.997847]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81496564>] hub_events+0xb4/0x9b0
[   90.998025]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81493076>] usb_disconnect+0x66/0x1d0
[   90.998204]  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8142929d>] device_release_driver+0x1d/0x40
[   90.998383]  #3:  (em28xx_devlist_mutex){+.+.+.}, at: [<ffffffffa040db77>] em28xx_close_extension+0x27/0x90 [em28xx]
[   90.998567]  hardkernel#4:  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]

Reviewed-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Tested-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
When booting a kexec/kdump kernel on a system that has specific memory
hotplug regions the boot will fail with warnings like:

 swapper/0: page allocation failure: order:9, mode:0x84d0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
  0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
  ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
  ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
 Call Trace:
  [<ffffffff815bcc67>] dump_stack+0x19/0x1b
  [<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
  [<ffffffff815b87ee>] ?  __alloc_pages_direct_compact+0xac/0x196
  [<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
  [<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
  [<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
  [<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
  [<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
  [<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
  [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
  [<ffffffff81047359>] arch_add_memory+0x59/0xd0
  [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
  [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
  [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
  [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
  [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
  [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
  [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
  [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
  [<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
  [<ffffffff810020e2>] do_one_initcall+0xe2/0x190
  [<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
  [<ffffffff819e18d0>] ? do_early_param+0x88/0x88
  [<ffffffff8159fea0>] ? rest_init+0x80/0x80
  [<ffffffff8159feae>] kernel_init+0xe/0x180
  [<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
  [<ffffffff8159fea0>] ? rest_init+0x80/0x80
 Mem-Info:
 Node 0 DMA per-cpu:
 CPU    0: hi:    0, btch:   1 usd:   0
 Node 0 DMA32 per-cpu:
 CPU    0: hi:   42, btch:   7 usd:   0
 active_anon:0 inactive_anon:0 isolated_anon:0
  active_file:0 inactive_file:0 isolated_file:0
  unevictable:0 dirty:0 writeback:0 unstable:0
  free:872 slab_reclaimable:13 slab_unreclaimable:1880
  mapped:0 shmem:0 pagetables:0 bounce:0
  free_cma:0

because the system has run out of memory at boot time.  This occurs
because of the following sequence in the boot:

Main kernel boots and sets E820 map.  The second kernel is booted with a
map generated by the kdump service using memmap= and memmap=exactmap.
These parameters are added to the kernel parameters of the kexec/kdump
kernel.   The kexec/kdump kernel has limited memory resources so as not
to severely impact the main kernel.

The system then panics and the kdump/kexec kernel boots (which is a
completely new kernel boot).  During this boot ACPI is initialized and the
kernel (as can be seen above) traverses the ACPI namespace and finds an
entry for a memory device to be hotadded.

ie)

  [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
  [<ffffffff81047359>] arch_add_memory+0x59/0xd0
  [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
  [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
  [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
  [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
  [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
  [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
  [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
  [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6

At this point the kernel adds page table information and the the kexec/kdump
kernel runs out of memory.

This can also be reproduced by using the memmap=exactmap and mem=X
parameters on the main kernel and booting.

This patchset resolves the problem by adding a kernel parameter,
acpi_no_memhotplug, to disable ACPI memory hotplug.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:

 Enabling non-boot CPUs ...
 smpboot: Booting Node 0 Processor 1 APIC 0x1
 [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
 perf: IBS APIC setup failed on cpu #1
 process: Switch to broadcast mode on CPU1
 CPU1 is up
 ...
 ACPI: Waking up from system sleep state S3

Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.

The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.

This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.

Marking it as stable to let distros pick up this fix.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> v3.2..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
dsd pushed a commit that referenced this pull request Jun 10, 2014
The sdhci_execute_tuning routine gets lock separately by
disable_irq(host->irq);
spin_lock(&host->lock);
It will cause the following lockdep error message since the &host->lock
could also be got in irq context.
Use spin_lock_irqsave/spin_unlock_restore instead to get rid of
this error message.

[ INFO: inconsistent lock state ]
3.13.0-rc1+ hardkernel#287 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
kworker/u2:1/33 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&host->lock)->rlock){?.-...}, at: [<8045f7f4>] sdhci_execute_tuning+0x4c/0x710
{IN-HARDIRQ-W} state was registered at:
  [<8005f030>] mark_lock+0x140/0x6ac
  [<80060760>] __lock_acquire+0xb30/0x1cbc
  [<800620d0>] lock_acquire+0x70/0x84
  [<8061d1c8>] _raw_spin_lock+0x30/0x40
  [<804605cc>] sdhci_irq+0x24/0xa68
  [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
  [<8006b350>] handle_irq_event+0x44/0x64
  [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
  [<8006a8f0>] generic_handle_irq+0x30/0x44
  [<8000f238>] handle_IRQ+0x54/0xbc
  [<8000864c>] gic_handle_irq+0x30/0x64
  [<80013024>] __irq_svc+0x44/0x5c
  [<80329bf4>] dev_vprintk_emit+0x50/0x58
  [<80329c24>] dev_printk_emit+0x28/0x30
  [<80329fec>] __dev_printk+0x4c/0x90
  [<8032a180>] dev_err+0x3c/0x48
  [<802dd4f0>] _regulator_get+0x158/0x1cc
  [<802dd5b4>] regulator_get_optional+0x18/0x1c
  [<80461df4>] sdhci_add_host+0x42c/0xbd8
  [<80464820>] sdhci_esdhc_imx_probe+0x378/0x67c
  [<8032ee88>] platform_drv_probe+0x20/0x50
  [<8032d48c>] driver_probe_device+0x118/0x234
  [<8032d690>] __driver_attach+0x9c/0xa0
  [<8032b89c>] bus_for_each_dev+0x68/0x9c
  [<8032cf44>] driver_attach+0x20/0x28
  [<8032cbc8>] bus_add_driver+0x148/0x1f4
  [<8032dce0>] driver_register+0x80/0x100
  [<8032ee54>] __platform_driver_register+0x50/0x64
  [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
  [<80008980>] do_one_initcall+0x108/0x16c
  [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
  [<80611b28>] kernel_init+0x10/0x120
  [<8000e9c8>] ret_from_fork+0x14/0x2c
irq event stamp: 805
hardirqs last  enabled at (805): [<8061d43c>] _raw_spin_unlock_irqrestore+0x38/0x4c
hardirqs last disabled at (804): [<8061d2c8>] _raw_spin_lock_irqsave+0x24/0x54
softirqs last  enabled at (570): [<8002b824>] __do_softirq+0x1c4/0x290
softirqs last disabled at (561): [<8002bcf4>] irq_exit+0xb4/0x10c

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&host->lock)->rlock);
  <Interrupt>
    lock(&(&host->lock)->rlock);

 *** DEADLOCK ***

2 locks held by kworker/u2:1/33:
 #0:  (kmmcd){.+.+..}, at: [<8003db18>] process_one_work+0x128/0x468
 #1:  ((&(&host->detect)->work)){+.+...}, at: [<8003db18>] process_one_work+0x128/0x468

stack backtrace:
CPU: 0 PID: 33 Comm: kworker/u2:1 Not tainted 3.13.0-rc1+ hardkernel#287
Workqueue: kmmcd mmc_rescan
Backtrace:
[<80012160>] (dump_backtrace+0x0/0x10c) from [<80012438>] (show_stack+0x18/0x1c)
 r6:bfad0900 r5:00000000 r4:8088ecc8 r3:bfad0900
[<80012420>] (show_stack+0x0/0x1c) from [<806169ec>] (dump_stack+0x84/0x9c)
[<80616968>] (dump_stack+0x0/0x9c) from [<806147b4>] (print_usage_bug+0x260/0x2d0)
 r5:8076ba88 r4:80977410
[<80614554>] (print_usage_bug+0x0/0x2d0) from [<8005f0d0>] (mark_lock+0x1e0/0x6ac)
 r9:8005e678 r8:00000000 r7:bfad0900 r6:00001015 r5:bfad0cd0
r4:00000002
[<8005eef0>] (mark_lock+0x0/0x6ac) from [<80060234>] (__lock_acquire+0x604/0x1cbc)
[<8005fc30>] (__lock_acquire+0x0/0x1cbc) from [<800620d0>] (lock_acquire+0x70/0x84)
[<80062060>] (lock_acquire+0x0/0x84) from [<8061d1c8>] (_raw_spin_lock+0x30/0x40)
 r7:00000000 r6:bfb63000 r5:00000000 r4:bfb60568
[<8061d198>] (_raw_spin_lock+0x0/0x40) from [<8045f7f4>] (sdhci_execute_tuning+0x4c/0x710)
 r4:bfb60000
[<8045f7a8>] (sdhci_execute_tuning+0x0/0x710) from [<80453454>] (mmc_sd_init_card+0x5f8/0x660)
[<80452e5c>] (mmc_sd_init_card+0x0/0x660) from [<80453748>] (mmc_attach_sd+0xb4/0x180)
 r9:bf92d400 r8:8065f364 r7:00061a80 r6:bfb60000 r5:8065f358
r4:bfb60000
[<80453694>] (mmc_attach_sd+0x0/0x180) from [<8044d9f8>] (mmc_rescan+0x284/0x2f0)
 r5:8065f358 r4:bfb602f8
[<8044d774>] (mmc_rescan+0x0/0x2f0) from [<8003db94>] (process_one_work+0x1a4/0x468)
 r8:00000000 r7:bfb55eb0 r6:bf80dc00 r5:bfb602f8 r4:bfb35980
r3:8044d774
[<8003d9f0>] (process_one_work+0x0/0x468) from [<8003e850>] (worker_thread+0x118/0x3e0)
[<8003e738>] (worker_thread+0x0/0x3e0) from [<80044de0>] (kthread+0xd4/0xf0)
[<80044d0c>] (kthread+0x0/0xf0) from [<8000e9c8>] (ret_from_fork+0x14/0x2c)
 r7:00000000 r6:00000000 r5:80044d0c r4:bfb37b40

Signed-off-by: Dong Aisheng <b29396@freescale.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <chris@printf.net>
dsd pushed a commit that referenced this pull request Jun 10, 2014
…/git/gregkh/usb

Pull USB updates from Greg KH:
 "Here's the big USB pull request for 3.14-rc1

  Lots of little things all over the place, and the usual USB gadget
  updates, and XHCI fixes (some for an issue reported by a lot of
  people).  USB PHY updates as well as chipidea updates and fixes.

  All of these have been in the linux-next tree with no reported issues"

* tag 'usb-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (318 commits)
  usb: chipidea: udc: using MultO at TD as real mult value for ISO-TX
  usb: chipidea: need to mask INT_STATUS when write otgsc
  usb: chipidea: put hw_phymode_configure before ci_usb_phy_init
  usb: chipidea: Fix Internal error: : 808 [#1] ARM related to STS flag
  usb: chipidea: imx: set CI_HDRC_IMX28_WRITE_FIX for imx28
  usb: chipidea: add freescale imx28 special write register method
  usb: ehci: add freescale imx28 special write register method
  usb: core: check for valid id_table when using the RefId feature
  usb: cdc-wdm: resp_count can be 0 even if WDM_READ is set
  usb: core: bail out if user gives an unknown RefId when using new_id
  usb: core: allow a reference device for new_id
  usb: core: add sanity checks when using bInterfaceClass with new_id
  USB: image: correct spelling mistake in comment
  USB: c67x00: correct spelling mistakes in comments
  usb: delete non-required instances of include <linux/init.h>
  usb:hub set hub->change_bits when over-current happens
  Revert "usb: chipidea: imx: set CI_HDRC_IMX28_WRITE_FIX for imx28"
  xhci: Set scatter-gather limit to avoid failed block writes.
  xhci: Avoid infinite loop when sg urb requires too many trbs
  usb: gadget: remove unused variable in gr_queue_int()
  ...
dsd pushed a commit that referenced this pull request Jun 10, 2014
The rport timers must be stopped before the SRP initiator destroys the
resources associated with the SCSI host. This is necessary because
otherwise the callback functions invoked from the SRP transport layer
could trigger a use-after-free. Stopping the rport timers before
invoking scsi_remove_host() can trigger long delays in the SCSI error
handler if a transport layer failure occurs while scsi_remove_host()
is in progress. Hence move the code for stopping the rport timers from
srp_rport_release() into a new function and invoke that function after
scsi_remove_host() has finished. This patch fixes the following
sporadic kernel crash:

     kernel BUG at include/asm-generic/dma-mapping-common.h:64!
     invalid opcode: 0000 [#1] SMP
     RIP: 0010:[<ffffffffa03b20b1>]  [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
     Call Trace:
     [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
     [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
     [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
     [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
     [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
     [<ffffffff8108b370>] worker_thread+0x170/0x2a0
     [<ffffffff81090876>] kthread+0x96/0xa0
     [<ffffffff8100c0ca>] child_rip+0xa/0x20

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
cond_resched_lock(cinfo->lock) is called everywhere else while holding
the cinfo->lock spinlock.  Not holding this lock while calling
transfer_commit_list in filelayout_recover_commit_reqs causes the BUG
below.

It's true that we can't hold this lock while calling pnfs_put_lseg,
because that might try to lock the inode lock - which might be the
same lock as cinfo->lock.

To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command
that crosses a stripe boundary and is not page aligned, such as:

 dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct

BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161
in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1
2 locks held by kworker/0:1/27:
 #0:  (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
 #1:  ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ hardkernel#21
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: events nfs_direct_write_schedule_work [nfs]
 0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130  ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40  ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000
Call Trace:
 [<ffffffff81491256>] dump_stack+0x4d/0x66
 [<ffffffff8105f103>] __might_sleep+0x100/0x105
 [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files]
 [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files]
 [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs]
 [<ffffffff810705df>] ? mark_lock+0x1df/0x224
 [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4
 [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs]
 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
 [<ffffffff81050258>] process_one_work+0x1f6/0x3a5
 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
 [<ffffffff8105187e>] worker_thread+0x149/0x1f5
 [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d
 [<ffffffff81056d74>] kthread+0xd2/0xda
 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
These changes correct the following issues with jumbo frames on the
stmmac driver:

1) The Synopsys EMAC can be configured to support different FIFO
sizes at core configuration time. There's no way to query the
controller and know the FIFO size, so the driver needs to get this
information from the device tree in order to know how to correctly
handle MTU changes and setting up dma buffers. The default
max-frame-size is as currently used, which is the size of a jumbo
frame.

2) The driver was enabling Jumbo frames by default, but was not allocating
dma buffers of sufficient size to handle the maximum possible packet
size that could be received. This led to memory corruption since DMAs were
occurring beyond the extent of the allocated receive buffers for certain types
of network traffic.

kernel BUG at net/core/skbuff.c:126!
Internal error: Oops - BUG: 0 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 563 Comm: sockperf Not tainted 3.13.0-rc6-01523-gf7111b9 hardkernel#31
task: ef35e580 ti: ef252000 task.ti: ef252000
PC is at skb_panic+0x60/0x64
LR is at skb_panic+0x60/0x64
pc : [<c03c7c3c>]    lr : [<c03c7c3c>]    psr: 60000113
sp : ef253c18  ip : 60000113  fp : 00000000
r10: ef3a5400  r9 : 00000ebc  r8 : ef3a546c
r7 : ee59f000  r6 : ee59f084  r5 : ee59ff40  r4 : ee59f140
r3 : 000003e2  r2 : 00000007  r1 : c0b9c420  r0 : 0000007d
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2e8ac04a  DAC: 00000015
Process sockperf (pid: 563, stack limit = 0xef252248)
Stack: (0xef253c18 to 0xef254000)
3c00:                                                       00000ebc ee59f000
3c20: ee59f084 ee59ff40 ee59f140 c04a9cd8 ee8c50c0 00000ebc ee59ff40 00000000
3c40: ee59f140 c02d0ef0 00000056 ef1eda80 ee8c50c0 00000ebc 22bbef29 c0318f8c
3c60: 00000056 ef3a547c ffe2c716 c02c9c90 c0ba1298 ef3a5838 ef3a5838 ef3a5400
3c80: 000020c0 ee573840 000055cb ef3f2050 c053f0e0 c0319214 22b9b085 22d92813
3ca0: 00001c80 004b8e00 ef3a5400 ee573840 ef3f2064 22d92813 ef3f2064 000055cb
3cc0: ef3f2050 c031a19c ef252000 00000000 00000000 c0561bc0 00000000 ff00ffff
3ce0: c05621c0 ef3a5400 ef3f2064 ee573840 00000020 ef3f2064 000055cb ef3f2050
3d00: c053f0e0 c031cad0 c053e740 00000e60 00000000 00000000 ee573840 ef3a5400
3d20: ef0a6e00 00000000 ef3f2064 c032507c 00010000 00000020 c0561bc0 c0561bc0
3d40: ee599850 c032799c 00000000 ee573840 c055a380 ef3a5400 00000000 ef3f2064
3d60: ef3f2050 c032799c 0101c7c0 2b6755cb c059a280 c030e4d8 000055cb ffffffff
3d80: ee574fc0 c055a380 ee574000 ee573840 00002b67 ee573840 c03fe9c4 c053fa68
3da0: c055a380 00001f6f 00000000 ee573840 c053f0e0 c0304fdc ef0a6e01 ef3f2050
3dc0: ee573858 ef031000 ee573840 c03055d8 c0ba0c40 ef000f40 00100100 c053f0dc
3de0: c053ffdc c053f0f0 00000008 00000000 ef031000 c02da948 00001140 00000000
3e00: c0563c78 ef253e5f 00000020 ee573840 00000020 c053f0f0 ef313400 ee573840
3e20: c053f0e0 00000000 00000000 c05380c0 ef313400 00001000 00000015 c02df280
3e40: ee574000 ef001e00 00000000 00001080 00000042 005cd980 ef031500 ef031500
3e60: 00000000 c02df824 ef031500 c053e390 c0541084 f00b1e00 c05925e8 c02df864
3e80: 00001f5c ef031440 c053e390 c0278524 00000002 00000000 c0b9eb48 c02df280
3ea0: ee8c7180 00000100 c0542ca8 00000015 00000040 ef031500 ef031500 ef031500
3ec0: c027803c ef252000 00000040 000000ec c05380c0 c0b9eb40 c0b9eb48 c02df940
3ee0: ef060780 ffffa4dd c0564a9c c056343c 002e80a8 00000080 ef031500 00000001
3f00: c053808c ef252000 fffec100 00000003 00000004 002e80a8 0000000c c00258f0
3f20: 002e80a8 c005e704 00000005 00000100 c05634d0 c0538080 c05333e0 00000000
3f40: 0000000a c0565580 c05380c0 ffffa4dc c05434f4 00400100 00000004 c0534cd4
3f60: 00000098 00000000 fffec100 002e80a8 00000004 002e80a8 002a20e0 c0025da8
3f80: c0534cd4 c000f020 fffec10c c053ea60 ef253fb0 c0008530 0000ffe2 b6ef67f4
3fa0: 40000010 ffffffff 00000124 c0012f3c 0000ffe2 002e80f0 0000ffe2 00004000
3fc0: becb6338 becb6334 00000004 00000124 002e80a8 00000004 002e80a8 002a20e0
3fe0: becb6300 becb62f4 002773bb b6ef67f4 40000010 ffffffff 00000000 00000000
[<c03c7c3c>] (skb_panic+0x60/0x64) from [<c02d0ef0>] (skb_put+0x4c/0x50)
[<c02d0ef0>] (skb_put+0x4c/0x50) from [<c0318f8c>] (tcp_collapse+0x314/0x3ec)
[<c0318f8c>] (tcp_collapse+0x314/0x3ec) from [<c0319214>]
(tcp_try_rmem_schedule+0x1b0/0x3c4)
[<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) from [<c031a19c>]
(tcp_data_queue+0x480/0xe6c)
[<c031a19c>] (tcp_data_queue+0x480/0xe6c) from [<c031cad0>]
(tcp_rcv_established+0x180/0x62c)
[<c031cad0>] (tcp_rcv_established+0x180/0x62c) from [<c032507c>]
(tcp_v4_do_rcv+0x13c/0x31c)
[<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) from [<c032799c>]
(tcp_v4_rcv+0x718/0x73c)
[<c032799c>] (tcp_v4_rcv+0x718/0x73c) from [<c0304fdc>]
(ip_local_deliver+0x98/0x274)
[<c0304fdc>] (ip_local_deliver+0x98/0x274) from [<c03055d8>]
(ip_rcv+0x420/0x758)
[<c03055d8>] (ip_rcv+0x420/0x758) from [<c02da948>]
(__netif_receive_skb_core+0x44c/0x5bc)
[<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) from [<c02df280>]
(netif_receive_skb+0x48/0xb4)
[<c02df280>] (netif_receive_skb+0x48/0xb4) from [<c02df824>]
(napi_gro_flush+0x70/0x94)
[<c02df824>] (napi_gro_flush+0x70/0x94) from [<c02df864>]
(napi_complete+0x1c/0x34)
[<c02df864>] (napi_complete+0x1c/0x34) from [<c0278524>]
(stmmac_poll+0x4e8/0x5c8)
[<c0278524>] (stmmac_poll+0x4e8/0x5c8) from [<c02df940>]
(net_rx_action+0xc4/0x1e4)
[<c02df940>] (net_rx_action+0xc4/0x1e4) from [<c00258f0>]
(__do_softirq+0x12c/0x2e8)
[<c00258f0>] (__do_softirq+0x12c/0x2e8) from [<c0025da8>] (irq_exit+0x78/0xac)
[<c0025da8>] (irq_exit+0x78/0xac) from [<c000f020>] (handle_IRQ+0x44/0x90)
[<c000f020>] (handle_IRQ+0x44/0x90) from [<c0008530>]
(gic_handle_irq+0x2c/0x5c)
[<c0008530>] (gic_handle_irq+0x2c/0x5c) from [<c0012f3c>]
(__irq_usr+0x3c/0x60)

3) The driver was setting the dma buffer size after allocating dma buffers,
which caused a system panic when changing the MTU.

BUG: Bad page state in process ifconfig  pfn:2e850
page:c0b72a00 count:0 mapcount:0 mapping:  (null) index:0x0
page flags: 0x200(arch_1)
Modules linked in:
CPU: 0 PID: 566 Comm: ifconfig Not tainted 3.13.0-rc6-01523-gf7111b9 hardkernel#29
[<c001547c>] (unwind_backtrace+0x0/0xf8) from [<c00122dc>]
(show_stack+0x10/0x14)
[<c00122dc>] (show_stack+0x10/0x14) from [<c03c793c>] (dump_stack+0x70/0x88)
[<c03c793c>] (dump_stack+0x70/0x88) from [<c00b2620>] (bad_page+0xc8/0x118)
[<c00b2620>] (bad_page+0xc8/0x118) from [<c00b302c>]
(get_page_from_freelist+0x744/0x870)
[<c00b302c>] (get_page_from_freelist+0x744/0x870) from [<c00b40f4>]
(__alloc_pages_nodemask+0x118/0x86c)
[<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) from [<c00b4858>]
(__get_free_pages+0x10/0x54)
[<c00b4858>] (__get_free_pages+0x10/0x54) from [<c00cba1c>]
(kmalloc_order_trace+0x24/0xa0)
[<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) from [<c02d199c>]
(__kmalloc_reserve.isra.21+0x24/0x70)
[<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) from [<c02d240c>]
(__alloc_skb+0x68/0x13c)
[<c02d240c>] (__alloc_skb+0x68/0x13c) from [<c02d3930>]
(__netdev_alloc_skb+0x3c/0xe8)
[<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) from [<c0279378>]
(stmmac_open+0x63c/0x1024)
[<c0279378>] (stmmac_open+0x63c/0x1024) from [<c02e18cc>]
(__dev_open+0xa0/0xfc)
[<c02e18cc>] (__dev_open+0xa0/0xfc) from [<c02e1b40>]
(__dev_change_flags+0x94/0x158)
[<c02e1b40>] (__dev_change_flags+0x94/0x158) from [<c02e1c24>]
(dev_change_flags+0x18/0x48)
[<c02e1c24>] (dev_change_flags+0x18/0x48) from [<c0337bc0>]
(devinet_ioctl+0x638/0x700)
[<c0337bc0>] (devinet_ioctl+0x638/0x700) from [<c02c7aec>]
(sock_ioctl+0x64/0x290)
[<c02c7aec>] (sock_ioctl+0x64/0x290) from [<c0100890>]
(do_vfs_ioctl+0x78/0x5b8)
[<c0100890>] (do_vfs_ioctl+0x78/0x5b8) from [<c0100e0c>] (SyS_ioctl+0x3c/0x5c)
[<c0100e0c>] (SyS_ioctl+0x3c/0x5c) from [<c000e760>]

The fixes have been verified using reproducible, automated testing.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsd pushed a commit that referenced this pull request Jun 10, 2014
 If we load the null_blk module with bs=8k we get following oops:
[ 3819.812190] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 3819.812387] IP: [<ffffffff81170aa5>] create_empty_buffers+0x28/0xaf
[ 3819.812527] PGD 219244067 PUD 215a06067 PMD 0
[ 3819.812640] Oops: 0000 [#1] SMP
[ 3819.812772] Modules linked in: null_blk(+)

 Fix that by resetting block size to PAGE_SIZE if it is greater than PAGE_SIZE

Reported-by: Sumanth <sumantk2@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Reviewed-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dsd pushed a commit that referenced this pull request Jun 10, 2014
Fix broken inline assembly contraints for cmpxchg64 on 32bit.

Fixes this crash:

specification exception: 0006 [#1] SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0 hardkernel#4
task: 005a16c8 ti: 00592000 task.ti: 00592000
Krnl PSW : 070ce000 8029abd6 (lockref_get+0x3e/0x9c)
...
Krnl Code: 8029abcc: a71a0001           ahi     %r1,1
           8029abd0: 1852               lr      %r5,%r2
          #8029abd2: bb40f064           cds     %r4,%r0,100(%r15)
          >8029abd6: 1943               cr      %r4,%r3
           8029abd8: 1815               lr      %r1,%r5
Call Trace:
([<0000000078e01870>] 0x78e01870)
 [<000000000021105a>] sysfs_mount+0xd2/0x1c8
 [<00000000001b551e>] mount_fs+0x3a/0x134
 [<00000000001ce768>] vfs_kern_mount+0x44/0x11c
 [<00000000001ce864>] kern_mount_data+0x24/0x3c
 [<00000000005cc4b8>] sysfs_init+0x74/0xd4
 [<00000000005cb5b4>] mnt_init+0xe0/0x1fc
 [<00000000005cb16a>] vfs_caches_init+0xb6/0x14c
 [<00000000005be794>] start_kernel+0x318/0x33c
 [<000000000010001c>] _stext+0x1c/0x80

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
If we have EEH error happens to the adapter and we have to remove
it from the system for some reasons (e.g. more than 5 EEH errors
detected from the device in last hour), the adapter will be disabled
for towice separately by eeh_err_detected() and remove_one(), which
will incur following unexpected backtrace. The patch tries to avoid
it.

WARNING: at drivers/pci/pci.c:1431
CPU: 12 PID: 121 Comm: eehd Not tainted 3.13.0-rc7+ #1
task: c0000001823a3780 ti: c00000018240c000 task.ti: c00000018240c000
NIP: c0000000003c1e40 LR: c0000000003c1e3c CTR: 0000000001764c5c
REGS: c00000018240f470 TRAP: 0700   Not tainted  (3.13.0-rc7+)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28000024  XER: 00000004
CFAR: c000000000706528 SOFTE: 1
GPR00: c0000000003c1e3c c00000018240f6f0 c0000000010fe1f8 0000000000000035
GPR04: 0000000000000000 0000000000000000 00000000003ae509 0000000000000000
GPR08: 000000000000346f 0000000000000000 0000000000000000 0000000000003fef
GPR12: 0000000028000022 c00000000ec93000 c0000000000c11b0 c000000184ac3e40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0000000009398d8 c00000000101f9c0 c0000001860ae000
GPR28: c000000182ba0000 00000000000001f0 c0000001860ae6f8 c0000001860ae000
NIP [c0000000003c1e40] .pci_disable_device+0xd0/0xf0
LR [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0
Call Trace:
[c0000000003c1e3c] .pci_disable_device+0xcc/0xf0 (unreliable)
[d0000000073881c4] .remove_one+0x174/0x320 [cxgb4]
[c0000000003c57e0] .pci_device_remove+0x60/0x100
[c00000000046396c] .__device_release_driver+0x9c/0x120
[c000000000463a20] .device_release_driver+0x30/0x60
[c0000000003bcdb4] .pci_stop_bus_device+0x94/0xd0
[c0000000003bcf48] .pci_stop_and_remove_bus_device+0x18/0x30
[c00000000003f548] .pcibios_remove_pci_devices+0xa8/0x140
[c000000000035c00] .eeh_handle_normal_event+0xa0/0x3c0
[c000000000035f50] .eeh_handle_event+0x30/0x2b0
[c0000000000362c4] .eeh_event_handler+0xf4/0x1b0
[c0000000000c12b8] .kthread+0x108/0x130
[c00000000000a168] .ret_from_kernel_thread+0x5c/0x74

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsd pushed a commit that referenced this pull request Jun 10, 2014
…r thp split

After thp split in hwpoison_user_mappings(), we hold page lock on the
raw error page only between try_to_unmap, hence we are in danger of race
condition.

I found in the RHEL7 MCE-relay testing that we have "bad page" error
when a memory error happens on a thp tail page used by qemu-kvm:

  Triggering MCE exception on CPU 10
  mce: [Hardware Error]: Machine check events logged
  MCE exception done on CPU 10
  MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption
  MCE 0x38c535: dirty LRU page recovery: Recovered
  qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000]
  BUG: Bad page state in process qemu-kvm  pfn:38c400
  page:ffffea000e310000 count:0 mapcount:0 mapping:          (null) index:0x7ffae3c00
  page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked)
  Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ...
  CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G   M        --------------   3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1
  Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011
  Call Trace:
    dump_stack+0x19/0x1b
    bad_page.part.59+0xcf/0xe8
    free_pages_prepare+0x148/0x160
    free_hot_cold_page+0x31/0x140
    free_hot_cold_page_list+0x46/0xa0
    release_pages+0x1c1/0x200
    free_pages_and_swap_cache+0xad/0xd0
    tlb_flush_mmu.part.46+0x4c/0x90
    tlb_finish_mmu+0x55/0x60
    exit_mmap+0xcb/0x170
    mmput+0x67/0xf0
    vhost_dev_cleanup+0x231/0x260 [vhost_net]
    vhost_net_release+0x3f/0x90 [vhost_net]
    __fput+0xe9/0x270
    ____fput+0xe/0x10
    task_work_run+0xc4/0xe0
    do_exit+0x2bb/0xa40
    do_group_exit+0x3f/0xa0
    get_signal_to_deliver+0x1d0/0x6e0
    do_signal+0x48/0x5e0
    do_notify_resume+0x71/0xc0
    retint_signal+0x48/0x8c

The reason of this bug is that a page fault happens before unlocking the
head page at the end of memory_failure().  This strange page fault is
trying to access to address 0x20 and I'm not sure why qemu-kvm does
this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the
way we catch the bad page bug/warning because we try to free a locked
page (which was the former head page.)

To fix this, this patch suggests to shift page lock from head page to
tail page just after thp split.  SIGSEGV still happens, but it affects
only error affected VMs, not a whole system.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>        [3.9+] # a3e0f9e "mm/memory-failure.c: transfer page count from head page to tail page after split thp"
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dsd pushed a commit that referenced this pull request Jun 10, 2014
skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset.
This leaves a free'd skb on the queue, resulting a double-free later.

Without this, the following oops can occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: af_rxrpc ...
CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ hardkernel#4
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000
RIP: 0010:[<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
RSP: 0018:ffff880035c93db8  EFLAGS: 00010097
RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084
RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270
R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070
R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720
FS:  00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0
Stack:
 ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8
 ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0
 ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008
Call Trace:
 [<ffffffffa041a5b8>] rxrpc_release+0x128/0x2e0 [af_rxrpc]
 [<ffffffff81546cef>] sock_release+0x1f/0x80
 [<ffffffff81546d62>] sock_close+0x12/0x20
 [<ffffffff811aaba1>] __fput+0xe1/0x230
 [<ffffffff811aad3e>] ____fput+0xe/0x10
 [<ffffffff810862cc>] task_work_run+0xbc/0xe0
 [<ffffffff8106a3be>] do_exit+0x2be/0xa10
 [<ffffffff8116dc47>] ? do_munmap+0x297/0x3b0
 [<ffffffff8106ab8f>] do_group_exit+0x3f/0xa0
 [<ffffffff8106ac04>] SyS_exit_group+0x14/0x20
 [<ffffffff8166b069>] system_call_fastpath+0x16/0x1b


Signed-off-by: Tim Smith <tim@electronghost.co.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
When the PR host is running on a POWER8 machine in POWER8 mode, it
will use doorbell interrupts for IPIs.  If one of them arrives while
we are in the guest, we pop out of the guest with trap number 0xA00,
which isn't handled by kvmppc_handle_exit_pr, leading to the following
BUG_ON:

[  331.436215] exit_nr=0xa00 | pc=0x1d2c | msr=0x800000000000d032
[  331.437522] ------------[ cut here ]------------
[  331.438296] kernel BUG at arch/powerpc/kvm/book3s_pr.c:982!
[  331.439063] Oops: Exception in kernel mode, sig: 5 [#2]
[  331.439819] SMP NR_CPUS=1024 NUMA pSeries
[  331.440552] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw virtio_net kvm binfmt_misc ibmvscsi scsi_transport_srp scsi_tgt virtio_blk
[  331.447614] CPU: 11 PID: 1296 Comm: qemu-system-ppc Tainted: G      D      3.11.7-200.2.fc19.ppc64p7 #1
[  331.448920] task: c0000003bdc8c000 ti: c0000003bd32c000 task.ti: c0000003bd32c000
[  331.450088] NIP: d0000000025d6b9c LR: d0000000025d6b98 CTR: c0000000004cfdd0
[  331.451042] REGS: c0000003bd32f420 TRAP: 0700   Tainted: G      D       (3.11.7-200.2.fc19.ppc64p7)
[  331.452331] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28004824  XER: 20000000
[  331.454616] SOFTE: 1
[  331.455106] CFAR: c000000000848bb8
[  331.455726]
GPR00: d0000000025d6b98 c0000003bd32f6a0 d0000000026017b8 0000000000000032
GPR04: c0000000018627f8 c000000001873208 320d0a3030303030 3030303030643033
GPR08: c000000000c490a8 0000000000000000 0000000000000000 0000000000000002
GPR12: 0000000028004822 c00000000fdc6300 0000000000000000 00000100076ec310
GPR16: 000000002ae343b8 00003ffffd397398 0000000000000000 0000000000000000
GPR20: 00000100076f16f4 00000100076ebe60 0000000000000008 ffffffffffffffff
GPR24: 0000000000000000 0000008001041e60 0000000000000000 0000008001040ce8
GPR28: c0000003a2d80000 0000000000000a00 0000000000000001 c0000003a2681810
[  331.466504] NIP [d0000000025d6b9c] .kvmppc_handle_exit_pr+0x75c/0xa80 [kvm]
[  331.466999] LR [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm]
[  331.467517] Call Trace:
[  331.467909] [c0000003bd32f6a0] [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm] (unreliable)
[  331.468553] [c0000003bd32f750] [d0000000025d98f0] kvm_start_lightweight+0xb4/0xc4 [kvm]
[  331.469189] [c0000003bd32f920] [d0000000025d7648] .kvmppc_vcpu_run_pr+0xd8/0x270 [kvm]
[  331.469838] [c0000003bd32f9c0] [d0000000025cf748] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
[  331.470790] [c0000003bd32fa50] [d0000000025cc19c] .kvm_arch_vcpu_ioctl_run+0x5c/0x1b0 [kvm]
[  331.471401] [c0000003bd32fae0] [d0000000025c4888] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
[  331.472026] [c0000003bd32fc90] [c00000000026192c] .do_vfs_ioctl+0x4dc/0x7a0
[  331.472561] [c0000003bd32fd80] [c000000000261cc4] .SyS_ioctl+0xd4/0xf0
[  331.473095] [c0000003bd32fe30] [c000000000009ed8] syscall_exit+0x0/0x98
[  331.473633] Instruction dump:
[  331.473766] 4bfff9b4 2b9d0800 419efc18 60000000 60420000 3d220000 e8bf11a0 e8df12a8
[  331.474733] 7fa4eb78 e8698660 48015165 e8410028 <0fe00000> 813f00e4 3ba00000 39290001
[  331.475386] ---[ end trace 49fc47d994c1f8f2 ]---
[  331.479817]

This fixes the problem by making kvmppc_handle_exit_pr() recognize the
interrupt.  We also need to jump to the doorbell interrupt handler in
book3s_segment.S to handle the interrupt on the way out of the guest.
Having done that, there's nothing further to be done in
kvmppc_handle_exit_pr().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
dsd pushed a commit that referenced this pull request Jun 10, 2014
There is race condition when call netif_napi_add() after
register_netdevice(), as ->open() can be called without napi initialized
and trigger BUG_ON() on napi_enable(), like on below messages:

[    9.699863] sky2: driver version 1.30
[    9.699960] sky2 0000:02:00.0: Yukon-2 EC Ultra chip revision 2
[    9.700020] sky2 0000:02:00.0: irq 45 for MSI/MSI-X
[    9.700498] ------------[ cut here ]------------
[    9.703391] kernel BUG at include/linux/netdevice.h:501!
[    9.703391] invalid opcode: 0000 [#1] PREEMPT SMP
<snip>
[    9.830018] Call Trace:
[    9.830018]  [<fa996169>] sky2_open+0x309/0x360 [sky2]
[    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
[    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
[    9.830018]  [<c135ed4b>] __dev_open+0x9b/0x120
[    9.830018]  [<c1431cbe>] ? _raw_spin_unlock_bh+0x1e/0x20
[    9.830018]  [<c135efd9>] __dev_change_flags+0x89/0x150
[    9.830018]  [<c135f148>] dev_change_flags+0x18/0x50
[    9.830018]  [<c13bb8e0>] devinet_ioctl+0x5d0/0x6e0
[    9.830018]  [<c13bcced>] inet_ioctl+0x6d/0xa0

To fix the problem patch changes the order of initialization.

Bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=67151

Reported-and-tested-by: ebrahim.azarisooreh@gmail.com
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsd pushed a commit that referenced this pull request Jun 10, 2014
After the change titled "Btrfs: add support for inode properties", if
btrfs was built-in the kernel (i.e. not as a module), it would cause a
kernel panic, as reported recently by Fengguang:

[    2.024722] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    2.027814] IP: [<ffffffff81501594>] crc32c+0xc/0x6b
[    2.028684] PGD 0
[    2.028684] Oops: 0000 [#1] SMP
[    2.028684] Modules linked in:
[    2.028684] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc7-04795-ga7b57c2 #1
[    2.028684] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    2.028684] task: ffff88000edba100 ti: ffff88000edd6000 task.ti: ffff88000edd6000
[    2.028684] RIP: 0010:[<ffffffff81501594>]  [<ffffffff81501594>] crc32c+0xc/0x6b
[    2.028684] RSP: 0000:ffff88000edd7e58  EFLAGS: 00010246
[    2.028684] RAX: 0000000000000000 RBX: ffffffff82295550 RCX: 0000000000000000
[    2.028684] RDX: 0000000000000011 RSI: ffffffff81efe393 RDI: 00000000fffffffe
[    2.028684] RBP: ffff88000edd7e60 R08: 0000000000000003 R09: 0000000000015d20
[    2.028684] R10: ffffffff81ef225e R11: ffffffff811b0222 R12: ffffffffffffffff
[    2.028684] R13: 0000000000000239 R14: 0000000000000000 R15: 0000000000000000
[    2.028684] FS:  0000000000000000(0000) GS:ffff88000fa00000(0000) knlGS:0000000000000000
[    2.028684] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    2.028684] CR2: 0000000000000000 CR3: 000000000220c000 CR4: 00000000000006f0
[    2.028684] Stack:
[    2.028684]  ffffffff82295550 ffff88000edd7e80 ffffffff8238af62 ffffffff8238ac05
[    2.028684]  0000000000000000 ffff88000edd7e98 ffffffff8238ac0f ffffffff8238ac05
[    2.028684]  ffff88000edd7f08 ffffffff810002ba ffff88000edd7f00 ffffffff810e2404
[    2.028684] Call Trace:
[    2.028684]  [<ffffffff8238af62>] btrfs_props_init+0x4f/0x96
[    2.028684]  [<ffffffff8238ac05>] ? ftrace_define_fields_btrfs_space_reservation+0x145/0x145
[    2.028684]  [<ffffffff8238ac0f>] init_btrfs_fs+0xa/0xf0
[    2.028684]  [<ffffffff8238ac05>] ? ftrace_define_fields_btrfs_space_reservation+0x145/0x145
[    2.028684]  [<ffffffff810002ba>] do_one_initcall+0xa4/0x13a
[    2.028684]  [<ffffffff810e2404>] ? parse_args+0x25f/0x33d
[    2.028684]  [<ffffffff8234cf75>] kernel_init_freeable+0x1aa/0x230
[    2.028684]  [<ffffffff8234c785>] ? do_early_param+0x88/0x88
[    2.028684]  [<ffffffff819f61b5>] ? rest_init+0x89/0x89
[    2.028684]  [<ffffffff819f61c3>] kernel_init+0xe/0x109

The issue here is that the initialization function of btrfs (super.c:init_btrfs_fs)
started using crc32c (from lib/libcrc32c.c). But when it needs to call crc32c (as
part of the properties initialization routine), the libcrc32c is not yet initialized,
so crc32c derreferenced a NULL pointer (lib/libcrc32c.c:tfm), causing the kernel
panic on boot.

The approach to fix this is to use crypto component directly to use its crc32c (which
is basically what lib/libcrc32c.c is, a wrapper around crypto). This is what ext4 is
doing as well, it uses crypto directly to get crc32c functionality.

Verified this works both when btrfs is built-in and when it's loadable kernel module.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
…NULL

In the gen_pool_dma_alloc() the dma pointer can be NULL and while
assigning gen_pool_virt_to_phys(pool, vaddr) to dma caused the following
crash on da850 evm:

   Unable to handle kernel NULL pointer dereference at virtual address 00000000
   Internal error: Oops: 805 [#1] PREEMPT ARM
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper Tainted: G        W    3.13.0-rc1-00001-g0609e45-dirty hardkernel#5
   task: c4830000 ti: c4832000 task.ti: c4832000
   PC is at gen_pool_dma_alloc+0x30/0x3c
   LR is at gen_pool_virt_to_phys+0x74/0x80
   Process swapper, call trace:
     gen_pool_dma_alloc+0x30/0x3c
     davinci_pm_probe+0x40/0xa8
     platform_drv_probe+0x1c/0x4c
     driver_probe_device+0x98/0x22c
     __driver_attach+0x8c/0x90
     bus_for_each_dev+0x6c/0x8c
     bus_add_driver+0x124/0x1d4
     driver_register+0x78/0xf8
     platform_driver_probe+0x20/0xa4
     davinci_init_late+0xc/0x14
     init_machine_late+0x1c/0x28
     do_one_initcall+0x34/0x15c
     kernel_init_freeable+0xe4/0x1ac
     kernel_init+0x8/0xec

This patch fixes the above.

[akpm@linux-foundation.org: update kerneldoc]
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Nicolin Chen <b42378@freescale.com>
Cc: Joe Perches <joe@perches.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: <stable@vger.kernel.org>	[3.13.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dsd pushed a commit that referenced this pull request Jun 10, 2014
…rm_data

Fix NULL pointer dereference of "chip->pdata" if platform_data was not
supplied to the driver.

The driver during probe stored the pointer to the platform_data:
	chip->pdata = client->dev.platform_data;
Later it was dereferenced in max17040_get_online() and
max17040_get_status().

If platform_data was not supplied, the NULL pointer exception would
happen:

[    6.626094] Unable to handle kernel  of a at virtual address 00000000
[    6.628557] pgd = c0004000
[    6.632868] [00000000] *pgd=66262564
[    6.634636] Unable to handle kernel paging request at virtual address e6262000
[    6.642014] pgd = de468000
[    6.644700] [e6262000] *pgd=00000000
[    6.648265] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    6.653552] Modules linked in:
[    6.656598] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 3.10.14-02717-gc58b4b4 torvalds#505
[    6.664334] Workqueue: events max17040_work
[    6.668488] task: dfa11b80 ti: df9f6000 task.ti: df9f6000
[    6.673873] PC is at show_pte+0x80/0xb8
[    6.677687] LR is at show_pte+0x3c/0xb8
[    6.681503] pc : [<c001b7b8>]    lr : [<c001b774>]    psr: 600f0113
[    6.681503] sp : df9f7d58  ip : 600f0113  fp : 00000009
[    6.692965] r10: 00000000  r9 : 00000000  r8 : dfa11b80
[    6.698171] r7 : df9f7ea0  r6 : e6262000  r5 : 00000000  r4 : 00000000
[    6.704680] r3 : 00000000  r2 : e6262000  r1 : 600f0193  r0 : c05b3750
[    6.711194] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[    6.718485] Control: 10c53c7d  Table: 5e46806a  DAC: 00000015
[    6.724218] Process kworker/0:1 (pid: 31, stack limit = 0xdf9f6238)
[    6.730465] Stack: (0xdf9f7d58 to 0xdf9f8000)
[    6.914325] [<c001b7b8>] (show_pte+0x80/0xb8) from [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74)
[    6.923425] [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74) from [<c001bb7c>] (do_page_fault+0x2c4/0x360)
[    6.933144] [<c001bb7c>] (do_page_fault+0x2c4/0x360) from [<c0008400>] (do_DataAbort+0x34/0x9c)
[    6.941825] [<c0008400>] (do_DataAbort+0x34/0x9c) from [<c000e5d8>] (__dabt_svc+0x38/0x60)
[    6.950058] Exception stack(0xdf9f7ea0 to 0xdf9f7ee8)
[    6.955099] 7ea0: df0c1790 00000000 00000002 00000000 df0c1794 df0c1790 df0c1790 00000042
[    6.963271] 7ec0: df0c1794 00000001 00000000 00000009 00000000 df9f7ee8 c0306268 c0306270
[    6.971419] 7ee0: a00f0113 ffffffff
[    6.974902] [<c000e5d8>] (__dabt_svc+0x38/0x60) from [<c0306270>] (max17040_work+0x8c/0x144)
[    6.983317] [<c0306270>] (max17040_work+0x8c/0x144) from [<c003f364>] (process_one_work+0x138/0x440)
[    6.992429] [<c003f364>] (process_one_work+0x138/0x440) from [<c003fa64>] (worker_thread+0x134/0x3b8)
[    7.001628] [<c003fa64>] (worker_thread+0x134/0x3b8) from [<c00454bc>] (kthread+0xa4/0xb0)
[    7.009875] [<c00454bc>] (kthread+0xa4/0xb0) from [<c000eb28>] (ret_from_fork+0x14/0x2c)
[    7.017943] Code: e1a03005 e2422480 e0826104 e59f002c (e7922104)
[    7.024017] ---[ end trace 73bc7006b9cc5c79 ]---

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: c6f4a42
Cc: <stable@vger.kernel.org>
dsd pushed a commit that referenced this pull request Jun 10, 2014
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
                                         if (!l4proto->new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
dsd pushed a commit that referenced this pull request Jun 10, 2014
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] ------------[ cut here ]------------
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode: 0000 [hardkernel#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[  474.556058] Stack:
[  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP <ffff8805c0ac3c38>
[  475.328734] ---[ end trace f076482e9d754adc ]---

Reported-by:  Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
dsd pushed a commit that referenced this pull request Jul 7, 2014
We need to delete un-finished td from current request's td list
at ep_dequeue API, otherwise, this non-user td will be remained
at td list before this request is freed. So if we do ep_queue->
ep_dequeue->ep_queue sequence, when the complete interrupt for
the second ep_queue comes, we search td list for this request,
the first td (added by the first ep_queue) will be handled, and
its status is still active, so we will consider the this transfer
still not be completed, but in fact, it has completed. It causes
the peripheral side considers it never receives current data for
this transfer.

We met this problem when do "Error Recovery Test - Device Configured"
test item for USBCV2 MSC test, the host has never received ACK for
the IN token for CSW due to peripheral considers it does not get this
CBW, the USBCV test log like belows:

--------------------------------------------------------------------------
INFO
Issuing BOT MSC Reset, reset should always succeed
INFO
Retrieving status on CBW endpoint
INFO
CBW endpoint status = 0x0
INFO
Retrieving status on CSW endpoint
INFO
CSW endpoint status = 0x0
INFO
Issuing required command (Test Unit Ready) to verify device has recovered
INFO
Issuing CBW (attempt #1):
INFO
|----- CBW LUN                  = 0x0
INFO
|----- CBW Flags                = 0x0
INFO
|----- CBW Data Transfer Length = 0x0
INFO
|----- CBW CDB Length           = 0x6
INFO
|----- CBW CDB-00 = 0x0
INFO
|----- CBW CDB-01 = 0x0
INFO
|----- CBW CDB-02 = 0x0
INFO
|----- CBW CDB-03 = 0x0
INFO
|----- CBW CDB-04 = 0x0
INFO
|----- CBW CDB-05 = 0x0
INFO
Issuing CSW : try 1
INFO
CSW Bulk Request timed out!
ERROR
Failed CSW phase : should have been success or stall
FAIL
(5.3.4) The CSW status value must be 0x00, 0x01, or 0x02.
ERROR
BOTCommonMSCRequest failed:  error=80004000

Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dsd pushed a commit that referenced this pull request Jul 7, 2014
d911d98 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dsd pushed a commit that referenced this pull request Jul 7, 2014
Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

    [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [51502.242213] ------------[ cut here ]------------
    [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
    [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [51502.244010] Call Trace:
    [51502.244010]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [51502.244010]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [51502.244010]  [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [51502.244010]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [51502.244010]  [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40
    [51502.244010]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [51502.244010]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [51502.244010]  [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs]
    [51502.244010]  [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs]
    [51502.244010]  [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0
    [51502.244010]  [<ffffffff811baebc>] vfs_unlink+0xcc/0x150
    [51502.244010]  [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0
    [51502.244010]  [<ffffffff811a9ef4>] ? filp_close+0x64/0x90
    [51502.244010]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [51502.244010]  [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [51502.244010]  [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40
    [51502.244010]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [51502.244010] RIP  [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs]

2)

    [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [25405.097488] ------------[ cut here ]------------
    [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
    [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [25405.100008] Call Trace:
    [25405.100008]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [25405.100008]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [25405.100008]  [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [25405.100008]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [25405.100008]  [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0
    [25405.100008]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [25405.100008]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [25405.100008]  [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs]
    [25405.100008]  [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs]
    [25405.100008]  [<ffffffff811bc63b>] vfs_create+0x9b/0x130
    [25405.100008]  [<ffffffff811bcf19>] do_last+0x849/0xe20
    [25405.100008]  [<ffffffff811b9409>] ? link_path_walk+0x79/0x820
    [25405.100008]  [<ffffffff811bd5b5>] path_openat+0xc5/0x690
    [25405.100008]  [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10
    [25405.100008]  [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0
    [25405.100008]  [<ffffffff811be2a3>] do_filp_open+0x43/0xa0
    [25405.100008]  [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0
    [25405.100008]  [<ffffffff811abcfc>] do_sys_open+0x13c/0x230
    [25405.100008]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [25405.100008]  [<ffffffff811abe12>] SyS_open+0x22/0x30
    [25405.100008]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [25405.100008] RIP  [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs]

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
dsd pushed a commit that referenced this pull request Jul 7, 2014
With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ hardkernel#44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
    __list_add+0x10/0xd4
    free_one_page+0x26c/0x638
    __free_pages_ok.part.52+0x84/0xbc
    __free_pages+0x74/0xbc
    init_cma_reserved_pageblock+0xe8/0x104
    cma_init_reserved_areas+0x190/0x1e4
    do_one_initcall+0xc4/0x154
    kernel_init_freeable+0x204/0x2a8
    kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
than MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is bigger
than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
�pageblock_order > MAX_ORDER� condition will be optimised out since both
sides of the operator are constants.  In cases where pageblock size is
variable, the performance degradation should not be significant anyway
since init_cma_reserved_pageblock() is called only at boot time at most
MAX_CMA_AREAS times which by default is eight.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dsd pushed a commit that referenced this pull request Jul 7, 2014
There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.

E.g.  for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation.  In such
situations reading /proc/stat is not possible anymore.

Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.

For reference a call trace where reading from /proc/stat failed:

  sadc: page allocation failure: order:4, mode:0x1040d0
  CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
  [...]
  Call Trace:
    show_stack+0x6c/0xe8
    warn_alloc_failed+0xd6/0x138
    __alloc_pages_nodemask+0x9da/0xb68
    __get_free_pages+0x2e/0x58
    kmalloc_order_trace+0x44/0xc0
    stat_open+0x5a/0xd8
    proc_reg_open+0x8a/0x140
    do_dentry_open+0x1bc/0x2c8
    finish_open+0x46/0x60
    do_last+0x382/0x10d0
    path_openat+0xc8/0x4f8
    do_filp_open+0x46/0xa8
    do_sys_open+0x114/0x1f0
    sysc_tracego+0x14/0x1a

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: David Rientjes <rientjes@google.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dsd pushed a commit that referenced this pull request Jul 15, 2014
Writing to either "cpuset.cpus" or "cpuset.mems" file flushes
cpuset_hotplug_work so that cpu or memory hotunplug doesn't end up
migrating tasks off a cpuset after new resources are added to it.

As cpuset_hotplug_work calls into cgroup core via
cgroup_transfer_tasks(), this flushing adds the dependency to cgroup
core locking from cpuset_write_resmak().  This used to be okay because
cgroup interface files were protected by a different mutex; however,
8353da1 ("cgroup: remove cgroup_tree_mutex") simplified the
cgroup core locking and this dependency became a deadlock hazard -
cgroup file removal performed under cgroup core lock tries to drain
on-going file operation which is trying to flush cpuset_hotplug_work
blocked on the same cgroup core lock.

The locking simplification was done because kernfs added an a lot
easier way to deal with circular dependencies involving kernfs active
protection.  Let's use the same strategy in cpuset and break active
protection in cpuset_write_resmask().  While it isn't the prettiest,
this is a very rare, likely unique, situation which also goes away on
the unified hierarchy.

The commands to trigger the deadlock warning without the patch and the
lockdep output follow.

 localhost:/ # mount -t cgroup -o cpuset xxx /cpuset
 localhost:/ # mkdir /cpuset/tmp
 localhost:/ # echo 1 > /cpuset/tmp/cpuset.cpus
 localhost:/ # echo 0 > cpuset/tmp/cpuset.mems
 localhost:/ # echo $$ > /cpuset/tmp/tasks
 localhost:/ # echo 0 > /sys/devices/system/cpu/cpu1/online

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.16.0-rc1-0.1-default+ hardkernel#7 Not tainted
  -------------------------------------------------------
  kworker/1:0/32649 is trying to acquire lock:
   (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150

  but task is already holding lock:
   (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (cpuset_hotplug_work){+.+...}:
  ...
  -> #1 (s_active#175){++++.+}:
  ...
  -> #0 (cgroup_mutex){+.+.+.}:
  ...

  other info that might help us debug this:

  Chain exists of:
    cgroup_mutex --> s_active#175 --> cpuset_hotplug_work

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(cpuset_hotplug_work);
				 lock(s_active#175);
				 lock(cpuset_hotplug_work);
    lock(cgroup_mutex);

   *** DEADLOCK ***

  2 locks held by kworker/1:0/32649:
   #0:  ("events"){.+.+.+}, at: [<ffffffff81085412>] process_one_work+0x192/0x520
   #1:  (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520

  stack backtrace:
  CPU: 1 PID: 32649 Comm: kworker/1:0 Not tainted 3.16.0-rc1-0.1-default+ hardkernel#7
 ...
  Call Trace:
   [<ffffffff815a5f78>] dump_stack+0x72/0x8a
   [<ffffffff810c263f>] print_circular_bug+0x10f/0x120
   [<ffffffff810c481e>] check_prev_add+0x43e/0x4b0
   [<ffffffff810c4ee6>] validate_chain+0x656/0x7c0
   [<ffffffff810c53d2>] __lock_acquire+0x382/0x660
   [<ffffffff810c57a9>] lock_acquire+0xf9/0x170
   [<ffffffff815aa13f>] mutex_lock_nested+0x6f/0x380
   [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150
   [<ffffffff811129c0>] hotplug_update_tasks_insane+0x110/0x1d0
   [<ffffffff81112bbd>] cpuset_hotplug_update_tasks+0x13d/0x180
   [<ffffffff811148ec>] cpuset_hotplug_workfn+0x18c/0x630
   [<ffffffff810854d4>] process_one_work+0x254/0x520
   [<ffffffff810875dd>] worker_thread+0x13d/0x3d0
   [<ffffffff8108e0c8>] kthread+0xf8/0x100
   [<ffffffff815acaec>] ret_from_fork+0x7c/0xb0

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Li Zefan <lizefan@huawei.com>
Tested-by: Li Zefan <lizefan@huawei.com>
dsd pushed a commit that referenced this pull request Jul 15, 2014
After unbinding the driver memory was corrupted by double free of
clk_lookup structure. This lead to OOPS when re-binding the driver
again.

The driver allocated memory for 'clk_lookup' with devm_kzalloc. During
driver removal this memory was freed twice: once by clkdev_drop() and
second by devm code.

Kernel panic log:
[   30.839284] Unable to handle kernel paging request at virtual address 5f343173
[   30.846476] pgd = dee14000
[   30.849165] [5f343173] *pgd=00000000
[   30.852703] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[   30.858166] Modules linked in:
[   30.861208] CPU: 0 PID: 1 Comm: bash Not tainted 3.16.0-rc2-00239-g94bdf617b07e-dirty hardkernel#40
[   30.869364] task: df478000 ti: df480000 task.ti: df480000
[   30.874752] PC is at clkdev_add+0x2c/0x38
[   30.878738] LR is at clkdev_add+0x18/0x38
[   30.882732] pc : [<c0350908>]    lr : [<c03508f4>]    psr: 60000013
[   30.882732] sp : df481e78  ip : 00000001  fp : c0700ed8
[   30.894187] r10: 0000000c  r9 : 00000000  r8 : c07b0e3c
[   30.899396] r7 : 00000002  r6 : df45f9d0  r5 : df421390  r4 : c0700d6c
[   30.905906] r3 : 5f343173  r2 : c0700d84  r1 : 60000013  r0 : c0700d6c
[   30.912417] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   30.919534] Control: 10c53c7d  Table: 5ee1406a  DAC: 00000015
[   30.925262] Process bash (pid: 1, stack limit = 0xdf480240)
[   30.930817] Stack: (0xdf481e78 to 0xdf482000)
[   30.935159] 1e60:                                                       00001000 df6de610
[   30.943321] 1e80: df7f4558 c0355650 c05ec6ec c0700eb0 df6de600 df7f4510 dec9d69c 00000014
[   30.951480] 1ea0: 00167b48 df6de610 c0700e30 c0713518 00000000 c0700e30 dec9d69c 00000006
[   30.959639] 1ec0: 00167b48 c02c1b7c c02c1b64 df6de610 c07aff48 c02c0420 c06fb150 c047cc20
[   30.967798] 1ee0: df6de610 df6de610 c0700e30 df6de644 c06fb150 0000000c dec9d690 c02bef90
[   30.975957] 1f00: dec9c6c0 dece4c00 df481f80 dece4c00 0000000c c02be73c 0000000c c016ca8c
[   30.984116] 1f20: c016ca48 00000000 00000000 c016c1f4 00000000 00000000 b6f18000 df481f80
[   30.992276] 1f40: df7f66c0 0000000c df480000 df480000 b6f18000 c011094c df47839c 60000013
[   31.000435] 1f60: 00000000 00000000 df7f66c0 df7f66c0 0000000c df480000 b6f18000 c0110dd4
[   31.008594] 1f80: 00000000 00000000 0000000c b6ec05d8 0000000c b6f18000 00000004 c000f2a8
[   31.016753] 1fa0: 00001000 c000f0e0 b6ec05d8 0000000c 00000001 b6f18000 0000000c 00000000
[   31.024912] 1fc0: b6ec05d8 0000000c b6f18000 00000004 0000000c 00000001 00000000 00167b48
[   31.033071] 1fe0: 00000000 bed83a80 b6e004f0 b6e5122c 60000010 00000001 ffffffff ffffffff
[   31.041248] [<c0350908>] (clkdev_add) from [<c0355650>] (s2mps11_clk_probe+0x2b4/0x3b4)
[   31.049223] [<c0355650>] (s2mps11_clk_probe) from [<c02c1b7c>] (platform_drv_probe+0x18/0x48)
[   31.057728] [<c02c1b7c>] (platform_drv_probe) from [<c02c0420>] (driver_probe_device+0x13c/0x384)
[   31.066579] [<c02c0420>] (driver_probe_device) from [<c02bef90>] (bind_store+0x88/0xd8)
[   31.074564] [<c02bef90>] (bind_store) from [<c02be73c>] (drv_attr_store+0x20/0x2c)
[   31.082118] [<c02be73c>] (drv_attr_store) from [<c016ca8c>] (sysfs_kf_write+0x44/0x48)
[   31.090016] [<c016ca8c>] (sysfs_kf_write) from [<c016c1f4>] (kernfs_fop_write+0xc0/0x17c)
[   31.098176] [<c016c1f4>] (kernfs_fop_write) from [<c011094c>] (vfs_write+0xa0/0x1c4)
[   31.105899] [<c011094c>] (vfs_write) from [<c0110dd4>] (SyS_write+0x40/0x8c)
[   31.112931] [<c0110dd4>] (SyS_write) from [<c000f0e0>] (ret_fast_syscall+0x0/0x3c)
[   31.120481] Code: e2842018 e584501c e1a00004 e885000c (e5835000)
[   31.126596] ---[ end trace efad45bfa3a61b05 ]---
[   31.131181] Kernel panic - not syncing: Fatal exception
[   31.136368] CPU1: stopping
[   31.139054] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D       3.16.0-rc2-00239-g94bdf617b07e-dirty hardkernel#40
[   31.148697] [<c0016480>] (unwind_backtrace) from [<c0012950>] (show_stack+0x10/0x14)
[   31.156419] [<c0012950>] (show_stack) from [<c0480db8>] (dump_stack+0x80/0xcc)
[   31.163622] [<c0480db8>] (dump_stack) from [<c001499c>] (handle_IPI+0x130/0x15c)
[   31.170998] [<c001499c>] (handle_IPI) from [<c000862c>] (gic_handle_irq+0x60/0x68)
[   31.178549] [<c000862c>] (gic_handle_irq) from [<c0013480>] (__irq_svc+0x40/0x70)
[   31.186009] Exception stack(0xdf4bdf88 to 0xdf4bdfd0)
[   31.191046] df80:                   ffffffed 00000000 00000000 00000000 df4bc000 c06d042c
[   31.199207] dfa0: 00000000 ffffffed c06d03c0 00000000 c070c288 00000000 00000000 df4bdfd0
[   31.207363] dfc0: c0010324 c0010328 60000013 ffffffff
[   31.212402] [<c0013480>] (__irq_svc) from [<c0010328>] (arch_cpu_idle+0x28/0x30)
[   31.219783] [<c0010328>] (arch_cpu_idle) from [<c005f150>] (cpu_startup_entry+0x2c4/0x3f0)
[   31.228027] [<c005f150>] (cpu_startup_entry) from [<400086c4>] (0x400086c4)
[   31.234968] ---[ end Kernel panic - not syncing: Fatal exception

Fixes: 7cc560d ("clk: s2mps11: Add support for s2mps11")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Yadwinder Singh Brar <yadi.brar@samsung.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
dsd pushed a commit that referenced this pull request Jul 15, 2014
When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

  BUG: unable to handle kernel paging request at 0000000000001d08
  IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
  PGD 0
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
   [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
   [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
   [<ffffffff811926f1>] new_slab+0x91/0x300
   [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
   [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
   [<ffffffff810a3c78>] ? load_balance+0x218/0x890
   [<ffffffff8101a679>] ? sched_clock+0x9/0x10
   [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
   [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
   [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
   [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
   [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
   [<ffffffff8105d0ec>] do_fork+0xbc/0x360
   [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
   [<ffffffff81086652>] kthreadd+0x2c2/0x300
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
   [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:

3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool->attrs->cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool->node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Fixes: bce9038 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")
dsd pushed a commit that referenced this pull request Jul 15, 2014
This fixes the following lockdep complaint:

[ INFO: possible circular locking dependency detected ]
3.16.0-rc2-mm1+ hardkernel#7 Tainted: G           O  
-------------------------------------------------------
kworker/u24:0/4356 is trying to acquire lock:
 (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0

but task is already holding lock:
 (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180

which lock already depends on the new lock.

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_es_lock);
                               lock(&(&sbi->s_es_lru_lock)->rlock);
                               lock(&ei->i_es_lock);
  lock(&(&sbi->s_es_lru_lock)->rlock);

 *** DEADLOCK ***

6 locks held by kworker/u24:0/4356:
 #0:  ("writeback"){.+.+.+}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
 #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
 #2:  (&type->s_umount_key#22){++++++}, at: [<ffffffff811a9c74>] grab_super_passive+0x44/0x90
 #3:  (jbd2_handle){+.+...}, at: [<ffffffff812979f9>] start_this_handle+0x189/0x5f0
 hardkernel#4:  (&ei->i_data_sem){++++..}, at: [<ffffffff81247062>] ext4_map_blocks+0x132/0x550
 hardkernel#5:  (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180

stack backtrace:
CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G           O   3.16.0-rc2-mm1+ hardkernel#7
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: writeback bdi_writeback_workfn (flush-253:0)
 ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007
 ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568
 ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930
Call Trace:
 [<ffffffff815df0bb>] dump_stack+0x4e/0x68
 [<ffffffff815db3dd>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810a7a3e>] __lock_acquire+0x163e/0x1d00
 [<ffffffff815e89dc>] ? retint_restore_args+0xe/0xe
 [<ffffffff815ddc7b>] ? __slab_alloc+0x4a8/0x4ce
 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
 [<ffffffff810a8707>] lock_acquire+0x87/0x120
 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
 [<ffffffff8128592d>] ? ext4_es_free_extent+0x5d/0x70
 [<ffffffff815e6f09>] _raw_spin_lock+0x39/0x50
 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
 [<ffffffff8119760b>] ? kmem_cache_alloc+0x18b/0x1a0
 [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0
 [<ffffffff812869b8>] ext4_es_insert_extent+0xc8/0x180
 [<ffffffff812470f4>] ext4_map_blocks+0x1c4/0x550
 [<ffffffff8124c4c4>] ext4_writepages+0x6d4/0xd00
	...

Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Minchan Kim <minchan@kernel.org>
Cc: stable@vger.kernel.org
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
dsd pushed a commit that referenced this pull request Jul 21, 2014
Madalin-Cristian reported crashs happening after a recent commit
(5a4ae5f "vlan: unnecessary to check if vlan_pcpu_stats is NULL")

-----------------------------------------------------------------------
root@p5040ds:~# vconfig add eth8 1
root@p5040ds:~# vconfig rem eth8.1
Unable to handle kernel paging request for data at address 0x2bc88028
Faulting instruction address: 0xc058e950
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in:
CPU: 3 PID: 2167 Comm: vconfig Tainted: G        W     3.16.0-rc3-00346-g65e85bf #2
task: e7264d90 ti: e2c2c000 task.ti: e2c2c000
NIP: c058e950 LR: c058ea30 CTR: c058e900
REGS: e2c2db20 TRAP: 0300   Tainted: G        W      (3.16.0-rc3-00346-g65e85bf)
MSR: 00029002 <CE,EE,ME>  CR: 48000428  XER: 20000000
DEAR: 2bc88028 ESR: 00000000
GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000
GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000
GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000
GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48
NIP [c058e950] vlan_dev_get_stats64+0x50/0x170
LR [c058ea30] vlan_dev_get_stats64+0x130/0x170
Call Trace:
[e2c2dbd0] [ffffffea] 0xffffffea (unreliable)
[e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140
[e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960
[e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110
[e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0
[e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50
[e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0
[e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160
[e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550
[e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0
[e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0
[e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80
[e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c

Fix this problem by freeing percpu stats from dev->destructor() instead
of ndo_uninit()

Reported-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Fixes: 5a4ae5f ("vlan: unnecessary to check if vlan_pcpu_stats is NULL")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsd pushed a commit that referenced this pull request Jul 21, 2014
When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[  347.151939] divide error: 0000 [#1] SMP
[  347.152907] Modules linked in:
[  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 hardkernel#4
[  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[  347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
[  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[  347.152907] Stack:
[  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[  347.152907] Call Trace:
[  347.152907]  [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
[  347.152907]  [<c16013e6>] tcp_init_tso_segs+0x36/0x50
[  347.152907]  [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
[  347.152907]  [<c10a0188>] ? up+0x28/0x40
[  347.152907]  [<c10acc85>] ? console_unlock+0x295/0x480
[  347.152907]  [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
[  347.152907]  [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
[  347.152907]  [<c15f4860>] tcp_push+0xf0/0x120
[  347.152907]  [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
[  347.152907]  [<c116d920>] ? kmem_cache_free+0xf0/0x120
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c114f0f0>] ? do_wp_page+0x3e0/0x850
[  347.152907]  [<c161c36a>] inet_sendmsg+0x4a/0xb0
[  347.152907]  [<c1150269>] ? handle_mm_fault+0x709/0xfb0
[  347.152907]  [<c15a006b>] sock_aio_write+0xbb/0xd0
[  347.152907]  [<c1180b79>] do_sync_write+0x69/0xa0
[  347.152907]  [<c1181023>] vfs_write+0x123/0x160
[  347.152907]  [<c1181d55>] SyS_write+0x55/0xb0
[  347.152907]  [<c167f0d8>] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  < S 0:0(0) win 32792 <mss 1460>
+0  > S. 0:0(0) ack 1 <mss 1460>
+0.1  < . 1:1(0) ack 1 win 65000

+0  accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec34232 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp->repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp->repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Fixes: ec34232 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 1d147bf upstream.

There is a race between the TX path and the STA wakeup: while
a station is sleeping, mac80211 buffers frames until it wakes
up, then the frames are transmitted. However, the RX and TX
path are concurrent, so the packet indicating wakeup can be
processed while a packet is being transmitted.

This can lead to a situation where the buffered frames list
is emptied on the one side, while a frame is being added on
the other side, as the station is still seen as sleeping in
the TX path.

As a result, the newly added frame will not be send anytime
soon. It might be sent much later (and out of order) when the
station goes to sleep and wakes up the next time.

Additionally, it can lead to the crash below.

Fix all this by synchronising both paths with a new lock.
Both path are not fastpath since they handle PS situations.

In a later patch we'll remove the extra skb queue locks to
reduce locking overhead.

BUG: unable to handle kernel
NULL pointer dereference at 000000b0
IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1
EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000)
iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9
Stack:
 e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0
 ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210
 ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002
Call Trace:
 [<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211]
 [<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211]
 [<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211]
 [<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211]
 [<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211]
 [<c149ef70>] dev_hard_start_xmit+0x450/0x950
 [<c14b9aa9>] sch_direct_xmit+0xa9/0x250
 [<c14b9c9b>] __qdisc_run+0x4b/0x150
 [<c149f732>] dev_queue_xmit+0x2c2/0xca0

Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
[reword commit log, use a separate lock]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit c59053a upstream.

In the first place, the loop 'for' in the macro 'for_each_isci_host'
(drivers/scsi/isci/host.h:314) is incorrect, because it accesses
the 3rd element of 2 element array. After the 2nd iteration it executes
the instruction:
        ihost = to_pci_info(pdev)->hosts[2]
(while the size of the 'hosts' array equals 2) and reads an
out of range element.

In the second place, this loop is incorrectly optimized by GCC v4.8
(see http://marc.info/?l=linux-kernel&m=138998871911336&w=2).
As a result, on platforms with two SCU controllers,
the loop is executed more times than it can be (for i=0,1 and 2).
It causes kernel panic during entering the S3 state
and the following oops after 'rmmod isci':

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0
Oops: 0000 [#1] SMP
RIP: 0010:[<ffffffff8131360b>]  [<ffffffff8131360b>] __list_add+0x1b/0xc0
Call Trace:
  [<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0
  [<ffffffff81661c3f>] mutex_lock+0x1f/0x30
  [<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas]
  [<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas]
  [<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci]
  [<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci]
  [<ffffffff813391cb>] pci_device_remove+0x3b/0xb0
  [<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0
  [<ffffffff813fc8f8>] driver_detach+0xa8/0xb0
  [<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120
  [<ffffffff813fcf2c>] driver_unregister+0x2c/0x50
  [<ffffffff813381f3>] pci_unregister_driver+0x23/0x80
  [<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci]
  [<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0
  [<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0
  [<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b

The loop has been corrected.
This patch fixes kernel panic during entering the S3 state
and the above oops.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit d25f06e upstream.

vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 hardkernel#4 [ffff88023abd5910] do_trap at ffffffff8152add4
 hardkernel#5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 hardkernel#6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 hardkernel#7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 hardkernel#8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 hardkernel#9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 9ef7506 upstream.

A few of the simpler TTM drivers (cirrus, ast, mgag200) do not implement
this function.  Yet can end up somehow with an evicted bo:

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<          (null)>]           (null)
  PGD 16e761067 PUD 16e6cf067 PMD 0
  Oops: 0010 [#1] SMP
  Modules linked in: bnep bluetooth rfkill fuse ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg btrfs zlib_deflate raid6_pq xor dm_queue_length iTCO_wdt iTCO_vendor_support coretemp kvm dcdbas dm_service_time microcode serio_raw pcspkr lpc_ich mfd_core i7core_edac edac_core ses enclosure ipmi_si ipmi_msghandler shpchp acpi_power_meter mperf nfsd auth_rpcgss nfs_acl lockd uinput sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sr_mod cdrom
   sd_mod usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit lpfc drm_kms_helper ttm crc32c_intel ata_piix bfa drm ixgbe libata i2c_core mdio crc_t10dif ptp crct10dif_common pps_core scsi_transport_fc dca scsi_tgt megaraid_sas bnx2 dm_mirror dm_region_hash dm_log dm_mod
  CPU: 16 PID: 2572 Comm: X Not tainted 3.10.0-86.el7.x86_64 #1
  Hardware name: Dell Inc. PowerEdge R810/0H235N, BIOS 0.3.0 11/14/2009
  task: ffff8801799dabc0 ti: ffff88016c884000 task.ti: ffff88016c884000
  RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
  RSP: 0018:ffff88016c885ad8  EFLAGS: 00010202
  RAX: ffffffffa04e94c0 RBX: ffff880178937a20 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000240004 RDI: ffff880178937a00
  RBP: ffff88016c885b60 R08: 00000000000171a0 R09: ffff88007cf171a0
  R10: ffffea0005842540 R11: ffffffff810487b9 R12: ffff880178937b30
  R13: ffff880178937a00 R14: ffff88016c885b78 R15: ffff880179929400
  FS:  00007f81ba2ef980(0000) GS:ffff88007cf00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000016e763000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffffffffa0306fae ffff8801799295c0 0000000000260004 0000000000000001
   ffff88016c885b60 ffffffffa0307669 00ff88007cf17738 ffff88017cf17700
   ffff880178937a00 ffff880100000000 ffff880100000000 0000000079929400
  Call Trace:
   [<ffffffffa0306fae>] ? ttm_bo_handle_move_mem+0x54e/0x5b0 [ttm]
   [<ffffffffa0307669>] ? ttm_bo_mem_space+0x169/0x340 [ttm]
   [<ffffffffa0307bd7>] ttm_bo_move_buffer+0x117/0x130 [ttm]
   [<ffffffff81130001>] ? perf_event_init_context+0x141/0x220
   [<ffffffffa0307cb1>] ttm_bo_validate+0xc1/0x130 [ttm]
   [<ffffffffa04e7377>] mgag200_bo_pin+0x87/0xc0 [mgag200]
   [<ffffffffa04e56c4>] mga_crtc_cursor_set+0x474/0xbb0 [mgag200]
   [<ffffffff811971d2>] ? __mem_cgroup_commit_charge+0x152/0x3b0
   [<ffffffff815c4182>] ? mutex_lock+0x12/0x2f
   [<ffffffffa0201433>] drm_mode_cursor_common+0x123/0x170 [drm]
   [<ffffffffa0205231>] drm_mode_cursor_ioctl+0x41/0x50 [drm]
   [<ffffffffa01f5ca2>] drm_ioctl+0x502/0x630 [drm]
   [<ffffffff815cbab4>] ? __do_page_fault+0x1f4/0x510
   [<ffffffff8101cb68>] ? __restore_xstate_sig+0x218/0x4f0
   [<ffffffff811b4445>] do_vfs_ioctl+0x2e5/0x4d0
   [<ffffffff8124488e>] ? file_has_perm+0x8e/0xa0
   [<ffffffff811b46b1>] SyS_ioctl+0x81/0xa0
   [<ffffffff815d05d9>] system_call_fastpath+0x16/0x1b
  Code:  Bad RIP value.
  RIP  [<          (null)>]           (null)
   RSP <ffff88016c885ad8>
  CR2: 0000000000000000

Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 3367da5 upstream.

Creating a large file on a JFFS2 partition sometimes crashes with this call
trace:

[  306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
[  306.488000] Oops[#1]:
[  306.488000] Cpu 13
[  306.492000] $ 0   : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
[  306.500000] $ 4   : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
[  306.508000] $ 8   : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
[  306.516000] $12   : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
[  306.524000] $16   : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
[  306.532000] $20   : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
[  306.540000] $24   : 0000000000000000 0000000000000004
[  306.548000] $28   : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
[  306.556000] Hi    : 00000000000574a5
[  306.560000] Lo    : 6193b7a7e903d8c9
[  306.564000] epc   : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
[  306.568000]     Tainted: G        W
[  306.572000] ra    : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
[  306.580000] Status: 5000f8e3    KX SX UX KERNEL EXL IE
[  306.584000] Cause : 00800008
[  306.588000] BadVA : c0000000dfff8002
[  306.592000] PrId  : 000c1100 (Netlogic XLP)
[  306.596000] Modules linked in:
[  306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
[  306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
        7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
        7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
        7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
        7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
        7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
        7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
        7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
        7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
        7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
        ...
[  306.676000] Call Trace:
[  306.680000] [<ffffffffc03a80a8>] jffs2_rtime_compress+0x98/0x198
[  306.684000] [<ffffffffc0394f10>] jffs2_selected_compress+0x110/0x230
[  306.692000] [<ffffffffc039508c>] jffs2_compress+0x5c/0x388
[  306.696000] [<ffffffffc039dc58>] jffs2_write_inode_range+0xd8/0x388
[  306.704000] [<ffffffffc03971bc>] jffs2_write_end+0x16c/0x2d0
[  306.708000] [<ffffffffc01d3d90>] generic_file_buffered_write+0xf8/0x2b8
[  306.716000] [<ffffffffc01d4e7c>] __generic_file_aio_write+0x1ac/0x350
[  306.720000] [<ffffffffc01d50a0>] generic_file_aio_write+0x80/0x168
[  306.728000] [<ffffffffc021f7dc>] do_sync_write+0x94/0xf8
[  306.732000] [<ffffffffc021ff6c>] vfs_write+0xa4/0x1a0
[  306.736000] [<ffffffffc02202e8>] SyS_write+0x50/0x90
[  306.744000] [<ffffffffc0116cc0>] handle_sys+0x180/0x1a0
[  306.748000]
[  306.748000]
Code: 020b202d  0205282d  90a50000 <90840000> 14a40038  00000000  0060602d  0000282d  016c5823
[  306.760000] ---[ end trace 79dd088435be02d0 ]---
Segmentation fault

This crash is caused because the 'positions' is declared as an array of signed
short. The value of position is in the range 0..65535, and will be converted
to a negative number when the position is greater than 32767 and causes a
corruption and crash. Changing the definition to 'unsigned short' fixes this
issue

Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 41bf1a2 upstream.

mounting JFFS2 partition sometimes crashes with this call trace:

[ 1322.240000] Kernel bug detected[#1]:
[ 1322.244000] Cpu 2
[ 1322.244000] $ 0   : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
[ 1322.252000] $ 4   : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
[ 1322.260000] $ 8   : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
[ 1322.268000] $12   : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
[ 1322.276000] $16   : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
[ 1322.284000] $20   : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
[ 1322.292000] $24   : 0000000000000004 0000000000000000
[ 1322.300000] $28   : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
[ 1322.308000] Hi    : 000000000000023c
[ 1322.312000] Lo    : 000000000003f802
[ 1322.316000] epc   : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
[ 1322.320000]     Not tainted
[ 1322.324000] ra    : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.332000] Status: 5400f8e3    KX SX UX KERNEL EXL IE
[ 1322.336000] Cause : 00800034
[ 1322.340000] PrId  : 000c1004 (Netlogic XLP)
[ 1322.344000] Modules linked in:
[ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
[ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
        c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
        c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
        0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
        0000008600000000 0000000000000000 0000000000000000 0000000000000000
        0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
        0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
        c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
        0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
        0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
        ...
[ 1322.424000] Call Trace:
[ 1322.428000] [<ffffffffc039a9f8>] check_tn_node+0x88/0x3b0
[ 1322.432000] [<ffffffffc039c0b0>] jffs2_do_read_inode_internal+0x1250/0x1e48
[ 1322.440000] [<ffffffffc039cf90>] jffs2_do_crccheck_inode+0x70/0xd0
[ 1322.448000] [<ffffffffc03a1b80>] jffs2_garbage_collect_pass+0x160/0x870
[ 1322.452000] [<ffffffffc03a392c>] jffs2_garbage_collect_thread+0xdc/0x1f0
[ 1322.460000] [<ffffffffc01541c8>] kthread+0xb8/0xc0
[ 1322.464000] [<ffffffffc0106d18>] kernel_thread_helper+0x10/0x18
[ 1322.472000]
[ 1322.472000]
Code: 67bd0050  94a4002c  2c830001 <00038036> de050218  2403fffc  0080a82d  00431824  24630044
[ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---

The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
is used to hold the compressed data length(csize) which is declared as uint32_t.
So, when the value of csize exceeds 16bits, it gets truncated when assigned to
tn->csize. This is causing a kernel BUG.
Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.

Signed-off-by: Ajesh Kunhipurayil Vijayan <ajesh@broadcom.com>
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit a585f87 upstream.

The scenario here is that someone calls enable_irq_wake() from somewhere
in the code. This will result in the lockdep producing a backtrace as can
be seen below. In my case, this problem is triggered when using the wl1271
(TI WlCore) driver found in drivers/net/wireless/ti/ .

The problem cause is rather obvious from the backtrace, but let's outline
the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(),
which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq()
calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to
grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in
irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not
marked as recursive, lockdep will spew the stuff below.

We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to
fix the spew.

 =============================================
 [ INFO: possible recursive locking detected ]
 3.10.33-00012-gf06b763-dirty hardkernel#61 Not tainted
 ---------------------------------------------
 kworker/0:1/18 is trying to acquire lock:
  (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88

 but task is already holding lock:
  (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 3 locks held by kworker/0:1/18:
  #0:  (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4
  #1:  ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4
  #2:  (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88

 stack backtrace:
 CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty hardkernel#61
 Workqueue: events request_firmware_work_func
 [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14)
 [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64)
 [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104)
 [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58)
 [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88)
 [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4)
 [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24)
 [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44)
 [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4)
 [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c)
 [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58)
 [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4)
 [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394)
 [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0)
 [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34)
 wlcore: loaded

Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 2b90563 upstream.

When stopping nfsd, I got BUG messages, and soft lockup messages,
The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
and destroy_client().

This patch just let nfsd traversing unconfirmed client through
hash-table instead of rbtree.

[ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
          (null)
[ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
[ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
[last unloaded: nfsd]
[ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF          O
3.14.0-rc8+ #2
[ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
ffff8800797f6000
[ 2325.022982] RIP: 0010:[<ffffffff8133c18c>]  [<ffffffff8133c18c>]
rb_erase+0x14c/0x390
[ 2325.022982] RSP: 0018:ffff8800797f7d98  EFLAGS: 00010246
[ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
0000000000000000
[ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
ffff880079f4c810
[ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
ffff88007964fc70
[ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
ffff880079f4c800
[ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
ffff880079f4c860
[ 2325.022982] FS:  0000000000000000(0000) GS:ffff88007f900000(0000)
knlGS:0000000000000000
[ 2325.022982] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
00000000000006e0
[ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 2325.022982] Stack:
[ 2325.022982]  ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
ffff8800797f7da8
[ 2325.022982]  ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
ffff880079c1f010
[ 2325.022982]  ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
ffff8800797f7df0
[ 2325.022982] Call Trace:
[ 2325.022982]  [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
[ 2325.022982]  [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
[nfsd]
[ 2325.022982]  [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[ 2325.022982]  [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[ 2325.022982]  [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[ 2325.022982]  [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
[ 2325.022982]  [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
[ 2325.022982]  [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2325.022982]  [<ffffffff810a8232>] kthread+0xd2/0xf0
[ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
[ 2325.022982] RIP  [<ffffffff8133c18c>] rb_erase+0x14c/0x390
[ 2325.022982]  RSP <ffff8800797f7d98>
[ 2325.022982] CR2: 0000000000000000
[ 2325.022982] ---[ end trace 28c27ed011655e57 ]---

[  228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
[  228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
i2c_core ata_generic pata_acpi
[  228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF          O
3.14.0-rc8+ #2
[  228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
ffff880074616000
[  228.064539] RIP: 0010:[<ffffffff8133ba17>]  [<ffffffff8133ba17>]
rb_next+0x27/0x50
[  228.064539] RSP: 0018:ffff880074617de0  EFLAGS: 00000282
[  228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
0000000000000014
[  228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
ffff880074478010
[  228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
0000000000000012
[  228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
ffffea0001d11a00
[  228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
ffff880074617d50
[  228.064539] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[  228.064539] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
00000000000006f0
[  228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  228.064539] Stack:
[  228.064539]  ffff880074617e28 ffffffffa01ab7db ffff880074617df0
ffff880074617df0
[  228.064539]  ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
0000000000000000
[  228.064539]  0000000000000000 ffff880074617e48 ffffffffa01840b8
ffffffff81cc26c0
[  228.064539] Call Trace:
[  228.064539]  [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
[nfsd]
[  228.064539]  [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
[  228.064539]  [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
[  228.064539]  [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
[  228.064539]  [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
[  228.064539]  [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
[  228.064539]  [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[  228.064539]  [<ffffffff810a8232>] kthread+0xd2/0xf0
[  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[  228.064539]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
[  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
[  228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7

Fixes: ac55fdc (nfsd: move the confirmed and unconfirmed hlists...)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
…s during lockd_up

commit 679b033 upstream.

We had a Fedora ABRT report with a stack trace like this:

kernel BUG at net/sunrpc/svc.c:550!
invalid opcode: 0000 [#1] SMP
[...]
CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
RIP: 0010:[<ffffffffa0305fa8>]  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
RSP: 0018:ffff88003f9b9de0  EFLAGS: 00010206
RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
FS:  00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
Stack:
 ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
 ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
 ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
Call Trace:
 [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
 [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
 [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
 [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
 [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
 [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
 [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
 [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
 [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
 [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
 [<ffffffff811c3f99>] ? putname+0x29/0x40
 [<ffffffff811b9569>] SyS_write+0x49/0xa0
 [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
 [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
RIP  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
 RSP <ffff88003f9b9de0>

Evidently, we created some lockd sockets and then failed to create
others. make_socks then returned an error and we tried to tear down the
svc, but svc->sv_permsocks was not empty so we ended up tripping over
the BUG() in svc_destroy().

Fix this by ensuring that we tear down any live sockets we created when
socket creation is going to return an error.

Fixes: 786185b (SUNRPC: move per-net operations from...)
Reported-by: Raphos <raphoszap@laposte.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 55f6714 upstream.

When I decrease the value of nr_hugepage in procfs a lot, softlockup
happens.  It is because there is no chance of context switch during this
process.

On the other hand, when I allocate a large number of hugepages, there is
some chance of context switch.  Hence softlockup doesn't happen during
this process.  So it's necessary to add the context switch in the
freeing process as same as allocating process to avoid softlockup.

When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
process occupied a CPU over 150 seconds and following softlockup message
appeared twice or more.

$ echo 6000000 > /proc/sys/vm/nr_hugepages
$ cat /proc/sys/vm/nr_hugepages
6000000
$ grep ^Huge /proc/meminfo
HugePages_Total:   6000000
HugePages_Free:    6000000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
$ echo 0 > /proc/sys/vm/nr_hugepages

BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
Call Trace:
  free_pool_huge_page+0xb8/0xd0
  set_max_huge_pages+0x128/0x190
  hugetlb_sysctl_handler_common+0x113/0x140
  hugetlb_sysctl_handler+0x1e/0x20
  proc_sys_call_handler+0x97/0xd0
  proc_sys_write+0x14/0x20
  vfs_write+0xb8/0x1a0
  sys_write+0x51/0x90
  __audit_syscall_exit+0x265/0x290
  system_call_fastpath+0x16/0x1b

I have not confirmed this problem with upstream kernels because I am not
able to prepare the machine equipped with 12TB memory now.  However I
confirmed that the amount of decreasing hugepages was directly
proportional to the amount of required time.

I measured required times on a smaller machine.  It showed 130-145
hugepages decreased in a millisecond.

  Amount of decreasing     Required time      Decreasing rate
  hugepages                     (msec)         (pages/msec)
  ------------------------------------------------------------
  10,000 pages == 20GB         70 -  74          135-142
  30,000 pages == 60GB        208 - 229          131-144

It means decrement of 6TB hugepages will trigger softlockup with the
default threshold 20sec, in this decreasing rate.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit ec4cb1a upstream.

When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:

WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()

CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
 [<ffffffff816311b0>] dump_stack+0x19/0x1b
 [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
 [<ffffffff8119421e>] setxattr+0x13e/0x1e0
 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b

The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:

CPU1                                CPU2
ext4_xattr_release_block()          ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
  le32_add_cpu(&BHDR(bh)->h_refcount, -1);
  unlock_buffer(bh);
                                    lock_buffer(bh);
                                    /* True */
                                    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                      get_bh(bh);
                                      ext4_free_blocks()
                                        ...
                                        jbd2_journal_forget()
                                          jbd2_journal_unfile_buffer()
                                          -> JH is gone
  error = ext4_handle_dirty_xattr_block(handle, inode, bh);
  -> triggers the warning

We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 10164c2 upstream.

Fix driver new_id sysfs-attribute removal deadlock by making sure to
not hold any locks that the attribute operations grab when removing the
attribute.

Specifically, usb_serial_deregister holds the table mutex when
deregistering the driver, which includes removing the new_id attribute.
This can lead to a deadlock as writing to new_id increments the
attribute's active count before trying to grab the same mutex in
usb_serial_probe.

The deadlock can easily be triggered by inserting a sleep in
usb_serial_deregister and writing the id of an unbound device to new_id
during module unload.

As the table mutex (in this case) is used to prevent subdriver unload
during probe, it should be sufficient to only hold the lock while
manipulating the usb-serial driver list during deregister. A racing
probe will then either fail to find a matching subdriver or fail to get
the corresponding module reference.

Since v3.15-rc1 this also triggers the following lockdep warning:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc2 hardkernel#123 Tainted: G        W
-------------------------------------------------------
modprobe/190 is trying to acquire lock:
 (s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94

but task is already holding lock:
 (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (table_lock){+.+.+.}:
       [<c0075f84>] __lock_acquire+0x1694/0x1ce4
       [<c0076de8>] lock_acquire+0xb4/0x154
       [<c03af3cc>] _raw_spin_lock+0x4c/0x5c
       [<c02bbc24>] usb_store_new_id+0x14c/0x1ac
       [<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
       [<c025f568>] drv_attr_store+0x30/0x3c
       [<c01690e0>] sysfs_kf_write+0x5c/0x60
       [<c01682c0>] kernfs_fop_write+0xd4/0x194
       [<c010881c>] vfs_write+0xbc/0x198
       [<c0108e4c>] SyS_write+0x4c/0xa0
       [<c000f880>] ret_fast_syscall+0x0/0x48

-> #0 (s_active#4){++++.+}:
       [<c03a7a28>] print_circular_bug+0x68/0x2f8
       [<c0076218>] __lock_acquire+0x1928/0x1ce4
       [<c0076de8>] lock_acquire+0xb4/0x154
       [<c0166b70>] __kernfs_remove+0x254/0x310
       [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
       [<c0169fb8>] remove_files.isra.1+0x48/0x84
       [<c016a2fc>] sysfs_remove_group+0x58/0xac
       [<c016a414>] sysfs_remove_groups+0x34/0x44
       [<c02623b8>] driver_remove_groups+0x1c/0x20
       [<c0260e9c>] bus_remove_driver+0x3c/0xe4
       [<c026235c>] driver_unregister+0x38/0x58
       [<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
       [<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
       [<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
       [<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
       [<c009d6cc>] SyS_delete_module+0x184/0x210
       [<c000f880>] ret_fast_syscall+0x0/0x48

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(table_lock);
                               lock(s_active#4);
                               lock(table_lock);
  lock(s_active#4);

 *** DEADLOCK ***

1 lock held by modprobe/190:
 #0:  (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]

stack backtrace:
CPU: 0 PID: 190 Comm: modprobe Tainted: G        W     3.15.0-rc2 hardkernel#123
[<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
[<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
[<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
[<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
[<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
[<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
[<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
[<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
[<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
[<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
[<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
[<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
[<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
[<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
[<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
[<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
[<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
[<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
[<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
[ Upstream commit dc8eaaa ]

When I open the LOCKDEP config and run these steps:

modprobe 8021q
vconfig add eth2 20
vconfig add eth2.20 30
ifconfig eth2 xx.xx.xx.xx

then the Call Trace happened:

[32524.386288] =============================================
[32524.386293] [ INFO: possible recursive locking detected ]
[32524.386298] 3.14.0-rc2-0.7-default+ hardkernel#35 Tainted: G           O
[32524.386302] ---------------------------------------------
[32524.386306] ifconfig/3103 is trying to acquire lock:
[32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386326]
[32524.386326] but task is already holding lock:
[32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386341]
[32524.386341] other info that might help us debug this:
[32524.386345]  Possible unsafe locking scenario:
[32524.386345]
[32524.386350]        CPU0
[32524.386352]        ----
[32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386364]
[32524.386364]  *** DEADLOCK ***
[32524.386364]
[32524.386368]  May be due to missing lock nesting notation
[32524.386368]
[32524.386373] 2 locks held by ifconfig/3103:
[32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
[32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386398]
[32524.386398] stack backtrace:
[32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ hardkernel#35
[32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
[32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
[32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
[32524.386435] Call Trace:
[32524.386441]  [<ffffffff814f68a2>] dump_stack+0x6a/0x78
[32524.386448]  [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
[32524.386454]  [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
[32524.386459]  [<ffffffff810a4874>] lock_acquire+0xe4/0x110
[32524.386464]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386471]  [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
[32524.386476]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386481]  [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386489]  [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
[32524.386495]  [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
[32524.386500]  [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
[32524.386506]  [<ffffffff8141b3cf>] __dev_open+0xef/0x150
[32524.386511]  [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
[32524.386516]  [<ffffffff8141b292>] dev_change_flags+0x32/0x80
[32524.386524]  [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
[32524.386532]  [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
[32524.386540]  [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
[32524.386550]  [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
[32524.386558]  [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
[32524.386568]  [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
[32524.386578]  [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
[32524.386586]  [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
[32524.386594]  [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
[32524.386604]  [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b

========================================================================

The reason is that all of the addr_lock_key for vlan dev have the same class,
so if we change the status for vlan dev, the vlan dev and its real dev will
hold the same class of addr_lock_key together, so the warning happened.

we should distinguish the lock depth for vlan dev and its real dev.

v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
	could support to add 8 vlan id on a same vlan dev, I think it is enough for current
	scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
	and the vlan dev would not meet the same class key with its real dev.

	The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
	dev could get a suitable class key.

v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
	and its real dev, but it make no sense, because the difference for subclass in the
	lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
	to distinguish the different depth for every vlan dev, the same depth of the vlan dev
	could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
	I think it is enough here, the lockdep should never exceed that value.

v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
	we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
	and use the depth as the subclass for addr_lock_key.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
[ Upstream commit b14878c ]

Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:

Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
03e00008  00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt

What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.

After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.

The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.

Fix in joint work with Daniel Borkmann.

Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
[ Upstream commit 5a008ff ]

The MC8305 module got an additional entry added based solely on
information from a Windows driver *.inf file. We now have the
actual descriptor layout from one of these modules, and it
consists of two alternate configurations where cfg #1 is a
normal Gobi 2k layout and cfg #2 is MBIM only, using interface
numbers 5 and 6 for MBIM control and data. The extra Windows
driver entry for interface number 5 was most likely a bug.

Deleting the bogus entry to avoid unnecessary qmi_wwan probe
failures when using the MBIM configuration.

Reported-by: Lana Black <sickmind@lavabit.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit aa07c71 upstream.

After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.

This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.

Thanks for Christoph Hellwig's advice.

First problem:
............ hang ...........

Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
3.15.0-rc1+ hardkernel#15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320]  [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320]  RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 72abc8f upstream.

I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
shows like this:

[ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297)
[ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G           O 3.10.40 #1
[ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24)
[ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28)
[ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs])
[ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs])
[ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8)
[ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544)
[ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398)
[ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8)
[ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238)
[ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c)
[ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188)
[ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs])
[ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs])
[ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418)
[ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530)
[ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54)
[ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184)
[ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac)
[ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0)
[ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40)

After analyzing the code, I found a condition that may cause this failed
in correct operations. Thus, I think this assertion is wrong and should be
removed.

Suppose there are two clean znodes and one dirty znode in TNC. So the
per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode
is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops
on this znode. We clear COW bit and DIRTY bit in write_index() without
@tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the
comments in write_index() shows, if another process hold @tnc_mutex and
dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1).
We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in
free_obsolete_znodes() to keep it right.

If shrink_tnc() performs between decrease and increase, it will release
other 2 clean znodes it holds and found @clean_zn_cnt is less than zero
(1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will
soon correct @clean_zn_cnt and no harm to fs in this case, I think this
assertion could be removed.

2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2

Thread A (commit)         Thread B (write or others)       Thread C (shrinker)
->write_index
   ->clear_bit(DIRTY_NODE)
   ->clear_bit(COW_ZNODE)

            @clean_zn_cnt == 2
                          ->mutex_locked(&tnc_mutex)
                          ->dirty_cow_znode
                              ->!ubifs_zn_cow(znode)
                              ->!test_and_set_bit(DIRTY_NODE)
                              ->atomic_dec(&clean_zn_cnt)
                          ->mutex_unlocked(&tnc_mutex)

            @clean_zn_cnt == 1
                                                           ->mutex_locked(&tnc_mutex)
                                                           ->shrink_tnc
                                                             ->destroy_tnc_subtree
                                                             ->atomic_sub(&clean_zn_cnt, 2)
                                                             ->ubifs_assert  <- hit
                                                           ->mutex_unlocked(&tnc_mutex)

            @clean_zn_cnt == -1
->mutex_lock(&tnc_mutex)
->free_obsolete_znodes
   ->atomic_inc(&clean_zn_cnt)
->mutux_unlock(&tnc_mutex)

            @clean_zn_cnt == 0 (correct after shrink)

Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 60e1751 upstream.

Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
triggers a use-after-free.  __fput() invokes f_op->release() before it
invokes cdev_put().  Make sure that the ib_umad_device structure is
freed by the cdev_put() call instead of f_op->release().  This avoids
that changing the port mode from IB into Ethernet and back to IB
followed by restarting opensmd triggers the following kernel oops:

    general protection fault: 0000 [#1] PREEMPT SMP
    RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
    Call Trace:
     [<ffffffff81190f20>] cdev_put+0x20/0x30
     [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
     [<ffffffff8118e35e>] ____fput+0xe/0x10
     [<ffffffff810723bc>] task_work_run+0xac/0xe0
     [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
     [<ffffffff814b8398>] int_signal+0x12/0x17

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
dsd pushed a commit that referenced this pull request Oct 9, 2014
commit 0430e49 upstream.

Commit 8aac627 "move exit_task_namespaces() outside of exit_notify"
introduced the kernel opps since the kernel v3.10, which happens when
Apparmor and IMA-appraisal are enabled at the same time.

----------------------------------------------------------------------
[  106.750167] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[  106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[  106.750241] PGD 0
[  106.750254] Oops: 0000 [#1] SMP
[  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
pps_core
[  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ hardkernel#15
[  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
09/19/2012
[  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
ffff880400fca000
[  106.750704] RIP: 0010:[<ffffffff811ec7da>]  [<ffffffff811ec7da>]
our_mnt+0x1a/0x30
[  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
[  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
ffff8800d51523e7
[  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
ffff880402d20020
[  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
0000000000000001
[  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff8800d5152300
[  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
ffff8800d51523e7
[  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
knlGS:0000000000000000
[  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
00000000001407e0
[  106.750962] Stack:
[  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
0000000000000000
[  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
0000000000000100
[  106.751093]  0000010000000000 0000000000000002 000000000000000e
ffff8803eb8df500
[  106.751149] Call Trace:
[  106.751172]  [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
[  106.751199]  [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
[  106.751225]  [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
[  106.751250]  [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
[  106.751276]  [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
[  106.751301]  [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
[  106.751327]  [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
[  106.751355]  [<ffffffff8130c853>] security_file_permission+0x23/0xa0
[  106.751382]  [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
[  106.751407]  [<ffffffff811c789d>] vfs_read+0x6d/0x170
[  106.751432]  [<ffffffff811cda31>] kernel_read+0x41/0x60
[  106.751457]  [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
[  106.751483]  [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
[  106.751509]  [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
[  106.751536]  [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
[  106.751562]  [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
[  106.751587]  [<ffffffff81352824>] ima_update_xattr+0x34/0x60
[  106.751612]  [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
[  106.751637]  [<ffffffff811c9635>] __fput+0xd5/0x300
[  106.751662]  [<ffffffff811c98ae>] ____fput+0xe/0x10
[  106.751687]  [<ffffffff81086774>] task_work_run+0xc4/0xe0
[  106.751712]  [<ffffffff81066fad>] do_exit+0x2bd/0xa90
[  106.751738]  [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
[  106.751763]  [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
[  106.751788]  [<ffffffff81067894>] SyS_exit_group+0x14/0x20
[  106.751814]  [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
[  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
[  106.752185] RIP  [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[  106.752214]  RSP <ffff880400fcba60>
[  106.752236] CR2: 0000000000000018
[  106.752258] ---[ end trace 3c520748b4732721 ]---
----------------------------------------------------------------------

The reason for the oops is that IMA-appraisal uses "kernel_read()" when
file is closed. kernel_read() honors LSM security hook which calls
Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty'
commit changed the order of cleanup code so that nsproxy->mnt_ns was
not already available for Apparmor.

Discussion about the issue with Al Viro and Eric W. Biederman suggested
that kernel_read() is too high-level for IMA. Another issue, except
security checking, that was identified is mandatory locking. kernel_read
honors it as well and it might prevent IMA from calculating necessary hash.
It was suggested to use simplified version of the function without security
and locking checks.

This patch introduces special version ima_kernel_read(), which skips security
and mandatory locking checking. It prevents the kernel oops to happen.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants