Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot boot using vhost-scsi controller. #2

Closed
cuinutanix opened this issue May 4, 2017 · 4 comments · May be fixed by open-power-host-os/slof#1
Closed

Cannot boot using vhost-scsi controller. #2

cuinutanix opened this issue May 4, 2017 · 4 comments · May be fixed by open-power-host-os/slof#1
Assignees

Comments

@cuinutanix
Copy link

cuinutanix commented May 4, 2017

After SLOF successfully probes scsi disks on a virtio-scsi controller, it sends a write to the PCI_COMMAND register with value 0x0 which disables the PCI device, which qemu translates to virtio_set_status(vdev, vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK); which tells the vhost backend that the driver wants the device stopped.

qemu's own implementation of virtio-scsi seems to ignore the fact that the pci device is disabled, and will happily continue to process scsi requests from the ring, and guests are able to boot. However, with a vhost-scsi backend, qemu tells the vhost backend to stop processing the ring, and the guest does not boot.

Here is a gdb stack trace of when qemu disables the vhost-scsi device:

#0  0x00000000202f63e8 in vhost_user_scsi_set_status (vdev=0x2173ffb0, 
    status=11 '\v')
    at /home/mcui/workspace/NXPower/qemu/hw/scsi/vhost-user-scsi.c:43
#1  0x000000002030fd9c in virtio_set_status (vdev=0x2173ffb0, val=11 '\v')
    at /home/mcui/workspace/NXPower/qemu/hw/virtio/virtio.c:898
#2  0x00000000205c4b38 in virtio_write_config (pci_dev=0x21737ba0, address=4, 
    val=1048832, len=4) at hw/virtio/virtio-pci.c:633
#3  0x000000002055bcec in pci_host_config_write_common (pci_dev=0x21737ba0, 
    addr=<optimized out>, limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:66
#4  0x0000000020341810 in finish_write_pci_config (spapr=<optimized out>, 
    buid=<optimized out>, addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2121530032)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_pci.c:200
#5  0x000000002033c99c in spapr_rtas_call (cpu=<optimized out>, 
    spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, rets=<optimized out>)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_rtas.c:665
#6  0x0000000020336784 in h_rtas (cpu=0x3fffb6260010, spapr=0x210aee80, 
    opcode=<optimized out>, args=<optimized out>)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_hcall.c:667
#7  0x0000000020339338 in spapr_hypercall (cpu=0x3fffb6260010, opcode=61440, 
    args=0x3fffb6200030)
    at /home/mcui/workspace/NXPower/qemu/hw/ppc/spapr_hcall.c:1243
#8  0x000000002040c6b4 in kvm_arch_handle_exit (cs=0x3fffb6260010, 
    run=0x3fffb6200000)
    at /home/mcui/workspace/NXPower/qemu/target-ppc/kvm.c:1817
#9  0x00000000202a25f8 in kvm_cpu_exec (cpu=0x3fffb6260010)
    at /home/mcui/workspace/NXPower/qemu/kvm-all.c:2016
#10 0x0000000020288350 in qemu_kvm_cpu_thread_fn (arg=0x3fffb6260010)
    at /home/mcui/workspace/NXPower/qemu/cpus.c:998
#11 0x00003fffb7488728 in start_thread () from /lib64/libpthread.so.0
#12 0x00003fffb73bd210 in clone () from /lib64/libc.so.6

You would need to set up vhost-scsi in order to reproduce the exact same issue. You can observe that the VIRTIO_CONFIG_S_DRIVER_OK status bit being cleared even with a standard virtio-scsi backend by setting a breakpoint on virtio_set_status() and observe that the status goes from 0 to 0xf (fully functional), then after the bus scan completes, the status is set to 0xb, with VIRTIO_CONFIG_S_DRIVER_OK cleared.

@mdroth
Copy link

mdroth commented May 4, 2017

the register value in the trace seems to be disabling io/mem too? that could be pci-device-disable in board-qemu/slof/pci-device_1af4_1004.fs

i wonder if simply removing the pci-device-disable from that file allow the boot to continue?

@cuinutanix
Copy link
Author

No it doesn't. It just removes 1 call to pci-device-disable but there are others, from pci-device.fs:

\ prepare the device for subsequent use
\ this word should be overloaded by the device file (if present)
\ the device file can call this file before implementing
\ its own open functionality
: open
        puid >r             \ save the old puid
        my-puid TO puid     \ set up the puid to the devices Hostbridge
        pci-master-enable   \ And enable Bus Master, IO and MEM access again.
        pci-mem-enable      \ enable mem access
        pci-io-enable       \ enable io access
        r> TO puid          \ restore puid
        true
;

\ close the previously opened device
\ this word should be overloaded by the device file (if present)
\ the device file can call this file after its implementation
\ of own close functionality
: close 
        puid >r             \ save the old puid
        my-puid TO puid     \ set up the puid
        pci-device-disable  \ and disable the device
        r> TO puid          \ restore puid
;

Node that the open method enables the device, but that's not getting called.

@cuinutanix
Copy link
Author

OK, I think I figured it out, clue is in the comment:

\ prepare the device for subsequent use
\ this word should be overloaded by the device file (if present)
\ the device file can call this file before implementing
\ its own open functionality

virtio-scsi.fs does not have the "open" word.

@laggarcia
Copy link
Member

Closing as per previous comment.

cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: <20160704205732.27022-6-marcandre.lureau@redhat.com>
Patchwork-id: 70942
O-Subject: [RHEV-7.3 qemu-kvm-rhev PATCH v2 5/5] char: do not use atexit cleanup handler
Bugzilla: 1347077
RH-Acked-by: Xiao Wang <jasowang@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Victor Kaplansky <vkaplans@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

From: Marc-André Lureau <marcandre.lureau@redhat.com>

It turns out qemu is calling exit() in various places from various
threads without taking much care of resources state. The atexit()
cleanup handlers cannot easily destroy resources that are in use (by
the same thread or other).

Since c1111a2, TCG arm guests run into the following abort() when
running tests, the chardev mutex is locked during the write, so
qemu_mutex_destroy() returns an error:

 #0  0x00007fffdbb806f5 in raise () at /lib64/libc.so.6
 #1  0x00007fffdbb822fa in abort () at /lib64/libc.so.6
 #2  0x00005555557616fe in error_exit (err=<optimized out>, msg=msg@entry=0x555555c38c30 <__func__.14622> "qemu_mutex_destroy")
     at /home/drjones/code/qemu/util/qemu-thread-posix.c:39
 #3  0x0000555555b0be20 in qemu_mutex_destroy (mutex=mutex@entry=0x5555566aa0e0) at /home/drjones/code/qemu/util/qemu-thread-posix.c:57
 open-power-host-os#4  0x00005555558aab00 in qemu_chr_free_common (chr=0x5555566aa0e0) at /home/drjones/code/qemu/qemu-char.c:4029
 open-power-host-os#5  0x00005555558b05f9 in qemu_chr_delete (chr=<optimized out>) at /home/drjones/code/qemu/qemu-char.c:4038
 open-power-host-os#6  0x00005555558b05f9 in qemu_chr_delete (chr=<optimized out>) at /home/drjones/code/qemu/qemu-char.c:4044
 open-power-host-os#7  0x00005555558b062c in qemu_chr_cleanup () at /home/drjones/code/qemu/qemu-char.c:4557
 open-power-host-os#8  0x00007fffdbb851e8 in __run_exit_handlers () at /lib64/libc.so.6
 open-power-host-os#9  0x00007fffdbb85235 in  () at /lib64/libc.so.6
 open-power-host-os#10 0x00005555558d1b39 in testdev_write (testdev=0x5555566aa0a0) at /home/drjones/code/qemu/backends/testdev.c:71
 open-power-host-os#11 0x00005555558d1b39 in testdev_write (chr=<optimized out>, buf=0x7fffc343fd9a "", len=0) at /home/drjones/code/qemu/backends/testdev.c:95
 open-power-host-os#12 0x00005555558adced in qemu_chr_fe_write (s=0x5555566aa0e0, buf=buf@entry=0x7fffc343fd98 "0q", len=len@entry=2) at /home/drjones/code/qemu/qemu-char.c:282

Instead of using a atexit() handler, only run the chardev cleanup as
initially proposed at the end of main(), where there are less chances
(hic) of conflicts or other races.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reported-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

(cherry picked from commit dd9250d5d99a2671584f65b7d0745355bbbe353f)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Markus Armbruster <armbru@redhat.com>
Message-id: <1469602394-4025-2-git-send-email-armbru@redhat.com>
Patchwork-id: 71467
O-Subject: [RHEV-7.3 qemu-kvm-rhev PATCH] block: drop support for using qcow[2] encryption with system emulators
Bugzilla: 1336659
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Daniel P. Berrange <berrange@redhat.com>

From: "Daniel P. Berrange" <berrange@redhat.com>

Back in the 2.3.0 release we declared qcow[2] encryption as
deprecated, warning people that it would be removed in a future
release.

  commit a1f688f
  Author: Markus Armbruster <armbru@redhat.com>
  Date:   Fri Mar 13 21:09:40 2015 +0100

    block: Deprecate QCOW/QCOW2 encryption

The code still exists today, but by a (happy?) accident we entirely
broke the ability to use qcow[2] encryption in the system emulators
in the 2.4.0 release due to

  commit 8336aaf
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Tue May 12 17:09:18 2015 +0100

    qcow2/qcow: protect against uninitialized encryption key

This commit was designed to prevent future coding bugs which
might cause QEMU to read/write data on an encrypted block
device in plain text mode before a decryption key is set.

It turns out this preventative measure was a little too good,
because we already had a long standing bug where QEMU read
encrypted data in plain text mode during system emulator
startup, in order to guess disk geometry:

  Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)):
  #0  0x00007fffe90b1a28 in raise () at /lib64/libc.so.6
  #1  0x00007fffe90b362a in abort () at /lib64/libc.so.6
  #2  0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6
  #3  0x00007fffe90aa2d2 in  () at /lib64/libc.so.6
  open-power-host-os#4  0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229
  open-power-host-os#5  0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908
  open-power-host-os#6  0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999
  open-power-host-os#7  0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544
  open-power-host-os#8  0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134
  open-power-host-os#9  0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778
  open-power-host-os#10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0
  open-power-host-os#11 0x00007fffe917f59d in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)):
  #0  0x00007fffe91797a9 in syscall () at /lib64/libc.so.6
  #1  0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397
  #2  0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117
  #3  0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175
  open-power-host-os#4  0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116
  open-power-host-os#5  0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187
  open-power-host-os#6  0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85
  open-power-host-os#7  0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135
  open-power-host-os#8  0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291
  open-power-host-os#9  0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591
  open-power-host-os#10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614
  open-power-host-os#11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622
  open-power-host-os#12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634
    nb_sectors@entry=1) at block/block-backend.c:504
  open-power-host-os#14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68
  open-power-host-os#15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133
  open-power-host-os#16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71
  open-power-host-os#17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174
  open-power-host-os#18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247
  open-power-host-os#19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058
  open-power-host-os#20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730)
        at qom/object.c:1514
  open-power-host-os#21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24
  open-power-host-os#22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905
  open-power-host-os#23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380
  open-power-host-os#24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122
  open-power-host-os#25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440
  open-power-host-os#26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218
  open-power-host-os#27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256
  open-power-host-os#28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249

So the safety net is correctly preventing QEMU reading cipher
text as if it were plain text, during startup and aborting QEMU
to avoid bad usage of this data.

For added fun this bug only happens if the encrypted qcow2
file happens to have data written to the first cluster,
otherwise the cluster won't be allocated and so qcow2 would
not try the decryption routines at all, just return all 0's.

That no one even noticed, let alone reported, this bug that
has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number
of actual users of encrypted qcow2 is approximately zero.

So rather than fix the crash, and backport it to stable
releases, just go ahead with what we have warned users about
and disable any use of qcow2 encryption in the system
emulators. qemu-img/qemu-io/qemu-nbd are still able to access
qcow2 encrypted images for the sake of data conversion.

In the future, qcow2 will gain support for the alternative
luks format, but when this happens it'll be using the
'-object secret' infrastructure for getting keys, which
avoids this problematic scenario entirely.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8c0dcbc)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: <20160810200144.31810-2-marcandre.lureau@redhat.com>
Patchwork-id: 71914
O-Subject: [RHEV-7.3 qemu-kvm-rhev PATCH 1/2] monitor: fix crash when leaving qemu with spice audio
Bugzilla: 1355704
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Markus Armbruster <armbru@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

Since aa5cb7f, the chardevs are being cleaned up when leaving
qemu. However, the monitor has still references to them, which may
lead to crashes when running atexit() and trying to send monitor
events:

 #0  0x00007fffdb18f6f5 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
 #1  0x00007fffdb1912fa in __GI_abort () at abort.c:89
 #2  0x0000555555c263e7 in error_exit (err=22, msg=0x555555d47980 <__func__.13537> "qemu_mutex_lock") at util/qemu-thread-posix.c:39
 #3  0x0000555555c26488 in qemu_mutex_lock (mutex=0x5555567a2420) at util/qemu-thread-posix.c:66
 open-power-host-os#4  0x00005555558c52db in qemu_chr_fe_write (s=0x5555567a2420, buf=0x55555740dc40 "{\"timestamp\": {\"seconds\": 1470041716, \"microseconds\": 989699}, \"event\": \"SPICE_DISCONNECTED\", \"data\": {\"server\": {\"port\": \"5900\", \"family\": \"ipv4\", \"host\": \"127.0.0.1\"}, \"client\": {\"port\": \"40272\", \"f"..., len=240) at qemu-char.c:280
 open-power-host-os#5  0x0000555555787cad in monitor_flush_locked (mon=0x5555567bd9e0) at /home/elmarco/src/qemu/monitor.c:311
 open-power-host-os#6  0x0000555555787e46 in monitor_puts (mon=0x5555567bd9e0, str=0x5555567a44ef "") at /home/elmarco/src/qemu/monitor.c:353
 open-power-host-os#7  0x00005555557880fe in monitor_json_emitter (mon=0x5555567bd9e0, data=0x5555567c73a0) at /home/elmarco/src/qemu/monitor.c:401
 open-power-host-os#8  0x00005555557882d2 in monitor_qapi_event_emit (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x5555567c73a0) at /home/elmarco/src/qemu/monitor.c:472
 open-power-host-os#9  0x000055555578838f in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x5555567c73a0, errp=0x7fffffffca88) at /home/elmarco/src/qemu/monitor.c:497
 open-power-host-os#10 0x0000555555c15541 in qapi_event_send_spice_disconnected (server=0x5555571139d0, client=0x5555570d0db0, errp=0x5555566c0428 <error_abort>) at qapi-event.c:1038
 open-power-host-os#11 0x0000555555b11bc6 in channel_event (event=3, info=0x5555570d6c00) at ui/spice-core.c:248
 open-power-host-os#12 0x00007fffdcc9983a in adapter_channel_event (event=3, info=0x5555570d6c00) at reds.c:120
 open-power-host-os#13 0x00007fffdcc99a25 in reds_handle_channel_event (reds=0x5555567a9d60, event=3, info=0x5555570d6c00) at reds.c:324
 open-power-host-os#14 0x00007fffdcc7d4c4 in main_dispatcher_self_handle_channel_event (self=0x5555567b28b0, event=3, info=0x5555570d6c00) at main-dispatcher.c:175
 open-power-host-os#15 0x00007fffdcc7d5b1 in main_dispatcher_channel_event (self=0x5555567b28b0, event=3, info=0x5555570d6c00) at main-dispatcher.c:194
 open-power-host-os#16 0x00007fffdcca7674 in reds_stream_push_channel_event (s=0x5555570d9910, event=3) at reds-stream.c:354
 open-power-host-os#17 0x00007fffdcca749b in reds_stream_free (s=0x5555570d9910) at reds-stream.c:323
 open-power-host-os#18 0x00007fffdccb5dad in snd_disconnect_channel (channel=0x5555576a89a0) at sound.c:229
 open-power-host-os#19 0x00007fffdccb9e57 in snd_detach_common (worker=0x555557739720) at sound.c:1589
 open-power-host-os#20 0x00007fffdccb9f0e in snd_detach_playback (sin=0x5555569fe3f8) at sound.c:1602
 open-power-host-os#21 0x00007fffdcca3373 in spice_server_remove_interface (sin=0x5555569fe3f8) at reds.c:3387
 open-power-host-os#22 0x00005555558ff6e2 in line_out_fini (hw=0x5555569fe370) at audio/spiceaudio.c:152
 open-power-host-os#23 0x00005555558f909e in audio_atexit () at audio/audio.c:1754
 open-power-host-os#24 0x00007fffdb1941e8 in __run_exit_handlers (status=0, listp=0x7fffdb5175d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
 open-power-host-os#25 0x00007fffdb194235 in __GI_exit (status=<optimized out>) at exit.c:104
 open-power-host-os#26 0x00007fffdb17b738 in __libc_start_main (main=0x5555558d7874 <main>, argc=67, argv=0x7fffffffcf48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffcf38) at ../csu/libc-start.c:323

Add a monitor_cleanup() functions to remove all the monitors before
cleaning up the chardev. Note that we are "losing" some events that
used to be sent during atexit().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20160801112343.29082-2-marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

(cherry picked from commit 2ef4571)
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1355704
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Gerd Hoffmann <kraxel@redhat.com>
Message-id: <1470920000-23113-2-git-send-email-kraxel@redhat.com>
Patchwork-id: 71946
O-Subject: [RHEL-7.3 qemu-kvm-rhev PATCH 1/3] vnc: don't crash getting server info if lsock is NULL
Bugzilla: 1359655
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Marcel Apfelbaum <marcel@redhat.com>
RH-Acked-by: Markus Armbruster <armbru@redhat.com>

From: "Daniel P. Berrange" <berrange@redhat.com>

When VNC is started with '-vnc none' there will be no
listener socket present. When we try to populate the
VncServerInfo we'll crash accessing a NULL 'lsock'
field.

 #0  qio_channel_socket_get_local_address (ioc=0x0, errp=errp@entry=0x7ffd5b8aa0f0) at io/channel-socket.c:33
 #1  0x00007f4b9a297d6f in vnc_init_basic_info_from_server_addr (errp=0x7ffd5b8aa0f0, info=0x7f4b9d425460, ioc=<optimized out>)  at ui/vnc.c:146
 #2  vnc_server_info_get (vd=0x7f4b9e858000) at ui/vnc.c:223
 #3  0x00007f4b9a29d318 in vnc_qmp_event (vs=0x7f4b9ef82000, vs=0x7f4b9ef82000, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:279
 open-power-host-os#4  vnc_connect (vd=vd@entry=0x7f4b9e858000, sioc=sioc@entry=0x7f4b9e8b3a20, skipauth=skipauth@entry=true, websocket=websocket @entry=false) at ui/vnc.c:2994
 open-power-host-os#5  0x00007f4b9a29e8c8 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>) at ui/v nc.c:3825
 open-power-host-os#6  0x00007f4b9a18d8a1 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7ffd5b8aa230) at qmp-marsh al.c:123
 open-power-host-os#7  0x00007f4b9a0b53f5 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.6.0/mon itor.c:3922
 open-power-host-os#8  0x00007f4b9a348580 in json_message_process_token (lexer=0x7f4b9c78dfe8, input=0x7f4b9c7350e0, type=JSON_RCURLY, x=111, y=5 9) at qobject/json-streamer.c:94
 open-power-host-os#9  0x00007f4b9a35cfeb in json_lexer_feed_char (lexer=lexer@entry=0x7f4b9c78dfe8, ch=125 '}', flush=flush@entry=false) at qobj ect/json-lexer.c:310
 open-power-host-os#10 0x00007f4b9a35d0ae in json_lexer_feed (lexer=0x7f4b9c78dfe8, buffer=<optimized out>, size=<optimized out>) at qobject/json -lexer.c:360
 open-power-host-os#11 0x00007f4b9a348679 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at q object/json-streamer.c:114
 open-power-host-os#12 0x00007f4b9a0b3a1b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/deb ug/qemu-2.6.0/monitor.c:3938
 open-power-host-os#13 0x00007f4b9a186751 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7f4b9c7add40) at qemu-char.c:2895
 open-power-host-os#14 0x00007f4b92b5c79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
 open-power-host-os#15 0x00007f4b9a2bb0c0 in glib_pollfds_poll () at main-loop.c:213
 open-power-host-os#16 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:258
 open-power-host-os#17 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
 open-power-host-os#18 0x00007f4b9a0835cf in main_loop () at vl.c:1934
 open-power-host-os#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667

Do an upfront check for a NULL lsock and report an error to
the caller, which matches behaviour from before

  commit 04d2529
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Fri Feb 27 16:20:57 2015 +0000

    ui: convert VNC server to use QIOChannelSocket

where getsockname() would be given a FD value -1 and thus report
an error to the caller.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470134726-15697-2-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 624cdd4)
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Gerd Hoffmann <kraxel@redhat.com>
Message-id: <1470920000-23113-3-git-send-email-kraxel@redhat.com>
Patchwork-id: 71947
O-Subject: [RHEL-7.3 qemu-kvm-rhev PATCH 2/3] vnc: fix crash when vnc_server_info_get has an error
Bugzilla: 1359655
RH-Acked-by: Thomas Huth <thuth@redhat.com>
RH-Acked-by: Marcel Apfelbaum <marcel@redhat.com>
RH-Acked-by: Markus Armbruster <armbru@redhat.com>

From: "Daniel P. Berrange" <berrange@redhat.com>

The vnc_server_info_get will allocate the VncServerInfo
struct and then call vnc_init_basic_info_from_server_addr
to populate the basic fields. If this returns an error
though, the qapi_free_VncServerInfo call will then crash
because the VncServerInfo struct instance was not properly
NULL-initialized and thus contains random stack garbage.

 #0  0x00007f1987c8e6f5 in raise () at /lib64/libc.so.6
 #1  0x00007f1987c902fa in abort () at /lib64/libc.so.6
 #2  0x00007f1987ccf600 in __libc_message () at /lib64/libc.so.6
 #3  0x00007f1987cd7d4a in _int_free () at /lib64/libc.so.6
 open-power-host-os#4  0x00007f1987cdb2ac in free () at /lib64/libc.so.6
 open-power-host-os#5  0x00007f198b654f6e in g_free () at /lib64/libglib-2.0.so.0
 open-power-host-os#6  0x0000559193cdcf54 in visit_type_str (v=v@entry=
     0x5591972f14b0, name=name@entry=0x559193de1e29 "host", obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899d80)
     at qapi/qapi-visit-core.c:255
 open-power-host-os#7  0x0000559193cca8f3 in visit_type_VncBasicInfo_members (v=v@entry=
     0x5591972f14b0, obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899dc0) at qapi-visit.c:12307
 open-power-host-os#8  0x0000559193ccb523 in visit_type_VncServerInfo_members (v=v@entry=
     0x5591972f14b0, obj=0x5591961dbfa0, errp=errp@entry=0x7fffd7899e00) at qapi-visit.c:12632
 open-power-host-os#9  0x0000559193ccb60b in visit_type_VncServerInfo (v=v@entry=
     0x5591972f14b0, name=name@entry=0x0, obj=obj@entry=0x7fffd7899e48, errp=errp@entry=0x0) at qapi-visit.c:12658
 open-power-host-os#10 0x0000559193cb53d8 in qapi_free_VncServerInfo (obj=<optimized out>) at qapi-types.c:3970
 open-power-host-os#11 0x0000559193c1e6ba in vnc_server_info_get (vd=0x7f1951498010) at ui/vnc.c:233
 open-power-host-os#12 0x0000559193c24275 in vnc_connect (vs=0x559197b2f200, vs=0x559197b2f200, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:284
 open-power-host-os#13 0x0000559193c24275 in vnc_connect (vd=vd@entry=0x7f1951498010, sioc=sioc@entry=0x559196bf9c00, skipauth=skipauth@entry=tru e, websocket=websocket@entry=false) at ui/vnc.c:3039
 open-power-host-os#14 0x0000559193c25806 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>)
     at ui/vnc.c:3877
 open-power-host-os#15 0x0000559193a90c28 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7fffd7899f90)
     at qmp-marshal.c:105
 open-power-host-os#16 0x000055919399c2b7 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>)
     at /home/berrange/src/virt/qemu/monitor.c:3971
 open-power-host-os#17 0x0000559193ce3307 in json_message_process_token (lexer=0x559194ab0838, input=0x559194a6d940, type=JSON_RCURLY, x=111, y=1 2) at qobject/json-streamer.c:105
 open-power-host-os#18 0x0000559193cfa90d in json_lexer_feed_char (lexer=lexer@entry=0x559194ab0838, ch=125 '}', flush=flush@entry=false)
     at qobject/json-lexer.c:319
 open-power-host-os#19 0x0000559193cfaa1e in json_lexer_feed (lexer=0x559194ab0838, buffer=<optimized out>, size=<optimized out>)
     at qobject/json-lexer.c:369
 open-power-host-os#20 0x0000559193ce33c9 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>)
     at qobject/json-streamer.c:124
 open-power-host-os#21 0x000055919399a85b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
     at /home/berrange/src/virt/qemu/monitor.c:3987
 open-power-host-os#22 0x0000559193a87d00 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x559194a7d900)
     at qemu-char.c:2895
 open-power-host-os#23 0x00007f198b64f703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
 open-power-host-os#24 0x0000559193c484b3 in main_loop_wait () at main-loop.c:213
 open-power-host-os#25 0x0000559193c484b3 in main_loop_wait (timeout=<optimized out>) at main-loop.c:258
 open-power-host-os#26 0x0000559193c484b3 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
 open-power-host-os#27 0x0000559193964c55 in main () at vl.c:1908
 open-power-host-os#28 0x0000559193964c55 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4603

This was introduced in

  commit 98481bf
  Author: Eric Blake <eblake@redhat.com>
  Date:   Mon Oct 26 16:34:45 2015 -0600

    vnc: Hoist allocation of VncBasicInfo to callers

which added error reporting for vnc_init_basic_info_from_server_addr
but didn't change the g_malloc calls to g_malloc0.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470134726-15697-3-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3e7f136)
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: <20161129073543.13711-11-marcandre.lureau@redhat.com>
Patchwork-id: 72914
O-Subject: [RHEV-7.3.z qemu-kvm-rhev PATCH 10/10] ahci: fix sglist leak on retry
Bugzilla: 1397745
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20)
    #1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58)
    #2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896
    #3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367
    open-power-host-os#4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844
    open-power-host-os#5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333
    open-power-host-os#6 0x556a187b650b in ide_start_dma hw/ide/core.c:921
    open-power-host-os#7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911
    open-power-host-os#8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486
    open-power-host-os#9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027
    open-power-host-os#10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204
    open-power-host-os#11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254
    open-power-host-os#12 0x556a187d168a in check_cmd hw/ide/ahci.c:510
    open-power-host-os#13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314
    open-power-host-os#14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435
    open-power-host-os#15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525
    open-power-host-os#16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591
    open-power-host-os#17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262
    open-power-host-os#18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578
    open-power-host-os#19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635
    open-power-host-os#20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737
    open-power-host-os#21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746
    open-power-host-os#22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72
    open-power-host-os#23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382
    open-power-host-os#24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573
    open-power-host-os#25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585
    open-power-host-os#26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387
    open-power-host-os#27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399
    open-power-host-os#28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902
    open-power-host-os#29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84

Follow John Snow recommendation:
  Everywhere else ncq_err is used, it is accompanied by a list cleanup
  except for ncq_cb, which is the case you are fixing here.

  Move the sglist destruction inside of ncq_err and then delete it from
  the other two locations to keep it tidy.

  Call dma_buf_commit in ide_dma_cb after the early return. Though, this
  is also a little wonky because this routine does more than clear the
  list, but it is at the moment the centralized "we're done with the
  sglist" function and none of the other side effects that occur in
  dma_buf_commit will interfere with the reset that occurs from
  ide_restart_bh, I think

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>

(cherry picked from commit 5839df7)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
cuinutanix referenced this issue in NXPower/qemu May 17, 2017
RH-Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: <20170104133125.1134-1-marcandre.lureau@redhat.com>
Patchwork-id: 73171
O-Subject: [RHEV-7.3.z qemu-kvm-rhev PATCH] net: don't poke at chardev internal QemuOpts
Bugzilla: 1410200
RH-Acked-by: Michael S. Tsirkin <mst@redhat.com>
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
RH-Acked-by: Victor Kaplansky <vkaplans@redhat.com>

From: "Daniel P. Berrange" <berrange@redhat.com>

The vhost-user & colo code is poking at the QemuOpts instance
in the CharDriverState struct, not realizing that it is valid
for this to be NULL. e.g. the following crash shows a codepath
where it will be NULL:

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x000055baf6ab4adc in qemu_opt_foreach (opts=0x0, func=0x55baf696b650 <net_vhost_chardev_opts>, opaque=0x7ffc51368c00, errp=0x7ffc51368e48) at util/qemu-option.c:617
 617         QTAILQ_FOREACH(opt, &opts->head, next) {
 [Current thread is 1 (Thread 0x7f1d4970bb40 (LWP 6603))]
 (gdb) bt
 #0  0x000055baf6ab4adc in qemu_opt_foreach (opts=0x0, func=0x55baf696b650 <net_vhost_chardev_opts>, opaque=0x7ffc51368c00, errp=0x7ffc51368e48) at util/qemu-option.c:617
 #1  0x000055baf696b7da in net_vhost_parse_chardev (opts=0x55baf8ff9260, errp=0x7ffc51368e48) at net/vhost-user.c:314
 #2  0x000055baf696b985 in net_init_vhost_user (netdev=0x55baf8ff9250, name=0x55baf879d270 "hostnet2", peer=0x0, errp=0x7ffc51368e48) at net/vhost-user.c:360
 #3  0x000055baf6960216 in net_client_init1 (object=0x55baf8ff9250, is_netdev=true, errp=0x7ffc51368e48) at net/net.c:1051
 open-power-host-os#4  0x000055baf6960518 in net_client_init (opts=0x55baf776e7e0, is_netdev=true, errp=0x7ffc51368f00) at net/net.c:1108
 open-power-host-os#5  0x000055baf696083f in netdev_add (opts=0x55baf776e7e0, errp=0x7ffc51368f00) at net/net.c:1186
 open-power-host-os#6  0x000055baf69608c7 in qmp_netdev_add (qdict=0x55baf7afaf60, ret=0x7ffc51368f50, errp=0x7ffc51368f48) at net/net.c:1205
 open-power-host-os#7  0x000055baf6622135 in handle_qmp_command (parser=0x55baf77fb590, tokens=0x7f1d24011960) at /path/to/qemu.git/monitor.c:3978
 open-power-host-os#8  0x000055baf6a9d099 in json_message_process_token (lexer=0x55baf77fb598, input=0x55baf75acd20, type=JSON_RCURLY, x=113, y=19) at qobject/json-streamer.c:105
 open-power-host-os#9  0x000055baf6abf7aa in json_lexer_feed_char (lexer=0x55baf77fb598, ch=125 '}', flush=false) at qobject/json-lexer.c:319
 open-power-host-os#10 0x000055baf6abf8f2 in json_lexer_feed (lexer=0x55baf77fb598, buffer=0x7ffc51369170 "}R\204\367\272U", size=1) at qobject/json-lexer.c:369
 open-power-host-os#11 0x000055baf6a9d13c in json_message_parser_feed (parser=0x55baf77fb590, buffer=0x7ffc51369170 "}R\204\367\272U", size=1) at qobject/json-streamer.c:124
 open-power-host-os#12 0x000055baf66221f7 in monitor_qmp_read (opaque=0x55baf77fb530, buf=0x7ffc51369170 "}R\204\367\272U", size=1) at /path/to/qemu.git/monitor.c:3994
 open-power-host-os#13 0x000055baf6757014 in qemu_chr_be_write_impl (s=0x55baf7610a40, buf=0x7ffc51369170 "}R\204\367\272U", len=1) at qemu-char.c:387
 open-power-host-os#14 0x000055baf6757076 in qemu_chr_be_write (s=0x55baf7610a40, buf=0x7ffc51369170 "}R\204\367\272U", len=1) at qemu-char.c:399
 open-power-host-os#15 0x000055baf675b3b0 in tcp_chr_read (chan=0x55baf90244b0, cond=G_IO_IN, opaque=0x55baf7610a40) at qemu-char.c:2927
 open-power-host-os#16 0x000055baf6a5d655 in qio_channel_fd_source_dispatch (source=0x55baf7610df0, callback=0x55baf675b25a <tcp_chr_read>, user_data=0x55baf7610a40) at io/channel-watch.c:84
 open-power-host-os#17 0x00007f1d3e80cbbd in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
 open-power-host-os#18 0x000055baf69d3720 in glib_pollfds_poll () at main-loop.c:213
 open-power-host-os#19 0x000055baf69d37fd in os_host_main_loop_wait (timeout=126000000) at main-loop.c:258
 open-power-host-os#20 0x000055baf69d38ad in main_loop_wait (nonblocking=0) at main-loop.c:506
 open-power-host-os#21 0x000055baf676587b in main_loop () at vl.c:1908
 open-power-host-os#22 0x000055baf676d3bf in main (argc=101, argv=0x7ffc5136a6c8, envp=0x7ffc5136a9f8) at vl.c:4604
 (gdb) p opts
 $1 = (QemuOpts *) 0x0

The crash occurred when attaching vhost-user net via QMP:

{
    "execute": "chardev-add",
    "arguments": {
        "id": "charnet2",
        "backend": {
            "type": "socket",
            "data": {
                "addr": {
                    "type": "unix",
                    "data": {
                        "path": "/var/run/openvswitch/vhost-user1"
                    }
                },
                "wait": false,
                "server": false
            }
        }
    },
    "id": "libvirt-19"
}
{
    "return": {

    },
    "id": "libvirt-19"
}
{
    "execute": "netdev_add",
    "arguments": {
        "type": "vhost-user",
        "chardev": "charnet2",
        "id": "hostnet2"
    },
    "id": "libvirt-20"
}

Code using chardevs should not be poking at the internals of the
CharDriverState struct. What vhost-user wants is a chardev that is
operating as reconnectable network service, along with the ability
to do FD passing over the connection. The colo code simply wants
a network service. Add a feature concept to the char drivers so
that chardev users can query the actual features they wish to have
supported. The QemuOpts member is removed to prevent future mistakes
in this area.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 0a73336)

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1394140
Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12298577

[ Marc-André - backport: drop colo-compare bits ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
aik pushed a commit that referenced this issue Sep 18, 2017
"nc" is freed after hotplug vhost-user, but the watcher is not removed.
The QEMU crash when the watcher access the "nc" when socket disconnects.

    Program received signal SIGSEGV, Segmentation fault.
    #0  object_get_class (obj=obj@entry=0x2) at qom/object.c:750
    #1  0x00007f9bb4180da1 in qemu_chr_fe_disconnect (be=<optimized out>) at chardev/char-fe.c:372
    #2  0x00007f9bb40d1100 in net_vhost_user_watch (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at net/vhost-user.c:188
    #3  0x00007f9baf97f99a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    #4  0x00007f9bb41d7ebc in glib_pollfds_poll () at util/main-loop.c:213
    #5  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
    #6  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
    #7  0x00007f9bb3e266a7 in main_loop () at vl.c:1917
    #8  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4786

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
aik pushed a commit that referenced this issue Nov 22, 2017
currently for sh4 cpu_model argument for '-cpu' option
could be either 'cpu model' name or cpu_typename.

however typically '-cpu' takes 'cpu model' name and
cpu type for sh4 target isn't advertised publicly
('-cpu help' prints only 'cpu model' names) so we
shouldn't care about this use case (it's more of a bug).

1. Drop '-cpu cpu_typename' to align with the rest of
   targets.
2. Compose searched for typename from cpu model and use
   it with object_class_by_name() directly instead of
   over-complicated
       object_class_get_list()
       g_slist_find_custom() + superh_cpu_name_compare()

With #1 droped, #2 could be used for both lookups which
simplifies superh_cpu_class_by_name() quite a bit.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1507211474-188400-23-git-send-email-imammedo@redhat.com>
[ehabkost: Include fixup sent by Igor]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
aik pushed a commit that referenced this issue Dec 15, 2017
Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.

The result is a SIGSEGV in kvm_vm_ioctl():

  #0  kvm_vm_ioctl (s=0x0, type=-2146390353)
      at qemu/accel/kvm/kvm-all.c:2032
  #1  0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>,
      radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>)
      at qemu/target/ppc/kvm.c:396
  #2  0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578
  #3  0x000000010059e4cc in vmstate_load_state (f=0x106230000,
      vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/migration/vmstate.c:165
  #4  0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>)
      at qemu/migration/savevm.c:748

This patch fixes the problem by not calling the KVM function with the
TCG mode.

Fixes: d39c90f ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
aik pushed a commit that referenced this issue Dec 15, 2017
when qemu is started with '-no-acpi' CLI option, an attempt
to unplug a CPU using device_del results in null pointer
dereference at:

  #0 object_get_class
  #1 pc_machine_device_unplug_request_cb
  #2 qmp_marshal_device_del

which is caused by pcms->acpi_dev == NULL due to ACPI support
being disabled.

Considering that ACPI support is necessary for unplug to work,
check that it's enabled and fail unplug request gracefully
if no acpi device were found.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
The passed-through device might be an express device. In this case the
old code allocated a too small emulated config space in
pci_config_alloc() since pci_config_size() returned the size for a
non-express device. This leads to an out-of-bound write in
xen_pt_config_reg_init(), which sometimes results in crashes. So set
is_express as already done for KVM in vfio-pci.

Shortened ASan report:

==17512==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000041648 at pc 0x55e0fdac51ff bp 0x7ffe4af07410 sp 0x7ffe4af07408
WRITE of size 2 at 0x611000041648 thread T0
    #0 0x55e0fdac51fe in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #1 0x55e0fdac51fe in stw_he_p include/qemu/bswap.h:330
    #2 0x55e0fdac51fe in stw_le_p include/qemu/bswap.h:379
    #3 0x55e0fdac51fe in pci_set_word include/hw/pci/pci.h:490
    #4 0x55e0fdac51fe in xen_pt_config_reg_init hw/xen/xen_pt_config_init.c:1991
    #5 0x55e0fdac51fe in xen_pt_config_init hw/xen/xen_pt_config_init.c:2067
    #6 0x55e0fdabcf4d in xen_pt_realize hw/xen/xen_pt.c:830
    #7 0x55e0fdf59666 in pci_qdev_realize hw/pci/pci.c:2034
    #8 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

0x611000041648 is located 8 bytes to the right of 256-byte region [0x611000041540,0x611000041640)
allocated by thread T0 here:
    #0 0x7ff596a94bb8 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9bb8)
    #1 0x7ff57da66580 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x50580)
    #2 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

Signed-off-by: Simon Gaiser <hw42@ipsumj.de>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
aik pushed a commit that referenced this issue Jan 31, 2018
bdrv_unref() requires the AioContext lock because bdrv_flush() uses
BDRV_POLL_WHILE(), which assumes the AioContext is currently held.  If
BDRV_POLL_WHILE() runs without AioContext held the
pthread_mutex_unlock() call in aio_context_release() fails.

This patch moves bdrv_unref() into the AioContext locked region to solve
the following pthread_mutex_unlock() failure:

  #0  0x00007f566181969b in raise () at /lib64/libc.so.6
  #1  0x00007f566181b3b1 in abort () at /lib64/libc.so.6
  #2  0x00005592cd590458 in error_exit (err=<optimized out>, msg=msg@entry=0x5592cdaf6d60 <__func__.23977> "qemu_mutex_unlock") at util/qemu-thread-posix.c:36
  #3  0x00005592cd96e738 in qemu_mutex_unlock (mutex=mutex@entry=0x5592ce9505e0) at util/qemu-thread-posix.c:96
  #4  0x00005592cd969b69 in aio_context_release (ctx=ctx@entry=0x5592ce950580) at util/async.c:507
  #5  0x00005592cd8ead78 in bdrv_flush (bs=bs@entry=0x5592cfa87210) at block/io.c:2478
  #6  0x00005592cd89df30 in bdrv_close (bs=0x5592cfa87210) at block.c:3207
  #7  0x00005592cd89df30 in bdrv_delete (bs=0x5592cfa87210) at block.c:3395
  #8  0x00005592cd89df30 in bdrv_unref (bs=0x5592cfa87210) at block.c:4418
  #9  0x00005592cd6b7f86 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=<optimized out>, errp=errp@entry=0x7ffe4a1fc9d8) at blockdev.c:2308

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171206144550.22295-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
There is a small chance that iothread_stop() hangs as follows:

  Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)):
  #0  0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6
  #1  0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
  #2  0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
  #3  0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629
  #4  0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59
  #5  0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0
  #6  0x00007f64012cce6f in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x7f640b45b280 (LWP 16103)):
  #0  0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0
  #1  0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547
  #2  0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91
  #3  0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102
  #4  0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852
  #5  0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867
  #6  0x0000559599680a6e in iothread_stop_all () at iothread.c:341
  #7  0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913

The relevant code from iothread_run() is:

  while (!atomic_read(&iothread->stopping)) {
      aio_poll(iothread->ctx, true);

and iothread_stop():

  iothread->stopping = true;
  aio_notify(iothread->ctx);
  ...
  qemu_thread_join(&iothread->thread);

The following scenario can occur:

1. IOThread:
  while (!atomic_read(&iothread->stopping)) -> stopping=false

2. Main loop:
  iothread->stopping = true;
  aio_notify(iothread->ctx);

3. IOThread:
  aio_poll(iothread->ctx, true); -> hang

The bug is explained by the AioContext->notify_me doc comments:

  "If this field is 0, everything (file descriptors, bottom halves,
  timers) will be re-evaluated before the next blocking poll(), thus the
  event_notifier_set call can be skipped."

The problem is that "everything" does not include checking
iothread->stopping.  This means iothread_run() will block in aio_poll()
if aio_notify() was called just before aio_poll().

This patch fixes the hang by replacing aio_notify() with
aio_bh_schedule_oneshot().  This makes aio_poll() or g_main_loop_run()
to return.

Implementing this properly required a new bool running flag.  The new
flag prevents races that are tricky if we try to use iothread->stopping.
Now iothread->stopping is purely for iothread_stop() and
iothread->running is purely for the iothread_run() thread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability.

The backtrace is:
   #0  0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
   #1  0x00007f46c602a8c8 in __GI_abort () at abort.c:90
   #2  0x00000000008543c9 in PAT_abort ()
   #3  0x000000000085140d in patchIllInsHandler ()
   #4  <signal handler called>
   #5  pthread_detach (th=139933037614848) at pthread_detach.c:50
   #6  0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>,
       arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512
   #7  0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>)
       at io/task.c:141
   #8  0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0,
       destroy=destroy@entry=0x0) at io/channel_socket.c:194
   #9  0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744
   #10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
   #11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
   #12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228
   #13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273
   #14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521
   #15 0x000000000056a511 in main_loop () at vl.c:2076
   #16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940

The cause of this problem is a glibc bug; for more information, see
https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
The solution for this bug is to use pthread_attr_setdetachstate.

There is a similar issue with pthread_setname_np, which is moved
from creating thread to created thread.

Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-Id: <20171128044656.10592-1-linzhecheng@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
[Simplify the code by removing qemu_thread_set_name, and free the arguments
 before invoking the start routine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:

    #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
    #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
    #2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
    #3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
    #4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
    #5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395

>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:

    >>> p desc[5]
    $8 = {
      name = 0x1037f098 "R27A",
      type = 1561964883,
      help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
      def_value_str = 0x2 <error: Cannot access memory at address 0x2>
    }
    >>> p desc[6]
    $9 = {
      name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
      type = 272101528,
      help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
      def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
    }

This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44f
Signed-off-by: Eric Blake <eblake@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
/public/qobject_is_equal_conversion: OK

=================================================================
==14396==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f07682c5850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f0767d12f0c in g_malloc ../glib/gmem.c:94
    #2 0x7f0767d131cf in g_malloc_n ../glib/gmem.c:331
    #3 0x562bd767371f in do_test_equality /home/elmarco/src/qq/tests/check-qobject.c:49
    #4 0x562bd7674a35 in qobject_is_equal_dict_test /home/elmarco/src/qq/tests/check-qobject.c:267
    #5 0x7f0767d37b04 in test_case_run ../glib/gtestutils.c:2237
    #6 0x7f0767d37ec4 in g_test_run_suite_internal ../glib/gtestutils.c:2321
    #7 0x7f0767d37f6d in g_test_run_suite_internal ../glib/gtestutils.c:2333
    #8 0x7f0767d38184 in g_test_run_suite ../glib/gtestutils.c:2408
    #9 0x7f0767d36e0d in g_test_run ../glib/gtestutils.c:1674
    #10 0x562bd7674e75 in main /home/elmarco/src/qq/tests/check-qobject.c:327
    #11 0x7f0766009039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180104160523.22995-9-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
Note that data_dir[] will now point to allocated strings.

Fixes:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f1448181850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f1446ed8f0c in g_malloc ../glib/gmem.c:94
    #2 0x7f1446ed91cf in g_malloc_n ../glib/gmem.c:331
    #3 0x7f1446ef739a in g_strsplit ../glib/gstrfuncs.c:2364
    #4 0x55cf276439d7 in main /home/elmarco/src/qq/vl.c:4311
    #5 0x7f143dfad039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180104160523.22995-10-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
Fixes leaks such as:

Direct leak of 2 byte(s) in 1 object(s) allocated from:
    #0 0x7eff58beb850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7eff57942f0c in g_malloc ../glib/gmem.c:94
    #2 0x7eff579431cf in g_malloc_n ../glib/gmem.c:331
    #3 0x7eff5795f6eb in g_strdup ../glib/gstrfuncs.c:363
    #4 0x55db720f1d46 in readline_hist_add /home/elmarco/src/qq/util/readline.c:258
    #5 0x55db720f2d34 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:387
    #6 0x55db71539d00 in monitor_read /home/elmarco/src/qq/monitor.c:3896
    #7 0x55db71f9be35 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167
    #8 0x55db71f9bed3 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179
    #9 0x55db71fa013c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    #10 0x55db71fe18a8 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    #11 0x7eff5793a90b in g_main_dispatch ../glib/gmain.c:3182
    #12 0x7eff5793b7ac in g_main_context_dispatch ../glib/gmain.c:3847
    #13 0x55db720af3bd in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214
    #14 0x55db720af505 in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261
    #15 0x55db720af6d6 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515
    #16 0x55db7184e0de in main_loop /home/elmarco/src/qq/vl.c:1995
    #17 0x55db7185e956 in main /home/elmarco/src/qq/vl.c:4914
    #18 0x7eff4ea17039 in __libc_start_main (/lib64/libc.so.6+0x21039)

(while at it, use g_new0(ReadLineState), it's a bit easier to read)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180104160523.22995-11-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
ASAN complains about:

==8856==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd8a1fe168 at pc 0x561136cb4451 bp 0x7ffd8a1fe130 sp 0x7ffd8a1fd8e0
READ of size 16 at 0x7ffd8a1fe168 thread T0
    #0 0x561136cb4450 in __asan_memcpy (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x110450)
    #1 0x561136d2a6a7 in qcrypto_ivgen_essiv_calculate /home/elmarco/src/qq/crypto/ivgen-essiv.c:83:5
    #2 0x561136d29af8 in qcrypto_ivgen_calculate /home/elmarco/src/qq/crypto/ivgen.c:72:12
    #3 0x561136d07c8e in test_ivgen /home/elmarco/src/qq/tests/test-crypto-ivgen.c:148:5
    #4 0x7f77772c3b04 in test_case_run /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2237
    #5 0x7f77772c3ec4 in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2321
    #6 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333
    #7 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333
    #8 0x7f77772c3f6d in g_test_run_suite_internal /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2333
    #9 0x7f77772c4184 in g_test_run_suite /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:2408
    #10 0x7f77772c2e0d in g_test_run /home/elmarco/src/gnome/glib/builddir/../glib/gtestutils.c:1674
    #11 0x561136d0799b in main /home/elmarco/src/qq/tests/test-crypto-ivgen.c:173:12
    #12 0x7f77756e6039 in __libc_start_main (/lib64/libc.so.6+0x21039)
    #13 0x561136c13d89 in _start (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x6fd89)

Address 0x7ffd8a1fe168 is located in stack of thread T0 at offset 40 in frame
    #0 0x561136d2a40f in qcrypto_ivgen_essiv_calculate /home/elmarco/src/qq/crypto/ivgen-essiv.c:76

  This frame has 1 object(s):
    [32, 40) 'sector.addr' <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/elmarco/src/qq/build/tests/test-crypto-ivgen+0x110450) in __asan_memcpy
Shadow bytes around the buggy address:
  0x100031437bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100031437c20: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00[f3]f3 f3
  0x100031437c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100031437c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

It looks like the rest of the code copes with ndata being larger than
sizeof(sector), so limit the memcpy() range.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20180104160523.22995-13-marcandre.lureau@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
Direct leak of 160 byte(s) in 4 object(s) allocated from:
    #0 0x55ed7678cda8 in calloc (/home/elmarco/src/qq/build/x86_64-softmmu/qemu-system-x86_64+0x797da8)
    #1 0x7f3f5e725f75 in g_malloc0 /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:124
    #2 0x55ed778aa3a7 in query_option_descs /home/elmarco/src/qq/util/qemu-config.c:60:16
    #3 0x55ed778aa307 in get_drive_infolist /home/elmarco/src/qq/util/qemu-config.c:140:19
    #4 0x55ed778a9f40 in qmp_query_command_line_options /home/elmarco/src/qq/util/qemu-config.c:254:36
    #5 0x55ed76d4868c in qmp_marshal_query_command_line_options /home/elmarco/src/qq/build/qmp-marshal.c:3078:14
    #6 0x55ed77855dd5 in do_qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:104:5
    #7 0x55ed778558cc in qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:131:11
    #8 0x55ed768b592f in handle_qmp_command /home/elmarco/src/qq/monitor.c:3840:11
    #9 0x55ed7786ccfe in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105:5
    #10 0x55ed778fe37c in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323:13
    #11 0x55ed778fdde6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373:15
    #12 0x55ed7786cd83 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124:12
    #13 0x55ed768b559e in monitor_qmp_read /home/elmarco/src/qq/monitor.c:3882:5
    #14 0x55ed77714f29 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167:9
    #15 0x55ed77714fde in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179:9
    #16 0x55ed7772ffad in tcp_chr_read /home/elmarco/src/qq/chardev/char-socket.c:440:13
    #17 0x55ed7777113b in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84:12
    #18 0x7f3f5e71d90b in g_main_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3182
    #19 0x7f3f5e71e7ac in g_main_context_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3847
    #20 0x55ed77886ffc in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214:9
    #21 0x55ed778865fd in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261:5
    #22 0x55ed77886222 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515:11
    #23 0x55ed76d2a4df in main_loop /home/elmarco/src/qq/vl.c:1995:9
    #24 0x55ed76d1cb4a in main /home/elmarco/src/qq/vl.c:4914:5
    #25 0x7f3f555f6039 in __libc_start_main (/lib64/libc.so.6+0x21039)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180104160523.22995-14-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
The coroutine is not finished by the time the test ends, resulting in
ASAN warning:

==7005==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 312 byte(s) in 1 object(s) allocated from:
    #0 0x7fd35290fa38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7fd3506c5f75 in g_malloc0 ../glib/gmem.c:124
    #2 0x55994af03e47 in qemu_coroutine_new /home/elmarco/src/qemu/util/coroutine-ucontext.c:144
    #3 0x55994aefed99 in qemu_coroutine_create /home/elmarco/src/qemu/util/qemu-coroutine.c:76
    #4 0x55994ac1eb50 in verify_entered_step_1 /home/elmarco/src/qemu/tests/test-coroutine.c:80
    #5 0x55994af03c75 in coroutine_trampoline /home/elmarco/src/qemu/util/coroutine-ucontext.c:119
    #6 0x7fd34ec02bef  (/lib64/libc.so.6+0x50bef)

Do not yield() to let the coroutine terminate.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180104160523.22995-17-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
Spotted thanks to ASAN:

==25226==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556715a1f120 at pc 0x556714b6f6b1 bp 0x7ffcdfac1360 sp 0x7ffcdfac1350
READ of size 1 at 0x556715a1f120 thread T0
    #0 0x556714b6f6b0 in init_disasm /home/elmarco/src/qemu/disas/s390.c:219
    #1 0x556714b6fa6a in print_insn_s390 /home/elmarco/src/qemu/disas/s390.c:294
    #2 0x55671484d031 in monitor_disas /home/elmarco/src/qemu/disas.c:635
    #3 0x556714862ec0 in memory_dump /home/elmarco/src/qemu/monitor.c:1324
    #4 0x55671486342a in hmp_memory_dump /home/elmarco/src/qemu/monitor.c:1418
    #5 0x5567148670be in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3109
    #6 0x5567148674ed in qmp_human_monitor_command /home/elmarco/src/qemu/monitor.c:613
    #7 0x556714b00918 in qmp_marshal_human_monitor_command /home/elmarco/src/qemu/build/qmp-marshal.c:1704
    #8 0x556715138a3e in do_qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:104
    #9 0x556715138f83 in qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:131
    #10 0x55671485cf88 in handle_qmp_command /home/elmarco/src/qemu/monitor.c:3839
    #11 0x55671514e80b in json_message_process_token /home/elmarco/src/qemu/qobject/json-streamer.c:105
    #12 0x5567151bf2dc in json_lexer_feed_char /home/elmarco/src/qemu/qobject/json-lexer.c:323
    #13 0x5567151bf827 in json_lexer_feed /home/elmarco/src/qemu/qobject/json-lexer.c:373
    #14 0x55671514ee62 in json_message_parser_feed /home/elmarco/src/qemu/qobject/json-streamer.c:124
    #15 0x556714854b1f in monitor_qmp_read /home/elmarco/src/qemu/monitor.c:3881
    #16 0x556715045440 in qemu_chr_be_write_impl /home/elmarco/src/qemu/chardev/char.c:172
    #17 0x556715047184 in qemu_chr_be_write /home/elmarco/src/qemu/chardev/char.c:184
    #18 0x55671505a8e6 in tcp_chr_read /home/elmarco/src/qemu/chardev/char-socket.c:440
    #19 0x5567150943c3 in qio_channel_fd_source_dispatch /home/elmarco/src/qemu/io/channel-watch.c:84
    #20 0x7fb90292b90b in g_main_dispatch ../glib/gmain.c:3182
    #21 0x7fb90292c7ac in g_main_context_dispatch ../glib/gmain.c:3847
    #22 0x556715162eca in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #23 0x556715163001 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #24 0x5567151631fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #25 0x556714ad6d3b in main_loop /home/elmarco/src/qemu/vl.c:1950
    #26 0x556714ade329 in main /home/elmarco/src/qemu/vl.c:4865
    #27 0x7fb8fe5c9009 in __libc_start_main (/lib64/libc.so.6+0x21009)
    #28 0x5567147af4d9 in _start (/home/elmarco/src/qemu/build/s390x-softmmu/qemu-system-s390x+0xf674d9)

0x556715a1f120 is located 32 bytes to the left of global variable 'char_hci_type_info' defined in '/home/elmarco/src/qemu/hw/bt/hci-csr.c:493:23' (0x556715a1f140) of size 104
0x556715a1f120 is located 8 bytes to the right of global variable 's390_opcodes' defined in '/home/elmarco/src/qemu/disas/s390.c:860:33' (0x556715a15280) of size 40600

This fix is based on Andreas Arnez <arnez@linux.vnet.ibm.com> upstream
commit:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=9ace48f3d7d80ce09c5df60cccb433470410b11b

2014-08-19  Andreas Arnez  <arnez@linux.vnet.ibm.com>

       * s390-dis.c (init_disasm): Simplify initialization of
       opc_index[].  This also fixes an access after the last element
       of s390_opcodes[].

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180104160523.22995-19-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Jan 31, 2018
… into staging

x86 queue, 2018-01-17

Highlight: new CPU models that expose CPU features that guests
can use to mitigate CVE-2017-5715 (Spectre variant #2).

# gpg: Signature made Thu 18 Jan 2018 02:00:03 GMT
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  i386: Add EPYC-IBPB CPU model
  i386: Add new -IBRS versions of Intel CPU models
  i386: Add FEAT_8000_0008_EBX CPUID feature word
  i386: Add spec-ctrl CPUID bit
  i386: Add support for SPEC_CTRL MSR
  i386: Change X86CPUDefinition::model_id to const char*
  target/i386: add clflushopt to "Skylake-Server" cpu model
  pc: add 2.12 machine types

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
aik pushed a commit that referenced this issue Jan 31, 2018
…e object

missed in 60765b6.

  Thread 1 "qemu-system-aarch64" received signal SIGSEGV, Segmentation fault.
  address_space_init (as=0x0, root=0x55555726e410, name=name@entry=0x555555e3f0a7 "sdhci-dma") at memory.c:3050
  3050	    as->root = root;
  (gdb) bt
  #0  address_space_init (as=0x0, root=0x55555726e410, name=name@entry=0x555555e3f0a7 "sdhci-dma") at memory.c:3050
  #1  0x0000555555af62c3 in sdhci_sysbus_realize (dev=<optimized out>, errp=0x7fff7f931150) at hw/sd/sdhci.c:1564
  #2  0x00005555558b25e5 in zynqmp_sdhci_realize (dev=0x555557051520, errp=0x7fff7f931150) at hw/sd/zynqmp-sdhci.c:151
  #3  0x0000555555a2e7f3 in device_set_realized (obj=0x555557051520, value=<optimized out>, errp=0x7fff7f931270) at hw/core/qdev.c:966
  #4  0x0000555555ba3f74 in property_set_bool (obj=0x555557051520, v=<optimized out>, name=<optimized out>, opaque=0x555556e04a20,
      errp=0x7fff7f931270) at qom/object.c:1906
  #5  0x0000555555ba51f4 in object_property_set (obj=obj@entry=0x555557051520, v=v@entry=0x5555576dbd60,
      name=name@entry=0x555555dd6306 "realized", errp=errp@entry=0x7fff7f931270) at qom/object.c:1102

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180123132051.24448-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
aik pushed a commit that referenced this issue Mar 29, 2018
Fixes the following ASAN report:

Direct leak of 128 byte(s) in 8 object(s) allocated from:
    #0 0x7fefce311850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7fefcdd5ef0c in g_malloc ../glib/gmem.c:94
    #2 0x559b976faff0 in create_ahci_io_test /home/elmarco/src/qemu/tests/ahci-test.c:1810

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180215212552.26997-6-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Mar 29, 2018
Fix the following ASAN reports:

==20125==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0faea03a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7f0fae450f75 in g_malloc0 ../glib/gmem.c:124
    #2 0x562fffd526fc in machine_start /home/elmarco/src/qemu/tests/sdhci-test.c:180

Indirect leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x7f0faea03850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f0fae450f0c in g_malloc ../glib/gmem.c:94
    #2 0x562fffd5d21d in qpci_init_pc /home/elmarco/src/qemu/tests/libqos/pci-pc.c:122

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180215212552.26997-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
aik pushed a commit that referenced this issue Mar 29, 2018
Spotted by ASAN:
QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 tests/boot-serial-test

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7ff8a9b0ca38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7ff8a8ea7f75 in g_malloc0 ../glib/gmem.c:124
    #2 0x55fef3d99129 in error_setv /home/elmarco/src/qemu/util/error.c:59
    #3 0x55fef3d99738 in error_setg_internal /home/elmarco/src/qemu/util/error.c:95
    #4 0x55fef323acb2 in load_elf_hdr /home/elmarco/src/qemu/hw/core/loader.c:393
    #5 0x55fef2d15776 in arm_load_elf /home/elmarco/src/qemu/hw/arm/boot.c:830
    #6 0x55fef2d16d39 in arm_load_kernel_notify /home/elmarco/src/qemu/hw/arm/boot.c:1022
    #7 0x55fef3dc634d in notifier_list_notify /home/elmarco/src/qemu/util/notify.c:40
    #8 0x55fef2fc3182 in qemu_run_machine_init_done_notifiers /home/elmarco/src/qemu/vl.c:2716
    #9 0x55fef2fcbbd1 in main /home/elmarco/src/qemu/vl.c:4679
    #10 0x7ff89dfed009 in __libc_start_main (/lib64/libc.so.6+0x21009)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
aik pushed a commit that referenced this issue Mar 29, 2018
Spotted by ASAN:

elmarco@boraha:~/src/qemu/build (master *%)$ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 tests/boot-serial-test
/aarch64/boot-serial/virt: ** (process:19740): DEBUG: 18:39:30.275: foo /tmp/qtest-boot-serial-cXaS94D
=================================================================
==19740==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000069648 at pc 0x7f1d2201cc54 bp 0x7fff331f6a40 sp 0x7fff331f61e8
READ of size 4 at 0x603000069648 thread T0
    #0 0x7f1d2201cc53  (/lib64/libasan.so.4+0xafc53)
    #1 0x55bc86685ee3 in load_aarch64_image /home/elmarco/src/qemu/hw/arm/boot.c:894
    #2 0x55bc86687217 in arm_load_kernel_notify /home/elmarco/src/qemu/hw/arm/boot.c:1047
    #3 0x55bc877363b5 in notifier_list_notify /home/elmarco/src/qemu/util/notify.c:40
    #4 0x55bc869331ea in qemu_run_machine_init_done_notifiers /home/elmarco/src/qemu/vl.c:2716
    #5 0x55bc8693bc39 in main /home/elmarco/src/qemu/vl.c:4679
    #6 0x7f1d1652c009 in __libc_start_main (/lib64/libc.so.6+0x21009)
    #7 0x55bc86255cc9 in _start (/home/elmarco/src/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x1ae5cc9)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
aik pushed a commit that referenced this issue Mar 29, 2018
Spotted thanks to ASAN:
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/migration-test -p /x86_64/migration/bad_dest

==30302==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f60efba1a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7f60eef3cf75 in g_malloc0 ../glib/gmem.c:124
    #2 0x55ca9094702c in error_copy /home/elmarco/src/qemu/util/error.c:203
    #3 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139
    #4 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150
    #5 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411
    #6 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81
    #7 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85
    #8 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142
    #9 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88
    #10 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552
    #11 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182
    #12 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847
    #13 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #14 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #15 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #16 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942
    #17 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724
    #18 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009)

Indirect leak of 45 byte(s) in 1 object(s) allocated from:
    #0 0x7f60efba1850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x7f60eef3cf0c in g_malloc ../glib/gmem.c:94
    #2 0x7f60eef3d1cf in g_malloc_n ../glib/gmem.c:331
    #3 0x7f60eef596eb in g_strdup ../glib/gstrfuncs.c:363
    #4 0x55ca90947085 in error_copy /home/elmarco/src/qemu/util/error.c:204
    #5 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139
    #6 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150
    #7 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411
    #8 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81
    #9 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85
    #10 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142
    #11 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88
    #12 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552
    #13 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182
    #14 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847
    #15 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214
    #16 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261
    #17 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515
    #18 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942
    #19 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724
    #20 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180306170959.3921-1-marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
aik pushed a commit that referenced this issue Mar 29, 2018
Found thanks to ASAN:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7efe20417a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7efe1f7b2f75 in g_malloc0 ../glib/gmem.c:124
    #2 0x7efe1f7b3249 in g_malloc0_n ../glib/gmem.c:355
    #3 0x558272879162 in sev_get_info /home/elmarco/src/qemu/target/i386/sev.c:414
    #4 0x55827285113b in hmp_info_sev /home/elmarco/src/qemu/target/i386/monitor.c:684
    #5 0x5582724043b8 in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3333

Fixes: 6303631
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180319175823.22111-1-marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
aik pushed a commit that referenced this issue May 15, 2018
Fix leak spotted by ASAN:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7fe1abb80a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7fe1aaf1bf75 in g_malloc0 ../glib/gmem.c:124
    #2 0x7fe1aaf1c249 in g_malloc0_n ../glib/gmem.c:355
    #3 0x55f4841cfaa9 in postcopy_ram_fault_thread /home/elmarco/src/qemu/migration/postcopy-ram.c:596
    #4 0x55f48479447b in qemu_thread_start /home/elmarco/src/qemu/util/qemu-thread-posix.c:504
    #5 0x7fe1a043550a in start_thread (/lib64/libpthread.so.0+0x750a)

Regression introduced with commit 00fa4fc.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180321113644.21899-1-marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
aik pushed a commit that referenced this issue May 15, 2018
This fixes leaks found by ASAN such as:
  GTESTER tests/test-blockjob
=================================================================
==31442==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f88483cba38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
    #1 0x7f8845e1bd77 in g_malloc0 ../glib/gmem.c:129
    #2 0x7f8845e1c04b in g_malloc0_n ../glib/gmem.c:360
    #3 0x5584d2732498 in block_job_txn_new /home/elmarco/src/qemu/blockjob.c:172
    #4 0x5584d2739b28 in block_job_create /home/elmarco/src/qemu/blockjob.c:973
    #5 0x5584d270ae31 in mk_job /home/elmarco/src/qemu/tests/test-blockjob.c:34
    #6 0x5584d270b1c1 in do_test_id /home/elmarco/src/qemu/tests/test-blockjob.c:57
    #7 0x5584d270b65c in test_job_ids /home/elmarco/src/qemu/tests/test-blockjob.c:118
    #8 0x7f8845e40b69 in test_case_run ../glib/gtestutils.c:2255
    #9 0x7f8845e40f29 in g_test_run_suite_internal ../glib/gtestutils.c:2339
    #10 0x7f8845e40fd2 in g_test_run_suite_internal ../glib/gtestutils.c:2351
    #11 0x7f8845e411e9 in g_test_run_suite ../glib/gtestutils.c:2426
    #12 0x7f8845e3fe72 in g_test_run ../glib/gtestutils.c:1692
    #13 0x5584d270d6e2 in main /home/elmarco/src/qemu/tests/test-blockjob.c:377
    #14 0x7f8843641f29 in __libc_start_main (/lib64/libc.so.6+0x20f29)

Add an assert to make sure that the job doesn't have associated txn before free().

[Jeff Cody: N.B., used updated patch provided by John Snow]

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
aik pushed a commit that referenced this issue May 15, 2018
Free the AIO context earlier than the GMainContext (if we have) to
workaround a glib2 bug that GSource context pointer is not cleared even
if the context has already been destroyed (while it should).

The patch itself only changed the order to destroy the objects, no
functional change at all. Without this workaround, we can encounter
qmp-test hang with oob (and possibly any other use case when iothread is
used with GMainContexts):

  #0  0x00007f35ffe45334 in __lll_lock_wait () from /lib64/libpthread.so.0
  #1  0x00007f35ffe405d8 in _L_lock_854 () from /lib64/libpthread.so.0
  #2  0x00007f35ffe404a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
  #3  0x00007f35fc5b9c9d in g_source_unref_internal (source=0x24f0600, context=0x7f35f0000960, have_lock=0) at gmain.c:1685
  #4  0x0000000000aa6672 in aio_context_unref (ctx=0x24f0600) at /root/qemu/util/async.c:497
  #5  0x000000000065851c in iothread_instance_finalize (obj=0x24f0380) at /root/qemu/iothread.c:129
  #6  0x0000000000962d79 in object_deinit (obj=0x24f0380, type=0x242e960) at /root/qemu/qom/object.c:462
  #7  0x0000000000962e0d in object_finalize (data=0x24f0380) at /root/qemu/qom/object.c:476
  #8  0x0000000000964146 in object_unref (obj=0x24f0380) at /root/qemu/qom/object.c:924
  #9  0x0000000000965880 in object_finalize_child_property (obj=0x24ec640, name=0x24efca0 "mon_iothread", opaque=0x24f0380) at /root/qemu/qom/object.c:1436
  #10 0x0000000000962c33 in object_property_del_child (obj=0x24ec640, child=0x24f0380, errp=0x0) at /root/qemu/qom/object.c:436
  #11 0x0000000000962d26 in object_unparent (obj=0x24f0380) at /root/qemu/qom/object.c:455
  #12 0x0000000000658f00 in iothread_destroy (iothread=0x24f0380) at /root/qemu/iothread.c:365
  #13 0x00000000004c67a8 in monitor_cleanup () at /root/qemu/monitor.c:4663
  #14 0x0000000000669e27 in main (argc=16, argv=0x7ffc8b1ae2f8, envp=0x7ffc8b1ae380) at /root/qemu/vl.c:4749

The glib2 bug is fixed in commit 26056558b ("gmain: allow
g_source_get_context() on destroyed sources", 2012-07-30), so the first
good version is glib2 2.33.10. But we still support building with
glib as old as 2.28, so we need the workaround.

Let's make sure we destroy the GSources first before its owner context
until we drop support for glib older than 2.33.10.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180409083956.1780-1-peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
aik pushed a commit that referenced this issue May 15, 2018
Eric Auger reported the problem days ago that OOB broke ARM when running
with libvirt:

http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html

The problem was that the monitor dispatcher bottom half was bound to
qemu_aio_context now, which could be polled unexpectedly in block code.
We should keep the dispatchers run in iohandler_ctx just like what we
did before the Out-Of-Band series (chardev uses qio, and qio binds
everything with iohandler_ctx).

If without this change, QMP dispatcher might be run even before reaching
main loop in block IO path, for example, in a stack like (the ARM case,
"cont" command handler run even during machine init phase):

        #0  qmp_cont ()
        #1  0x00000000006bd210 in qmp_marshal_cont ()
        #2  0x0000000000ac05c4 in do_qmp_dispatch ()
        #3  0x0000000000ac07a0 in qmp_dispatch ()
        #4  0x0000000000472d60 in monitor_qmp_dispatch_one ()
        #5  0x000000000047302c in monitor_qmp_bh_dispatcher ()
        #6  0x0000000000acf374 in aio_bh_call ()
        #7  0x0000000000acf428 in aio_bh_poll ()
        #8  0x0000000000ad5110 in aio_poll ()
        #9  0x0000000000a08ab8 in blk_prw ()
        #10 0x0000000000a091c4 in blk_pread ()
        #11 0x0000000000734f94 in pflash_cfi01_realize ()
        #12 0x000000000075a3a4 in device_set_realized ()
        #13 0x00000000009a26cc in property_set_bool ()
        #14 0x00000000009a0a40 in object_property_set ()
        #15 0x00000000009a3a08 in object_property_set_qobject ()
        #16 0x00000000009a0c8c in object_property_set_bool ()
        #17 0x0000000000758f94 in qdev_init_nofail ()
        #18 0x000000000058e190 in create_one_flash ()
        #19 0x000000000058e2f4 in create_flash ()
        #20 0x00000000005902f0 in machvirt_init ()
        #21 0x00000000007635cc in machine_run_board_init ()
        #22 0x00000000006b135c in main ()

Actually the problem is more severe than that.  After we switched to the
qemu AIO handler it means the monitor dispatcher code can even be called
with nested aio_poll(), then it can be an explicit aio_poll() inside
another main loop aio_poll() which could be racy too; breaking code
like TPM and 9p that use nested event loops.

Switch to use the iohandler_ctx for monitor dispatchers.

My sincere thanks to Eric Auger who offered great help during both
debugging and verifying the problem.  The ARM test was carried out by
applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the
patch.

A quick test of mine shows that after this patch applied we can pass all
raw iotests even with OOB on by default.

CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180410044942.17059-1-peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants