Allow overriding / providing vmcoreinfo data #350

brenns10 · 2023-08-22T20:31:03Z

I had a bit of a hack that let me specify my own vmcoreinfo note, in case the one provided is incorrect or missing.

Incorrect vmcoreinfo is generated by Hyper-V when generating a kdump from a hypervisor snapshot. See: https://github.com/Azure/azure-linux-utils/issues/12
Qemu/libvirt devices can be missing vmcoreinfo if the hypervisor or guest didn't enable support for the -device vmcoreinfo virtual device.

In either case, the vmcoreinfo note itself is generally still present somewhere inside the guest memory, and it's not actually that tough to find. I have a tool that can search the pages of an ELF/kdump vmcore for the vmcoreinfo note, rather quickly based on the idea that it will only appear at a PAGE_SIZE boundary, and it will start with OSRELEASE=.

Supposing you've used such a tool, Drgn should have an advanced feature somewhere to allow you to specify your own vmcoreinfo note. Crash doesn't allow this exact feature -- but it lets you specify architecture/configuration details such as KASLR offsets which would normally be found in the note.

The hack I had allowed you to specify the vmcoreinfo note contents in an environment variable, which was a bad API. Spitballing a few APIs here (mostly thinking of Python-facing API):

Allow set_core_dump() (and friends) to accept an optional buffer which contains the vmcoreinfo. This would raise the question of whether set_kernel() would allow it, since that's functionally the same as set_core_dump("/proc/kcore").
Add an optional parameter to the Program constructor.
Allow the user to specify the file path for a file containing vmcoreinfo, via an environment variable.

Of those, I prefer the optional parameter to the Program constructor. It's nice because it mirrors the platform parameter, and in cases where vmcoreinfo is broken, typically the other fields which drgn relies on to determine the platform are also broken, so Drgn would fail to read the vmcore without specifying platform as well.

@osandov if you have a preferred API, let me know. Feel free to assign me on this (I can't update assignees, labels, etc). I'll probably submit something within the next month or two when I find time for it.

The text was updated successfully, but these errors were encountered:

Homura2333 · 2023-09-01T09:35:22Z

Hi, I encountered an issue while using makedumpfile -R new_vmcore <vmcore followed by running sudo dragon -c new_vmcore. I received the following error:

drgn 0.0.23+98.g5a362aa.dirty (using Python 3.8.13, elfutils 0.176, with libkdumpfile)
Traceback (most recent call last):
  File "/usr/local/bin/drgn", line 33, in <module>
    sys.exit(load_entry_point('drgn==0.0.23+98.g5a362aa.dirty', 'console_scripts', 'drgn')())
  File "/usr/local/lib/python3.8/site-packages/drgn-0.0.23+98.g5a362aa.dirty-py3.8-linux-x86_64.egg/drgn/cli.py", line 263, in _main
    prog.set_core_dump(args.core)
Exception: kdump_vmcoreinfo_raw: linux.vmcoreinfo.raw is not set

I would like to know if this error is related to the issue you mentioned?

brenns10 · 2023-09-01T14:30:13Z

Yes, this error is at least related to this issue! It comes from here in the code.

Essentially, the vmcoreinfo note contains metadata that Drgn needs in order to understand the vmcore, but your vmcore does not have a vmcoreinfo note -- or at least, it doesn't have it easily accessible. A resolution to this issue might allow you to open this vmcore in Drgn.

A few questions, if you don't mind helping us out:

How did you generate this vmcore? I'm guessing it was with Qemu or some other hypervisor?
What is the kernel version that was running in the vmcore? Was it any particular distribution kernel?
Would you mind trying out the program below to see if it can extract the necessary data?

Instructions for compiling & using the program (I assume you have libkdumpfile headers available since it looks like you built drgn from source)

git clone https://github.com/brenns10/kernel_stuff/
cd kernel_stuff/vmcoreinfo
gcc -g -o dumpphys{,.c} -lkdumpfile
./dumpphys -i -c PATH/TO/VMCORE -o vmcoreinfo.txt

Let me know the output and provide the vmcoreinfo.txt if you can.

Homura2333 · 2023-09-04T02:25:50Z

Thank you so much for your assistance! I truly appreciate your help.
Regarding the issue, the vmcore I have is generated from a lightweight virtual machine hypervisor. Both the hypervisor and the kernel are internally developed and closed-source, so unfortunately, I cannot provide more specific details in this regard.
I followed your instructions. Here are the results:

kdump_read: Wrong page size: 0

Please let me know if there is anything else I can provide or if there are any further steps you recommend.

brenns10 · 2023-09-04T03:29:27Z

kdump_read: Wrong page size: 0

That's an error I've never seen before. The error is coming from libkdumpfile. It indicates that there's a page with encoded size of 0. That's very odd, and libkdumpfile believes that it indicates a corrupted file. Corrupted could mean a few different things: (1) damaged data, which is unlikely but possible, or (2) the hypervisor generated the diskdump format incorrectly, or (3) the hypervisor generated correct data but libkdumpfile hit a bug. I'd put my money on (2), but (3) does happen rarely, and (1) is quite unlikely.

In any case, I pushed some changes to that tool in the kernel_stuff repository, please pull it, rebuild, and add the -p option to the ./dumpphys command. This will continue reading after the first kdump_read() fails, so that we can see if the whole file is broken, or just some of the data.

Do you have any tool which is actually capable of reading this file? It would help me to understand whether this is just an issue you observe with Drgn, or whether this entire vmcore is problematic.

iostapyshyn · 2024-05-24T16:26:14Z

@brenns10

Do you think it would be possible to attach to drgn to a raw memory dump of a virtual machine (e.g., QEMU's memory-backend-file) by providing a vmcoreinfo? I'm trying to halt and debug a QEMU guest periodically, where dumping via dump-guest-memory is a bit of an overkill (the resulting vmcore takes a lot of disk space and time and can be thrown away immediately).

I constructed a custom drgn program with add_memory_segment but seem to be missing virtual->physical address translation and other symbols specified by vmcoreinfo.

brenns10 · 2024-05-24T22:40:06Z

Hi @iostapyshyn

Providing vmcoreinfo to drgn should be enough to do this, at least in theory. In practice, though, there's not yet a way to make a custom Program have the IS_LINUX_KERNEL flag, and all the associated linux-specific behavior. (Note that the API doesn't have any place you can set flags.) The only way that flag gets applied is when loading a core dump (or /proc/kcore) if drgn can detect it.

When loading the core dump, if a kernel dump is detected, drgn adds a special memory reader which does page table translation. If you never load a core dump, because you're adding a custom memory reader, then drgn will never add those memory readers, nor will it set IS_LINUX_KERNEL.

The branches I've worked on carry on with that assumption. They do allow you to give drgn some amount of information it would need, but they wouldn't (yet) get the page table translation working with a custom memory reader.

I have a branch called vmcoreinfo which I went ahead and rebased/updated. Like I said it won't work for your particular use case, but I think I need to start getting this merged before we can really start thinking about solutions to the other problems I've outlined. It's just been on the back burner for me.

iostapyshyn · 2024-05-27T12:45:51Z

@brenns10

Thanks for the answer, it all makes perfect sense!

In practice, though, there's not yet a way to make a custom Program have the IS_LINUX_KERNEL flag, and all the associated linux-specific behavior. (Note that the API doesn't have any place you can set flags.) The only way that flag gets applied is when loading a core dump (or /proc/kcore) if drgn can detect it.

I have implemented Program.set_image function in my branch here, that loads a raw memory image, reads VMCOREINFO from an environment variable and sets IS_LINUX_KERNEL. Here's a usage example:

platform = drgn.Platform(drgn.Architecture.X86_64,
                         drgn.PlatformFlags.IS_LITTLE_ENDIAN | drgn.PlatformFlags.IS_64_BIT)
prog = drgn.Program(platform)
prog.set_image("/dev/shm/qemu-ram")

It works wonderfully, but isn't as pretty internally as I hoped due to the following QEMU quirk:

When using a file as QEMU a memory backend (via -object memory-backend-file,id=mem,size=16G,mem-path=/dev/shm/qemu-ram,share=o -machine memory-backend=mem), the APIC&HPET region (0xc0000000-0x100000000 just below the 4G border) is skipped in the file, resulting in an address mismatch above 4 GiB. While applying a simple offset when reading above 0x100000000 is enough, this is highly architecture dependent and not general enough (to implement #172, for example).

I also had to fix the address translation issue that you mention in #396, I will comment there in a moment on that matter.

brenns10 · 2024-05-28T21:35:23Z

prog.set_image("/dev/shm/qemu-ram")

That is really cool. It's quite exciting that this is possible with so few code changes on the drgn side. I agree that it may not be general enough, the address mismatch issue sounds pretty architecture-specific. But I think this reveals two interesting items that could be useful for custom programs.

It would be nice to be able to add drgn_read_memory_file via Program.add_memory_segment so we could have custom file-backed programs without the overhead of a Python read_fn, and without needing to write custom C.
It would be nice to have some sort of Program.set_linux_kernel() which could use the previously established memory segments and vmcoreinfo to set the IS_LINUX_KERNEL flag and associate the linux kernel read_memory_via_pgtable reader.

With these two things, my vmcoreinfo changes, and a fix for address translation with swapper_pg_dir like what you have, your program could be created entirely in Python. What's more, this would probably open the door for other interesting Python-based Linux backends that could be used even before we have full gdbstub support.

brenns10 · 2024-05-30T19:22:23Z

Closed in #396!

brenns10 mentioned this issue May 24, 2024

Support setting custom vmcoreinfo #396

Merged

brenns10 closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow overriding / providing vmcoreinfo data #350

Allow overriding / providing vmcoreinfo data #350

brenns10 commented Aug 22, 2023

Homura2333 commented Sep 1, 2023

brenns10 commented Sep 1, 2023 •

edited

Loading

Homura2333 commented Sep 4, 2023

brenns10 commented Sep 4, 2023 •

edited

Loading

iostapyshyn commented May 24, 2024 •

edited

Loading

brenns10 commented May 24, 2024 •

edited

Loading

iostapyshyn commented May 27, 2024 •

edited

Loading

brenns10 commented May 28, 2024

brenns10 commented May 30, 2024

Allow overriding / providing vmcoreinfo data #350

Allow overriding / providing vmcoreinfo data #350

Comments

brenns10 commented Aug 22, 2023

Homura2333 commented Sep 1, 2023

brenns10 commented Sep 1, 2023 • edited Loading

Homura2333 commented Sep 4, 2023

brenns10 commented Sep 4, 2023 • edited Loading

iostapyshyn commented May 24, 2024 • edited Loading

brenns10 commented May 24, 2024 • edited Loading

iostapyshyn commented May 27, 2024 • edited Loading

brenns10 commented May 28, 2024

brenns10 commented May 30, 2024

brenns10 commented Sep 1, 2023 •

edited

Loading

brenns10 commented Sep 4, 2023 •

edited

Loading

iostapyshyn commented May 24, 2024 •

edited

Loading

brenns10 commented May 24, 2024 •

edited

Loading

iostapyshyn commented May 27, 2024 •

edited

Loading