Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding / providing vmcoreinfo data #350

Closed
brenns10 opened this issue Aug 22, 2023 · 9 comments
Closed

Allow overriding / providing vmcoreinfo data #350

brenns10 opened this issue Aug 22, 2023 · 9 comments

Comments

@brenns10
Copy link
Contributor

I had a bit of a hack that let me specify my own vmcoreinfo note, in case the one provided is incorrect or missing.

In either case, the vmcoreinfo note itself is generally still present somewhere inside the guest memory, and it's not actually that tough to find. I have a tool that can search the pages of an ELF/kdump vmcore for the vmcoreinfo note, rather quickly based on the idea that it will only appear at a PAGE_SIZE boundary, and it will start with OSRELEASE=.

Supposing you've used such a tool, Drgn should have an advanced feature somewhere to allow you to specify your own vmcoreinfo note. Crash doesn't allow this exact feature -- but it lets you specify architecture/configuration details such as KASLR offsets which would normally be found in the note.

The hack I had allowed you to specify the vmcoreinfo note contents in an environment variable, which was a bad API. Spitballing a few APIs here (mostly thinking of Python-facing API):

  1. Allow set_core_dump() (and friends) to accept an optional buffer which contains the vmcoreinfo. This would raise the question of whether set_kernel() would allow it, since that's functionally the same as set_core_dump("/proc/kcore").
  2. Add an optional parameter to the Program constructor.
  3. Allow the user to specify the file path for a file containing vmcoreinfo, via an environment variable.

Of those, I prefer the optional parameter to the Program constructor. It's nice because it mirrors the platform parameter, and in cases where vmcoreinfo is broken, typically the other fields which drgn relies on to determine the platform are also broken, so Drgn would fail to read the vmcore without specifying platform as well.

@osandov if you have a preferred API, let me know. Feel free to assign me on this (I can't update assignees, labels, etc). I'll probably submit something within the next month or two when I find time for it.

@Homura2333
Copy link

Hi, I encountered an issue while using makedumpfile -R new_vmcore <vmcore followed by running sudo dragon -c new_vmcore. I received the following error:

drgn 0.0.23+98.g5a362aa.dirty (using Python 3.8.13, elfutils 0.176, with libkdumpfile)
Traceback (most recent call last):
  File "/usr/local/bin/drgn", line 33, in <module>
    sys.exit(load_entry_point('drgn==0.0.23+98.g5a362aa.dirty', 'console_scripts', 'drgn')())
  File "/usr/local/lib/python3.8/site-packages/drgn-0.0.23+98.g5a362aa.dirty-py3.8-linux-x86_64.egg/drgn/cli.py", line 263, in _main
    prog.set_core_dump(args.core)
Exception: kdump_vmcoreinfo_raw: linux.vmcoreinfo.raw is not set

I would like to know if this error is related to the issue you mentioned?

@brenns10
Copy link
Contributor Author

brenns10 commented Sep 1, 2023

Yes, this error is at least related to this issue! It comes from here in the code.

Essentially, the vmcoreinfo note contains metadata that Drgn needs in order to understand the vmcore, but your vmcore does not have a vmcoreinfo note -- or at least, it doesn't have it easily accessible. A resolution to this issue might allow you to open this vmcore in Drgn.

A few questions, if you don't mind helping us out:

  1. How did you generate this vmcore? I'm guessing it was with Qemu or some other hypervisor?
  2. What is the kernel version that was running in the vmcore? Was it any particular distribution kernel?
  3. Would you mind trying out the program below to see if it can extract the necessary data?

Instructions for compiling & using the program (I assume you have libkdumpfile headers available since it looks like you built drgn from source)

git clone https://github.com/brenns10/kernel_stuff/
cd kernel_stuff/vmcoreinfo
gcc -g -o dumpphys{,.c} -lkdumpfile
./dumpphys -i -c PATH/TO/VMCORE -o vmcoreinfo.txt

Let me know the output and provide the vmcoreinfo.txt if you can.

@Homura2333
Copy link

Thank you so much for your assistance! I truly appreciate your help.
Regarding the issue, the vmcore I have is generated from a lightweight virtual machine hypervisor. Both the hypervisor and the kernel are internally developed and closed-source, so unfortunately, I cannot provide more specific details in this regard.
I followed your instructions. Here are the results:

kdump_read: Wrong page size: 0

Please let me know if there is anything else I can provide or if there are any further steps you recommend.

@brenns10
Copy link
Contributor Author

brenns10 commented Sep 4, 2023

kdump_read: Wrong page size: 0

That's an error I've never seen before. The error is coming from libkdumpfile. It indicates that there's a page with encoded size of 0. That's very odd, and libkdumpfile believes that it indicates a corrupted file. Corrupted could mean a few different things: (1) damaged data, which is unlikely but possible, or (2) the hypervisor generated the diskdump format incorrectly, or (3) the hypervisor generated correct data but libkdumpfile hit a bug. I'd put my money on (2), but (3) does happen rarely, and (1) is quite unlikely.

In any case, I pushed some changes to that tool in the kernel_stuff repository, please pull it, rebuild, and add the -p option to the ./dumpphys command. This will continue reading after the first kdump_read() fails, so that we can see if the whole file is broken, or just some of the data.

Do you have any tool which is actually capable of reading this file? It would help me to understand whether this is just an issue you observe with Drgn, or whether this entire vmcore is problematic.

@iostapyshyn
Copy link
Contributor

iostapyshyn commented May 24, 2024

@brenns10

Do you think it would be possible to attach to drgn to a raw memory dump of a virtual machine (e.g., QEMU's memory-backend-file) by providing a vmcoreinfo? I'm trying to halt and debug a QEMU guest periodically, where dumping via dump-guest-memory is a bit of an overkill (the resulting vmcore takes a lot of disk space and time and can be thrown away immediately).

I constructed a custom drgn program with add_memory_segment but seem to be missing virtual->physical address translation and other symbols specified by vmcoreinfo.

@brenns10
Copy link
Contributor Author

brenns10 commented May 24, 2024

Hi @iostapyshyn

Providing vmcoreinfo to drgn should be enough to do this, at least in theory. In practice, though, there's not yet a way to make a custom Program have the IS_LINUX_KERNEL flag, and all the associated linux-specific behavior. (Note that the API doesn't have any place you can set flags.) The only way that flag gets applied is when loading a core dump (or /proc/kcore) if drgn can detect it.

When loading the core dump, if a kernel dump is detected, drgn adds a special memory reader which does page table translation. If you never load a core dump, because you're adding a custom memory reader, then drgn will never add those memory readers, nor will it set IS_LINUX_KERNEL.

The branches I've worked on carry on with that assumption. They do allow you to give drgn some amount of information it would need, but they wouldn't (yet) get the page table translation working with a custom memory reader.

I have a branch called vmcoreinfo which I went ahead and rebased/updated. Like I said it won't work for your particular use case, but I think I need to start getting this merged before we can really start thinking about solutions to the other problems I've outlined. It's just been on the back burner for me.

@iostapyshyn
Copy link
Contributor

iostapyshyn commented May 27, 2024

@brenns10

Thanks for the answer, it all makes perfect sense!

In practice, though, there's not yet a way to make a custom Program have the IS_LINUX_KERNEL flag, and all the associated linux-specific behavior. (Note that the API doesn't have any place you can set flags.) The only way that flag gets applied is when loading a core dump (or /proc/kcore) if drgn can detect it.

I have implemented Program.set_image function in my branch here, that loads a raw memory image, reads VMCOREINFO from an environment variable and sets IS_LINUX_KERNEL. Here's a usage example:

platform = drgn.Platform(drgn.Architecture.X86_64,
                         drgn.PlatformFlags.IS_LITTLE_ENDIAN | drgn.PlatformFlags.IS_64_BIT)
prog = drgn.Program(platform)
prog.set_image("/dev/shm/qemu-ram")

It works wonderfully, but isn't as pretty internally as I hoped due to the following QEMU quirk:

When using a file as QEMU a memory backend (via -object memory-backend-file,id=mem,size=16G,mem-path=/dev/shm/qemu-ram,share=o -machine memory-backend=mem), the APIC&HPET region (0xc0000000-0x100000000 just below the 4G border) is skipped in the file, resulting in an address mismatch above 4 GiB. While applying a simple offset when reading above 0x100000000 is enough, this is highly architecture dependent and not general enough (to implement #172, for example).

I also had to fix the address translation issue that you mention in #396, I will comment there in a moment on that matter.

@brenns10
Copy link
Contributor Author

prog.set_image("/dev/shm/qemu-ram")

That is really cool. It's quite exciting that this is possible with so few code changes on the drgn side. I agree that it may not be general enough, the address mismatch issue sounds pretty architecture-specific. But I think this reveals two interesting items that could be useful for custom programs.

  1. It would be nice to be able to add drgn_read_memory_file via Program.add_memory_segment so we could have custom file-backed programs without the overhead of a Python read_fn, and without needing to write custom C.
  2. It would be nice to have some sort of Program.set_linux_kernel() which could use the previously established memory segments and vmcoreinfo to set the IS_LINUX_KERNEL flag and associate the linux kernel read_memory_via_pgtable reader.

With these two things, my vmcoreinfo changes, and a fix for address translation with swapper_pg_dir like what you have, your program could be created entirely in Python. What's more, this would probably open the door for other interesting Python-based Linux backends that could be used even before we have full gdbstub support.

@brenns10
Copy link
Contributor Author

Closed in #396!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants