Excessive memory consumption #90

pkgw · 2022-10-04T13:31:36Z

As reported in the discussion of #66, it seems that this program sometimes eats up an amount of memory that is totally out of line with what it should be doing. E.g., "Analyzing a 123MB binary OOMs my 24GB RAM machine."

I don't use this tool myself anymore and unfortunately I'm not in a position to spend time on it, but maybe the proposal in #47 would help with addressing this. The approach taken by the code for this program is about as straightforward as it can get, so I feel like there must be a problem with one of the underlying libraries (or their Rust bindings).

CC @rowanworth @Ristovski @oldherl

HanabishiRecca · 2023-10-21T18:05:18Z

I managed to reduce the memory consumption dramatically by splitting section into chunks of 1 MiB.

E.g. currently I am not able to analyze electron binary with a size of ~158 MiB, being killed by OOM on my 32 GB RAM machine.

Doing something like

for chunk in data.chunks(1024 * 1024) {
    let insns = cs
        .disasm_all(chunk, 0)

reduces memory consumption down to ~500 MiB and finishes successfully.

A fairly dumb and straightforward solution.

The only possible downside I can think of: some instructions may fall on a boundary of the slice and be split in half. Which means they will be skipped as invalid.
It seems to be quite unlikely though, and considering that usually there are more than 1 instruction of a kind, I don't see any difference on practice.

pkgw · 2023-10-21T18:12:42Z

I'm still quite surprised about this memory consumption issue, but if a chunking approach helps, then great!

If you overlap the chunks by a little bit, would that avoid the pitfall of missing instructions that fall on a boundary? I suppose there is a possibility then that maybe an incomplete instruction will be mis-parsed?

Alternatively, maybe there's a way to figure out the byte offset and size of the final decoded instruction, and start the next chunk at a known boundary?

HanabishiRecca · 2023-10-21T18:41:47Z

I'm still quite surprised about this memory consumption issue

Well, it contains a large amount of detailed information for every instruction. Disabling the detail mode reduces memory consumption significantly, but we lose ability to identify the group obviously.

If you overlap the chunks by a little bit, would that avoid the pitfall of missing instructions that fall on a boundary? I suppose there is a possibility then that maybe an incomplete instruction will be mis-parsed?

Maybe. Shrinking down to 4 KiB chunks I am able to see some discrepancy.

Alternatively, maybe there's a way to figure out the byte offset and size of the final decoded instruction, and start the next chunk at a known boundary?

I think it is technically possible.
But in this case I would better look towards cs.disasm_count. I don't know why capstone-rs API is so impractical, but maybe we can do something with it.

HanabishiRecca · 2023-10-21T19:13:03Z

Actually yes, disasm_count seems to work very well. The solution I came up with is

let data = sect.data().expect("couldn't get section data");
let mut offset = 0;

loop {
    let rest = &data[offset..];
    if rest.is_empty() {
        break;
    }

    let insns = cs
        .disasm_count(rest, 0, 1)
        .expect("couldn't disassemble section");

    for insn in insns.iter() {
        offset += insn.bytes().len();

        let Ok(detail) = cs.insn_detail(insn) else {
            continue;
        };

        for group_code in detail.groups() {
            if seen_groups.insert(group_code.0) {
                if let Some(mnemonic) = insn.mnemonic() {
                    if let Some(desc) = describe_group(&group_code.0) {
                        println!("{} ({})", desc, mnemonic);
                    }
                }
            }
        }
    }
}

Reading instructions one-by-one works like a charm.
Memory consumption drops even lower than with the chunks approach (< 100 MiB) and also significantly faster (~9 secs for one-by-one vs ~28 secs for chunks).

In fact it is generally 3x faster than current 0.6.1 release, even for small binaries, on my machine. disasm_all seems to be ineffective for some reason.

pkgw · 2023-10-21T19:23:57Z

I am also very surprised that going one-by-one is faster too, but I'll take it! Maybe there are arrays that are realloc'ed with inefficient copying around, or something.

It might be a little more helpful to break when disasm_count() returns an error, rather than panicking (something that this program should be better at in general).

HanabishiRecca · 2023-10-21T19:36:02Z

Maybe there are arrays that are realloc'ed with inefficient copying around, or something.

I guess so. Because performance only goes down with increased count of instructions requested via disasm_count(). Which is counterintuitive as batch operations are often cheaper.

It might be a little more helpful to break when disasm_count() returns an error

Maybe, but I don't really know in which situation it can fail. Invalid instructions are simply skipped as we know.
Also it is tricky because we don't know the amount of bytes to move offset in this case. Maybe simple offset += 1 will suffice though, idk really.

HanabishiRecca · 2023-10-21T20:20:20Z

To summarize, I am unable to find an explanation of reasons disasm_count() can fail. I assume it is only about C library under the hood being unable to allocate memory or something. Which means we should exit the app in this case anyway.

So I think to push the PR as the current implementation seems good according to my testing.

HanabishiRecca · 2023-10-21T21:23:30Z

Btw there is cs_disasm_iter exist in the original C library, which does effectively what we want.
But not implemented by capstone-rs. capstone-rust/capstone-rs#1

Maybe it is worth trying to call it directly via capstone-sys, can result in another significant performance uplift.

HanabishiRecca · 2023-10-22T03:24:16Z

Well, my test shows that cs_disasm_iter is only slightly faster. Like 9 -> 8 secs improvement for a big binary.
I think it is not worth it right now for such marginal speedup, as it introduces a lot of new complexity and usafety. (Here you can see how horrendous it is.)

So for now I'd better go with the current PR.

pkgw · 2023-10-22T11:58:50Z

Thank you for investigating and documenting your findings!

HanabishiRecca · 2023-10-22T12:20:04Z

Btw, I noticed that simply updating dependencies to the latest versions improves the performance a bit.
So consider doing cargo update before the next release. 👍

HanabishiRecca · 2023-10-22T12:26:05Z

I posted too late 🙃

pkgw mentioned this issue Oct 4, 2022

elfx86exts 0.4.3 / capstone 0.7.0 overlooks instructions in some circumstances #66

Closed

This was referenced Oct 21, 2023

Fix excessive memory consumption #173

Merged

Problem when running on Archlinux tensorflow-opt-cuda #2

Closed

pkgw closed this as completed in #173 Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory consumption #90

Excessive memory consumption #90

pkgw commented Oct 4, 2022 •

edited

Loading

HanabishiRecca commented Oct 21, 2023 •

edited

Loading

pkgw commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023 •

edited

Loading

pkgw commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 22, 2023 •

edited

Loading

pkgw commented Oct 22, 2023

HanabishiRecca commented Oct 22, 2023 •

edited

Loading

HanabishiRecca commented Oct 22, 2023

Excessive memory consumption #90

Excessive memory consumption #90

Comments

pkgw commented Oct 4, 2022 • edited Loading

HanabishiRecca commented Oct 21, 2023 • edited Loading

pkgw commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023 • edited Loading

pkgw commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 21, 2023

HanabishiRecca commented Oct 22, 2023 • edited Loading

pkgw commented Oct 22, 2023

HanabishiRecca commented Oct 22, 2023 • edited Loading

HanabishiRecca commented Oct 22, 2023

pkgw commented Oct 4, 2022 •

edited

Loading

HanabishiRecca commented Oct 21, 2023 •

edited

Loading

HanabishiRecca commented Oct 21, 2023 •

edited

Loading

HanabishiRecca commented Oct 22, 2023 •

edited

Loading

HanabishiRecca commented Oct 22, 2023 •

edited

Loading