Large number of page faults occurring in `fn rav1d_submit_frame` #1358

ivanloz · 2024-09-13T16:46:26Z

(Copied from #1294 (comment) -- @kkysen for visibility, thanks!)

I've been looking into what remaining sources of overhead might be and wanted to chime in with some of my findings.

One major discrepency when running benchmarks I noticed between dav1d and rav1d was an order of magnitude difference in the number of madvise calls and and page-faults. I also saw a larger number of context-switches in rav1d than dav1d (likely related to page-faults?). This may explain at least some of the performance difference seen.

Digging into this, I found the majority (~82%) of the page-faults in rav1d are coming from rav1d::src::decode::rav1d_submit_frame. Specifically, the stack trace points to:

<alloc::boxed::Box<[rav1d::src::refmvs::RefMvsTemporalBlock]> as core::iter::traits::collect::FromIterator<rav1d::src::refmvs::RefMvsTemporalBlock>>::from_iter::<core::iter::adapters::map::Map<core::ops::range::Range<usize>, rav1d::src::decode::rav1d_submit_frame::{closure#1}>>

I believe that corresponds to this closure:

rav1d/src/decode.rs

Lines 5223 to 5227 in 7d72409

    
           f.mvs = Some( 
        
               (0..f.sb128h as usize * 16 * (f.b4_stride >> 1) as usize) 
        
                   .map(|_| Default::default()) 
        
                   .collect(), 
        
           );

This is the equivalent operation in dav1d:

rav1d/src/decode.c

Lines 3623 to 3624 in 7d72409

    
           f->mvs_ref = dav1d_ref_create_using_pool(c->refmvs_pool, 
        
               sizeof(*f->mvs) * f->sb128h * 16 * (f->b4_stride >> 1));

Here dav1d_submit_frame allocates using dav1d_ref_create_using_pool. This then calls into dav1d_mem_pool_pop, which allocates from pooled memory (initialized in dav1d_mem_pool_init). This likely reduces the amount of allocator calls.

The switch from using pooled memory in rav1d looks to have been introduced as part of 6420e5a, PR #984.

The text was updated successfully, but these errors were encountered:

CrazyboyQCD · 2024-09-19T01:16:55Z

We could use other allocators like jemalloc and mimalloc for comparison.

kkysen · 2024-09-19T05:27:18Z

It's possible that if we just changed this to use an explicitly zeroed allocation, it will be optimized enough. If the performance of that is good, that's much easier than pooling it again and managing the lifetime for that.

kkysen · 2024-09-19T05:58:18Z

The optimization Rust does for zeroed allocations is only done for types that implement IsZero, which is only implemented for the primitive types and for arrays of them. Thus, doing it for custom types won't work out of the box. We could, however, allocate zeroed u8s (or maybe a larger alignment primitive for more aligned types), align it, and then use FromZeroes to transmute it. There is also the issue of how we can initialize an Arc<[T]> this way, but I think we can figure that out.

@ivanloz, if you want to see if doing this for f.mvs fixed the performance issue you measured on your machine, that'd be great. We can just try a quick and dirty solution to see if it's worth pursuing first before trying to figure out a good API for it and doing it for all of our Vecs, AlignedVecs, Box<[_]>s, and Arc<[_]>s that are zero-initialized.

kkysen · 2024-09-19T06:24:57Z

zerocopy (if we enable the "alloc" feature) actually already has FromZeroes::new_{box_slice,vec}_zeroed that do this optimization, though more directly. It doesn't have an Arc one, though, as that's more complex, but it should be enough for testing I think.

ivanloz · 2024-09-19T18:09:34Z

While I've got experience tracking down performance issues, I'm a bit of a novice when it comes to actually writing Rust. This sounds like it should be a relatively straightforward change to put together, but I haven't had much luck implementing it.

If you (or someone else) can provide the patch I'd be happy to test it out and report the results.

kkysen · 2024-09-19T22:01:01Z

While I've got experience tracking down performance issues, I'm a bit of a novice when it comes to actually writing Rust. This sounds like it should be a relatively straightforward change to put together, but I haven't had much luck implementing it.

If you (or someone else) can provide the patch I'd be happy to test it out and report the results.

Sure, I'll work on it when I get some more time. If there's not much of a rush either, we'll also probably want to wait for Arc::new_zeroed_slice (rust-lang/rust#129396) to stabilize. The similar new_uninit/new_uninit_slices functions were just recently stabilized in rust-lang/rust#63291, so hopefully there is still momentum to stabilize the zeroed version sooner. Otherwise, we'll have to use Arc<Box<[T]>>s instead of Arc<[T]>s (in release mode) to get this zeroed initialization optimization, as there's no other way to (safely/stably) create a zeroed Arc.

ivanloz mentioned this issue Sep 13, 2024

Tracking issue for performance #1294

Open

kkysen added the performance label Sep 13, 2024

kkysen changed the title ~~Large number of page faults occurring in rav1d_submit_frame~~ Large number of page faults occurring in fn rav1d_submit_frame Sep 17, 2024

kkysen self-assigned this Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of page faults occurring in `fn rav1d_submit_frame` #1358

Large number of page faults occurring in `fn rav1d_submit_frame` #1358

ivanloz commented Sep 13, 2024

CrazyboyQCD commented Sep 19, 2024 •

edited

Loading

kkysen commented Sep 19, 2024

kkysen commented Sep 19, 2024 •

edited

Loading

kkysen commented Sep 19, 2024

ivanloz commented Sep 19, 2024

kkysen commented Sep 19, 2024

Large number of page faults occurring in fn rav1d_submit_frame #1358

Large number of page faults occurring in fn rav1d_submit_frame #1358

Comments

ivanloz commented Sep 13, 2024

CrazyboyQCD commented Sep 19, 2024 • edited Loading

kkysen commented Sep 19, 2024

kkysen commented Sep 19, 2024 • edited Loading

kkysen commented Sep 19, 2024

ivanloz commented Sep 19, 2024

kkysen commented Sep 19, 2024

Large number of page faults occurring in `fn rav1d_submit_frame` #1358

Large number of page faults occurring in `fn rav1d_submit_frame` #1358

CrazyboyQCD commented Sep 19, 2024 •

edited

Loading

kkysen commented Sep 19, 2024 •

edited

Loading