Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling Ui - Round 1 #31

Open
bvssvni opened this issue Nov 2, 2015 · 15 comments
Open

Profiling Ui - Round 1 #31

bvssvni opened this issue Nov 2, 2015 · 15 comments

Comments

@bvssvni
Copy link
Member

bvssvni commented Nov 2, 2015

Testing to see if we can make some performance improvements in Conrod.

Rendering 21 buttons. Profiling using Instruments on OSX.

screen shot 2015-11-02 at 21 38 02

With UI:

screen shot 2015-11-02 at 21 38 13

Without UI (the spikes are when turning UI on to start/stop profiling):

screen shot 2015-11-02 at 21 38 26

@bvssvni
Copy link
Member Author

bvssvni commented Nov 2, 2015

I also ran a benchmark on the texture_swap example to get an estimate of how fast it could be. On my machine, one textured rectangle should take approximately 0.000016 seconds. If we include the frame and extra stuff, let's say 0.00005, then rendering 21 buttons should take no more than 0.00105 seconds. About 1 millisecond in worst case.

Need an estimate on how much Conrod uses per button.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 2, 2015

Notice: This was tested in debug mode, so the result is invalid.

Running in bench mode for 2000 frames.

21 buttons: 20.701
11 buttons: 13.141
1 button: 5.600

(20.701 - 5.600)/20/2000 = 0.000377525
(13.141 - 5.600)/10/2000 = 0.00037705

This is 7.5 times slower than it could be.

@mitchmindtree
Copy link

Hmmm does the profiler's call tree give you a percentage distribution of where the most time is being spent?

The last time I checked, I think the drawing in elmesque is a big bottleneck for conrod - maybe it's worth checking out?

draw_element
draw_form.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 4, 2015

@mitchmindtree I suspect that the drawing is not the biggest bottleneck, but when the elements are constructed. They make use of allocation which is a slow operation for rendering.

I haven an idea: Since most of the interface consists of rectangles, perhaps we can find a solution where rectangle shapes are cheap?

@bvssvni
Copy link
Member Author

bvssvni commented Nov 4, 2015

Btw, I don't want to draw conclusions at this point. There could be something in the drawing that makes it slow. However, the profiler shows _platform_memmove at the top which suggests something to do with memory.

@mitchmindtree
Copy link

Ahh I see! I always worried that all the boxing elmesque uses in its recursive Form and Element data structures might be an issue in this way 😟

I've been thinking about this for a while too (making layout cheaper). All of the unnecessary boxing that occurs happens within the Widget::draw implementations due to the elmesque Form and Element related functions. Widget::draw is basically just a way for a widget to produce a description of how it is to be drawn, but without actually drawing the widget (that happens later when Ui::draw or Ui::draw_if_changed is called). I've been thinking of two options for making the graphics "description" and layout stuff cheaper:

1. Re-think how elmesque's recursive Form and Element data structures work internally.

It might be nice to try this first as it might be a bit easier/quicker than option 2? However I still haven't been able to think of a more efficient way to do this that doesn't involve changing the elm-style purely functional API or adding some hidden global state graph or something. Maybe a solution could be to use carboxyl's FRP API? I'm not certain if this would be a benefit though without also re-working conrod to use it, which would might be a pretty huge/unnecessary process.

2. Create a conrod specific graphics description/layout system inspired by elmesque.

This would be quite the breaking change, however there could be a number of benefits to this approach:

  • The design would not be restricted to elmesque's aims to stay representative of the elm-lang graphics modules.
  • We could remove the confusing distinction between the Form and Element types and consolidate them under one Form type for example.
  • It could be easier for a Widget to re-use and update its existing graphics description/layout (elmesque doesn't provide a way to do this).
  • It would be easier to provide a simple conrod backend API (Conrod should expose (or at least re-export) everything a user needs to use their own graphics and event backends. conrod#576) as we could easily make the rendering of our custom system generic over our own backend (this might be trickier or not as nice using elmesque).
  • One less dependency - slightly easier to maintain.

I can picture this being done as a "graphics layout tree", where rather than building the tree using boxed recursion, we provide a little tree data structure (a wrapper around something like rose_tree) to each widget which only gets allocated once and can be re-used every Widget::draw. The API could be very similar to as it is now, just with the tree being described in the data structure rather than the type recursion, removing the need for all the elmesque related allocations. We might even be able to do all widgets within a single tree - rather than providing each widget with it's own unique tree, we could provide a safe wrapper around the branch of the master tree that represents that widget's graphics layout.

Hmmm

I think I've been heavily leaning towards option 2 for a while now - despite being quite a bit of work, it feels like a much more future proof option having something specific to conrod. Lemme know your thoughts or if you have any other ideas! I might have a crack at this tree thing tonight in a separate branch while it's fresh in my mind.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

This profiling was done in debug mode, so it doesn't represent a reliable benchmark. The previous estimates of how long time spent rendering a button is not representative of the time spent in a released application. Conrod is a lot faster in release mode.

Improved optimizations in the Rust compiler might also have affected the results. I forgot to include the version used in this test.

In order to evaluate PistonDevelopers/conrod#626 properly we need to redo the estimates before Elmesque is removed from Conrod. This gives us a closer base to what level of performance of improvement we can expect by switching to primitives.

Ran again for 2000 frames, but now in release, making sure that the overhead from Cargo is removed.

rustc 1.6.0-nightly (5b4986fa5 2015-11-08)

cargo build --release --example hello_world
time ./target/release/examples/hello_world

Code changes:

    let mut frames = (0..2000).into_iter();
    for mut e in window.bench_mode(true) {
        if let Some(_) = e.render_args() {
            if frames.next().is_none() { break; }
        }
        ...
       if !capture_cursor {
            ui.handle_event(&e);
            e.draw_2d(|c, g| {
                use conrod::*;

                widget_ids!(REFRESH);

                Button::new()
                    .color(color::blue())
                    .top_left()
                    .dimensions(60.0, 30.0)
                    .label("refresh")
                    .react(|| {})
                    .set(REFRESH, &mut ui);

                for i in 0..20 {
                    Button::new()
                        .color(color::blue())
                        .down(0.0)
                        .dimensions(60.0, 30.0)
                        .label("refresh")
                        .react(|| {})
                        .set(REFRESH + 1 + i, &mut ui);
                }

                ui.draw(c, g);
            });
        }
    }

Ran 3 times using Conrod 0.22.2, deleting the slowest:

0m6.009s
0m6.011s

Ran 3 times, deleting the slowest using mitchmindtree/conrod@861726a (before Elmesque was removed from Conrod):

0m6.400s
0m6.266s

We see that PistonDevelopers/conrod#626 is in the same ballpark, but a little slower. Notice that Elmesque is not removed yet, and the PR is still work-in-progress, so it looks promising.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

Made a spread sheet that I will add to the piston-examples repo:

rustc 1.6.0-nightly (5b4986fa5 2015-11-08)

screen shot 2015-11-15 at 20 04 58

I am getting approximately 15 microseconds in release mode (two runs).

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

In debug mode I get 46 microseconds, which is about 3 times slower than release mode.

screen shot 2015-11-15 at 20 26 21

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

One weakness with the texture_swap estimate is that performance is sensitive to the size of the textures. I expect it to be have approximately same characteristics across hardware, such that you could calculate the worst case for a texture of a given size.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

I generalized the spread sheet for estimating O(N) stuff in Turbine. Here I measure buttons using Conrod 0.22.2 with rustc 1.6.0-nightly (5b4986fa5 2015-11-08):

screen shot 2015-11-15 at 21 52 38

I get about 69 microseconds per button. You can see the curve bends slightly up, which is probably why the accuracy of the prediction is around 85%. The more buttons, the longer time it spends per button.

Notice that one button is ignored, it becomes part of the background overhead.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

Here are buttons with Conrod mitchmindtree/conrod@861726a (before Elmesque is removed) on rustc 1.6.0-nightly (5b4986fa5 2015-11-08):

screen shot 2015-11-15 at 22 10 39

As before when I measures total time, this is a little slower. It also shows that Conrod spends more time per button when adding more buttons, in comparison to 0.22.2. An ideal O(N) algorithm would have accuracy of 100%, but this shows 79%.

This type of estimation could be useful, not just checking how fast it is, but also see if changes improves algorithm complexity.

@mitchmindtree
Copy link

Will take a look at this more closely soon, but just thought I'd mention
that elmesque hasn't actually been removed in that new PR just yet, and I
would expect it to be a bit slower in its current state :) I'll let you
know once I've actually removed it and expect things to be faster (y)

On Mon, 16 Nov 2015 08:26 Sven Nilsen notifications@github.com wrote:

Here are buttons with Conrod mitchmindtree/conrod@861726a
mitchmindtree/conrod@861726a
(before Elmesque is removed) on rustc 1.6.0-nightly (5b4986fa5 2015-11-08)
:

[image: screen shot 2015-11-15 at 22 10 39]
https://cloud.githubusercontent.com/assets/1743862/11171040/bbdfa78c-8be5-11e5-8554-a73249003b93.png

As before when I measures total time, this is a little slower. It also
shows that Conrod spends more time per button when adding more buttons, in
comparison to 0.22.2. An ideal O(N) algorithm would have accuracy of 100%,
but this shows 79%.

This type of estimation could be useful, not just checking how fast it is,
but also see if changes improves algorithm complexity.


Reply to this email directly or view it on GitHub
#31 (comment)
.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

@mitchmindtree Yeah, I knew that. I'm doing this to test the method so we know what it says. I wrote "before Elmesque is removed" where it is relevant.

@bvssvni
Copy link
Member Author

bvssvni commented Nov 15, 2015

Measuring buttons in debug mode using Conrod 0.22.2 with rustc 1.6.0-nightly (5b4986fa5 2015-11-08):

screen shot 2015-11-15 at 23 37 50

About 416 microseconds per button, this is 6 times slower than release mode.

This shows something interesting, that the algorithm becomes almost linear in debug mode. I think it is because the overhead by design drowns in the noise and only becomes significant when the compiler generates optimized machine code. Maybe an indicator that extra allocations doesn't matter compared to optimization, which is a bit surprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants