Using overdraw optimization by itself #1

IntellectualKitty · 2017-03-25T03:08:56Z

It would be nice to be able to use the overdraw optimizer by itself both for testing purposes but also because I am using unusual geometry with many concavities, which means that the pixel processing far outweighs the cost of vertex transformation.

zeux · 2017-04-29T07:12:23Z

Sorry for a long delay, somehow my notifications were disabled.

So the way overdraw optimizer works is by ordering clusters of triangles with a heuristic that's based on normal/position of a cluster. It uses the fact that vcache optimizer creates clusters that are reasonably local in terms of topology, then splits each cluster into smaller clusters based on the ACMR threshold and then runs the sorting.

If you care more about overdraw than about vertex transform cost it should be sufficient to run overdraw optimizer (after vcache optimizer) with a high threshold, e.g. 1.5 or 2; this would mean that the cluster splitting is more aggressive and overdraw optimizer has more opportunities for reordering.

Given a very high threshold, the clusters will be split into individual triangles. I'm not sure what the overdraw optimization efficiency of this would be - if in your testing using threshold 3 gives good overdraw results then I could add a method that generates single-triangle clusters and immediately runs overdraw optimization, but I am not sure it's a good tradeoff - generally running vcache optimization does not penalize the overdraw too much as long as threshold is above one (1.1 or 1.2 are usually reasonable values).

IntellectualKitty · 2017-05-03T04:00:11Z

From a practical standpoint, I agree with every point that you made. From a purely educational standpoint, I still think it would be beneficial to be able to contrast pure vertex cache optimization with pure overdraw optimization in order to compare the effect of optimization aimed at the GPU's vertex processing vs. its pixel processing.

One of the things that I've been surprised by in my testing is how responsive GPU performance can be to vertex cache and overdraw optimization. However, that responsiveness is often not exposed because it depends on how close to optimal a triangle mesh is to begin with – and that seems to vary a great deal from model to model. I ultimately ended up writing a pair of functions to randomly order vertices and triangles to establish a baseline for performance testing, and I found that it was relatively easy to roughly double performance against that baseline.

I learned a lot in that process, but I still don't feel like I have the complete picture of the effect of geometry optimization since I haven't been able to compare pure overdraw optimization directly. So, from a educational perspective, it would still be nice to be able to evaluate pure overdraw optimization.

As I mentioned, I found it very illuminating to compare optimized geometry against completely randomized geometry. One of the things that I like about your library and tool is the statistics reporting. If you're interested adding randomized geometry as a baseline, I'd be happy to share that code with you. I doubt that would be the case since it's obviously fairly simple to do and the code I have is written for Cinder, but I appreciate you taking the time to get back to me so I wanted to make the offer.

zeux · 2017-05-17T15:02:48Z

I have implemented an overdraw analyzer (sort of WIP as I might attempt to make it a bit faster, but it works) and I have added random shuffle and pure overdraw using existing function with special cluster/etc. setup to demo program. Here are results from two large meshes I was testing on:

Using buddha.obj (549409 vertices, 1087474 triangles)
Original       : ACMR 1.395487 ATVR 2.762161 Overdraw 1.202008 in 0.000000 msec
Random Shuffle : ACMR 2.999890 ATVR 5.937839 Overdraw 1.220027 in 62.500000 msec
Cache          : ACMR 0.637773 ATVR 1.262376 Overdraw 1.238767 in 93.750000 msec
Cache+Overdraw : ACMR 0.716197 ATVR 1.417607 Overdraw 1.092349 in 140.625000 msec
Overdraw Only  : ACMR 2.985486 ATVR 5.909328 Overdraw 1.085637 in 203.125000 msec

Using dragon.obj (438976 vertices, 871306 triangles)
Original       : ACMR 1.528173 ATVR 3.033209 Overdraw 1.226321 in 0.000000 msec
Random Shuffle : ACMR 2.999837 ATVR 5.954257 Overdraw 1.218979 in 31.250000 msec
Cache          : ACMR 0.633917 ATVR 1.258237 Overdraw 1.220777 in 93.750000 msec
Cache+Overdraw : ACMR 0.711258 ATVR 1.411747 Overdraw 1.074942 in 125.000000 msec
Overdraw Only  : ACMR 2.985012 ATVR 5.924832 Overdraw 1.072744 in 187.500000 msec

Overdraw Only mode is currently pretty inefficient as it hits some not-quite-optimized paths in the overdraw code due to how it's constructing the inputs; I'll take a look at this.

Based on these results, my preference is to not add overdraw-only optimization - users of the library may think it is enticing without realizing that the overdraw benefits are marginal and vertex cache penalties are very large - but I'll see if maybe it can be slightly easier to run this (e.g. maybe clusters vector can be empty - unfortunately this would mean that it's easy to forget to fill the clusters vector by mistake when running the combined vertex cache+overdraw).

IntellectualKitty · 2017-05-18T05:23:02Z

First of all, thank you for putting so much time and effort into this.

Also, I understand your concerns about not wanting to confuse users with overdraw-only optimization. As a user of the library, I, too, do not want to go down the wrong path by over-emphasizing the importance of overdraw optimization.

The concern that I have is that the two models that you tested with are basically convex, at least in relative terms, and therefore intrinsically have little capacity for excessive overdraw. In contrast, the geometry that I am concerned about would be roughly similar to the following as an example:

Layer upon layer of the cargo containers in the picture would have lots of overdraw, which is where my concerns are focused.

I realize that this is something of a special case since most often geometry is similar to the models that you tested. However, I don't think that the case that I am proposing is that uncommon in the real world, even if it occurs far less often. Here is another, unfortunately related, example:

From both the results you've posted and the helpful discussions that we've had, I am starting to doubt that overdraw-only optimization will be more helpful than vertex cache and overdraw optimization (with a suitably large threshold) even in the extreme cases that I am proposing. But I do think it bears further testing.

I've attached some geometry of the type that I'm suggesting in case it's helpful.
PlyStacks.obj.zip

Again, thank you for the time that you've put into this and the insight that you've shared.

zeux · 2017-05-18T05:54:23Z

PlyStacks.obj is interesting:

Using PlyStacks.obj (3120 vertices, 1560 triangles)
Original       : ACMR 2.000000 ATVR 1.000000 Overdraw 2.242937 in 0.000000 msec
Random Shuffle : ACMR 2.989743 ATVR 1.494872 Overdraw 1.777862 in 0.000000 msec
Cache          : ACMR 2.000000 ATVR 1.000000 Overdraw 2.242937 in 0.000000 msec
Cache+Overdraw : ACMR 2.000000 ATVR 1.000000 Overdraw 1.000000 in 0.000000 msec
Overdraw Only  : ACMR 2.104487 ATVR 1.052244 Overdraw 1.000000 in 0.000000 msec

ACMR here is pretty bad overall because the model is basically built out of disjointed quads (so 2 vertices/triangle is the best case), but the overdraw optimization is incredibly effective, with or without vertex cache :)

My larger point I guess is that there is already a simple way to do overdraw-only optimization. I'm doing some cleanup/optimization work on the overdraw code so I'll see if it can be simplified, but it's not too bad as it is: https://github.com/zeux/meshoptimizer/blob/master/demo/main.cpp#L154-L164

IntellectualKitty · 2017-05-25T06:54:54Z

Sorry that I did not respond sooner. My e-mail notification about your last message only include the first line so I thought you were still investigating.

Thank you again for taking the time to look into this. I've learned a lot from these discussions, and you've cleared up the doubts that I had about Tipsy algorithm with regard to my prior concerns.

I've learned a lot from the statistics for the comparison of methods, so I hope that you keep them since I believe they will be similarly helpful for others. I've had some prior experience with Tipsy (I obtained the original code from article's author a number of years back) before I started using your library, but this still has been very illuminating for me.

Thank you again for the hard work that you've put into this.

zeux · 2017-05-25T15:17:03Z

Thank you for starting the conversation! This thread led me to add the overdraw analyzer to this project which can be used to understand the performance benefits/tradeoffs of overdraw optimizer better. I think the conclusion is basically this:

By default you don't gain that much more overdraw reordering opportunities if you just slightly split clusters that post-transform cache optimizer produces; this is definitely true for Tipsy but might be a general property of post-transform optimization algorithms. While it's unwise to just use the clusters Tipsy produces without any extra splitting (kThreshold = 0.f), slightly splitting them (kThreshold in [1, 1.05f]) produces enough data so that the heuristics-based sort works well.

If you want to be very aggressive about overdraw optimization, you can try to use kThreshold of 3.f, and then you might as well not do the post-transform optimization because at threshold 3 the triangle order will be completely reshuffled. This is what the overdraw-only pass in the demo program does. Frequently this results in overdraw that's marginally better than the combined approach, although I had seen one model with a lot of overlap (hairball from http://graphics.cs.williams.edu/data/meshes.xml) where the difference can be substantial.

Additionally, I have found that the pre-defined clustering that Tipsy provides and the framework of the algorithm is sort of a mixed blessing - sometimes these extra splits are very important for overdraw sorting because they contain inherent connectivity information and naturally isolate geometrical clusters, but on the flip side this means that the threshold argument for the algorithm is misleading - when you set it to 1.05, you aren't actually guaranteed that ACMR is within 5% of the original because the initial cluster split sometimes hurts ACMR despite what the paper suggests. You can experiment with this by disregarding the cluster vector the post-transform pass provides, and just using a vector { 1 } with a smaller threshold (like 1.05).

When -cf setting is specified, we will save uncompressed binary data to the fallback buffer so that loaders that don't support the compression extension can load the data. This requires adjusting the JSON format for the buffer views; this change simply implements scaffolding necessary to support this - we now save an additional .fallback.bin file when requested and reference it from the JSON blob as a buffer #1.

IntellectualKitty closed this as completed May 25, 2017

IntellectualKitty mentioned this issue May 25, 2017

Mesh optimization support cinder/Cinder#1785

Closed

zeux added the enhancement label Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using overdraw optimization by itself #1

Using overdraw optimization by itself #1

IntellectualKitty commented Mar 25, 2017

zeux commented Apr 29, 2017

IntellectualKitty commented May 3, 2017

zeux commented May 17, 2017

IntellectualKitty commented May 18, 2017

zeux commented May 18, 2017 •

edited

Loading

IntellectualKitty commented May 25, 2017

zeux commented May 25, 2017 •

edited

Loading

Using overdraw optimization by itself #1

Using overdraw optimization by itself #1

Comments

IntellectualKitty commented Mar 25, 2017

zeux commented Apr 29, 2017

IntellectualKitty commented May 3, 2017

zeux commented May 17, 2017

IntellectualKitty commented May 18, 2017

zeux commented May 18, 2017 • edited Loading

IntellectualKitty commented May 25, 2017

zeux commented May 25, 2017 • edited Loading

zeux commented May 18, 2017 •

edited

Loading

zeux commented May 25, 2017 •

edited

Loading