Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using overdraw optimization by itself #1

Closed
IntellectualKitty opened this issue Mar 25, 2017 · 7 comments
Closed

Using overdraw optimization by itself #1

IntellectualKitty opened this issue Mar 25, 2017 · 7 comments

Comments

@IntellectualKitty
Copy link

It would be nice to be able to use the overdraw optimizer by itself both for testing purposes but also because I am using unusual geometry with many concavities, which means that the pixel processing far outweighs the cost of vertex transformation.

@zeux
Copy link
Owner

zeux commented Apr 29, 2017

Sorry for a long delay, somehow my notifications were disabled.

So the way overdraw optimizer works is by ordering clusters of triangles with a heuristic that's based on normal/position of a cluster. It uses the fact that vcache optimizer creates clusters that are reasonably local in terms of topology, then splits each cluster into smaller clusters based on the ACMR threshold and then runs the sorting.

If you care more about overdraw than about vertex transform cost it should be sufficient to run overdraw optimizer (after vcache optimizer) with a high threshold, e.g. 1.5 or 2; this would mean that the cluster splitting is more aggressive and overdraw optimizer has more opportunities for reordering.

Given a very high threshold, the clusters will be split into individual triangles. I'm not sure what the overdraw optimization efficiency of this would be - if in your testing using threshold 3 gives good overdraw results then I could add a method that generates single-triangle clusters and immediately runs overdraw optimization, but I am not sure it's a good tradeoff - generally running vcache optimization does not penalize the overdraw too much as long as threshold is above one (1.1 or 1.2 are usually reasonable values).

@IntellectualKitty
Copy link
Author

From a practical standpoint, I agree with every point that you made. From a purely educational standpoint, I still think it would be beneficial to be able to contrast pure vertex cache optimization with pure overdraw optimization in order to compare the effect of optimization aimed at the GPU's vertex processing vs. its pixel processing.

One of the things that I've been surprised by in my testing is how responsive GPU performance can be to vertex cache and overdraw optimization. However, that responsiveness is often not exposed because it depends on how close to optimal a triangle mesh is to begin with – and that seems to vary a great deal from model to model. I ultimately ended up writing a pair of functions to randomly order vertices and triangles to establish a baseline for performance testing, and I found that it was relatively easy to roughly double performance against that baseline.

I learned a lot in that process, but I still don't feel like I have the complete picture of the effect of geometry optimization since I haven't been able to compare pure overdraw optimization directly. So, from a educational perspective, it would still be nice to be able to evaluate pure overdraw optimization.

As I mentioned, I found it very illuminating to compare optimized geometry against completely randomized geometry. One of the things that I like about your library and tool is the statistics reporting. If you're interested adding randomized geometry as a baseline, I'd be happy to share that code with you. I doubt that would be the case since it's obviously fairly simple to do and the code I have is written for Cinder, but I appreciate you taking the time to get back to me so I wanted to make the offer.

@zeux
Copy link
Owner

zeux commented May 17, 2017

I have implemented an overdraw analyzer (sort of WIP as I might attempt to make it a bit faster, but it works) and I have added random shuffle and pure overdraw using existing function with special cluster/etc. setup to demo program. Here are results from two large meshes I was testing on:

Using buddha.obj (549409 vertices, 1087474 triangles)
Original       : ACMR 1.395487 ATVR 2.762161 Overdraw 1.202008 in 0.000000 msec
Random Shuffle : ACMR 2.999890 ATVR 5.937839 Overdraw 1.220027 in 62.500000 msec
Cache          : ACMR 0.637773 ATVR 1.262376 Overdraw 1.238767 in 93.750000 msec
Cache+Overdraw : ACMR 0.716197 ATVR 1.417607 Overdraw 1.092349 in 140.625000 msec
Overdraw Only  : ACMR 2.985486 ATVR 5.909328 Overdraw 1.085637 in 203.125000 msec

Using dragon.obj (438976 vertices, 871306 triangles)
Original       : ACMR 1.528173 ATVR 3.033209 Overdraw 1.226321 in 0.000000 msec
Random Shuffle : ACMR 2.999837 ATVR 5.954257 Overdraw 1.218979 in 31.250000 msec
Cache          : ACMR 0.633917 ATVR 1.258237 Overdraw 1.220777 in 93.750000 msec
Cache+Overdraw : ACMR 0.711258 ATVR 1.411747 Overdraw 1.074942 in 125.000000 msec
Overdraw Only  : ACMR 2.985012 ATVR 5.924832 Overdraw 1.072744 in 187.500000 msec

Overdraw Only mode is currently pretty inefficient as it hits some not-quite-optimized paths in the overdraw code due to how it's constructing the inputs; I'll take a look at this.

Based on these results, my preference is to not add overdraw-only optimization - users of the library may think it is enticing without realizing that the overdraw benefits are marginal and vertex cache penalties are very large - but I'll see if maybe it can be slightly easier to run this (e.g. maybe clusters vector can be empty - unfortunately this would mean that it's easy to forget to fill the clusters vector by mistake when running the combined vertex cache+overdraw).

@IntellectualKitty
Copy link
Author

First of all, thank you for putting so much time and effort into this.

Also, I understand your concerns about not wanting to confuse users with overdraw-only optimization. As a user of the library, I, too, do not want to go down the wrong path by over-emphasizing the importance of overdraw optimization.

The concern that I have is that the two models that you tested with are basically convex, at least in relative terms, and therefore intrinsically have little capacity for excessive overdraw. In contrast, the geometry that I am concerned about would be roughly similar to the following as an example:

bigstock-cargo-container-ship-21978356

Layer upon layer of the cargo containers in the picture would have lots of overdraw, which is where my concerns are focused.

I realize that this is something of a special case since most often geometry is similar to the models that you tested. However, I don't think that the case that I am proposing is that uncommon in the real world, even if it occurs far less often. Here is another, unfortunately related, example:

wooden-pallets-3d-model-low-poly-obj-fbx-blend-mtl

From both the results you've posted and the helpful discussions that we've had, I am starting to doubt that overdraw-only optimization will be more helpful than vertex cache and overdraw optimization (with a suitably large threshold) even in the extreme cases that I am proposing. But I do think it bears further testing.

I've attached some geometry of the type that I'm suggesting in case it's helpful.
PlyStacks.obj.zip

Again, thank you for the time that you've put into this and the insight that you've shared.

@zeux
Copy link
Owner

zeux commented May 18, 2017

PlyStacks.obj is interesting:

Using PlyStacks.obj (3120 vertices, 1560 triangles)
Original       : ACMR 2.000000 ATVR 1.000000 Overdraw 2.242937 in 0.000000 msec
Random Shuffle : ACMR 2.989743 ATVR 1.494872 Overdraw 1.777862 in 0.000000 msec
Cache          : ACMR 2.000000 ATVR 1.000000 Overdraw 2.242937 in 0.000000 msec
Cache+Overdraw : ACMR 2.000000 ATVR 1.000000 Overdraw 1.000000 in 0.000000 msec
Overdraw Only  : ACMR 2.104487 ATVR 1.052244 Overdraw 1.000000 in 0.000000 msec

ACMR here is pretty bad overall because the model is basically built out of disjointed quads (so 2 vertices/triangle is the best case), but the overdraw optimization is incredibly effective, with or without vertex cache :)

My larger point I guess is that there is already a simple way to do overdraw-only optimization. I'm doing some cleanup/optimization work on the overdraw code so I'll see if it can be simplified, but it's not too bad as it is: https://github.com/zeux/meshoptimizer/blob/master/demo/main.cpp#L154-L164

@IntellectualKitty
Copy link
Author

Sorry that I did not respond sooner. My e-mail notification about your last message only include the first line so I thought you were still investigating.

Thank you again for taking the time to look into this. I've learned a lot from these discussions, and you've cleared up the doubts that I had about Tipsy algorithm with regard to my prior concerns.

I've learned a lot from the statistics for the comparison of methods, so I hope that you keep them since I believe they will be similarly helpful for others. I've had some prior experience with Tipsy (I obtained the original code from article's author a number of years back) before I started using your library, but this still has been very illuminating for me.

Thank you again for the hard work that you've put into this.

@zeux
Copy link
Owner

zeux commented May 25, 2017

Thank you for starting the conversation! This thread led me to add the overdraw analyzer to this project which can be used to understand the performance benefits/tradeoffs of overdraw optimizer better. I think the conclusion is basically this:

By default you don't gain that much more overdraw reordering opportunities if you just slightly split clusters that post-transform cache optimizer produces; this is definitely true for Tipsy but might be a general property of post-transform optimization algorithms. While it's unwise to just use the clusters Tipsy produces without any extra splitting (kThreshold = 0.f), slightly splitting them (kThreshold in [1, 1.05f]) produces enough data so that the heuristics-based sort works well.

If you want to be very aggressive about overdraw optimization, you can try to use kThreshold of 3.f, and then you might as well not do the post-transform optimization because at threshold 3 the triangle order will be completely reshuffled. This is what the overdraw-only pass in the demo program does. Frequently this results in overdraw that's marginally better than the combined approach, although I had seen one model with a lot of overlap (hairball from http://graphics.cs.williams.edu/data/meshes.xml) where the difference can be substantial.

Additionally, I have found that the pre-defined clustering that Tipsy provides and the framework of the algorithm is sort of a mixed blessing - sometimes these extra splits are very important for overdraw sorting because they contain inherent connectivity information and naturally isolate geometrical clusters, but on the flip side this means that the threshold argument for the algorithm is misleading - when you set it to 1.05, you aren't actually guaranteed that ACMR is within 5% of the original because the initial cluster split sometimes hurts ACMR despite what the paper suggests. You can experiment with this by disregarding the cluster vector the post-transform pass provides, and just using a vector { 1 } with a smaller threshold (like 1.05).

zeux added a commit that referenced this issue Oct 9, 2019
When -cf setting is specified, we will save uncompressed binary data to
the fallback buffer so that loaders that don't support the compression
extension can load the data.

This requires adjusting the JSON format for the buffer views; this
change simply implements scaffolding necessary to support this - we now
save an additional .fallback.bin file when requested and reference it
from the JSON blob as a buffer #1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants