-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using overdraw optimization by itself #1
Comments
Sorry for a long delay, somehow my notifications were disabled. So the way overdraw optimizer works is by ordering clusters of triangles with a heuristic that's based on normal/position of a cluster. It uses the fact that vcache optimizer creates clusters that are reasonably local in terms of topology, then splits each cluster into smaller clusters based on the ACMR threshold and then runs the sorting. If you care more about overdraw than about vertex transform cost it should be sufficient to run overdraw optimizer (after vcache optimizer) with a high threshold, e.g. 1.5 or 2; this would mean that the cluster splitting is more aggressive and overdraw optimizer has more opportunities for reordering. Given a very high threshold, the clusters will be split into individual triangles. I'm not sure what the overdraw optimization efficiency of this would be - if in your testing using threshold 3 gives good overdraw results then I could add a method that generates single-triangle clusters and immediately runs overdraw optimization, but I am not sure it's a good tradeoff - generally running vcache optimization does not penalize the overdraw too much as long as threshold is above one (1.1 or 1.2 are usually reasonable values). |
From a practical standpoint, I agree with every point that you made. From a purely educational standpoint, I still think it would be beneficial to be able to contrast pure vertex cache optimization with pure overdraw optimization in order to compare the effect of optimization aimed at the GPU's vertex processing vs. its pixel processing. One of the things that I've been surprised by in my testing is how responsive GPU performance can be to vertex cache and overdraw optimization. However, that responsiveness is often not exposed because it depends on how close to optimal a triangle mesh is to begin with – and that seems to vary a great deal from model to model. I ultimately ended up writing a pair of functions to randomly order vertices and triangles to establish a baseline for performance testing, and I found that it was relatively easy to roughly double performance against that baseline. I learned a lot in that process, but I still don't feel like I have the complete picture of the effect of geometry optimization since I haven't been able to compare pure overdraw optimization directly. So, from a educational perspective, it would still be nice to be able to evaluate pure overdraw optimization. As I mentioned, I found it very illuminating to compare optimized geometry against completely randomized geometry. One of the things that I like about your library and tool is the statistics reporting. If you're interested adding randomized geometry as a baseline, I'd be happy to share that code with you. I doubt that would be the case since it's obviously fairly simple to do and the code I have is written for Cinder, but I appreciate you taking the time to get back to me so I wanted to make the offer. |
I have implemented an overdraw analyzer (sort of WIP as I might attempt to make it a bit faster, but it works) and I have added random shuffle and pure overdraw using existing function with special cluster/etc. setup to demo program. Here are results from two large meshes I was testing on:
Based on these results, my preference is to not add overdraw-only optimization - users of the library may think it is enticing without realizing that the overdraw benefits are marginal and vertex cache penalties are very large - but I'll see if maybe it can be slightly easier to run this (e.g. maybe clusters vector can be empty - unfortunately this would mean that it's easy to forget to fill the clusters vector by mistake when running the combined vertex cache+overdraw). |
First of all, thank you for putting so much time and effort into this. Also, I understand your concerns about not wanting to confuse users with overdraw-only optimization. As a user of the library, I, too, do not want to go down the wrong path by over-emphasizing the importance of overdraw optimization. The concern that I have is that the two models that you tested with are basically convex, at least in relative terms, and therefore intrinsically have little capacity for excessive overdraw. In contrast, the geometry that I am concerned about would be roughly similar to the following as an example: Layer upon layer of the cargo containers in the picture would have lots of overdraw, which is where my concerns are focused. I realize that this is something of a special case since most often geometry is similar to the models that you tested. However, I don't think that the case that I am proposing is that uncommon in the real world, even if it occurs far less often. Here is another, unfortunately related, example: From both the results you've posted and the helpful discussions that we've had, I am starting to doubt that overdraw-only optimization will be more helpful than vertex cache and overdraw optimization (with a suitably large threshold) even in the extreme cases that I am proposing. But I do think it bears further testing. I've attached some geometry of the type that I'm suggesting in case it's helpful. Again, thank you for the time that you've put into this and the insight that you've shared. |
PlyStacks.obj is interesting:
ACMR here is pretty bad overall because the model is basically built out of disjointed quads (so 2 vertices/triangle is the best case), but the overdraw optimization is incredibly effective, with or without vertex cache :) My larger point I guess is that there is already a simple way to do overdraw-only optimization. I'm doing some cleanup/optimization work on the overdraw code so I'll see if it can be simplified, but it's not too bad as it is: https://github.com/zeux/meshoptimizer/blob/master/demo/main.cpp#L154-L164 |
Sorry that I did not respond sooner. My e-mail notification about your last message only include the first line so I thought you were still investigating. Thank you again for taking the time to look into this. I've learned a lot from these discussions, and you've cleared up the doubts that I had about Tipsy algorithm with regard to my prior concerns. I've learned a lot from the statistics for the comparison of methods, so I hope that you keep them since I believe they will be similarly helpful for others. I've had some prior experience with Tipsy (I obtained the original code from article's author a number of years back) before I started using your library, but this still has been very illuminating for me. Thank you again for the hard work that you've put into this. |
Thank you for starting the conversation! This thread led me to add the overdraw analyzer to this project which can be used to understand the performance benefits/tradeoffs of overdraw optimizer better. I think the conclusion is basically this: By default you don't gain that much more overdraw reordering opportunities if you just slightly split clusters that post-transform cache optimizer produces; this is definitely true for Tipsy but might be a general property of post-transform optimization algorithms. While it's unwise to just use the clusters Tipsy produces without any extra splitting (kThreshold = 0.f), slightly splitting them (kThreshold in [1, 1.05f]) produces enough data so that the heuristics-based sort works well. If you want to be very aggressive about overdraw optimization, you can try to use kThreshold of 3.f, and then you might as well not do the post-transform optimization because at threshold 3 the triangle order will be completely reshuffled. This is what the overdraw-only pass in the demo program does. Frequently this results in overdraw that's marginally better than the combined approach, although I had seen one model with a lot of overlap (hairball from http://graphics.cs.williams.edu/data/meshes.xml) where the difference can be substantial. Additionally, I have found that the pre-defined clustering that Tipsy provides and the framework of the algorithm is sort of a mixed blessing - sometimes these extra splits are very important for overdraw sorting because they contain inherent connectivity information and naturally isolate geometrical clusters, but on the flip side this means that the threshold argument for the algorithm is misleading - when you set it to 1.05, you aren't actually guaranteed that ACMR is within 5% of the original because the initial cluster split sometimes hurts ACMR despite what the paper suggests. You can experiment with this by disregarding the cluster vector the post-transform pass provides, and just using a vector { 1 } with a smaller threshold (like 1.05). |
When -cf setting is specified, we will save uncompressed binary data to the fallback buffer so that loaders that don't support the compression extension can load the data. This requires adjusting the JSON format for the buffer views; this change simply implements scaffolding necessary to support this - we now save an additional .fallback.bin file when requested and reference it from the JSON blob as a buffer #1.
It would be nice to be able to use the overdraw optimizer by itself both for testing purposes but also because I am using unusual geometry with many concavities, which means that the pixel processing far outweighs the cost of vertex transformation.
The text was updated successfully, but these errors were encountered: