Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate on skinning weights sum #1749

Merged
merged 3 commits into from
Nov 11, 2020
Merged

Elaborate on skinning weights sum #1749

merged 3 commits into from
Nov 11, 2020

Conversation

lexaknyazev
Copy link
Member

Following the discussion from KhronosGroup/glTF-Validator#132.

Rationale is borrowed from KhronosGroup/glTF-Validator#124 (comment) by @zeux.

/cc @vpenades

@scurest
Copy link

scurest commented Jan 24, 2020

I brought up this specific case back when normalized weights were first made required: #1352 (comment).

My impression was that some tolerance for this problem was supposed to be allowed?

Isn't this a breaking change? There's probably plenty of code that quantitizes weights by multiplying by 255 and rounding.

@lexaknyazev
Copy link
Member Author

@scurest
It seems that the allowed tolerance could only be observed with float32 weights. This means that with unorm8 components, exporters should renormalize weights after quantization.

Implementation example.

@scurest
Copy link

scurest commented Jan 24, 2020

Isn't this a breaking change?

@lexaknyazev
Copy link
Member Author

There's probably plenty of code that quantitizes weights by multiplying by 255 and rounding.

There were plenty of reports of models (even in our own Sample-Models repo) behaving incorrectly because of non-unit sum (like 0.7) or inaccurate weights quantization (see screenshots in the linked thread). The alternative would be to require engines to always preprocess weights on load.

Isn't this a breaking change?

This PR just clarifies what "unit linear sum" means for specific component types.

@vpenades
Copy link
Contributor

vpenades commented Jan 24, 2020

I'm surprised with the results seen in the screenshots here: KhronosGroup/glTF-Validator#124 (comment)

While it might be true that this is related to weights not having exactly the same value as the originals, I suspect this is happening also because the original model's hands have some undesired influences from unrelated bones, and this has been made more apparent by quantized weights. So maybe in most cases it might not be so apparent. In fact maybe in other cases where the renormalization is uneven, it might have undersirable effects too!

I also fear what @scurest has said; even if the schema clarifies this case scenario, everybody has been asumming some level of tolerance on all component types; After all, "if some tolerance is allowed in the Float component, why not allow it in others?" , so to me, even if it is a clarification, it's also a breaking change.

So, maybe a compromise solution would be to have TWO tolerances: if the weights sum has an exact sum of 1, all is good, but if it's very close to 1, issue a warning instead of an error. This will allow current implementations to keep working, and give due warning for those that want to walk the extra mile.

In any case, the specification and gltf-validator need to be aligned, so developers following the specification will not hit gltf-validator errors.

@zeux
Copy link
Contributor

zeux commented Jan 24, 2020

While it might be true that this is related to weights not having exactly the same value as the originals, I suspect this is happening also because the original model's hands have some undesired influences from unrelated bones, and this has been made more apparent by quantized weights.

I don't think the result is very surprising. If you have any surface that's skinned to multiple bones, the influences of the bones will gradually decrease/increase. As a result, the sum error after quantization will fluctuate between -1/255, 0/255 and 1/255 (occasionally getting to +-2/255 for 4-weight sources). This should just naturally occur for any regions that have >2 weights. If you have exactly 2 weights, then I think this will be rare since the sum should mostly result in precisely 255. But even if you have 2 weights, and they are both exactly equal to 0.5, naive quantization will introduce an error of 1/255.

When you have any error, you will see a problem. Consider the character in that screenshot. The hands look like they are at about +-0.6m from the center of the character along the X axis. The error of 1/255 results in 0.2cm in the steady position, which is 10% of the finger width. I think this is why the fingers are visibly distorted even in the first screenshot. Now, if you use world-space skinning (which three.js does, and it is suboptimal for a few reasons but a straightforward implementation of glTF spec would also skin in world space), then moving the character just 5 meters to the right would place the hands at 5.6m from the origin, and so the error becomes 2cm. 50 meters, and it's 20 cm. This is exactly what you see in the screenshots.

The reason why you see the vertices "oscillating" is because you have the error of the sum oscillating.

For 16-bit weights, the problem is much less severe because the error from naive quantization is 256 times smaller - so you have to move the character much further from the origin to see it. But for 8-bit weights it's severe.

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

@vpenades
Copy link
Contributor

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

Currently gltf-validator issues an error when it finds weights not perfecly normalized. Given that the "naive normalization" approach is what most developers will naturally take, maybe being so strict is not good for adoption.

Bug given that also not many developers are writing gltfs with packed weights, I don't think being strict will affect many developers either...

@scurest
Copy link

scurest commented Jan 24, 2020

The alternative would be to require engines to always preprocess weights on load.

Or normalize them in the shader or whatever. CesiumMan has non-normalized weights so any engine that can load it is already doing some kind of normalization.

Having the exporter write round(255*weight) and then having the engine normalize after de-quantization is the best case for accuracy. Any adjustments you do to make them sum to 255 increases the absolute error.


Btw the algorithm linked above is not necessarily the best way to do it, even with just four weights. It concentrates all the error in the maximal element.

Here are some weights quantized with just round(255*weight). These are best possible errors. The quantized column sums to 257.

weight quantized error
0.7 179 0.002
0.1 26 0.002
0.1 26 0.002
0.1 26 0.002

Adjusting the quantized weights to sum to 255 by modifying only the maximal element increases the absolute error.

weight quantized error
0.7 177 -0.006
0.1 26 0.002
0.1 26 0.002
0.1 26 0.002

OTOH, if you defray it over two elements, the absolute error is still the best possible

weight quantized error
0.7 178 -0.002
0.1 26 0.002
0.1 26 0.002
0.1 25 -0.002

@zeux
Copy link
Contributor

zeux commented Jan 24, 2020

Btw the algorithm linked above is not necessarily the best way to do it, even with just four weights. It concentrates all the error in the maximal element.

Yes, spreading the error across multiple elements can reduce the per-component error, but this is less important compared to minimizing the error in the sum.

I agree that normalizing in the shader can work around these issues, but do all engines do that? three.js clearly doesn't (it normalizes during load - even that should be unnecessary, it's only doing this because some models were broken)

@scurest
Copy link

scurest commented Jan 24, 2020

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load? Or does it only normalize floats or something?

@vpenades
Copy link
Contributor

@zeux How cesium man looks in threejs?

I think we're in a sort of crossroads, where we need to choose either requiring renormalization after quantization (which, as @scurest said, it does not guarantee to solve the problem) - or requiring engines to renormalize on load.

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load? Or does it only normalize floats or something?

I believe it's the floats, so the deviation is evenly distributed across all weights, instead of overshooting one weight.

@zeux
Copy link
Contributor

zeux commented Jan 24, 2020

it does not guarantee to solve the problem

Can you expand on this?

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load?

It only normalizes floating point weights right now. In general I don't think the specification should assume that all loaders need to normalize weights on load, the only reason why three.js did it is that bugs with broken models were reported and needed to be fixed.

@lexaknyazev
Copy link
Member Author

After all, "if some tolerance is allowed in the Float component, why not allow it in others?" , so to me, even if it is a clarification, it's also a breaking change.

Technically, the allowed level of tolerance (2e-7 per non-zero weight in the validator) does not depend on the component type - it's just too small to be achievable with unorm8 (that is limited by 1/255 steps).

even if it is a clarification, it's also a breaking change

At first, the spec didn't mention the weights sum at all. Then, after multiple reports of unexpected behavior, we agreed on "linear unit sum" in general. Now, since this check has been implemented and we have some specific examples of its impact, it's time to make the spec more concrete.

This change certainly invalidates assets with obviously wrong weights - that's fine. There are also assets that were valid all along - that's great. Everything in the middle has some issues that may be visible only in a particular pose or with exaggerated scale (when engines do not fix that somehow, like CesiumMan in three.js).

Keep in mind, that one of the primary goals of the validator is to improve glTF compatibility with different engines/platforms/environments by subjecting assets to ever-tightening rules. A less picky workflow may knowingly override the corresponding validation severity code or mute it completely.

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

In my opinion, a warning should not lead to a broken render (in general).

we're in a sort of crossroads ...

As a rule of thumb, glTF prefers putting the burden on exporters.

@lexaknyazev
Copy link
Member Author

Please review the updated spec language.

Copy link
Contributor

@donmccurdy donmccurdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording threads the needle of (1) not normatively specifying something too arbitrary, (2) explaining the motivation clearly, and (3) setting expectations for the validator. Looks good to me, thanks @lexaknyazev. 👍

@lexaknyazev
Copy link
Member Author

Final nit. The validation note says:

The threshold in the official validation tool is set to 2e-7 times the number of non-zero weights per vertex.

It may be unclear from what number of weights threshold accumulation starts, i.e., whether [0, 0, 0, 0.9999998] is a valid accessor element. Currently, this case is allowed by the validator.

@vpenades
Copy link
Contributor

Additionally, given the exquisitely precise this requirement is, I would also suggest to pick old CesiumMan and other models that have invalid weights and move them to the gltf-generation collection as negative load cases.

@donmccurdy
Copy link
Contributor

It may be unclear from what number of weights threshold accumulation starts, i.e., whether [0, 0, 0, 0.9999998] is a valid accessor element. Currently, this case is allowed by the validator.

The validator's behavior, and considering that case to be one non-zero weight for a total threshold of 2e-7, seems like clearest option to me.

@emackey
Copy link
Member

emackey commented Jan 31, 2020

I'm not a skinning expert, so @donmccurdy or @bghgary can one of you merge this when ready?

@donmccurdy
Copy link
Contributor

@lexaknyazev your comment (#1749 (comment)) asks about the wording, and to me the wording seems clear. Is this OK to merge, or was there more clarification you feel is needed, like saying "regardless of weight order"?

@lexaknyazev
Copy link
Member Author

@donmccurdy
It's not exactly about the order. To put it simply: when there's only one "active" (i.e. non-zero) weight, should it be strictly 1.0?

Some implementations read only three weights and use w4 = 1.0 - w1 - w2 - w3.

@donmccurdy
Copy link
Contributor

Ok, I have no preference on that.

@donmccurdy
Copy link
Contributor

To put it simply: when there's only one "active" (i.e. non-zero) weight, should it be strictly 1.0?

Let's skip adding further requirements for now, a single clear requirement for the skinning sum is a good starting point.

@lexaknyazev
Copy link
Member Author

Let's skip adding further requirements for now, a single clear requirement for the skinning sum is a good starting point.

It's more like a disambiguation rather than an extra requirement.

2e-7 * num_non_zero_weights allows [0, 0, 0, 0.9999998] which makes little sense when zeros represent missing joints. At the very least, I'd like the spec to have a clear position on validity of such a case.

@donmccurdy
Copy link
Contributor

Sorry, I can't understand why 2e-7 * num_non_zero_weights would not be sufficient, even when there's just one non-zero weight. It can be extremely difficult to correctly convert various DCC tools' skinning representations into any standard format, and I don't see much benefit to adding another branch in the validation requirements for this case.

@emackey
Copy link
Member

emackey commented Oct 27, 2020

Having written my own skinning implementation earlier this year, I finally understand certain issues such as the importance of the weights summing to 1.0. That said, it does make sense to allow some tolerance, particularly for systems that use different floating-point representations and must convert. With such tolerance allowed, the [0, 0, 0, 0.9999998] case should also be allowed, for consistency. That case doesn't appear any more harmful than any other 2e-7 tolerance.

@lexaknyazev
Copy link
Member Author

After going through the original rationale one more time, I agree with allowing [0, 0, 0, 0.9999998] case for consistency.

@donmccurdy @emackey
Please take a final look on the PR's language.


> **Implementation Note:** Since the allowed threshold is much lower than minimum possible step for quantized component types, exporters should just renormalize weight sum after quantization.

Without these requirements, vertices would be deformed significantly because the weight error would get multiplied by the joint position. For example, an error of `1/255` in the weight sum would result in an unacceptably large difference in the joint position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be part of the implementation note immediately above it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would merge the non-implementation-note lines into a single paragraph, followed by two implementation notes. No need to interleave them sentence by sentence.

@donmccurdy
Copy link
Contributor

Other than the minor layout issue @emackey mentioned, this looks good to me. 👍

@lexaknyazev lexaknyazev merged commit 0a61cf8 into master Nov 11, 2020
@lexaknyazev lexaknyazev deleted the weight-sum branch November 11, 2020 14:00
emackey added a commit that referenced this pull request Nov 17, 2020
@shrinktofit
Copy link

@lexaknyazev What happened if some vertices are not skinned? e.g they have zero sum weights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants