Elaborate on skinning weights sum #1749

lexaknyazev · 2020-01-24T15:35:11Z

Following the discussion from KhronosGroup/glTF-Validator#132.

Rationale is borrowed from KhronosGroup/glTF-Validator#124 (comment) by @zeux.

specification/2.0/README.md

scurest · 2020-01-24T20:38:45Z

I brought up this specific case back when normalized weights were first made required: #1352 (comment).

My impression was that some tolerance for this problem was supposed to be allowed?

Isn't this a breaking change? There's probably plenty of code that quantitizes weights by multiplying by 255 and rounding.

lexaknyazev · 2020-01-24T20:50:22Z

@scurest
It seems that the allowed tolerance could only be observed with float32 weights. This means that with unorm8 components, exporters should renormalize weights after quantization.

Implementation example.

scurest · 2020-01-24T20:55:22Z

Isn't this a breaking change?

lexaknyazev · 2020-01-24T20:59:28Z

There's probably plenty of code that quantitizes weights by multiplying by 255 and rounding.

There were plenty of reports of models (even in our own Sample-Models repo) behaving incorrectly because of non-unit sum (like 0.7) or inaccurate weights quantization (see screenshots in the linked thread). The alternative would be to require engines to always preprocess weights on load.

Isn't this a breaking change?

This PR just clarifies what "unit linear sum" means for specific component types.

vpenades · 2020-01-24T21:45:29Z

I'm surprised with the results seen in the screenshots here: KhronosGroup/glTF-Validator#124 (comment)

While it might be true that this is related to weights not having exactly the same value as the originals, I suspect this is happening also because the original model's hands have some undesired influences from unrelated bones, and this has been made more apparent by quantized weights. So maybe in most cases it might not be so apparent. In fact maybe in other cases where the renormalization is uneven, it might have undersirable effects too!

I also fear what @scurest has said; even if the schema clarifies this case scenario, everybody has been asumming some level of tolerance on all component types; After all, "if some tolerance is allowed in the Float component, why not allow it in others?" , so to me, even if it is a clarification, it's also a breaking change.

So, maybe a compromise solution would be to have TWO tolerances: if the weights sum has an exact sum of 1, all is good, but if it's very close to 1, issue a warning instead of an error. This will allow current implementations to keep working, and give due warning for those that want to walk the extra mile.

In any case, the specification and gltf-validator need to be aligned, so developers following the specification will not hit gltf-validator errors.

zeux · 2020-01-24T22:45:19Z

While it might be true that this is related to weights not having exactly the same value as the originals, I suspect this is happening also because the original model's hands have some undesired influences from unrelated bones, and this has been made more apparent by quantized weights.

I don't think the result is very surprising. If you have any surface that's skinned to multiple bones, the influences of the bones will gradually decrease/increase. As a result, the sum error after quantization will fluctuate between -1/255, 0/255 and 1/255 (occasionally getting to +-2/255 for 4-weight sources). This should just naturally occur for any regions that have >2 weights. If you have exactly 2 weights, then I think this will be rare since the sum should mostly result in precisely 255. But even if you have 2 weights, and they are both exactly equal to 0.5, naive quantization will introduce an error of 1/255.

When you have any error, you will see a problem. Consider the character in that screenshot. The hands look like they are at about +-0.6m from the center of the character along the X axis. The error of 1/255 results in 0.2cm in the steady position, which is 10% of the finger width. I think this is why the fingers are visibly distorted even in the first screenshot. Now, if you use world-space skinning (which three.js does, and it is suboptimal for a few reasons but a straightforward implementation of glTF spec would also skin in world space), then moving the character just 5 meters to the right would place the hands at 5.6m from the origin, and so the error becomes 2cm. 50 meters, and it's 20 cm. This is exactly what you see in the screenshots.

The reason why you see the vertices "oscillating" is because you have the error of the sum oscillating.

For 16-bit weights, the problem is much less severe because the error from naive quantization is 256 times smaller - so you have to move the character much further from the origin to see it. But for 8-bit weights it's severe.

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

vpenades · 2020-01-24T22:54:23Z

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

Currently gltf-validator issues an error when it finds weights not perfecly normalized. Given that the "naive normalization" approach is what most developers will naturally take, maybe being so strict is not good for adoption.

Bug given that also not many developers are writing gltfs with packed weights, I don't think being strict will affect many developers either...

scurest · 2020-01-24T22:55:56Z

The alternative would be to require engines to always preprocess weights on load.

Or normalize them in the shader or whatever. CesiumMan has non-normalized weights so any engine that can load it is already doing some kind of normalization.

Having the exporter write round(255*weight) and then having the engine normalize after de-quantization is the best case for accuracy. Any adjustments you do to make them sum to 255 increases the absolute error.

Btw the algorithm linked above is not necessarily the best way to do it, even with just four weights. It concentrates all the error in the maximal element.

Here are some weights quantized with just round(255*weight). These are best possible errors. The quantized column sums to 257.

weight	quantized	error
0.7	179	0.002
0.1	26	0.002
0.1	26	0.002
0.1	26	0.002

Adjusting the quantized weights to sum to 255 by modifying only the maximal element increases the absolute error.

weight	quantized	error
0.7	177	-0.006
0.1	26	0.002
0.1	26	0.002
0.1	26	0.002

OTOH, if you defray it over two elements, the absolute error is still the best possible

weight	quantized	error
0.7	178	-0.002
0.1	26	0.002
0.1	26	0.002
0.1	25	-0.002

zeux · 2020-01-24T23:01:59Z

Btw the algorithm linked above is not necessarily the best way to do it, even with just four weights. It concentrates all the error in the maximal element.

Yes, spreading the error across multiple elements can reduce the per-component error, but this is less important compared to minimizing the error in the sum.

I agree that normalizing in the shader can work around these issues, but do all engines do that? three.js clearly doesn't (it normalizes during load - even that should be unnecessary, it's only doing this because some models were broken)

scurest · 2020-01-24T23:08:15Z

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load? Or does it only normalize floats or something?

vpenades · 2020-01-24T23:09:38Z

@zeux How cesium man looks in threejs?

I think we're in a sort of crossroads, where we need to choose either requiring renormalization after quantization (which, as @scurest said, it does not guarantee to solve the problem) - or requiring engines to renormalize on load.

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load? Or does it only normalize floats or something?

I believe it's the floats, so the deviation is evenly distributed across all weights, instead of overshooting one weight.

zeux · 2020-01-24T23:19:23Z

it does not guarantee to solve the problem

Can you expand on this?

If it's normalizing on load, shouldn't it be adjusting these quantized weights at load?

It only normalizes floating point weights right now. In general I don't think the specification should assume that all loaders need to normalize weights on load, the only reason why three.js did it is that bugs with broken models were reported and needed to be fixed.

lexaknyazev · 2020-01-24T23:22:59Z

After all, "if some tolerance is allowed in the Float component, why not allow it in others?" , so to me, even if it is a clarification, it's also a breaking change.

Technically, the allowed level of tolerance (2e-7 per non-zero weight in the validator) does not depend on the component type - it's just too small to be achievable with unorm8 (that is limited by 1/255 steps).

even if it is a clarification, it's also a breaking change

At first, the spec didn't mention the weights sum at all. Then, after multiple reports of unexpected behavior, we agreed on "linear unit sum" in general. Now, since this check has been implemented and we have some specific examples of its impact, it's time to make the spec more concrete.

This change certainly invalidates assets with obviously wrong weights - that's fine. There are also assets that were valid all along - that's great. Everything in the middle has some issues that may be visible only in a particular pose or with exaggerated scale (when engines do not fix that somehow, like CesiumMan in three.js).

Keep in mind, that one of the primary goals of the validator is to improve glTF compatibility with different engines/platforms/environments by subjecting assets to ever-tightening rules. A less picky workflow may knowingly override the corresponding validation severity code or mute it completely.

Having said that, it might make sense to change this to be a warning? I am not sure I fully understand the distinction.

In my opinion, a warning should not lead to a broken render (in general).

we're in a sort of crossroads ...

As a rule of thumb, glTF prefers putting the burden on exporters.

lexaknyazev · 2020-01-25T14:11:37Z

Please review the updated spec language.

donmccurdy

The wording threads the needle of (1) not normatively specifying something too arbitrary, (2) explaining the motivation clearly, and (3) setting expectations for the validator. Looks good to me, thanks @lexaknyazev. 👍

lexaknyazev · 2020-01-27T20:38:09Z

Final nit. The validation note says:

The threshold in the official validation tool is set to 2e-7 times the number of non-zero weights per vertex.

It may be unclear from what number of weights threshold accumulation starts, i.e., whether [0, 0, 0, 0.9999998] is a valid accessor element. Currently, this case is allowed by the validator.

vpenades · 2020-01-28T23:25:51Z

Additionally, given the exquisitely precise this requirement is, I would also suggest to pick old CesiumMan and other models that have invalid weights and move them to the gltf-generation collection as negative load cases.

donmccurdy · 2020-01-30T00:06:45Z

It may be unclear from what number of weights threshold accumulation starts, i.e., whether [0, 0, 0, 0.9999998] is a valid accessor element. Currently, this case is allowed by the validator.

The validator's behavior, and considering that case to be one non-zero weight for a total threshold of 2e-7, seems like clearest option to me.

emackey · 2020-01-31T19:05:22Z

I'm not a skinning expert, so @donmccurdy or @bghgary can one of you merge this when ready?

donmccurdy · 2020-02-04T16:07:20Z

@lexaknyazev your comment (#1749 (comment)) asks about the wording, and to me the wording seems clear. Is this OK to merge, or was there more clarification you feel is needed, like saying "regardless of weight order"?

lexaknyazev · 2020-02-04T16:29:24Z

@donmccurdy
It's not exactly about the order. To put it simply: when there's only one "active" (i.e. non-zero) weight, should it be strictly 1.0?

Some implementations read only three weights and use w4 = 1.0 - w1 - w2 - w3.

donmccurdy · 2020-02-04T16:33:28Z

Ok, I have no preference on that.

donmccurdy · 2020-10-21T20:53:15Z

To put it simply: when there's only one "active" (i.e. non-zero) weight, should it be strictly 1.0?

Let's skip adding further requirements for now, a single clear requirement for the skinning sum is a good starting point.

lexaknyazev · 2020-10-21T22:45:52Z

Let's skip adding further requirements for now, a single clear requirement for the skinning sum is a good starting point.

It's more like a disambiguation rather than an extra requirement.

2e-7 * num_non_zero_weights allows [0, 0, 0, 0.9999998] which makes little sense when zeros represent missing joints. At the very least, I'd like the spec to have a clear position on validity of such a case.

donmccurdy · 2020-10-27T00:38:34Z

Sorry, I can't understand why 2e-7 * num_non_zero_weights would not be sufficient, even when there's just one non-zero weight. It can be extremely difficult to correctly convert various DCC tools' skinning representations into any standard format, and I don't see much benefit to adding another branch in the validation requirements for this case.

emackey · 2020-10-27T13:59:43Z

Having written my own skinning implementation earlier this year, I finally understand certain issues such as the importance of the weights summing to 1.0. That said, it does make sense to allow some tolerance, particularly for systems that use different floating-point representations and must convert. With such tolerance allowed, the [0, 0, 0, 0.9999998] case should also be allowed, for consistency. That case doesn't appear any more harmful than any other 2e-7 tolerance.

lexaknyazev · 2020-10-27T17:16:37Z

After going through the original rationale one more time, I agree with allowing [0, 0, 0, 0.9999998] case for consistency.

@donmccurdy @emackey
Please take a final look on the PR's language.

emackey · 2020-10-27T17:42:12Z

specification/2.0/README.md

+
+> **Implementation Note:** Since the allowed threshold is much lower than minimum possible step for quantized component types, exporters should just renormalize weight sum after quantization.
+
+Without these requirements, vertices would be deformed significantly because the weight error would get multiplied by the joint position. For example, an error of `1/255` in the weight sum would result in an unacceptably large difference in the joint position.


Should this line be part of the implementation note immediately above it?

I think I would merge the non-implementation-note lines into a single paragraph, followed by two implementation notes. No need to interleave them sentence by sentence.

donmccurdy · 2020-10-30T21:33:01Z

Other than the minor layout issue @emackey mentioned, this looks good to me. 👍

shrinktofit · 2021-01-12T03:12:48Z

@lexaknyazev What happened if some vertices are not skinned? e.g they have zero sum weights?

Elaborate on skinning weights sum

4acc801

lexaknyazev requested review from emackey, donmccurdy and bghgary January 24, 2020 15:35

bghgary reviewed Jan 24, 2020

View reviewed changes

specification/2.0/README.md Outdated Show resolved Hide resolved

Addressed feedback

41f21c5

donmccurdy approved these changes Jan 27, 2020

View reviewed changes

bghgary approved these changes Jan 27, 2020

View reviewed changes

emackey approved these changes Jan 27, 2020

View reviewed changes

donmccurdy added the specification label Oct 17, 2020

emackey reviewed Oct 27, 2020

View reviewed changes

Rearrange paragraphs

1554616

lexaknyazev merged commit 0a61cf8 into master Nov 11, 2020

lexaknyazev deleted the weight-sum branch November 11, 2020 14:00

emackey added a commit that referenced this pull request Nov 17, 2020

Merge changes from #1749

0d27971

emackey mentioned this pull request Jan 12, 2021

What if there are some vertices have zero weight sum in a skinned mesh? #1929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elaborate on skinning weights sum #1749

Elaborate on skinning weights sum #1749

lexaknyazev commented Jan 24, 2020

scurest commented Jan 24, 2020 •

edited

Loading

lexaknyazev commented Jan 24, 2020

scurest commented Jan 24, 2020

lexaknyazev commented Jan 24, 2020

vpenades commented Jan 24, 2020 •

edited

Loading

zeux commented Jan 24, 2020 •

edited

Loading

vpenades commented Jan 24, 2020

scurest commented Jan 24, 2020 •

edited

Loading

zeux commented Jan 24, 2020

scurest commented Jan 24, 2020

vpenades commented Jan 24, 2020

zeux commented Jan 24, 2020

lexaknyazev commented Jan 24, 2020

lexaknyazev commented Jan 25, 2020

donmccurdy left a comment

lexaknyazev commented Jan 27, 2020

vpenades commented Jan 28, 2020

donmccurdy commented Jan 30, 2020

emackey commented Jan 31, 2020

donmccurdy commented Feb 4, 2020

lexaknyazev commented Feb 4, 2020

donmccurdy commented Feb 4, 2020

donmccurdy commented Oct 21, 2020

lexaknyazev commented Oct 21, 2020

donmccurdy commented Oct 27, 2020

emackey commented Oct 27, 2020

lexaknyazev commented Oct 27, 2020

emackey Oct 27, 2020

donmccurdy Oct 30, 2020

donmccurdy commented Oct 30, 2020

shrinktofit commented Jan 12, 2021


		> Implementation Note: Since the allowed threshold is much lower than minimum possible step for quantized component types, exporters should just renormalize weight sum after quantization.

		Without these requirements, vertices would be deformed significantly because the weight error would get multiplied by the joint position. For example, an error of `1/255` in the weight sum would result in an unacceptably large difference in the joint position.

Elaborate on skinning weights sum #1749

Elaborate on skinning weights sum #1749

Conversation

lexaknyazev commented Jan 24, 2020

scurest commented Jan 24, 2020 • edited Loading

lexaknyazev commented Jan 24, 2020

scurest commented Jan 24, 2020

lexaknyazev commented Jan 24, 2020

vpenades commented Jan 24, 2020 • edited Loading

zeux commented Jan 24, 2020 • edited Loading

vpenades commented Jan 24, 2020

scurest commented Jan 24, 2020 • edited Loading

zeux commented Jan 24, 2020

scurest commented Jan 24, 2020

vpenades commented Jan 24, 2020

zeux commented Jan 24, 2020

lexaknyazev commented Jan 24, 2020

lexaknyazev commented Jan 25, 2020

donmccurdy left a comment

Choose a reason for hiding this comment

lexaknyazev commented Jan 27, 2020

vpenades commented Jan 28, 2020

donmccurdy commented Jan 30, 2020

emackey commented Jan 31, 2020

donmccurdy commented Feb 4, 2020

lexaknyazev commented Feb 4, 2020

donmccurdy commented Feb 4, 2020

donmccurdy commented Oct 21, 2020

lexaknyazev commented Oct 21, 2020

donmccurdy commented Oct 27, 2020

emackey commented Oct 27, 2020

lexaknyazev commented Oct 27, 2020

emackey Oct 27, 2020

Choose a reason for hiding this comment

donmccurdy Oct 30, 2020

Choose a reason for hiding this comment

donmccurdy commented Oct 30, 2020

shrinktofit commented Jan 12, 2021

scurest commented Jan 24, 2020 •

edited

Loading

vpenades commented Jan 24, 2020 •

edited

Loading

zeux commented Jan 24, 2020 •

edited

Loading

scurest commented Jan 24, 2020 •

edited

Loading