Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec clarification: Buffer views without buffers, buffers without URIs and GLB #1684

Closed
zeux opened this issue Oct 10, 2019 · 20 comments · Fixed by #1686
Closed

Spec clarification: Buffer views without buffers, buffers without URIs and GLB #1684

zeux opened this issue Oct 10, 2019 · 20 comments · Fixed by #1686

Comments

@zeux
Copy link
Contributor

zeux commented Oct 10, 2019

This requires some context so bear with me.

Original use case

I'm working on a new extension that allows glTF/GLB files to specify compressed buffer views, using a custom compression technique optimized for typical geometry/animation data. It's really important that the compression is applied at the bufferView level - applying it at the buffer level conflates data with multiple strides / meaning and necessitates decompressing the entire buffer into memory at once serially; applying it at the accessor level makes it impossible to upload bufferViews to the GPU efficiently, and inflates the JSON blob.

There are two important use cases I'd like to cover with this extension:

  • Data is unconditionally compressed; loading the glTF file requires the extension support
  • Data is provided in both compressed and raw form; loaders that don't support the extension can load the raw data from a separate buffer

To achieve this, I'm trying to use the usual technique of extension specifying "override" data for bufferView - here's an example for usecase 2:

   "extensionsUsed":["MESHOPT_compression"],
   "buffers":[ 
      { 
         "uri":"BrainStem.bin",
         "byteLength":394128
      },
      { 
         "uri":"BrainStem.fallback.bin",
         "byteLength":1166032
      }
   ],
   "bufferViews":[ 
      { 
         "buffer":1,
         "byteOffset":0,
         "byteLength":136336,
         "byteStride":4,
         "target":34962,
         "extensions":{ 
            "MESHOPT_compression":{ 
               "buffer":0,
               "byteOffset":0,
               "byteLength":2646,
               "byteStride":4,
               "mode":0,
               "count":34084
            }
         }
      },

As you can see, there are two buffers - main buffer with compressed data and fallback buffer with uncompressed data. The bufferView refers to buffer 1 (fallback), but compression extension overrides this - if the loader supports the extension, there will not be a need to load the fallback buffer at all. Commonly used GLTF JS loaders use lazy buffer loading, loading buffers as they parse buffer views that refer to them, so this works great.

When data like this is stored in the GLB file, the JSON blob is the same except that the first, main, buffer doesn't have a URI - this also works well.

What doesn't work quite as well is the "data is unconditionally compressed" workflow - I'd like to be able to generate a file without fallback buffers (with extension marked as required).

Dummy buffers

The cleanest solution would have been to omit the buffer reference in bufferView for this case, but glTF spec doesn't allow that - buffer reference is mandatory even in presence of extension. In #1680 it was suggested that a buffer without a URI can be used instead. I've implemented it and it works well in practice - the example above turns into:

   "extensionsUsed":["MESHOPT_compression"],
   "extensionsRequired":["MESHOPT_compression"],
   "buffers":[ 
      { 
         "uri":"BrainStem.bin",
         "byteLength":394128
      },
      { 
         "byteLength":1166032
      }
   ],

Which is basically the same modulo the URI reference. In practice, loaders won' try to load the fallback buffer since all references are through bufferViews that have the required extension, so it seems fine and the spec doesn't require "uri" or "data" to be present. However, the current validator doesn't like this technique when used in GLB file. I tried to fix it in KhronosGroup/glTF-Validator#122 but this breaks some existing tests in the validator and it's not clear whether they are spec-conformant.

Dummy buffers and GLB files

The issue is of course that the first buffer without a URI when used in a glTF file is "special" - it refers to the BIN chunk. The validator "helpfully" points out that if a buffer without a URI is not the first buffer, it might be the case that the file is encoded incorrectly.

When the file with required compression (no fallback) is encoded in GLB, I currently produce this:

   "extensionsUsed":["MESHOPT_compression"],
   "extensionsRequired":["MESHOPT_compression"],
   "buffers":[ 
      { 
         "byteLength":394128
      },
      { 
         "byteLength":1166032
      }
   ],

However note that the semantics of the two buffers is actually quite different.

The first buffer is a BIN chunk buffer reference - it contains compressed data.
The second buffer is a dummy fallback buffer - it is not backed by any data, and it only exists to be referred-to from bufferViews that should never use this reference because they have a compressed view and the extension support is required.

What is the exact semantics of dummy buffers wrt GLB?

Thus, there are a few questions that would be good to resolve to understand the interactions here.

  1. If a GLB file has a BIN chunk, does the first buffer have to refer to it? The specification doesn't mandate this. It's probably not very sensible to not refer to the BIN chunk, but this is not explicitly stated as far as I can tell. For example, is it valid to have a BIN chunk, and have no buffers defined in the file?

  2. Since a GLB file does not have to have a BIN chunk, is a GLB file without a BIN chunk valid if the first buffer in the JSON content is a dummy buffer (without URI)?

  3. If a GLB file does have a BIN chunk, and the first buffer refers to it, are all other dummy buffers valid with the same semantics that they have in glTF files?

  4. Also, what is the semantics of a buffer without a URI or data or BIN chunk? :) What should loaders do if they do try to load data from a buffer like this?

  5. In the future, allowing for multiple BIN chunks seems worthwhile. Are the answers to the questions above going to be compatible with this?

As a concrete example, validation suite has a test that asserts that the following JSON, when encoded into a GLB with a BIN chunk, is invalid:

{"asset":{"version":"2.0"},
"buffers":[{"byteLength":1,"uri":"data:application/gltf-buffer,0"},{"byteLength":4}]}

But is that really true? Sure, there's a BIN chunk, and the first buffer clearly doesn't refer to it - but the validator assumes that the second buffer does and I don't see why that is. This file looks valid to me even if it's odd.

Alternatives: omitting the buffer reference?

Please note that I'm not strongly attached to the concept of dummy buffers in the first place. I would be happier in fact if I could omit the buffer reference from the bufferView. However, that requires changing the core spec - is it possible to do this at this point?

Although the semantics of GLB <-> dummy buffer interaction still should be clarified, if we can change the spec to make the bufferView -> buffer reference optional (noting that in absence of a buffer reference, a required extension is expected to specify the data source), that would make the output files cleaner and less ambiguous.

If we do this we could:

  • Update the spec to make the bufferView.buffer reference optional
  • Add text to the effect of "bufferView.buffer reference is providing the data; in absence of this reference, the view data should be provided by required extensions"

This would relax the spec without making it too lax.

Alternatives: faking bufferLength?

If the .buffer property is sacred, then we could point it at the compressed buffer instead of a dummy buffer. However, this has a problem wrt bufferLength value. For the sake of the example below, let's assume that the scene just has one bufferView (very realistic! it can be an unindexed point cloud with just point positions), and that the compressed data is smaller than uncompressed data. In this case, the buffer with compressed data will be smaller than the bufferLength implied by the accessor. Now let's follow the spec:

  1. "Each accessor must fit its bufferView, i.e., accessor.byteOffset + STRIDE * (accessor.count - 1) + SIZE_OF_ELEMENT must be less than or equal to bufferView.length." => therefore bufferView length (presumably .byteLength is implied here) must be that of the uncompressed data, or larger.

  2. "A bufferView represents a subset of data in a buffer, defined by a byte offset into the buffer specified in the byteOffset property and a total byte length specified by the byteLength property of the buffer view." - while this technically doesn't strictly say that a buffer view must fit its buffer, it seems that this was considered obvious through "subset" wording. Validator agrees (bufferViewTooLong error is generated otherwise). Therefore, the buffer.byteLength must also be that of the uncompressed data or larger.

As a result, we can not fake bufferLength of either the bufferView or the buffer and we need a dummy buffer or we need to cut the bufferView.buffer reference.

@lexaknyazev
Copy link
Member

As a concrete example, validation suite has a test that asserts that the following JSON, when encoded into a GLB with a BIN chunk, is invalid...

This test is based on under-defined assumptions, so it should not be seen as a reference.

that requires changing the core spec - is it possible to do this at this point?

That doesn't require changing the core spec. The question is whether required extensions can make pre-existing mandatory properties optional.

/cc @bghgary @donmccurdy

@zeux
Copy link
Contributor Author

zeux commented Oct 10, 2019

One possible way of resolving this cleanly then could be:

  • Updating the spec to state that required extension can make pre-existing mandatory properties optional
  • Updating the spec to state that a buffer must have data, URI or be the first buffer in a GLB file referring to a BIN chunk. A similar disambiguation exists for images - the property reference for images specifies a few ways to designate image data but the spec says that an image must use one of them.

One issue with this is that the validator won't be able to validate files without knowing the semantics of the required extension - if I try to use this right now, validator (rightfully) complains that "Property 'buffer' must be defined". If we changed the core spec to specify that buffer in bufferView optional this could be resolved.

@bghgary
Copy link
Contributor

bghgary commented Oct 10, 2019

That doesn't require changing the core spec. The question is whether required extensions can make pre-existing mandatory properties optional.

I'm not sure about this. If we don't change the schema, then any implementation that uses the schema to validate will fail even if the extension is required. But changing the schema is a breaking change that shouldn't be allowed. Maybe point the unused buffer to the same buffer as the extension?

@donmccurdy
Copy link
Contributor

Maybe point the unused buffer to the same buffer as the extension?

That would be my thought, too. For the "unconditionally compressed data" workflow:

   "bufferViews":[ 
      { 
         "buffer":0,
         "byteOffset":0,
         "byteLength":2646,
         "byteStride":4,
         "target":34962,
         "extensions":{ 
            "MESHOPT_compression":{ 
               "buffer":0,
               "byteOffset":0,
               "byteLength":2646,
               "byteStride":4,
               "mode":0,
               "count":34084
            }
         }
      },

That's a (small) amount of redundancy, and obviously requires recognizing the extension, but for this workflow the extension is required anyway.

@zeux
Copy link
Contributor Author

zeux commented Oct 10, 2019

Maybe point the unused buffer to the same buffer as the extension?

The issue with this is buffer lengths - uncompressed data is larger than compressed data, so in some cases you can end up in situations where the entire buffer is smaller than each individual uncompressed chunk. As a result, the file will fail validation because byteLength in a bufferView is larger than the byteLength of the referenced buffer.

@donmccurdy
Copy link
Contributor

There's no uncompressed data in this scenario, though – there is no need to specify the byteLength the data had before compression, use the byteLength of the compressed data instead. Validation will fail, unless/until the validator recognizes the extension, but that can be true for any required extension.

@zeux
Copy link
Contributor Author

zeux commented Oct 10, 2019

There's no uncompressed data in this scenario, though – there is no need to specify the byteLength the data had before compression, use the byteLength of the compressed data instead. Validation will fail, unless/until the validator recognizes the extension, but that can be true for any required extension.

You do need to specify the uncompressed data length - this is necessary to create the GPU buffers with the right size (byteLength in the extension example above refers to the compressed representation of data).

But separately, if it's okay for validation to fail, this would suggest that the required extension makes a previously stated validation requirement not crucial. If this is possible, I don't see the difference between this and dropping the buffer reference? (e.g. if it's okay for byteLength in the bufferView to be non-sensical wrt the referenced buffer, why is it not okay to omit the buffer reference in the first place - both are violations of the spec?)

@donmccurdy
Copy link
Contributor

Validation will fail, unless/until the validator recognizes the extension...

But separately, if it's okay for validation to fail...

Hm I think I misstated this. The presence of a required extension doesn't permit any-and-all validation to fail, it just means the validator may not be useful, and ideally we'd avoid that.

if it's okay for byteLength in the bufferView to be non-sensical wrt the referenced buffer

It wasn't my intention say the byteLength should be nonsensical, I'd missed that you wanted to preserve a reference to the size of the uncompressed data so that GPU buffers could be pre-allocated.

For the earlier question, whether required extensions can make pre-existing mandatory properties optional, I don't know. It might be OK, but I share @bghgary's concern, and would prefer to avoid this if we can find a way around it.

To suggest another alternative, keeping the bufferView itself schema-compliant and including the length of the uncompressed data, what about:

   "bufferViews":[ 
      { 
         "buffer":0,
         "byteOffset":0,
         "byteLength":2646, // size of compressed buffer
         "byteStride":4,
         "target":34962,
         "extensions":{ 
            "MESHOPT_compression":{ 
               "buffer":0,
               "byteOffset":0,
               "byteLength":2646, // size of compressed buffer
               "byteLengthUncompressed": 136336,
               "byteStride":4,
               "mode":0,
               "count":34084
            }
         }
      },

In that case:

  1. bufferView.buffer points to a buffer of length bufferView.byteLength
  2. MESHOPT_compression.buffer points to a buffer of length MESHOPT_compression.byteLength
  3. MESHOPT_compression.byteLengthUncompressed provides the information needed to allocate the GPU buffer.

If we're concerned about the redundancy, most of the properties of the extension could be made optional overrides of the base bufferView, so that this would be functionally equivalent to the version above:

   "bufferViews":[ 
      { 
         "buffer":0,
         "byteOffset":0,
         "byteLength":2646,
         "byteStride":4,
         "target":34962,
         "extensions":{ 
            "MESHOPT_compression":{ 
               "byteLengthUncompressed": 136336,
               "mode":0,
               "count":34084
            }
         }
      },

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

To suggest another alternative, keeping the bufferView itself schema-compliant and including the length of the uncompressed data, what about:

The core issue with bufferView.bufferLength is that it's also important to keep accessors valid - accessors (rightfully!) validate that they request a correct (accessible) range from the buffer view they reference. So I don't see a clean way of avoiding specifying bufferLength on the bufferView.

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

Hm I think I misstated this. The presence of a required extension doesn't permit any-and-all validation to fail, it just means the validator may not be useful, and ideally we'd avoid that.

Right, here's what I'm trying to achieve:

  1. I would like it to make it so that this change doesn't require validator to recognize this extension to keep validation clean. This is similar for DRACO and I think all other existing extensions in general - an extension doesn't break validation just because it's required.

  2. I am perfectly content with validator becoming "less useful" if this extension is required. For example, validator can't perform validation of index values or attribute values when this extension is required because it can't decompress the data - which is fine!

  3. I would love to figure out the cleanest solution that satisfies pt.1

Obviously if pt1 is completely off the table I'm fine with breaking validator, and possibly adding support for this extension to validator later, etc. But this creates confusion - e.g. if validator produces lots of errors, people will report bugs (they already have!), etc. So the optimal case is that validation is clean short of "this required extension isn't recognized by the validator".

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

To focus the discussion I have reworded the original post by adding the "Alternatives: omitting the buffer reference?" section instead of "Alternatives" (more clearly spelling out what we could do), and adding "Alternatives: faking bufferLength?" to carefully explain why we can not reuse the same buffer or lie about the bufferLength.

@donmccurdy
Copy link
Contributor

Ok, yeah it's tricky to make this clean. 😅 Actually feels a lot like the Draco extension discussions that ended with accessor.bufferView being left undefined on primitive attributes – they're just metadata about the accessor, and the extension defines where the bytes are in the uncompressed data. I don't know much about your compression technique but that may be another option?

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

Unless I'm misunderstanding, Draco approach doesn't work well because it requires per-accessor specification. It is extremely redundant in terms of JSON size, and prevents efficient decoding directly into GPU memory.

@donmccurdy
Copy link
Contributor

donmccurdy commented Oct 11, 2019

Well, let's see if this fits your goals. Suppose:

  1. accessor.bufferView is left undefined for mesh attributes
  2. accessor.byteOffset reflects the correct offset into the decompressed bufferView
  3. A mesh- or primitive-level extension defines the compressed bufferView for all attributes of that mesh.
  4. That bufferView may or may not have an uncompressed fallback. If not, it's defined according to Spec clarification: Buffer views without buffers, buffers without URIs and GLB #1684 (comment).

Example:

{
  meshes: [
    {
      primitives: [ { attributes: { POSITION: 0, NORMAL: 1 } }
      extensions: { MESHOPT_compression: { bufferView: 0 } }
    }
  ],
  accessors: [
    { bufferView: undefined, byteOffset: 0, ... },
    { bufferView: undefined, byteOffset: 1024, ... },
  ],
  bufferViews: [
    buffer: 0,
    byteLength: 2646,
    ...
    extensions: { 
      MESHOPT_compression: { 
        byteLengthUncompressed: 136336,
        mode: 0,
        count: 34084
      }
    }
  ],
  ...
}

There's a bit more indirection here — you can't load the accessor without knowing the mesh from which it's referenced. That's true of Draco as well, and a couple developers have complained about it, but as far as I know it hasn't actually been a barrier to adoption.

I believe something like that would keep your accessors and bufferviews valid and enable decoding into GPU memory. I'll defer to @lexaknyazev on whether I've trampled on any schema requirements at this point. 😇

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

The extension can be used to compress any buffer view - indices, geometry, animation data. So this would require overriding accessors everywhere where accessors can be specified. This is dramatically increasing the scope of the extension, as well as implementation complexity - the Babylon.JS loader right now is ~30 lines of meaningful JavaScript code and I would really like to keep it that way: https://github.com/zeux/meshoptimizer/blob/master/demo/babylon.MESHOPT_compression.js#L18-L47. Additionally this substantially increases the amount of JSON overhead which can matter in some cases, such as scenes with animation where each node needs several accessors.

@donmccurdy
Copy link
Contributor

donmccurdy commented Oct 11, 2019

The extension can be used to compress any buffer view - indices, geometry, animation data.

Oof. That is elegant, and I congratulate you on it, and this ruins my tidy ideas above. 😣

I think it comes back to your five questions about semantics of dummy buffers, then. I don't know all of those answers but here's an attempt:

  1. If a GLB file has a BIN chunk, does the first buffer have to refer to it? The specification doesn't mandate this. It's probably not very sensible to not refer to the BIN chunk, but this is not explicitly stated as far as I can tell. For example, is it valid to have a BIN chunk, and have no buffers defined in the file?
  1. Since a GLB file does not have to have a BIN chunk, is a GLB file without a BIN chunk valid if the first buffer in the JSON content is a dummy buffer (without URI)?

These two appear to be underdefined now. We could probably clarify this in the spec without considering it a breaking change, or your extension could impose additional restrictions.

  1. If a GLB file does have a BIN chunk, and the first buffer refers to it, are all other dummy buffers valid with the same semantics that they have in glTF files?

Can the later dummy buffers omit the URI? The spec allows that.

  1. Also, what is the semantics of a buffer without a URI or data or BIN chunk? :) What should loaders do if they do try to load data from a buffer like this?

This is left undefined; I would expect nothing useful of it. :)

  1. In the future, allowing for multiple BIN chunks seems worthwhile. Are the answers to the questions above going to be compatible with this?

I would strongly hope that if multiple BIN chunks are allowed in the future, they'll be referenced by a more explicit mechanism than sequential buffers omitting the URI. So let's say yes.

I'm half tempted to suggest a property on the dummy buffers too. Not that implementations would strictly need it, but just to make the whole thing less confusing to see. Like:

"buffers":[
  {"byteLength": 1024 },
  {"byteLength": 4096, "extensions": { "MESHOPT_compression": { "empty": true } } }
]

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

Can the later dummy buffers omit the URI? The spec allows that.

That's the question! :) This started as a validator bug (KhronosGroup/glTF-Validator#121) and the rest is history.

I suspect that the most frictionless course of action is to:

  1. Declare the validator behavior a bug and declare the test that the validator considers invalid, actually valid
  2. Fix the validator bug (Only treat first buffer as .glb-embedded glTF-Validator#122)
  3. Clarify the wording in the spec. Given that any use of dummy buffers in GLB files currently produces validation errors with near certainty I would be surprised if this issue was ever encountered before.

I'm half tempted to suggest a property on the dummy buffers too. Not that implementations would strictly need it, but just to make the whole thing less confusing to see. Like:

I was actually thinking about this as well yeah. This wouldn't serve any purpose normally but if, say, a loader is honoring the extension, it can use this to say "hey I don't actually need to load the buffer eagerly" (all JS loaders load buffers lazily but in theory you can load eagerly).

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

Thinking about this further, the answers to

Since a GLB file does not have to have a BIN chunk, is a GLB file without a BIN chunk valid if the first buffer in the JSON content is a dummy buffer (without URI)?
If a GLB file does have a BIN chunk, and the first buffer refers to it, are all other dummy buffers valid with the same semantics that they have in glTF files?

really must be "yes". Imagine a glTF file with an arbitrary mix of dummy buffers and buffers with a URI. Given a tool that takes the glTF file and wraps it into a GLB file without a BIN chunk, it really needs to be the case that the buffer arrangement becomes valid. Given a tool that takes one of the buffers with a URI (say, first one) and converts it to a BIN chunk, there's no reason to expect other buffers to become invalid.

In other words, since dummy buffers are clearly valid, albeit slightly strange, in glTF files, they really must be valid with the same semantics in GLB files as well, modulo the first one.

I also wholeheartedly agree with the desire to specify the mapping more explicitly so let's assume that it doesn't matter how multiple BIN chunks behave - if/when this happens, buffer can get a "glbIndex" attribute or some such.

Thus I propose that we amend the spec, particularly the "GLB-stored Buffer" section, to say:

(existing)

glTF asset could use GLB file container to pack all resources into one file. glTF Buffer referring to GLB-stored BIN chunk, must have buffer.uri property undefined, and it must be the first element of buffers array; byte length of BIN chunk could be up to 3 bytes bigger than JSON-defined buffer.byteLength to satisfy GLB padding requirements.

(addition)

Any glTF Buffer with undefined buffer.uri property that is not the first element does not refer to the GLB-stored BIN chunk.

This should resolve the validation issues with minimal and hopefully uncontroversial change.

(I will also note that the spec technically allows the first buffer that refers to the GLB BIN chunk to have non-empty buffer.data property but let's not fix it here...)

@donmccurdy
Copy link
Contributor

Any glTF Buffer with undefined buffer.uri property that is not the first element does not refer to the GLB-stored BIN chunk.

I'd agree with that change, and perhaps go a little further to say ... does not refer to the GLB-stored BIN chunk, and the behavior of such buffers is left undefined to accommodate future extensions and specification versions..

What do you mean by buffer.data?

@zeux
Copy link
Contributor Author

zeux commented Oct 11, 2019

What do you mean by buffer.data?

Sorry - I got my wires crossed thinking about buffer specification modes. Your correction sounds good as well. I will draft a PR to this effect (if there are objections to this change I’m happy to discuss alternatives but this seems safe and conservative).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants