Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing Base 64 / Data URIs in future glTF version #1915

Closed
donmccurdy opened this issue Dec 1, 2020 · 12 comments
Closed

Consider removing Base 64 / Data URIs in future glTF version #1915

donmccurdy opened this issue Dec 1, 2020 · 12 comments
Labels
breaking change Changes under consideration for a future glTF spec version, which would require breaking changes.
Milestone

Comments

@donmccurdy
Copy link
Contributor

donmccurdy commented Dec 1, 2020

Currently glTF has three common packing arrangements:

  • (1) .gltf + .bin + textures (separate)
  • (2) .gltf (embedded)
  • (3) .glb (binary embedded)

Option (2), a .gltf with embedded binary data, adds 20-30% to the overall file size, and a non-trivial amount of extra processing cost to parse the base64 data (at least in JS). This has repeatedly been a stumbling block for three.js users, who understandably don't know the difference between the different options. I've had a difficult time explaining the difference to users — I think users sort of understand the .glb vs .gltf difference, but not the two types of .gltf file.

I'm not aware of any confirmed need for the embedded .gltf Data URIs except for convenient debugging within a single file. Seeing users shipping slower/larger embedded .gltf files without understanding the performance cost, or even comparing glTF unfavorably to other formats based on size differences related to these Data URIs, have me wondering if we are paying too high of a cost for this debugging feature.

Perhaps if it had a different file extension (.gltf-debug?) it would be easier to communicate to users, but I'm tempted to suggest that we drop the option for Data URIs in the next version of glTF (whenever that might be). Thoughts?

@donmccurdy donmccurdy added the breaking change Changes under consideration for a future glTF spec version, which would require breaking changes. label Dec 1, 2020
@donmccurdy donmccurdy added this to the glTF Next milestone Dec 1, 2020
@donmccurdy
Copy link
Contributor Author

In any case, we should also try to ensure that tools don't use (2) as their default output. A few do this today, such as glTF-Pipeline.

@lexaknyazev
Copy link
Member

I wouldn't call data URIs a "debugging feature" since text editors usually do not like huge multi-megabyte strings. Data URIs have almost no extra debugging value compared to keeping binary resources in external files (the latter could be more efficiently edited with specialized tools).

Instead, I think that data URIs in glTF should be viewed just like data URIs in HTML/CSS - used only for very small binary payloads. The pros/cons are basically the same across all web technologies.

+1 for ensuring that tools do not produce them by default. Maybe add a new validation issue when the embedded binary size exceeds a certain threshold?

@donmccurdy
Copy link
Contributor Author

Maybe add a new validation issue when the embedded binary size exceeds a certain threshold?

That's a great idea. 👍

@zeux
Copy link
Contributor

zeux commented Dec 12, 2020

It's a bit tangential perhaps, but somewhat in line and may complement this well, but if we simultaneously add a way to store multiple buffers as part of GLB instead of just a single one, this will clean up the buffer storage story, making migration between gltf + external files and glb simpler as existing buffer view / buffer structure can be maintained, and in some cases allow GLB loaders to omit specific buffers that aren't necessary (due to use of fallbacks for unsupported extensions / LOD-type extensions / etc.).

@prideout
Copy link

As an aside, I've seen data URI's used in the images array but the spec only mentions them in the context of the buffers array. Maybe the spec should be more explicit about where they are allowed.

@zeux and I have contributed to the cgltf library, which does not support data URI's in images, and I don't know if it should.

@zeux
Copy link
Contributor

zeux commented Mar 17, 2021

@prideout I've definitely seen data URIs used in images; cgltf will preserve this data but doesn't decode it in any way, so an application that uses cgltf would need to process data URIs - gltfpack does this (it decodes the Base64 encoded data and stores it in a binary buffer, see https://github.com/zeux/meshoptimizer/blob/master/gltf/write.cpp#L723 + calls to parseDataUri.

@lexaknyazev
Copy link
Member

@prideout
It's defined in the URIs section, so it applies to all URI usages. Using data: with images on the web is trivial although I understand that it doesn't apply as easy to native apps.

@ideiasfrescas
Copy link

I would like to mention that users may want to do some server side processing of the image and then create a whole new model based on a template one. Using php to load images and post changes to the file would be a case against dropping data uris.

@wallabyway
Copy link

wallabyway commented Aug 5, 2021

@donmccurdy
While on the topic of making breaking changes to .glb format...

Regarding - (3) .glb (binary embedded)

I would like to suggest adding msgpack type, to replace the json utf-8 string type, for the structure json chunk inside the .glb file spec.

(happy to try other alternatives to msgpack serialization, but let's discuss that later)...

THE TOOLING PROBLEM:
There's a problem working with large glTF files between tools (and interfaces). They don't serialize/deserialize well.

For example:
I have a tooling pipeline, that generates a glTF+bin's file-set. This is then run through gltfpack for optimizations (unfortunately, gltf-transform got overwhelmed).

The gltf content, is 'massive' - gigabytes in size, hours to generate ( even with node-v8 large-memory settings). For example,. tree's with leaves, just waiting to be de-duplicated and turned into gltf-mesh-instance structures. See #1699

I'm using node.js, and here is the crux of the problem:

let buf = Buffer.from(JSON.stringify(gltf))

This line fills memory, and crashes because there are so many nodes, accessors, bufferviews, etc.

Ironically, this line is needed (serializing of the glTF structure into a JSON string/UINT8Array) for both saving to a glTF file or generating the JSON chunk for a .glb stream.

@zeux
I tried to integrate gltfpack node.js interface directly, but the interface still requires a .glb stream, which means I still need to serialize with this... JSON.stringify(gltf).

THE WORKAROUND(S)
I can generate the large glTF file, by separately serializing sub-parts of the glTF (nodes, accessors, bufferviews, etc) or I can use a different serializer, like msgpack.

This worked! (ie. serializing the JSON structure into a msgpack file)
msgpack files were 4x smaller, serialization was minutes (instead of hours), and gz compressed 2x better. It was easy to find integration tools.

image courtesy of @petrbroz
Screen Shot 2021-08-05 at 1 31 45 PM

It got me thinking...

For the .glb format only, - if we replaced the structured JSON content from type JSON-string to type msgpack
would this be better for the gltf-tooling-ecosystem ?

https://github.com/KhronosGroup/glTF/tree/master/specification/2.0#chunks

(on second thought, this is insanity. I should probably just use msgpack for the whole thing... including the .bin files, since msgpack encodes binary arrays in a compatible way to gltf .bin format... NVM)

@donmccurdy
Copy link
Contributor Author

I think @lexaknyazev has suggested that something like this might be appropriate for GLB v3 as well (#1560 (comment)). There is a real cost to it — currently it takes about +15 LOC in my implementations to add GLB support to an existing GLTF parser, switching to an entirely different serialization would almost certainly increase that. But GLTF has grown far beyond the web now, and JSON is not as much "at home" on other platforms, so perhaps it is something we should just do. Whether it would be msgpack or FlatBuffer or something else, I don't know, that is tricky because file formats can outlive individual libraries. Perhaps this is worth starting a separate issue to discuss.

@lexaknyazev
Copy link
Member

The updated spec is more explicit about the size increase introduced by Data URIs.

@donmccurdy
Copy link
Contributor Author

Unfortunately these files are already common; I don't think language in the spec is going to change that. We have the maintainers of the Blender addon saying "never ever use glTF Embedded", and I'm trying to communicate that same thing to three.js users as well, but it's difficult because it looks like a perfectly normal glTF file.

Ideally I'd like to see authoring tools stop using "glTF Embedded" entirely – it's a trap for users. I'm planning to start showing warnings in three.js any time glTF Embedded files exceed 100kb. I'd also like to propose we remove the option from Blender.

Related: #1117

echadwick-artist added a commit to KhronosGroup/glTF-Sample-Assets that referenced this issue May 23, 2023
Clarifications for glTF Embedded, for more details see KhronosGroup/glTF#1915
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes under consideration for a future glTF spec version, which would require breaking changes.
Projects
None yet
Development

No branches or pull requests

6 participants