Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.openAsBlob() does not work properly for files > 2GB #52585

Open
joelrbrandt opened this issue Apr 18, 2024 · 7 comments
Open

fs.openAsBlob() does not work properly for files > 2GB #52585

joelrbrandt opened this issue Apr 18, 2024 · 7 comments
Labels
fs Issues and PRs related to the fs subsystem / file system.

Comments

@joelrbrandt
Copy link
Contributor

joelrbrandt commented Apr 18, 2024

Version

v21.7.3 (and v20.12.2)

Platform

Darwin bender.local 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000 arm64

Subsystem

fs

What steps will reproduce the bug?

Steps to reproduce bug:

  1. Generate a 3 GB file with dd if=/dev/random of=random.bin bs=1048576 count=3072
  2. Launch node (v21.7.3 and v20.12.2 both repro for me)
  3. const b = await fs.openAsBlob("random.bin")
  4. b.slice(2**31-2, 2**31-1) <-- works
  5. b.slice(2**31-1, 2**31) <-- returns a slice of zero length
  6. b.slice(2**31, 2**31+1) <-- crashes

Full textual output of doing this given below.

Also, for files greater than 4GB, the size of the blob is wrong. It appears to be exactly 4*(2**30) less than the actual size. For example, for a 5*(2**30) byte file, the blob size is reported as 1073741824, which is (2**30). And, for a file that is exactly 4*(2**30), it reports a size of zero.

How often does it reproduce? Is there a required condition?

Always reproduces.

What is the expected behavior? Why is that the expected behavior?

Can operate on the Blob normally.

What do you see instead?

Described above. See output below.

Additional information

$ dd if=/dev/random of=random.bin bs=1048576 count=3072
3072+0 records in
3072+0 records out
3221225472 bytes transferred in 4.213743 secs (764457033 bytes/sec)
$ ls -l random.bin
-rw-r--r--  1 jbrandt  staff  3221225472 Apr 18 15:04 random.bin
$ node -v
v21.7.3
$ node
Welcome to Node.js v21.7.3.
Type ".help" for more information.
> const b = await fs.openAsBlob("random.bin")
undefined
> b
Blob { size: 3221225472, type: '' }
> b.slice(2**31-2, 2**31-1)
Blob { size: 1, type: '' }
> b.slice(2**31-1, 2**31)
Blob { size: 0, type: '' }
> b.slice(2**31, 2**31+1)

  #  node[96070]: static void node::Blob::ToSlice(const FunctionCallbackInfo<v8::Value> &) at ../src/node_blob.cc:247
  #  Assertion failed: args[0]->IsUint32()

----- Native stack trace -----

 1: 0x1041dfc0c node::Assert(node::AssertionInfo const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 2: 0x10589b4a4 node::Blob::ToSlice(v8::FunctionCallbackInfo<v8::Value> const&) (.cold.2) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 3: 0x1041a8994 node::Blob::ToSlice(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 4: 0x104c32a38 Builtins_CallApiCallbackGeneric [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 5: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 6: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 7: 0x104c2e8ac Builtins_JSEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 8: 0x104c2e594 Builtins_JSEntry [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
 9: 0x1044daef8 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
10: 0x1044db0e0 v8::internal::Execution::CallScript(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
11: 0x10439ef68 v8::Script::Run(v8::Local<v8::Context>, v8::Local<v8::Data>) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
12: 0x1041c8d1c node::contextify::ContextifyScript::EvalMachine(v8::Local<v8::Context>, node::Environment*, long long, bool, bool, bool, v8::MicrotaskQueue*, v8::FunctionCallbackInfo<v8::Value> const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
13: 0x1041c87fc node::contextify::ContextifyScript::RunInContext(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
14: 0x104c32a38 Builtins_CallApiCallbackGeneric [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
15: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
16: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
17: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
18: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
19: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
20: 0x10a6531d0 
21: 0x10a6ad6ac 
22: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
23: 0x104c30b84 Builtins_InterpreterEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
24: 0x10a66b508 
25: 0x10a6767a8 
26: 0x10a66cb7c 
27: 0x10a65307c 
28: 0x10a6ad6ac 
29: 0x10a6a73f0 
30: 0x104c69e04 Builtins_GeneratorPrototypeNext [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
31: 0x10a65c654 
32: 0x10a65307c 
33: 0x10a6ad6ac 
34: 0x10a66d390 
35: 0x10a6754b8 
36: 0x10a66e24c 
37: 0x10a675b48 
38: 0x10a64f78c 
39: 0x104c2e8ac Builtins_JSEntryTrampoline [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
40: 0x104c2e594 Builtins_JSEntry [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
41: 0x1044daef8 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
42: 0x1044da478 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
43: 0x1043b2404 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
44: 0x104100fc0 node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
45: 0x104117c00 node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
46: 0x1042af900 node::StreamBase::CallJSOnreadMethod(long, v8::Local<v8::ArrayBuffer>, unsigned long, node::StreamBase::StreamBaseJSChecks) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
47: 0x1042b0f7c node::EmitToJSStreamListener::OnStreamRead(long, uv_buf_t const&) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
48: 0x1042b526c node::LibuvStreamWrap::OnUvRead(long, uv_buf_t const*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
49: 0x1042b59a0 node::LibuvStreamWrap::ReadStart()::$_1::__invoke(uv_stream_s*, long, uv_buf_t const*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
50: 0x104c1a390 uv__stream_io [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
51: 0x104c21da4 uv__io_poll [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
52: 0x104c0fda4 uv_run [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
53: 0x1041016f0 node::SpinEventLoopInternal(node::Environment*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
54: 0x10421f754 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
55: 0x10421f4d4 node::NodeMainInstance::Run() [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
56: 0x10419eb48 node::Start(int, char**) [/Users/jbrandt/.nvm/versions/node/v21.7.3/bin/node]
57: 0x19e7060e0 start [/usr/lib/dyld]

----- JavaScript stack trace -----

1: slice (node:internal/blob:265:21)
2: REPL5:1:3
3: runInThisContext (node:vm:136:12)
4: defaultEval (node:repl:598:22)
5: bound (node:domain:432:15)
6: runBound (node:domain:443:12)
7: onLine (node:repl:927:10)
8: emit (node:events:531:35)
9: emit (node:domain:488:12)
10: [_onLine] (node:internal/readline/interface:416:12)


zsh: abort      node
@joelrbrandt joelrbrandt changed the title fs.openAsBlob does not work properly for files > 2GB fs.openAsBlob() does not work properly for files > 2GB Apr 18, 2024
@joelrbrandt
Copy link
Contributor Author

joelrbrandt commented Apr 18, 2024

FWIW, this also reproduces on a linux/arm64 container image running in Docker for Mac.

root@f7a53c9fe57c:/# uname -a
Linux f7a53c9fe57c 6.6.16-linuxkit #1 SMP Fri Feb 16 11:54:02 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

root@f7a53c9fe57c:/# node -v       
v20.12.2

root@f7a53c9fe57c:/# dd if=/dev/random of=random.bin bs=1048576 count=3072
3072+0 records in
3072+0 records out
3221225472 bytes (3.2 GB, 3.0 GiB) copied, 8.45959 s, 381 MB/s

root@f7a53c9fe57c:/# node
Welcome to Node.js v20.12.2.
Type ".help" for more information.
> const b = await fs.openAsBlob("random.bin")
undefined
> b
Blob { size: 3221225472, type: '' }
> b.slice(2**31-1, 2**31)
Blob { size: 0, type: '' }
> b.slice(2**31, 2**31+1)

  #  node[15]: static void node::Blob::ToSlice(const v8::FunctionCallbackInfo<v8::Value>&) at ../src/node_blob.cc:247
  #  Assertion failed: args[0]->IsUint32()

----- Native stack trace -----

 1: 0xcab09c node::Assert(node::AssertionInfo const&) [node]
 2: 0xc65b38 node::Blob::ToSlice(v8::FunctionCallbackInfo<v8::Value> const&) [node]
 3: 0xf1f228 v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [node]
 4: 0xf1f9e8  [node]
 5: 0xf1fe00 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
 6: 0x189c964  [node]

----- JavaScript stack trace -----

1: slice (node:internal/blob:266:21)
2: REPL4:1:3
3: runInThisContext (node:vm:136:12)
4: defaultEval (node:repl:598:22)
5: bound (node:domain:432:15)
6: runBound (node:domain:443:12)
7: onLine (node:repl:927:10)
8: emit (node:events:530:35)
9: emit (node:domain:488:12)
10: [_onLine] (node:internal/readline/interface:416:12)


Aborted

@VoltrexKeyva VoltrexKeyva added the fs Issues and PRs related to the fs subsystem / file system. label Apr 19, 2024
@kylo5aby
Copy link
Contributor

kylo5aby commented Apr 19, 2024

Hi @joelrbrandt , In the implementation of Blob.slice, it's hinted that the start and end should be within the uint32 range,
Since it is never documented and actually there allows Blob have index out of range uint32, should it be documented to make it clear or validate the range of index instead of using such as start | 0 and end | 0? /cc @nodejs/buffer

@joelrbrandt
Copy link
Contributor Author

@kylo5aby thanks for your quick response (and sorry for my slow follow-up).

it's hinted that the start and end should be within the uint32 range

I'm a bit confused by this part of your response. Where is this hinted? In the node implementation?

I don't see any mention of a 32-bit limitation in the File API spec. So, it doesn't seem like the Blob.prototype.slice() API should be limited to the uint32 range. Also, if it were limited to the uint32 range, it should go up to 2**32, not 2*31. But, the .slice() API supports negative values (distance from end), so maybe you meant int32?

Independent of the issue with .slice(), node reports the wrong size for Blobs that are larger than 4GB.

Interestingly, if I get a stream of the Blob (with Blob.prototype.stream()), I can read all the bytes in Blobs larger than 4GB.

See below where I do the following:

  1. Generate 5GB of random data
  2. Hash that data with openssl
  3. Launch node v20.12.2
  4. Open that random data in node with fs.openAsBlob
  5. Observe the incorrectly reported size of 1GB (instead of 5GB)
  6. Construct a stream, and send all the bytes to a hash using async iteration of the stream
  7. Digest the hash to exactly the same value returned by openssl
$ dd if=/dev/random of=random.bin bs=1048576 count=5120
5120+0 records in
5120+0 records out
5368709120 bytes transferred in 6.613665 secs (811760063 bytes/sec)
$ openssl dgst random.bin 
SHA256(random.bin)= f80fac6d9dc913e33d2c6a69783fd8fc3b4ed18f9c844aca801358e437096e6c
$ node
Welcome to Node.js v20.12.2.
Type ".help" for more information.
> b = await fs.openAsBlob("random.bin")
Blob { size: 1073741824, type: '' }
> b.size
1073741824
> const { createHash } = require("node:crypto")
undefined
> const h = createHash("sha256")
undefined
> const s = b.stream()
undefined
> for await (const a of s) {
... h.update(a);
... }
undefined
> h.digest("hex")
'f80fac6d9dc913e33d2c6a69783fd8fc3b4ed18f9c844aca801358e437096e6c'

Finally, here's a codepen that can be used to check that files > 4GB work fine in all major modern browsers (open console to see output): https://codepen.io/joelrbrandt/pen/wvZRXvY

@joelrbrandt
Copy link
Contributor Author

joelrbrandt commented Apr 23, 2024

Finally, if fs.openAsBlob() is going to have a 4GB limit (and Blob.prototype.slice() is going to have a 2GB limit), it would certainly be more ideal if they threw RangeError exceptions rather than crashing node on out-of-range values.

Buffer.alloc() does this:

> b = Buffer.alloc(2**33)
Uncaught:
RangeError [ERR_OUT_OF_RANGE]: The value of "size" is out of range. It must be >= 0 && <= 4294967296. Received 8_589_934_592
    at Function.alloc (node:buffer:389:3) {
  code: 'ERR_OUT_OF_RANGE'
}

@kylo5aby
Copy link
Contributor

kylo5aby commented Apr 23, 2024

Hi @joelrbrandt, Thank you for the information.

  1. For fs.openAsBlob(), if it read a file with size larger than 4GB, the generated blob has a wrong size as you mentioned:

Observe the incorrectly reported size of 1GB (instead of 5GB)

There maybe a bug in fs.openAsBlob(), it wrongly set the size. do you think it works as expected? @jasnell . AFAIK, new Blob() can correctly set the size.
2. the uint32 range is for Blob.slice instead of Blob, it allows negative index as you mentioned, and will converted to positive under the hood and check whether it under the uin32 range. I have a PR #52588 to try to solve if there is a large start or end(larger than 2^31)

b.slice(231-1, 231)
Blob { size: 0, type: '' }
b.slice(231, 231+1)
node[15]: static void node::Blob::ToSlice(const v8::FunctionCallbackInfov8::Value&) at ../src/node_blob.cc:247
Assertion failed: args[0]->IsUint32()

@joelrbrandt
Copy link
Contributor Author

@kylo5aby

AFAIK, new Blob() can correctly set the size

I'm not sure if this is accurate, either. Trying to construct a new Blob from a set of ArrayBuffers that total a size greater than 4GB also fails:

$ node
Welcome to Node.js v18.13.0.
Type ".help" for more information.
> const arrs = []
undefined
> for (let i = 0; i < 5 ; i++) {
...     arrs.push(new Uint8Array(2**30))
... }
5
> arrs.reduce((acc, cur) => acc + cur.byteLength, 0)
5368709120
> const b = new Blob(arrs)
Uncaught:
RangeError [ERR_BUFFER_TOO_LARGE]: Cannot create a Buffer larger than 4294967295 bytes
    at __node_internal_captureLargerStackTrace (node:internal/errors:491:5)
    at new NodeError (node:internal/errors:400:5)
    at new Blob (node:internal/blob:165:13) {
  code: 'ERR_BUFFER_TOO_LARGE'
}

However, this code also works fine in a browser, resulting in a 5GB Blob.

@kylo5aby
Copy link
Contributor

kylo5aby commented Apr 24, 2024

You are right. In versions up to v21, on 64-bit systems, the maximum size of a Buffer was 4GB. However, in the version I last used, compiled from the main branch, this limit has been exceeded, that's why I said new Blob is ok to create a blob larger than 4GB.
Conversely, fs.openAsBlob can successfully create blobs larger than 4GB, but there might be issues with the size.

nodejs-github-bot pushed a commit that referenced this issue May 6, 2024
PR-URL: #52588
Refs: #52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Ch3nYuY pushed a commit to Ch3nYuY/node that referenced this issue May 8, 2024
PR-URL: nodejs#52588
Refs: nodejs#52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this issue May 8, 2024
PR-URL: #52588
Refs: #52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
marco-ippolito pushed a commit that referenced this issue Jun 17, 2024
PR-URL: #52588
Refs: #52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
EliphazBouye pushed a commit to EliphazBouye/node that referenced this issue Jun 20, 2024
PR-URL: nodejs#52588
Refs: nodejs#52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
bmeck pushed a commit to bmeck/node that referenced this issue Jun 22, 2024
PR-URL: nodejs#52588
Refs: nodejs#52585
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

No branches or pull requests

3 participants