Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package ext2fs crashes Node 14.4.0 #76

Closed
srguiwiz opened this issue Jun 20, 2020 · 11 comments
Closed

package ext2fs crashes Node 14.4.0 #76

srguiwiz opened this issue Jun 20, 2020 · 11 comments

Comments

@srguiwiz
Copy link

srguiwiz commented Jun 20, 2020

This crash appears to be related to Node.js pull request 33321 , which apparently is a partial relief, but not a full fix. I report here too, because I cannot tell yet who should fix it how.

To compare, the example given here runs fine with Node 12.18.1, an LTS, and crashes with Node 14.4.0, latest.

Reproduce with a small project with package.json

{
  "private": true,
  "name": "ext2fs-example",
  "version": "1.0.0",
  "description": "Demo using ext2fs.",
  "scripts": {
    "start": "node index.js"
  },
  "license": "SEE LICENSE IN LICENSE",
  "dependencies": {
    "ext2fs": "^1.0.31",
    "file-disk": "^6.0.1",
    "node-fetch": "^2.6.0"
  }
}

and with index.js

//
const util = require("util");
const fs = require("fs");

const ext2fs = require("ext2fs");
const fileDisk = require("file-disk");

const ext2fsMount = util.promisify(ext2fs.mount).bind(ext2fs);
const ext2fsUmount = util.promisify(ext2fs.umount).bind(ext2fs);

const diskImageFileOrigin = "https://github.com/balena-io-modules/node-ext2fs/raw/master/test/fixtures/ext2.img";
const diskImageFileDownload = "ext2.img";
const diskImageFile = "example.img";

// onetime setup
async function getSampleDiskImage () {
  const fetch = require("node-fetch");
  const body = await fetch(diskImageFileOrigin);
  const arrayBuffer = await body.arrayBuffer();
  await fs.promises.writeFile(diskImageFileDownload, Buffer.from(arrayBuffer));
}
// setup
async function ensureDiskImage() {
  try {
    await fs.promises.access(diskImageFileDownload);
  } catch (e) {
    console.log(`trying first once to download ${diskImageFileDownload} from ${diskImageFileOrigin}`);
    await getSampleDiskImage();
    console.log("done downloading");
  }
  console.log(`making ${diskImageFile} as a fresh copy of ${diskImageFileDownload}`);
  await fs.promises.copyFile(diskImageFileDownload, diskImageFile);
}

// the actual action
async function doSomethingWithDiskImage() {
  let diskImageFileHandle;
  let diskImageFileDisk;
  let diskImageFilesystem;
  try {
    diskImageFileHandle = await fs.promises.open(diskImageFile, "r+");
    diskImageFileDisk = new fileDisk.FileDisk(diskImageFileHandle.fd, false); // fd, readOnly
    diskImageFilesystem = await ext2fsMount(diskImageFileDisk, {}); // disk, options
    // promisify as needed
    diskImageFilesystem.mkdirAsync = util.promisify(diskImageFilesystem.mkdir);
    diskImageFilesystem.readdirAsync = util.promisify(diskImageFilesystem.readdir);
    //
    console.log("going to do something with disk image");
    await diskImageFilesystem.mkdirAsync("/somedir", { recursive: true }); // path, options e.g. {recursive:true}
    const filesInDir = await diskImageFilesystem.readdirAsync("/");
    console.log("files in dir: ", filesInDir);
    //
    console.log("done something with disk image, apparently no crash");
  } catch (e) {
    console.error("Error: ", e);
  } finally {
    if (diskImageFileDisk) await diskImageFileDisk.flush();
    if (diskImageFilesystem) await ext2fsUmount(diskImageFilesystem);
    if (diskImageFileDisk) await diskImageFileDisk.flush();
    if (diskImageFileHandle) await diskImageFileHandle.close();
  }
}

// invoke
(async () => {
  await ensureDiskImage();

  await doSomethingWithDiskImage();

  console.log("the end");
}
)();

If it runs fine (Node 12.18.1) it shows expected output like

making example.img as a fresh copy of ext2.img
going to do something with disk image
files in dir:  [ '1', '2', '3', '4', '5', 'lost+found', 'somedir' ]
done something with disk image, apparently no crash
the end

If it fails (Node 14.4.0) it shows something like

making example.img as a fresh copy of ext2.img
going to do something with disk image
#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7ffd8dfb16f0
 1: 0xa975a1  [node]
 2: 0x1972504 V8_Fatal(char const*, ...) [node]
 3: 0xe5c609 v8::internal::GlobalBackingStoreRegistry::Register(std::shared_ptr<v8::internal::BackingStore>) [node]
 4: 0xbc98d8 v8::ArrayBuffer::GetBackingStore() [node]
 5: 0xa037ef node::Buffer::Data(v8::Local<v8::Object>) [node]
 6: 0xa3750f  [node]
 7: 0xc06c4b  [node]
 8: 0xc081f6  [node]
 9: 0xc08876 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
10: 0x13a9fb9  [node]
Illegal instruction (core dumped)

This trace is from running in CentOS 7.8. For correctness, doing the required npm rebuild between changing Node versions with sudo n.

This problem in Node.js apparently occurs with concurrency and addon code, and hence appears only in some use cases of Node, and that apparently is why the first partial fix there has been accepted. Apparently, as shown here, there is still a problem.

@srguiwiz
Copy link
Author

@zvin Today we have learned this has to be addressed in the addon, i.e. here in the ext2fs package. It cannot be corrected in Node.js.

The problem can’t really be corrected in Node.js (without non-trivial overhead), so this is something to be addressed in the addons themselves, unfortunately.

Originally posted by @addaleax in nodejs/node#33321 (comment)

For better understanding there is more info at Node.js issue 32463, which caused above mentioned Node.js pull request 33321.

One important insight seems to be:

Except for the race condition, so, with node 14.x (also the 14.3 that implements the fix), it's no (more?) correct to have multiple create_node_buffer pointing at the same c buffer?

I am not sure I can fix this quickly, because this here looks like a big package for someone new to it. Any hints?

@srguiwiz
Copy link
Author

@zvin I am in node_ext2fs.cc looking at the two invocations of node::Buffer::Data, one in ReadWorker, and one in WriteWorker, as that method is mentioned in the trace and seems to fit the problem described at Node.

Alas, this is not code I am fluent in recently.

@srguiwiz
Copy link
Author

Now in js_io.cc looking at the NewBuffer and the comment above it.

Have added in js_io.cc a printf:

    buffer = NewBuffer((char*) s->data, size, noop, NULL).ToLocalChecked();
    printf("s->data is %x, size is %d, buffer is %x\n", s->data, size, buffer);

Seeing same address for s-data twice before it crashes.

Apparently that is what now no longer is allowed per Node, as linked above.

For example:

making example.img as a fresh copy of ext2.img
going to mount disk image
s->data is ac000c00, size is 1024, buffer is 49530c0
s->data is ac001490, size is 1024, buffer is 49530c0
s->data is ac001db0, size is 1024, buffer is 49530c0
s->data is ac0022e0, size is 1024, buffer is 49530c0
going to do something with disk image
s->data is a40009b0, size is 1024, buffer is 49530c0
s->data is a4001d40, size is 1024, buffer is 49530c0
s->data is ac0022e0, size is 1024, buffer is 49530c0
#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7ffdcaa61910
 1: 0xa975a1  [node]
 2: 0x1972504 V8_Fatal(char const*, ...) [node]
 3: 0xe5c609 v8::internal::GlobalBackingStoreRegistry::Register(std::shared_ptr<v8::internal::BackingStore>) [node]
 4: 0xbc98d8 v8::ArrayBuffer::GetBackingStore() [node]
 5: 0xa037ef node::Buffer::Data(v8::Local<v8::Object>) [node]
 6: 0xa3750f  [node]
 7: 0xc06c4b  [node]
 8: 0xc081f6  [node]
 9: 0xc08876 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
10: 0x13a9fb9  [node]
Illegal instruction (core dumped)

Here the slightly changed index.js I'm now using:

//

const util = require("util");
const fs = require("fs");

const ext2fs = require("ext2fs");
const fileDisk = require("file-disk");

const ext2fsMount = util.promisify(ext2fs.mount).bind(ext2fs);
const ext2fsUmount = util.promisify(ext2fs.umount).bind(ext2fs);

const diskImageFileOrigin = "https://github.com/balena-io-modules/node-ext2fs/raw/master/test/fixtures/ext2.img";
const diskImageFileDownload = "ext2.img";
const diskImageFile = "example.img";

// onetime setup
async function getSampleDiskImage () {
  const fetch = require("node-fetch");
  const body = await fetch(diskImageFileOrigin);
  const arrayBuffer = await body.arrayBuffer();
  await fs.promises.writeFile(diskImageFileDownload, Buffer.from(arrayBuffer));
}
// setup
async function ensureDiskImage() {
  try {
    await fs.promises.access(diskImageFileDownload);
  } catch (e) {
    console.log(`trying first once to download ${diskImageFileDownload} from ${diskImageFileOrigin}`);
    await getSampleDiskImage();
    console.log("done downloading");
  }
  console.log(`making ${diskImageFile} as a fresh copy of ${diskImageFileDownload}`);
  await fs.promises.copyFile(diskImageFileDownload, diskImageFile);
}

// the actual action
async function doSomethingWithDiskImage() {
  let diskImageFileHandle;
  let diskImageFileDisk;
  let diskImageFilesystem;
  try {
    diskImageFileHandle = await fs.promises.open(diskImageFile, "r+");
    diskImageFileDisk = new fileDisk.FileDisk(diskImageFileHandle.fd, false); // fd, readOnly
    console.log("going to mount disk image");
    diskImageFilesystem = await ext2fsMount(diskImageFileDisk, {}); // disk, options
    // promisify as needed
    diskImageFilesystem.mkdirAsync = util.promisify(diskImageFilesystem.mkdir);
    diskImageFilesystem.readdirAsync = util.promisify(diskImageFilesystem.readdir);
    //
    console.log("going to do something with disk image");
    // await diskImageFilesystem.mkdirAsync("/somedir", { recursive: true }); // path, options e.g. {recursive:true}
    let filesInDir = await diskImageFilesystem.readdirAsync("/");
    // filesInDir = await diskImageFilesystem.readdirAsync("/");
    filesInDir = await diskImageFilesystem.readdirAsync("/1");
    // filesInDir = await diskImageFilesystem.readdirAsync("/2");
    // filesInDir = await diskImageFilesystem.readdirAsync("/3");
    // filesInDir = await diskImageFilesystem.readdirAsync("/");
    console.log("files in dir: ", filesInDir);
    //
    console.log("done something with disk image, apparently no crash");
  } catch (e) {
    console.error("Error: ", e);
  } finally {
    if (diskImageFileDisk) await diskImageFileDisk.flush();
    if (diskImageFilesystem) await ext2fsUmount(diskImageFilesystem);
    if (diskImageFileDisk) await diskImageFileDisk.flush();
    if (diskImageFileHandle) await diskImageFileHandle.close();
  }
}

// invoke
(async () => {
  await ensureDiskImage();

  await doSomethingWithDiskImage();

  console.log("the end");
}
)();

@srguiwiz
Copy link
Author

It seems making a copy before the call is considered safe or recommended by Node coders. After a read one would have to copy back.

I am trying in js_io.cc in js_request instead of NewBuffer to CopyBuffer:

    //buffer = NewBuffer((char*) s->data, size, noop, NULL).ToLocalChecked();
    buffer = CopyBuffer((char*) s->data, size).ToLocalChecked();

and then trying in js_request_done if it was a read then copy back from buffer into s->data:

  if (ret->IsNull()) {
    s->ret = 0;
    if (s->type == 2) {
      ???
      memcpy(s->data, buffer_data, buffer_length);
    }

I have tried a few things where the ??? is, including node::Buffer::Data, GetBackingStore, and data and size and length methods, e.g. from a backing store Data() and ByteLength().

This also includes adding fields for buffer and size into the struct request_state_t.

While I think this is the right idea, making a copy of the data, I fail at running successfully.

One version I get something like char *node::Buffer::Data(Local<v8::Value>): Assertion val->IsArrayBufferView() failed which is a symptom of me not knowing enough about coding a Node add-on with C++.

Can someone more fluent with this code try this kind of copy before and after the call?

@zvin
Copy link
Contributor

zvin commented Jun 23, 2020

Thanks for the investigation @srguiwiz , I'll have a look as soon as possible.
Copying buffers seems like a waste of cpu time. Maybe we can track the data pointers and reuse Buffers if they already exist ?

@srguiwiz
Copy link
Author

Update in two sections, first current status here, then general analysis in next reply.

If in addition to buffer we add two more local variables

  v8::Local<v8::Value> buffer;
  std::shared_ptr<v8::BackingStore> backing_store;
  uint8_t *buffer_data;

and then in js_request right after NewBuffer try to find where the data is

    buffer = NewBuffer((uint8_t*) s->data, size, noop, NULL).ToLocalChecked();
    backing_store = v8::Local<v8::ArrayBuffer>::Cast(buffer)->GetBackingStore();
    buffer_data = backing_store->Data();

that works fine.

However, if keeping around the buffer in struct request_state_t and by the time we would try the same GetBackingStore() in the js_request_done we get a

Segmentation fault (core dumped)

as if the Buffer no longer is valid, as plausible explanation coming back after an async run_on_default_loop.

As mentioned earlier, this must be done with awareness of type, when buffer is null, and only for read (type 2) we need back the data from the buffer. Even when handling these condition, the problem occurs.

The plan was to switch to CopyBuffer instead of NewBuffer, but this describes a reduced version of code where development is stuck on a problem.

With that annoying change in Node 14 it doesn't run at all. For now anything that prevents if from crashing is good, even with an extra copy.

While using the adjective "annyoing" I also respect that Node developer folk have to "deal with" what is given to them by V8 developer folk. It wasn't within one organization that this could have been "discussed better before releasing".

Important to consider regarding testing this problem: Minimal activity doesn't crash, because minimal activity happens not to reuse any address with malloc. E.g. with the test code I have provided if limited (by commenting out) to reading only one directory and nothing else, then it doesn't crash. Reading a second directory happens to trigger a crash. Details how soon someone gets hit by the problem may depend on platform, how memory management is implemented, but for example under CentOS 7.8 it is as I describe here, it crashes after barely more than one activity with the ext2 file system.

@srguiwiz
Copy link
Author

A discussion of node-ext2fs under consideration of Node PR 33321 and Node issue 32463.

In most other circumstances one could angrily complain about Node 14 as described, asking "why do you guys have such a ridiculous constraint?" In this special case, if I understand it correctly, they have a free pass because Node is made with V8, which are made by two different organizations. Those who develop V8 have a different focus than those who then develop Node, which incorporates V8.

The constraint is, one must not accidentally invoke Nan::NewBuffer twice in a row with the same address as pointer to data, or else your whole app crashes. Details described at Node PR and issue. Can happen accidentally if malloc gives the same addess as pointer.

This has specific implications for coding the ext2fs package for Node.

Part of the challenge here are the layers of calls back and forth beweent Node and C++.

The ext2fs package is in and is to be invoked from Node. We could agree that is the point, for ease of use. Host file system access is provided by Node. A disk image can be opened.

The ext2fs package makes C++ or C calls to the native ext file system library e2fsprogs. The point is reuse of proven code. Don't reinvent the wheel. I think it is linking to e2fsprogs, is not only calling command line programs of e2fsprogs, is that correct?

This matters because when e2fsprogs wants data from a disk image, then the ext2fs package needs to call back to JavaScript of the ext2fs package to get that data.

This matters because the ext2fs package isn't really in control of what size reads and writes and what number of requests the native code of e2fsprogs might want from the disk image.

Therefore, the ext2fs package as I see it cannot preallocate from its JavaScript side the ArrayBuffer that would be needed.

Therefore I am looking at ways to enable the ext2fs package to create ArrayBuffer that won't crash this new version of Node.

For writes, Nan::CopyBuffer may be good enough. But after reads we need to get back data.

Having listed this, I don't want to limit anyone's thoughts.

@srguiwiz
Copy link
Author

I am now seeing in disk.js at E2FS_BLK_READ apparently the callback being given bytesRead and buffer. I would explore that buffer being passed whether via async.cc it can arrive in js_request_done and then trying whether we succeed with a GetBackingStore() from it.

As mentioned, had no luck with buffer being passed via request_state_t, hence considering this way. That is, if I can figure how.

Have not done the steps described in this one reply. Stepping away from computer for a while.

@srguiwiz
Copy link
Author

Apparently it is possible to pass back to C++ the bytesRead and buffer by changing in lib/disk.js in function request the callback to pass args like so

  request(type, offset, length, buffer, fn, s, callback2) {
    const callback = (err, ...args) => {
      callback2(err, fn, s, ...args);
    };

From there I was trying to look at them in src\async.cc in NAN_METHOD(callback_wrapper) and apparently got the buffer from info[4], also apparently successfully called GetBackingStore() but then Data() returned 0, nullptr.

    v8::Local<v8::Value> buffer = info[4];
    std::shared_ptr<v8::BackingStore> backing_store = v8::Local<v8::ArrayBuffer>::Cast(buffer)->GetBackingStore();
    uint8_t *buffer_data = backing_store->Data();

Again, I am just rigging here without being properly set up for C++ development or having refreshed my C++ in years. There could be some obvious flaw in this.

I am thinking, maybe something happens similar to Detach having been called.

@srguiwiz
Copy link
Author

Also, I happen to have noticed all reads seem to be size 1024 in my test with ext2. If so, maybe alternatively it would be possible to allocate a buffer that size from JavaScript, from disk.js? Viability depends on concurrency or execution in strict order. And, in what scope? Just correcting an earlier statement that reads could be any size. Although this doesn't seem like a preferred choice to me now, but I did notice it.

@zvin
Copy link
Contributor

zvin commented Nov 5, 2020

This should no longer be an issue since v3.0.0 as we build it as a WASM module.

@zvin zvin closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants