Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiread #13

Closed
dominictarr opened this issue Feb 24, 2013 · 11 comments
Closed

multiread #13

dominictarr opened this issue Feb 24, 2013 · 11 comments
Labels
discussion Discussion enhancement New feature or request help wanted Extra attention is needed

Comments

@dominictarr
Copy link
Contributor

An operation that is like a batch, but for gets, but without the keys being adjacent in order.

I mention this because at campjs @rvagg and I discussed where the boundry between userland and core should be, regarding issues like: #9 (comment)

We couldn't think of many features that you might want to do in the C binding, but we decided that we should evaluate them on a case-by-case basis for a bit, until it becomes clearer.

So this issue is mainly intended for discussion, although it may be useful.

hmm, so for the purposes of discussion, the api is

db.getAll(arrayOfKeys, cb)

Would an operation like this also benefit from crossing the c-js boundry only once,
passing an array each way?

@rvagg
Copy link
Member

rvagg commented Feb 24, 2013

Should benefit very much, I think @juliangruber had something similar in Leveled? We can do multi-threaded get()s at the moment so it's not all bad, but we have a limit to the number of threads.

It's interesting to think about this in relation to a range delete cause they could share a similar API.

db.getAll(array, cb)
db.delAll(array, cb)
db.delRange(options, cb) // where options is the same as for db.readStream()
db.getRange(options, cb) // where options is the same as for db.readStream()

Of course the getRange() on is a bit trickier cause of potential size problems, perhaps we should put an upper limit on it or at least enforce the use of a 'limit' property so at least they know what they are doing (they could put limit:99999 if they wanted but then at least it'd be explicit).

@juliangruber
Copy link
Member

I had db#range(from, to, cb) and db#find(glob, cb), no multiget though. For those methods the performance gain was huge because iterators require so many js-c boundry crossings.

With a multiget the gain would increase with the numbers of keys given. I'm not sure yet about when this would pay off. Someone would have to do an implementation so we can see.

Oh, and I'm for overloading the db#get function for this.

@dominictarr
Copy link
Contributor Author

getRange could just alias to iterator, with the limit set very large.
I'm opposed to adding safety features to leveldown, but am happy to have them in levelup.

see section about the "The Hole Hawg" in this article http://artlung.com/smorgasborg/C_R_Y_P_T_O_N_O_M_I_C_O_N.shtml

@rvagg
Copy link
Member

rvagg commented May 12, 2014

ref #9

Marking as "help wanted" if someone wants to come up with an interesting solution for this that we may want to consider merging in. Again, unlikely to be something exposed at the LevelUP layer but might be nice to have down here.

@vweevers vweevers added the enhancement New feature or request label Dec 31, 2018
@matonga
Copy link

matonga commented Jan 2, 2019

Hi there,

I've implemented this functionality, it looks like this:

db = leveldown (...), db.open, etc...

db.map (keys, [options, ], function (error, values) {
    // whatever you want to do here
});

options are the same ones used by .get (... options, ...).

keys is an array of strings / buffers.
values is an array of strings / buffers / undefineds, returned in same order of supplied keys. Keys not found are left undefined in resulting values array, keys found are returned as buffers or strings (depending on supplied options or defaults).

In my use case I'm actually checking for pre-existing non-consecutive keys before insertion. I did some benchmarks with 1 million keys and found some interesting results:

  • If the keys are all pre-existing, .map returns values three times faster than using .get
  • If the keys don't exist, .map returns values about ten times faster. This is usually the case for me so I'm going to fork leveldown, but anyway... I have a question to see if I can submit a pull request:

I followed other methods implementation, so had to edit both abstract-leveldown and leveldown. Added these methods:

abstract-leveldown -> AbstractLevelDOWN.prototype.map (parameter validation, options defaults, call to this._map)
leveldown -> LevelDOWN.prototype._map (call to binding.map_do)
binding.cc -> MapWorker, NAPI_METHOD(map_do)

My question is if I should pull-request the two separate github projects (abstract-leveldown & leveldown) or should I modify the implementation to keep all changes inside levedown?

abstract-leveldown could polyfill ._map implementation when not found, by issuing ._get by hand for every supplied key instead (however this would be is a lot slower) (benchmarked it too).

Don't hesitate to ask me any question, english is not my native language and I may have screwed the explanation a bit 😄

@vweevers
Copy link
Member

vweevers commented Jan 2, 2019

@matonga nice work!

With iterator.seek(), we've made it possible for "multiread" aka "batch get" to be implemented in userland too. Or indeed as an abstract-leveldown fallback, though at the moment leveldown is the only implementation that supports seek(). See Level/levelup#491 (comment):

I think, rather than a batch get, I prefer to officially expose iterators with a seek() method. This not only enables fetching multiple non-contiguous keys but also getting multiple ranges of keys. You seek, read some, seek again. Also you get backpressure and can do things like skip scans (for compound aka composite indexes).

Would be interesting to compare that against your native solution. I'm guessing yours will be faster because in theory it only has to cross the C++/JS boundary once. Is your code on GitHub?

@vweevers
Copy link
Member

vweevers commented Jan 2, 2019

My question is if I should pull-request the two separate github projects (abstract-leveldown & leveldown) or should I modify the implementation to keep all changes inside levedown?

It's fine for a new feature to live in leveldown for a while, for people to play with. The same happened with iterator.seek(). The barrier for new abstract-leveldown features is fairly high: there must be a good use case (that can't be solved with existing features), a solid API and at least a few other implementations that can support it (in theory).

matonga pushed a commit to matonga/leveldown that referenced this issue Jan 2, 2019
@matonga
Copy link

matonga commented Jan 2, 2019

I'm guessing yours will be faster because in theory it only has to cross the C++/JS boundary once. Is your code on GitHub?

@vweevers that's right, I find it works faster, maybe because it crosses the C++/JS boundary once or maybe because there is less overhead (one Javascript call to map() vs. many calls to get() / seek()), or maybe it's a combination of both. I've just uploaded the code to GitHub:

matonga@b86b0a1

Any suggestions welcome. Guess I should write some test/map-test.js for a proper pull request?

The barrier for new abstract-leveldown features is fairly high: there must be a good use case (that can't be solved with existing features), a solid API and at least a few other implementations that can support it (in theory).

Just in case, I've moved all code to leveldown.

@vweevers
Copy link
Member

vweevers commented Jan 2, 2019

matonga@b86b0a1

Any suggestions welcome. Guess I should write some test/map-test.js for a proper pull request?

Looks like a viable solution. We can discuss details in the PR :) It's up to you whether you want to write a test before opening a PR. It may help us to understand the intended behavior. It would be great if you can share your benchmarks as well (code and results). Thanks for taking this on!

matonga pushed a commit to matonga/leveldown-hyper that referenced this issue Jun 19, 2019
@vweevers
Copy link
Member

Level/community#101

@matonga
Copy link

matonga commented Oct 1, 2021

@vweevers thank you! good to see a multiread implementation being merged into leveldown and related repos at last.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants