Skip to content
This repository has been archived by the owner on Sep 9, 2021. It is now read-only.

query interface #9

Open
pgte opened this issue Jul 5, 2017 · 13 comments
Open

query interface #9

pgte opened this issue Jul 5, 2017 · 13 comments
Labels
status/ready Ready to be worked

Comments

@pgte
Copy link
Contributor

pgte commented Jul 5, 2017

While trying to adapt a datastore into a Leveldown interface, I came across some impedance. Mind you that I'm new to the datastore eco-system, so I may be very wrong.

The first part of it is the fact that a query returns a pull stream. While I love pull-streams, transforming them into into a Leveldown iterator interface is not trivial as far as I know. Here, you may argue that the pull-stream interface is superior, but my guess is that very few developers are familiar with it. Also, there are other alternatives that are more standard, ranging from the Node streams to ES6 iterators.

The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.

One option which I like would be to provide a declarative querying interface similar to the Leveldown one, which then allows us to translate these into back-end options on 99% of the cases.

@dignifiedquire
Copy link
Member

@pgte I am confused I already did the work of writing a generic level interface for datastore that does all this work here. http://github.com/ipfs/js-datastore-level it accepts any leveldown compatible implementation

@dignifiedquire
Copy link
Member

the conversion from iterator to pull-stream is done here: https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L90 it's a bit tricky but works quite well as far as I understand

@dignifiedquire
Copy link
Member

In terms of the options that we support, this is a 1:1 port of the interfaces go provides, so if we want to change anything there we should consider those settings first.

@dignifiedquire
Copy link
Member

While trying to adapt a datastore into a Leveldown interface,

Oh I am sorry I miss understood you are trying to go the other way around, I haven't looked into that yet.

@dignifiedquire
Copy link
Member

The main reason I ended up not using the leveldown interface is two fold.

  1. it is missing some options that go implements that I wanted to support and we are using in the dht, especially prefix
  2. We already have one lazy iterative interface in the code base which is pull-streams and the datastores should fit into here as well as possible. Using pull-streams for this seemed the natural way to go, as I would otherwise in modules like the dht, have to adapt the iterator to a pull stream anyway

@dignifiedquire
Copy link
Member

@pgte
Copy link
Contributor Author

pgte commented Jul 5, 2017

@dignifiedquire that's a great example. Here you mostly have to create a full iterator that iterates over the entire DB snapshot, while filtering it in memory:
https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L96-L100
It's not efficient, wouldn't you say?

@dignifiedquire
Copy link
Member

It's not great, but leveldown doesn't expose the filtering in the database anyway in a way that I need, so not seeing how this could be improved.

@dignifiedquire
Copy link
Member

Namely it does not allow for doing any sort of key based filtering directly, without pulling all entries out

@pgte
Copy link
Contributor Author

pgte commented Jul 5, 2017

@dignifiedquire yeah, it allows for key partitioning, and range queries. I understand that's very limited, but it caters to most use cases I've seen using a kv-store, you just have to decide wisely about the key partitioning / subleveling and perhaps implementing materialised views.
I thought the datastore interface was meant to those cases.
What use cases is interface-datastore trying to solve?

@dignifiedquire
Copy link
Member

Abstract storage layers including but not limited to file system, key value stores and sql databases. With a way to combine all those into a path like namespaces. Similar to the goals described here

In addition one important goal is to support all operations that ipfs needs to achieve feature parity with go-ipfs and being able to read and write repos the same way go-ipfs does.

@pgte
Copy link
Contributor Author

pgte commented Jul 6, 2017

My opinion is that the query interface is perhaps too generic to enable any efficient implementation.
I propose that we enable some form of query options that allows range queries upon keys.

Without this, for instance, I'm not able to translate a levelDB query into a datastore query in a way that is efficient during runtime..

@daviddias daviddias added the status/ready Ready to be worked label Aug 25, 2018
@Gozala
Copy link
Contributor

Gozala commented May 21, 2020

The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.

This is also something I'm running into in an attempt to move js-ipfs into shared worker (ipfs/js-ipfs#3022). Problem is you can not pass functions across the threads so basically you'd have to send all the data from worker to the main thread and then filter it out there. I think it would be better to represent query as data and provide more complicating filtering as an exercise to the user. That way

  • Query could be optimized for cases that @pgte mentioned and for multithread use cases.
  • This would work better with ipfs-http-client so that host can filter data without passing it onto client.
  • Generally fits better systems that cross language boundaries.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

4 participants