Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for external storage of sketches in RocksDB indexes #415

Open
ctb opened this issue Aug 12, 2024 · 1 comment
Open

Improve support for external storage of sketches in RocksDB indexes #415

ctb opened this issue Aug 12, 2024 · 1 comment

Comments

@ctb
Copy link
Collaborator

ctb commented Aug 12, 2024

In #381, we found a problem when using RocksDB inverted indexes with zip files.

Starting from #390 (comment),

I found the problem in the storage::InnerStorage struct, which was opening the zip file using a relative path:

InnerStorage::from_spec: opening zip://list.sig.zip

Conveniently 😭 this is not a new bug or problem, this is the same problem that I discussed ad nauseum over in sourmash-bio/sourmash#3008, where we found it to be a problem for manifests - triggered initially by sourmash-bio/sourmash#3053.

In brief, the problem is this: when we refer to an external storage, how do we interpret non-absolute paths?

As I wrote in #3008, "I am slowly coming around to the idea that loading things relative to the manifest path is correct."

So I think the Right Fix would be to rejigger the path to the zip file to be interpreted relative to the RocksDB location.

However, there is another fun component to this, which is that I'm not sure it's documented anywhere that RocksDB indices created by this plugin store the sketches externally, which is needed for gather (but not for manysearch)... Per @luizirber on slack,

me:

do RocksDB disk rev indexes, once created, require access to the original signatures to function?

@luizirber:

Yes for gather, no for search
You can also build the index with internal sigs, but I don’t think it’s the default


In sourmash-bio/sourmash#3250, @luizirber implemented internal storage for RocksDB, and over in #390 that is now being set as the default for new RocksDB indexes; documentation is being updated as well, in a new PR.

However, the original issue of relative path interpretation still remains, so this issue is here to remind us of all that went before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant