Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relative path interpretation for manifests is a little fubar #3053

Closed
ctb opened this issue Feb 29, 2024 · 2 comments · Fixed by #3054
Closed

relative path interpretation for manifests is a little fubar #3053

ctb opened this issue Feb 29, 2024 · 2 comments · Fixed by #3054

Comments

@ctb
Copy link
Contributor

ctb commented Feb 29, 2024

(this is what is triggering the problem in #3048 (comment))

with the following directory structure:

annie-test-1/sig.csv
annie-test-1/sig.zip
annie-test-2/mf.csv

mf.csv built by doing:

sourmash sig check annie-test-1/sig.zip --picklist annie-test-1/sig.csv:md5:md5 -m annie-test-2/mf.csv

then mf.csv looks like this:

# SOURMASH-MANIFEST-VERSION: 1.0
internal_location,md5,md5short,ksize,moltype,num,scaled,n_hashes,with_abundance,name,filename
annie-test-1/sig.zip,c11126d0591db94cd3d1c8568499375f,c11126d0,31,DNA,0,1000,1478,False,"CP001941.1 Aciduliprofundum boonei T469, complete genome",1.fa

and then

sourmash sig cat annie-test-2/mf.csv -o /dev/null -d

yields

sourmash.exceptions.IndexNotLoaded: Cannot load sourmash index: annie-test-2/annie-test-1/sig.zip

more info soon

related to:

@ctb
Copy link
Contributor Author

ctb commented Feb 29, 2024

The problem in the code is here:

https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/save_load.py#L162-L171

here:

https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/index/__init__.py#L1157-L1158

and here:

https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/index/__init__.py#L1188-L1192

basically, the prefix is set from the dirname of the manifest, and then prepended wilhe-nilhe to the internal_locations of the rows in the manifest.

A workaround that @AnneliektH used successfully was to build the manifest with sig check using absolute paths to the zip files.

@ctb
Copy link
Contributor Author

ctb commented Mar 3, 2024

see discussion in #3048 (comment); once #3054 is merged, solution will be to use sourmash sig check --relpath.

@ctb ctb closed this as completed in #3054 Mar 8, 2024
ctb added a commit that referenced this issue Mar 8, 2024
…` and `sig collect` (#3054)

This PR updates `sig collect` and `sig check` so that they can produce
standalone manifests that work properly with default sourmash loading
behavior. The default behavior produces broken manifests in some
situations and is not changed, but will be deprecated in v5.

## Details

Currently, `sig collect` and `sig check` default to producing standalone
manifests with internal path locations relative to the current working
directory. This conflicts with the default `StandaloneManifest` behavior
implemented in `save_load.py` that loads path locations relative to the
manifest location. As a result, whenever the manifest was in a
subdirectory, the standalone manifests output by `sig check` and `sig
collect` were broken. The only way to make good manifests in this
situation was to use `sig collect --abspath`, but `sig check` didn't
support `--abspath`, and using absolute paths is brittle in situations
where you want to distribute manifests.

This PR adds `--relpath` to both `sig check` and `sig collect`, and adds
`--abspath` to `sig check`. It also demonstrates the bad behavior in
tests and annotates the tests appropriately.

See
#3008 (comment)
for more detailed discussion of why I think `--relpath` is the right
behavior for the future.

- [x] adds `--abspath` and `--relpath` to `sig check`, to properly
support relative paths;
- [x] adds `--relpath` to `sig collect`, to properly support relative
paths;
- [x] documents this behavior properly for creating standalone
manifests;
- [ ] create issue to change default `sig check` and `sig collect`
behavior for v4, and disable cwd behavior.

Techie TODO:
- [x] explicitly test `relpath` and `abspath` behavior in `sig check`;
- [x] explicitly test `relpath` behavior in `sig collect`
- [x] write some tests for `sig check` and `sig collect` to explore the
relative path loading issue, with all three combinations of relpath: mf
in cwd, sigs in subdir; mf in subdir, sigs in cwd; mf in subdir, sigs in
subdir.

Related issues:
* Addresses #3008
* Addresses issues in
#3048 by updating `sig
check` to support `--relpath`;
* Fixes #3053 -
`--relpath` again

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ctb added a commit that referenced this issue Mar 20, 2024
This PR:
* fixes a minor nit in `sourmash sig collect` output where it said
"loaded 0 signatures"
* updates a lot of the documentation around standalone manifests to
encourage their use
* in tandem, modifies docs to discourage loading from
pathlists/from-files and directory hierarchies

TODO:
- [x] look at TODO item re directories in sig collect
- [x] think about adding
#3023 information into
docs about lazy loading; maybe in the advanced databases document?
- [x] update `sig manifest` docs to point out that they do not generate
standalone manifests
- [x] revisit branchwater plugin documentation to, to either make issues
or make changes
- [x] update `sig check` and `sig collect` to tell people to expand
their paths ref #3039
- [x] update docs more to recommend against pathlists and directories
per #3040

Related issues:
* sourmash-bio/sourmash_plugin_branchwater#235
* Fixes #3048
* Fixes #3009 by
recommending `sig collect` and `sig check` instead of `sig manifest` for
making standalone manifests
* #3053
* Fixes #3023
* Fixes #3039
* Fixes #3040

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant