Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest testing and grokking (branchwater plugin) #237

Closed
ctb opened this issue Feb 24, 2024 · 2 comments
Closed

manifest testing and grokking (branchwater plugin) #237

ctb opened this issue Feb 24, 2024 · 2 comments

Comments

@ctb
Copy link
Collaborator

ctb commented Feb 24, 2024

this manifest works for multisearch:

# SOURMASH-MANIFEST-VERSION: 1.0
internal_location,md5,md5short,ksize,moltype,num,scaled,n_hashes,with_abundance,name,filename
podar-small/1.fa.sig.gz,c11126d0591db94cd3d1c8568499375f,c11126d0,31,DNA,0,1000,1478,0,"CP001941.1 Aciduliprofundum boonei T469, complete genome",podar-ref/1.fa

with the internal location being podar-small/1.fa.sig.gz, a sig.gz file.

this manifest does not work with multisearch:

# SOURMASH-MANIFEST-VERSION: 1.0
internal_location,md5,md5short,ksize,moltype,num,scaled,n_hashes,with_abundance,name,filename
podar-small/1.fa.sig.zip,c11126d0591db94cd3d1c8568499375f,c11126d0,31,DNA,0,1000,1478,False,"CP001941.1 Aciduliprofundum boonei T469, complete genome",podar-ref/1.fa

because the internal location is a zip file, podar-small/1.fa.sig.zip.

This is inconsistent with sourmash (which, ok, sure) and I think this should work.

(No need to fix it just yet, just trying to understand the scope of what maybe needs to be done)

@ctb
Copy link
Collaborator Author

ctb commented Feb 24, 2024

This works with manysearch: sig.gz x zip

sourmash scripts manysearch podar-small/1.fa.sig.gz podar-small/1.fa.sig.zip -o out

as does sig.gz x sig.gz:

sourmash scripts manysearch podar-small/1.fa.sig.gz podar-small/1.fa.sig.gz -o out

and also zip x zip:

sourmash scripts manysearch podar-small/1.fa.sig.zip podar-small/1.fa.sig.zip -o out

manifest with .sig.gz file

If we build a manifest using a .sig.gz file with:

sourmash sig collect -F csv -o xyz.mf podar-small/1.fa.sig.gz

then manifest x manifest works!

sourmash scripts manysearch xyz.mf xyz.mf -o out 

🎉

manifest with .zip file

if we build a manifest with a zip file, then problems:

sourmash sig collect -F csv -o xyz.mf podar-small/1.fa.sig.zip

and now using that as a list for manysearch fails:

sourmash scripts manysearch podar-small/1.fa.sig.zip xyz.mf -o out

yields WARNING: skipped 1 search paths - no compatible signatures.

Likewise, using the manifest as a query fails with a different error:

sourmash scripts manysearch xyz.mf podar-small/1.fa.sig.zip -o out

yields

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Error: Failed to load query record: CP001941.1 Aciduliprofundum boonei T469, complete genome

can pathlists contain zip files?

nope - if we create xyz.paths.txt like so,

ls -1 podar-small/1.fa.sig.zip > xyz.paths.txt

and then run

sourmash scripts multisearch xyz.paths.txt xyz.paths.txt -o out

we get

ksize: 31 / scaled: 1000 / moltype: DNA / threshold: 0.01
searching all sketches in 'xyz.paths.txt' against 'xyz.paths.txt' using 8 threads
Reading query(s) from: 'xyz.paths.txt'
Sketch loading error: expected value at line 1 column 1
WARNING: could not load sketches from path 'podar-small/1.fa.sig.zip'
No valid signatures found in query pathlist 'xyz.paths.txt'
WARNING: 1 query paths failed to load. See error messages above.
No query signatures loaded, exiting.
Reading search(s) from: 'xyz.paths.txt'
Sketch loading error: expected value at line 1 column 1
WARNING: could not load sketches from path 'podar-small/1.fa.sig.zip'
No valid signatures found in search pathlist 'xyz.paths.txt'
WARNING: 1 search paths failed to load. See error messages above.
No search signatures loaded, exiting.
DONE. Processed 0 comparisons
...multisearch is done! results in 'out'

conclusions?

I think we need to be able to load zip files from pathlists and manifests :)

@ctb
Copy link
Collaborator Author

ctb commented Mar 5, 2024

closing in favor of closing in favor of #264 and #266.

@ctb ctb closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant