Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider organization of metadata files to avoid name clashes with salmon #22

Closed
rob-p opened this issue Jul 20, 2021 · 8 comments
Closed
Labels
enhancement New feature or request

Comments

@rob-p
Copy link
Contributor

rob-p commented Jul 20, 2021

As pointed out by @jashapiro in issue 688 in the salmon repo, there is a name collision between the cmd_info.json file written by alevin-fry and the one written by salmon alevin if the quantifications are placed in the same directory as the input RAD file. Further, both tools produce their own meta_info.json file, though a clash is avoided because one writes this to an aux subdirectory while the other doesn't.

We should design a better scheme for naming and placement of metadata files, so that names don't clash even if the quantification output is written to the same directory as the initial RAD file.

@rob-p rob-p added the enhancement New feature or request label Jul 20, 2021
@rob-p
Copy link
Contributor Author

rob-p commented Jul 22, 2021

Here is my current proposal. I would appreciate any feedback @DongzeHE, @k3yavi, @hiraksarkar, @mohsenzakeri and @jashapiro.

Get rid of the cmd_info.json generated by the quant command of alevin-fry. At this point in the process, we don't have access to the information that was available to salmon when generating it's cmd_info.json file, and the spoofed version we created was only intended to aid in alevinQC compatibility. But since we've already started discussing with @csoneson adding alevin-fry support to alevinQC directly, we can assume this spoofing will no longer be needed.

Rename meta_info.json as output by alevin-fry quant to quant.json. This matches the naming convention used by other stages in the pipeline, like collate. This also avoids potential name clashes with the meta_info.json written by salmon alevin. However, this rename will require us to change the relevant load_fry functions in the usefulaf repo as well as in fishpond. For that, I propose that we first look for quant.json, and if we don't find it, look for meta_info.json for backward compatibility.

Any thoughts on this proposal?

@jashapiro
Copy link

This sounds quite reasonable to me.

I would also request that as this is implemented, you also add version info to each of the .json outputs. This is already present in the generate_permit_list.json file, but not in collate.json or the current meta-info.json (to be quant.json)

rob-p pushed a commit that referenced this issue Jul 22, 2021
This bumps the version of some deps and the version of fry and
libradicl to 0.4.1.

This also addresses #22 by changing the name of meta_info.json to
quant.json, and adding a version_str field to the quant.json output.
@rob-p
Copy link
Contributor Author

rob-p commented Jul 22, 2021

Hi @jashapiro,

This has been implemented in 0.4.1 which has been tagged and merged into bioconda (bioconda/bioconda-recipes#29716). Hopefully the image should show up on biocontainers soon. Once you confirm on your side, we can close this issue.

--Rob

@jashapiro
Copy link

Thanks @rob-p, I will look at it soon!

@DongzeHE
Copy link
Contributor

I like the neat idea of changing the name of alevin-fry quant from cmd_info.json to something like quant.json. If we decide to go this way, I will take care of the implementation of read_fry() everywhere.

@rob-p
Copy link
Contributor Author

rob-p commented Jul 22, 2021

@DongzeHE --- I implemented this and made the changes in the python and R functions in usefulaf. I also updated tutorials. Could you take care of updating in the same way in fishpond? You can see how I handled it in usefulaf (basically, I look for quant.json first, and if I don't find it, I fall back to meta_info.json assuming it's from an older version).

@jashapiro
Copy link

jashapiro commented Jul 23, 2021

Just confirming that I have tested this version (using the docker image), and the updates look great. Thanks for the additional info in each of the output files! Thanks again @rob-p!

-Josh

@rob-p
Copy link
Contributor Author

rob-p commented Jul 23, 2021

Awesome --- I think we'll consider this issue closed then.

@rob-p rob-p closed this as completed Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants