-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A fasta file with all phage-positive contigs? #182
Comments
yes that is correct. I had to remove it because reviewers said that users will just accept this file as result without thinking and checking the other outputs. In the report go to the phage prediction by contig and then
|
Thanks for the answer! |
well for users it would be more convenient but I also understand the reviewers point of view (without checking the results papers could be full of false positive results) 👍🏽 tail -n+2 final_report.utf8.csv | tr -d '"' | cut -f2 -d"," > contig_IDs_of_interest.txt gives you list of the contig ids of interest and via |
Yeah, I understand the point, but what are the "contigs of interest" in this case? What are the criteria that define them? Would there be a simple way to have a subset of contigs that e.g.
Previously, I have been using the quality summary table to select the contigs ids that would follow these criteria and then knowing the needed ids, extracted the needed fasta sequences from the common fasta output file. I have done this half-manually, so a bit lost with the command line now. |
ahhh ok 👍🏽 got it now, sorry so contigs of interest depends on you (based on the other outputs). Ofc you can do that also with the checkV table :
but then you need to parse the downloaded and filtered checkV table yourself to extract the contigs and sequences of interest by yourself (via seqkit) |
to sum up if you want you can upload the downloaded, filtered table and then I can do the command line so you have a list of contig ids you can extract from your fasta input file |
Thanks! I think I got, I was missing the fact that one can filter prediction values in the table online, sorry, now I see that ;) What are actually F1 scores by Ho et al? To filter the contigs predicted by all used tools, would you recommend to use these F1 scores for the high confidence of prediction or e.g. 0.7 in the sum_normed column? |
Oki Ho et al. benchmarked the tools we use in WtP (we can only use benchmarked tools as they were "tested" and accepted by rewiewers 👍🏽 ). The F1 score is defined as the harmonic mean of precision and recall. (check here for better explaination). As I understood it it tells you how "reliable/trusworthy" the phage prediction tools are that are being used. |
Thanks for the explanation! Would you consider prediction values > 0.7 as "positive"? What was the threshold for generating all phage-positive contigs file in the previous version or was it just all above 0? |
At the time I set a filter (or the user was able to define a filter and set a value) above 0.5 and the contigs that were above this value were collected in the phage positive contig file. Today I would recommend to
|
Thanks a lot for all the help! I have managed to make a fasta file with all positive contigs (p > 0.75), following your instructions, but using seqtk in the end, as I had that already installed. I think there is a typo in lines
should be I will further explore the set I have to extract the contigs based on the criteria I mentioned above. Btw, the chromomap file has never opened nicely for me, it has been impossible to scroll it. It might be just too large, having hundreds of thousands contigs. |
its just an example name on how to call the folder where you do the commands and use seqkit then. okay Thanks |
Yeah, of course, it can be named whatever, just in this example script
the folder is originally called |
ahh now I got it. thanks for clarification. now it should be correct 👍🏽 |
No problem! Thanks for all the efforts, WtP is great! ;) |
Thanks for using the tool :) |
Hi,
Is it so that there is currently no final fasta file that would contain all phage-positive contigs sequences from the used tools? I think it used to exist when I was using WtP 1-2 years ago, but can't find it now in the updated version. Are there only individual fasta files from each program available now (found from the raw_data folder)?
Best regards,
Tatiana
The text was updated successfully, but these errors were encountered: