A fasta file with all phage-positive contigs? #182

deminatanja · 2023-01-25T10:30:18Z

Hi,

Is it so that there is currently no final fasta file that would contain all phage-positive contigs sequences from the used tools? I think it used to exist when I was using WtP 1-2 years ago, but can't find it now in the updated version. Are there only individual fasta files from each program available now (found from the raw_data folder)?

Best regards,
Tatiana

mult1fractal · 2023-01-25T10:52:14Z

yes that is correct. I had to remove it because reviewers said that users will just accept this file as result without thinking and checking the other outputs.
but you can still extract the contigs of interest by yourself:

In the report go to the phage prediction by contig and then

# Filter the Phage prediction by contig table to your liking   
# Click on the CSV-Button (this will download the Phage prediction by contig table)     
# Open your Linux-Terminal     
mkdir contigs_of_interest 
cd  contigs_of_interest  
# Copy the downloaded Phage prediction by contig table to the contig_IDs_of_interest -folder  
# Copy the input_fasta to the contig_IDs_of_interest -folder  
cp WtP_results/your_sample/Input_fasta/your_input_fasta.fa.gz /foo/bar/contigs_of_interest  
# Get contig IDs of interest  
tail -n+2 final_report.utf8.csv | tr -d '"' | cut -f2 -d"," > contig_IDs_of_interest.txt  
# via Docker: use Seqkit to extract contigs of interest of your input fasta-file  
docker run --rm -it -v $PWD:/input nanozoo/seqkit:0.13.2--cd66104  
cd input  
seqkit grep --pattern-file contig_IDs_of_interest.txt your_input_fasta.fa.gz > contigs_of_interest.fa    
# Finally, close the docker with ctrl + d

deminatanja · 2023-01-25T11:21:28Z

Thanks for the answer!
I have to disagree with the reviewers :) , one could still extract a subset of contigs from that file and it was very handy.
When we are getting IDs of the contigs of interest, what actually happens by ' tail -n+2 final_report.utf8.csv | tr -d '"' | cut -f2 -d"," > contig_IDs_of_interest.txt ' ?

mult1fractal · 2023-01-25T11:31:04Z

well for users it would be more convenient but I also understand the reviewers point of view (without checking the results papers could be full of false positive results) 👍🏽

tail -n+2 final_report.utf8.csv | tr -d '"' | cut -f2 -d"," > contig_IDs_of_interest.txt

gives you list of the contig ids of interest and via seqkit you can extract these contigs of interest from your input fasta file

deminatanja · 2023-01-25T11:34:05Z

Yeah, I understand the point, but what are the "contigs of interest" in this case? What are the criteria that define them?

Would there be a simple way to have a subset of contigs that e.g.

were predicted by all the used tools,
have at least 1 viral gene,
10 kbp long,
or can be less than 10 kbp if >50% complete.

Previously, I have been using the quality summary table to select the contigs ids that would follow these criteria and then knowing the needed ids, extracted the needed fasta sequences from the common fasta output file. I have done this half-manually, so a bit lost with the command line now.

mult1fractal · 2023-01-25T11:43:42Z

ahhh ok 👍🏽 got it now, sorry

so contigs of interest depends on you (based on the other outputs).
You filter the table (phage prediction by contig) to your needs e.g. prediction values >0.7, download the filtered table and execute the code I provided.

Ofc you can do that also with the checkV table :

have at least 1 viral gene,
10 kbp long,
or can be less than 10 kbp if >50% complete.

but then you need to parse the downloaded and filtered checkV table yourself to extract the contigs and sequences of interest by yourself (via seqkit)

mult1fractal · 2023-01-25T11:50:18Z

to sum up
It needs to be manually done by the user unfortunately because I cant predict what are the users want/or filter

if you want you can upload the downloaded, filtered table and then I can do the command line so you have a list of contig ids you can extract from your fasta input file

deminatanja · 2023-01-25T11:56:04Z

Thanks! I think I got, I was missing the fact that one can filter prediction values in the table online, sorry, now I see that ;)

What are actually F1 scores by Ho et al? To filter the contigs predicted by all used tools, would you recommend to use these F1 scores for the high confidence of prediction or e.g. 0.7 in the sum_normed column?

mult1fractal · 2023-01-25T12:11:34Z

Oki

Ho et al. benchmarked the tools we use in WtP (we can only use benchmarked tools as they were "tested" and accepted by rewiewers 👍🏽 ). The F1 score is defined as the harmonic mean of precision and recall. (check here for better explaination).

As I understood it it tells you how "reliable/trusworthy" the phage prediction tools are that are being used.
They (F1) have nothing to do with the prediction values that the tools generate.

deminatanja · 2023-01-26T06:13:50Z

Thanks for the explanation! Would you consider prediction values > 0.7 as "positive"? What was the threshold for generating all phage-positive contigs file in the previous version or was it just all above 0?

mult1fractal · 2023-01-26T08:45:56Z

At the time I set a filter (or the user was able to define a filter and set a value) above 0.5 and the contigs that were above this value were collected in the phage positive contig file.

Today I would recommend to

filter the phage prediction by contig (last column > 0.75)
then check the CheckV outputtable for completness and other phage indicators
check the chromomap-html (what phage genes were found on the contig (not in the final-result-html))
extract the contigs of interest
further validate these contigs with other methods

deminatanja · 2023-01-26T10:50:38Z

Thanks a lot for all the help!

I have managed to make a fasta file with all positive contigs (p > 0.75), following your instructions, but using seqtk in the end, as I had that already installed.

I think there is a typo in lines

# Copy the downloaded Phage prediction by contig table to the contig_IDs_of_interest -folder  
# Copy the input_fasta to the contig_IDs_of_interest -folder

should be contigs_of_interest -folder?

I will further explore the set I have to extract the contigs based on the criteria I mentioned above.

Btw, the chromomap file has never opened nicely for me, it has been impossible to scroll it. It might be just too large, having hundreds of thousands contigs.

mult1fractal · 2023-01-26T10:56:34Z

its just an example name on how to call the folder where you do the commands and use seqkit then.
I named it contig_IDs_of_interest.. how you name it is up to you 👍🏽

okay Thanks
I will add this to my fixing list

deminatanja · 2023-01-26T11:01:46Z

Yeah, of course, it can be named whatever, just in this example script

mkdir contigs_of_interest 
cd  contigs_of_interest  
# Copy the downloaded Phage prediction by contig table to the contig_IDs_of_interest -folder  
# Copy the input_fasta to the contig_IDs_of_interest -folder  
cp WtP_results/your_sample/Input_fasta/your_input_fasta.fa.gz /foo/bar/contigs_of_interest

the folder is originally called contigs_of_interest, which is also true in actual cp command, but has a bit different name in the preceding comment, that's why it caught my eye :)

mult1fractal · 2023-01-26T11:08:21Z

ahh now I got it. thanks for clarification. now it should be correct 👍🏽

deminatanja · 2023-01-26T11:22:32Z

No problem! Thanks for all the efforts, WtP is great! ;)

mult1fractal · 2023-01-27T08:43:21Z

Thanks for using the tool :)

$@mult1fractal$ mult1fractal self-assigned this Jan 25, 2023

$@mult1fractal$ mult1fractal added the question Further information is requested label Jan 25, 2023

$@mult1fractal$ mult1fractal closed this as completed Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A fasta file with all phage-positive contigs? #182

A fasta file with all phage-positive contigs? #182

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023 •

edited

Loading

mult1fractal commented Jan 25, 2023 •

edited

Loading

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023 •

edited

Loading

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 27, 2023

A fasta file with all phage-positive contigs? #182

A fasta file with all phage-positive contigs? #182

Comments

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023 • edited Loading

mult1fractal commented Jan 25, 2023 • edited Loading

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 25, 2023

mult1fractal commented Jan 25, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023 • edited Loading

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 26, 2023

deminatanja commented Jan 26, 2023

mult1fractal commented Jan 27, 2023

deminatanja commented Jan 25, 2023 •

edited

Loading

mult1fractal commented Jan 25, 2023 •

edited

Loading

deminatanja commented Jan 26, 2023 •

edited

Loading