Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very very slow processing of samples #537

Closed
shalercr opened this issue Jun 16, 2020 · 12 comments
Closed

Very very slow processing of samples #537

shalercr opened this issue Jun 16, 2020 · 12 comments

Comments

@shalercr
Copy link

shalercr commented Jun 16, 2020

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon
Describe the bug
salmon takes days to process through 16 samples

To Reproduce
Tried running on a different computer, same issue.

Specifically, please provide at least the following information:

  • Which version of salmon was used?
    1.2.1
  • How was salmon installed (compiled, downloaded executable, through bioconda)?
    bioconda
  • Which reference (e.g. transcriptome) was used?
    mouse Mus_musculus.GRCm38.cdna.all.fa
  • Which read files were used?
    fastq
  • Which which program options were used?
    Various
    the --hitfilterpolicy BOTH was just added following reading another user query to see if that would help. (Much slower than previous versions #527)
### salmon (mapping-based) v1.2.1
### [ program ] => salmon 
### [ command ] => quant 
### [ index ] => { mouse_index1 }
### [ libType ] => { IU }
### [ mates1 ] => { /Volumes/FIle_backup/Genewiz_Fastq_June_2020/qc/13_1.trimmed.fastq.gz }
### [ mates2 ] => { /Volumes/FIle_backup/Genewiz_Fastq_June_2020/qc/13_2.trimmed.fastq.gz }
### [ threads ] => { 4 }
### [ hitFilterPolicy ] => { BOTH }
### [ biasSpeedSamp ] => { 10 }
### [ output ] => { quants/13_quant }

Expected behavior
Not to be so slow.... I've used salmon previously with some SRR datasets and it was very fast, this seems very strange to me

Screenshots

[2020-06-13 02:34:43.686] [jointLog] [info] setting maxHashResizeThreads to 4
[2020-06-13 02:34:43.686] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2020-06-13 02:34:43.686] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2020-06-13 02:34:43.686] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.35.
[2020-06-13 02:34:43.686] [jointLog] [info] parsing read library format
[2020-06-13 02:34:43.686] [jointLog] [info] There is 1 library.
[2020-06-13 02:34:43.738] [jointLog] [info] Loading pufferfish index
[2020-06-13 02:34:43.738] [jointLog] [info] Loading dense pufferfish index.
[2020-06-13 02:34:45.327] [jointLog] [info] done
[2020-06-13 02:34:45.327] [jointLog] [info] Index contained 117,135 targets
[2020-06-13 02:34:45.346] [jointLog] [info] Number of decoys : 0
[2020-06-13 02:35:35.911] [jointLog] [info] Automatically detected most likely library type as IU
[2020-06-13 06:56:12.646] [fileLog] [info] 
At end of round 0
==================
Observed 28512328 total fragments (28512328 in most recent round)

[2020-06-13 06:56:12.645] [jointLog] [info] Computed 348,024 rich equivalence classes for further processing
[2020-06-13 06:56:12.645] [jointLog] [info] Counted 12,990,838 total reads in the equivalence classes 
[2020-06-13 06:56:12.989] [jointLog] [warning] 0.0736383% of fragments were shorter than the k used to build the index.
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 1.


[2020-06-13 06:56:12.989] [jointLog] [info] Number of mappings discarded because of alignment score : 19,645,245,772
[2020-06-13 06:56:12.989] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 2,436,564
[2020-06-13 06:56:12.989] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
[2020-06-13 06:56:12.989] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 1,448,149
[2020-06-13 06:56:12.989] [jointLog] [info] Mapping rate = 45.5622%

[2020-06-13 06:56:12.989] [jointLog] [info] finished quantifyLibrary()
[2020-06-13 06:56:12.991] [jointLog] [info] Starting optimizer
[2020-06-13 06:56:13.091] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2020-06-13 06:56:13.106] [jointLog] [info] iteration = 0 | max rel diff. = 8178.65
[2020-06-13 06:56:14.511] [jointLog] [info] iteration = 100 | max rel diff. = 17.6849
[2020-06-13 06:56:16.028] [jointLog] [info] iteration = 200 | max rel diff. = 6.46204
[2020-06-13 06:56:17.541] [jointLog] [info] iteration = 300 | max rel diff. = 1.8111
[2020-06-13 06:56:19.027] [jointLog] [info] iteration = 400 | max rel diff. = 12.2108
[2020-06-13 06:56:20.501] [jointLog] [info] iteration = 500 | max rel diff. = 0.616929
[2020-06-13 06:56:21.954] [jointLog] [info] iteration = 600 | max rel diff. = 0.218435
[2020-06-13 06:56:23.439] [jointLog] [info] iteration = 700 | max rel diff. = 0.068711
[2020-06-13 06:56:24.945] [jointLog] [info] iteration = 800 | max rel diff. = 0.044637
[2020-06-13 06:56:26.398] [jointLog] [info] iteration = 900 | max rel diff. = 0.0340291
[2020-06-13 06:56:27.811] [jointLog] [info] iteration = 1,000 | max rel diff. = 0.237808
[2020-06-13 06:56:29.235] [jointLog] [info] iteration = 1,100 | max rel diff. = 0.0764161
[2020-06-13 06:56:30.717] [jointLog] [info] iteration = 1,200 | max rel diff. = 0.0683725
[2020-06-13 06:56:32.253] [jointLog] [info] iteration = 1,300 | max rel diff. = 0.0990377
[2020-06-13 06:56:33.509] [jointLog] [info] iteration = 1,389 | max rel diff. = 0.00998936
[2020-06-13 06:56:33.511] [jointLog] [info] Finished optimizer
[2020-06-13 06:56:33.511] [jointLog] [info] writing output 

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu Linux, OSX]
  • Version [ If you are on OSX, the output of sw_vers. If you are on linux the output of uname -a and lsb_release -a]
    ProductName: Mac OS X
    ProductVersion: 10.15.5
    BuildVersion: 19F101

Additional context
Add any other context about the problem here.

@rob-p
Copy link
Collaborator

rob-p commented Jun 16, 2020

Thank you for the report. Can you share one of the samples where you see this issue? Also, out of curiosity, was the other machine you tried on also OSX, or was it a linux machine?

P.S. One other thing that does seem very strange to me is this line:

Number of mappings discarded because of alignment score : 19,645,245,772

That is 19.6 billion alignments that were scored and discarded.

@shalercr
Copy link
Author

shalercr commented Jun 16, 2020 via email

@rob-p
Copy link
Collaborator

rob-p commented Jun 16, 2020

Thanks @shalercr,

I'll grab those files once they are finished uploading. I don't know if all samples show similar behavior, but these are called 31_1 and 31_2, while in your logs you had 13_1 and 13_2.

Regarding OSX vs. linux, it should not really matter, obviously (salmon should work well under both, but I'm just curious since this is obviously atypical and unexpected behavior).

--Rob

@rob-p
Copy link
Collaborator

rob-p commented Jun 16, 2020 via email

@shalercr
Copy link
Author

shalercr commented Jun 16, 2020 via email

@rob-p
Copy link
Collaborator

rob-p commented Jun 16, 2020

Hi @shalercr,

I agree that if each sample runs as quickly as you see here, it's not a big deal to just re-run them all. Note that, while my avoids ~116 times more (eventually-discarded) alignments than did yours, the difference in the number of discarded fragments (i.e. fragments that had no mapping) is very small — ~7,000, or 0.023% of the fragments. Thus, despite the huge speed-up, I wouldn't expect many differences in the quantifications. However, I couldn't argue against re-quantifying everything with the new version just for good measure :). When you have a chance, please do let me know if you see this same performance difference using the pre-compiled (linux) binary here. Of course, once we finalize this release (soon), we will push to bioconda so we will also have OSX binaries available. However, I just want to make sure this fixes things on your end as well before the release.

@shalercr
Copy link
Author

shalercr commented Jun 17, 2020 via email

@rob-p
Copy link
Collaborator

rob-p commented Jun 17, 2020

Hi @shalercr,

Thanks for reporting back! I agree that there are some challenging reads in these samples that are likely at the root of the slightly-longer-than-normal runtime. If you could upload the quant dir for this sample (that contains the logs), that would be useful. We (specifically, my student @mohsenzakeri, who is one of the main developers of the new selective-alignment algorithm) can poke around a bit to see if there is anything strange going on that can be characterized, but it might just be an inherent property of samples with very repetitive reads. Regardless, we'll be happy to take a look. Thanks!

--Rob

@rob-p
Copy link
Collaborator

rob-p commented Jun 17, 2020

P.S. @shalercr,

I also note that layering --hitFilterPolicy BOTH on top of the new version cuts down the time by another factor of 2 for me

2163.65user 12.72system 4:21.57elapsed 832%CPU (0avgtext+0avgdata 1221856maxresident)k

and the number of mappings discarded alignments due to score comes down by another factor of ~6X. It might be worth seeing what you get with that option as well.

@shalercr
Copy link
Author

shalercr commented Jun 18, 2020 via email

@shalercr
Copy link
Author

shalercr commented Jun 22, 2020 via email

@rob-p
Copy link
Collaborator

rob-p commented Jul 4, 2020

1.3.0 had been released, with theses improvements and a few others you can read about in the release notes. Let is know if you have any further issues!

@rob-p rob-p closed this as completed Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants