Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How extend maximum intron length #941

Closed
francicco opened this issue Jun 11, 2020 · 14 comments
Closed

How extend maximum intron length #941

francicco opened this issue Jun 11, 2020 · 14 comments
Labels

Comments

@francicco
Copy link

Hi,

I'm annotating a new genome and I have the impression that long splice sites (introns > 50 kb) are not detected. I was wondering if you can give me any tips on how to set STAR to look for long splice site?

Thanks a lot
Francesco

@alexdobin
Copy link
Owner

Hi Francesco,

what parameters are you using?
With default parameters, you should be able to detect ~600kb introns.

Cheers
Alex

@francicco
Copy link
Author

Hi Alex,

with the default, I reach 20k. Now I'm using these settings:

--outSJfilterIntronMaxVsReadN 80 100 500 1000 2000 5000 20000
--alignIntronMax 450000 --alignSJoverhangMin 10

It seems I can reach larger splices. But I'm worried of unspecific splice sites.

What do you think?
Thanks
F

@alexdobin
Copy link
Owner

You are right - the --outSJfilterIntronMaxVsReadN controls the output of splice junctions to the SJ.out.tab.
The default values --outSJfilterIntronMaxVsReadN 50000 100000 200000 require >3 reads for introns >200k. With your parameters, it's more stringent - you require >7 reads for introns > 20k.
If you want to remove this filter, you can use --outSJfilterIntronMaxVsReadN 1000000000

Cheers
Alex

@francicco
Copy link
Author

I'm now running some test to understand the behavior.

  • Blue: Default settings generated a total of 96.6k splice junctions
  • Red: --outSJfilterIntronMaxVsReadN 80 100 500 1000 2000 5000 20000 --alignIntronMax 450000. 108.1k sj
  • Green: --outSJfilterIntronMaxVsReadN 80 100 500 1000 2000 5000 20000 50000 100000 --alignIntronMax 450000. 107.5k sj

This is the distribution. Y-axis the percentage over the total (log2 scale). X-axys the splice site lengths in kb.

SsFrq2

It seems like the default in underperforming and more stringent. the opposite of what you say. The green line seems to be a bit more stringent because according to what you saying it requires >7 reads for sj > 100kb.

Is that right?

@alexdobin
Copy link
Owner

Hi Francesco,

no, this is not supposed to happen. Could you list all parameters you are using for each of the runs? Also, let's count how many junctions with the gap>20k you have in each case, i.e.
awk '$3-$2>=20000' SJ.out.tab | wc

Cheers
Alex

@francicco
Copy link
Author

Hi Alex,

This is the default (blue) command line:

STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThreadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat -outFileNamePrefix Eisa.RNA.

This is for the Red line:

STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThreadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat --outSJfilterIntronMaxVsReadN 80 100 500 1000 2000 5000 20000 --alignIntronMax 450000 --alignSJoverhangMin 10 --outFileNamePrefix Eisa.RNA.intronTest.
and this is the Green:

STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /work/tk19812/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz /work/tk19812/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThreadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat --outSJfilterIntronMaxVsReadN 80 100 500 1000 2000 5000 20000 50000 100000 --alignIntronMax 450000 --alignSJoverhangMin 10 --outFileNamePrefix Eisa.RNA.intronTest2.

About your awk line, this is what I got:
Blue: 2 18 76
Red: 5908 53172 234162
Green: 5301 47709 210540

Cheers
F

@alexdobin
Copy link
Owner

Thanks!
Could you please add --alignIntronMax 450000 --alignSJoverhangMin 10 to the 1st run parameters and re-map. We want to compare cases where the only change in parameters is --outSJfilterIntronMaxVsReadN

Also, please send me the Log.out file.

@francicco
Copy link
Author

I was running that already:

STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.
gz /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThr
eadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat --alignIntronMax 450000 --alignSJoverhangMin 10 --outFileNamePrefix Eisa.RNA.intronTest4.

F

@alexdobin
Copy link
Owner

This is different from the "Blue line" command in the previous post.

@francicco
Copy link
Author

I was running that already:

STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.
gz /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThr
eadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat --alignIntronMax 450000 --alignSJoverhangMin 10 --outFileNamePrefix Eisa.RNA.intronTest4.

This is what you want to test, right?
F

@alexdobin
Copy link
Owner

Yes, these are the parameters we want - but these are not the parameters that you used to plot the Blue line, according to your post #941 (comment)

With these new parameters, what is the output of the awk '$3-$2>=20000' SJ.out.tab | wc

@francicco
Copy link
Author

No, those are not the parameter of the blue line. It's just finished.
The awk command gave this:

10538 94842 412679

Screen Shot 2020-06-13 at 21 44 26

So, apparently the --alignIntronMax was causing the "problem".

What's a suggested value for an insect genome?

Thanks a lot
F

and this is the log:

STAR version=2.7.2c
STAR compilation time,server,dir=Wed Oct 2 10:38:59 EDT 2019 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
##### Command Line:
STAR --genomeDir Eisa.STAR.Index --limitBAMsortRAM 20000000000 --twopassMode Basic --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz --outFilterType BySJout --outSAMattributes All --outSAMtype BAM SortedByCoordinate --runThreadN 16 --alignEndsType Local --outStd Log --readFilesCommand zcat --alignIntronMax 450000 --alignSJoverhangMin 10 --outFileNamePrefix Eisa.RNA.intronTest4.
##### Initial USER parameters from Command Line:
outFileNamePrefix                 Eisa.RNA.intronTest4.
outStd                            Log
###### All USER parameters from Command Line:
genomeDir                     Eisa.STAR.Index     ~RE-DEFINED
limitBAMsortRAM               20000000000     ~RE-DEFINED
twopassMode                   Basic     ~RE-DEFINED
readFilesIn                   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz        ~RE-DEFINED
outFilterType                 BySJout     ~RE-DEFINED
outSAMattributes              All        ~RE-DEFINED
outSAMtype                    BAM   SortedByCoordinate        ~RE-DEFINED
runThreadN                    16     ~RE-DEFINED
alignEndsType                 Local     ~RE-DEFINED
outStd                        Log     ~RE-DEFINED
readFilesCommand              zcat        ~RE-DEFINED
alignIntronMax                450000     ~RE-DEFINED
alignSJoverhangMin            10     ~RE-DEFINED
outFileNamePrefix             Eisa.RNA.intronTest4.     ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runThreadN                        16
genomeDir                         Eisa.STAR.Index
readFilesIn                       /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz   
readFilesCommand                  zcat   
limitBAMsortRAM                   20000000000
outFileNamePrefix                 Eisa.RNA.intronTest4.
outStd                            Log
outSAMtype                        BAM   SortedByCoordinate   
outSAMattributes                  All   
outFilterType                     BySJout
alignIntronMax                    450000
alignSJoverhangMin                10
alignEndsType                     Local
twopassMode                       Basic

-------------------------------
##### Final effective command line:
STAR   --runThreadN 16   --genomeDir Eisa.STAR.Index   --readFilesIn /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz      --readFilesCommand zcat      --limitBAMsortRAM 20000000000   --outFileNamePrefix Eisa.RNA.intronTest4.   --outStd Log   --outSAMtype BAM   SortedByCoordinate      --outSAMattributes All      --outFilterType BySJout   --alignIntronMax 450000   --alignSJoverhangMin 10   --alignEndsType Local   --twopassMode Basic
----------------------------------------


   Input read files for mate 1, from input string /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz
-rw-r--r-- 1 tk19812 bisc 5420609432 Apr 29 21:06 /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz

   readsCommandsFile:
exec > "Eisa.RNA.intronTest4._STARtmp/tmp.fifo.read1"
echo FILE 0
zcat      "/mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz"


   Input read files for mate 2, from input string /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz
-rw-r--r-- 1 tk19812 bisc 5340939090 Apr 29 21:06 /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz

   readsCommandsFile:
exec > "Eisa.RNA.intronTest4._STARtmp/tmp.fifo.read2"
echo FILE 0
zcat      "/mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz"

Finished loading and checking parameters
Reading genome generation parameters:
### STAR   --runMode genomeGenerate   --runThreadN 28   --genomeDir Eisa.STAR.Index   --genomeFastaFiles Eisa.assembly.v1.1.fasta      --genomeSAindexNbases 13   --genomeChrBinNbits 8
### GstrandBit=32
versionGenome                 2.7.1a     ~RE-DEFINED
genomeFastaFiles              Eisa.assembly.v1.1.fasta        ~RE-DEFINED
genomeSAindexNbases           13     ~RE-DEFINED
genomeChrBinNbits             8     ~RE-DEFINED
genomeSAsparseD               1     ~RE-DEFINED
sjdbOverhang                  0     ~RE-DEFINED
sjdbFileChrStartEnd           -        ~RE-DEFINED
sjdbGTFfile                   -     ~RE-DEFINED
sjdbGTFchrPrefix              -     ~RE-DEFINED
sjdbGTFfeatureExon            exon     ~RE-DEFINED
sjdbGTFtagExonParentTranscripttranscript_id     ~RE-DEFINED
sjdbGTFtagExonParentGene      gene_id     ~RE-DEFINED
sjdbInsertSave                Basic     ~RE-DEFINED
genomeFileSizes               440171264   3630805665        ~RE-DEFINED
Genome version is compatible with current STAR
Number of real (reference) chromosomes= 147
1	Eisa1Z00	16680516	0
2	Eisa0200	17065634	16680704
3	Eisa0300	17883974	33746432
4	Eisa0400	16449752	51630592
5	Eisa0500	16162363	68080384
6	Eisa0600	16736798	84242944
7	Eisa0700	14722350	100979968
8	Eisa0800	16712451	115702528
9	Eisa0900	16638819	132415232
10	Eisa1000	16265376	149054208
11	Eisa1100	17330994	165319680
12	Eisa1200	14166302	182650880
13	Eisa1300	14891182	196817408
14	Eisa1400	15892170	211708672
15	Eisa1500	18488317	227600896
16	Eisa1600	14606391	246089216
17	Eisa1700	13996447	260695808
18	Eisa1800	13106782	274692352
19	Eisa1900	14466405	287799296
20	Eisa2000	13147892	302265856
21	Eisa2100	12710976	315413760
22	Eisa2200	11654559	328124928
23	Eisa2300	12302664	339779584
24	Eisa2400	11131083	352082432
25	Eisa2500	11939010	363213568
26	Eisa2600	11920680	375152640
27	Eisa2700	11777604	387073536
28	Eisa2800	9050530	398851328
29	Eisa2900	9445216	407901952
30	Eisa3000	6790307	417347328
31	Eisa3100	10454291	424137728
32	Eisa00001	444395	434592256
33	Eisa00002	260723	435036672
34	Eisa00003	260116	435297536
35	Eisa00004	239626	435557888
36	Eisa00005	224207	435797760
37	Eisa00006	176694	436022016
38	Eisa00007	169523	436198912
39	Eisa00008	152637	436368640
40	Eisa00009	148470	436521472
41	Eisa00010	133169	436669952
42	Eisa00011	125011	436803328
43	Eisa00012	116263	436928512
44	Eisa00013	104234	437044992
45	Eisa00014	90712	437149440
46	Eisa00015	81287	437240320
47	Eisa00016	76928	437321728
48	Eisa00017	76853	437398784
49	Eisa00018	73428	437475840
50	Eisa00019	71328	437549312
51	Eisa00020	70475	437620736
52	Eisa00021	68354	437691392
53	Eisa00022	61272	437760000
54	Eisa00023	59701	437821440
55	Eisa00024	57977	437881344
56	Eisa00025	55203	437939456
57	Eisa00026	53491	437994752
58	Eisa00027	53356	438048256
59	Eisa00028	53219	438101760
60	Eisa00029	52117	438155008
61	Eisa00030	51941	438207232
62	Eisa00031	51806	438259200
63	Eisa00032	48925	438311168
64	Eisa00033	48518	438360320
65	Eisa00034	46322	438408960
66	Eisa00035	45512	438455296
67	Eisa00036	44957	438500864
68	Eisa00037	43721	438545920
69	Eisa00038	39705	438589696
70	Eisa00039	39255	438629632
71	Eisa00040	38421	438669056
72	Eisa00041	37827	438707712
73	Eisa00042	37591	438745600
74	Eisa00043	37094	438783232
75	Eisa00044	36941	438820352
76	Eisa00045	36467	438857472
77	Eisa00046	36338	438894080
78	Eisa00047	35261	438930432
79	Eisa00048	34621	438965760
80	Eisa00049	34531	439000576
81	Eisa00050	33740	439035136
82	Eisa00051	32069	439068928
83	Eisa00052	31886	439101184
84	Eisa00053	31769	439133184
85	Eisa00054	31314	439165184
86	Eisa00055	30498	439196672
87	Eisa00056	29635	439227392
88	Eisa00057	29633	439257088
89	Eisa00058	28349	439286784
90	Eisa00059	27573	439315200
91	Eisa00060	26751	439342848
92	Eisa00061	26709	439369728
93	Eisa00062	26621	439396608
94	Eisa00063	24906	439423232
95	Eisa00064	24830	439448320
96	Eisa00065	24205	439473152
97	Eisa00066	24133	439497472
98	Eisa00067	23996	439521792
99	Eisa00068	23358	439545856
100	Eisa00069	22772	439569408
101	Eisa00070	22490	439592192
102	Eisa00071	21799	439614720
103	Eisa00072	21494	439636736
104	Eisa00073	21198	439658240
105	Eisa00074	21170	439679488
106	Eisa00075	20932	439700736
107	Eisa00076	20911	439721728
108	Eisa00077	20742	439742720
109	Eisa00078	20147	439763712
110	Eisa00079	19813	439783936
111	Eisa00080	19428	439803904
112	Eisa00081	19215	439823360
113	Eisa00082	18709	439842816
114	Eisa00083	18589	439861760
115	Eisa00084	18352	439880448
116	Eisa00085	17799	439898880
117	Eisa00086	17297	439916800
118	Eisa00087	17001	439934208
119	Eisa00088	15971	439951360
120	Eisa00089	15932	439967488
121	Eisa00090	15273	439983616
122	Eisa00091	15032	439998976
123	Eisa00092	14133	440014080
124	Eisa00093	14109	440028416
125	Eisa00094	13487	440042752
126	Eisa00095	11178	440056320
127	Eisa00096	11119	440067584
128	Eisa00097	9940	440078848
129	Eisa00098	9711	440088832
130	Eisa00099	8270	440098560
131	Eisa00100	8002	440107008
132	Eisa00101	7674	440115200
133	Eisa00102	6637	440122880
134	Eisa00103	5759	440129536
135	Eisa00104	5487	440135424
136	Eisa00105	4600	440141056
137	Eisa00106	3179	440145664
138	Eisa00107	2974	440148992
139	Eisa00108	2727	440152064
140	Eisa00109	2566	440154880
141	Eisa00110	2557	440157696
142	Eisa00111	2175	440160256
143	Eisa00112	2054	440162560
144	Eisa00113	1875	440164864
145	Eisa00114	1552	440166912
146	Eisa00115	1214	440168704
147	Eisa00116	1121	440169984
Started loading the genome: Sat Jun 13 13:36:54 2020

Genome: size given as a parameter = 440171264
SA: size given as a parameter = 3630805665
SAindex: size given as a parameter = 1
Read from SAindex: pGe.gSAindexNbases=13  nSAi=89478484
nGenome=440171264;  nSAbyte=3630805665
GstrandBit=32   SA number of indices=880195312
Shared memory is not used for genomes. Allocated a private copy of the genome.
Genome file size: 440171264 bytes; state: good=1 eof=0 fail=0 bad=0
Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 440171264 bytes
SA file size: 3630805665 bytes; state: good=1 eof=0 fail=0 bad=0
Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3630805665 bytes
Loading SAindex ... done: 391468491 bytes
Finished loading the genome: Sat Jun 13 13:36:56 2020

To accommodate alignIntronMax=450000 redefined winBinNbits=17
winBinNbits=17 > pGe.gChrBinNbits=8   redefining:
winBinNbits=8
To accommodate alignIntronMax=450000 and alignMatesGapMax=0, redefined winFlankNbins=1758 and winAnchorDistNbins=3516
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread0 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread0 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread1 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread1 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread2 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread2 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread3 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread3 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread4 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread4 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread5 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread5 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread6 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread6 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread7 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread7 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread8 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread8 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread9 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread9 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread10 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread10 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread11 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread11 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread12 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread12 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread13 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread13 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread14 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread14 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread15 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread15 ... ok
Created thread # 1
Created thread # 2
Created thread # 3
Created thread # 4
Created thread # 5
Created thread # 6
Starting to map file # 0
mate 1:   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz
mate 2:   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz
Created thread # 7
Created thread # 8
Created thread # 9
Created thread # 10
Created thread # 11
Created thread # 12
Created thread # 13
Created thread # 14
Created thread # 15
Thread #3 end of input stream, nextChar=-1
Completed: thread #14
Completed: thread #0
Completed: thread #12
Completed: thread #3
Completed: thread #10
Completed: thread #9
Completed: thread #5
Completed: thread #4
Completed: thread #7
Completed: thread #6
Completed: thread #8
Completed: thread #15
Completed: thread #2
Completed: thread #13
Completed: thread #11
Completed: thread #1
Joined thread # 1
Joined thread # 2
Joined thread # 3
Joined thread # 4
Joined thread # 5
Joined thread # 6
Joined thread # 7
Joined thread # 8
Joined thread # 9
Joined thread # 10
Joined thread # 11
Joined thread # 12
Joined thread # 13
Joined thread # 14
Joined thread # 15
Jun 13 17:17:36   Loaded database junctions from the 1st pass file: Eisa.RNA.intronTest4._STARpass1//SJ.out.tab: 127642 total junctions

Jun 13 17:17:37   Finished preparing junctions
Jun 13 17:17:37 ..... inserting junctions into the genome indices
Jun 13 17:17:46   Finished SA search: number of new junctions=127624, old junctions=0
Jun 13 17:18:02   Finished sorting SA indicesL nInd=51048132
Genome size with junctions=465823688  440171264   25652424
GstrandBit1=32   GstrandBit=32
Jun 13 17:18:17   Finished inserting junction indices
Jun 13 17:18:25   Finished SAi
Jun 13 17:18:25 ..... finished inserting junctions into genome

   Input read files for mate 1, from input string /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz
-rw-r--r-- 1 tk19812 bisc 5420609432 Apr 29 21:06 /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz

   readsCommandsFile:
exec > "Eisa.RNA.intronTest4._STARtmp/tmp.fifo.read1"
echo FILE 0
zcat      "/mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz"


   Input read files for mate 2, from input string /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz
-rw-r--r-- 1 tk19812 bisc 5340939090 Apr 29 21:06 /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz

   readsCommandsFile:
exec > "Eisa.RNA.intronTest4._STARtmp/tmp.fifo.read2"
echo FILE 0
zcat      "/mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz"

Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread0 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread0 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread1 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread1 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread2 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread2 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread3 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread3 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread4 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread4 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread5 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread5 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread6 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread6 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread7 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread7 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread8 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread8 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread9 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread9 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread10 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread10 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread11 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread11 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread12 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread12 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread13 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread13 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread14 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread14 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate1.thread15 ... ok
Opening the file: Eisa.RNA.intronTest4._STARtmp//FilterBySJoutFiles.mate2.thread15 ... ok
Created thread # 1
Created thread # 2
Created thread # 3
Starting to map file # 0
mate 1:   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R1.qf.fastq.gz
mate 2:   /mnt/storage/home/tk19812/scratch/HeliconiniiProject/RNAseq/Eisa/Eisa_2.R2.qf.fastq.gz
Created thread # 4
Created thread # 5
Created thread # 6
Created thread # 7
Created thread # 8
Created thread # 9
Created thread # 10
Created thread # 11
Created thread # 12
Created thread # 13
Created thread # 14
Created thread # 15
BAM sorting: 176289 mapped reads
BAM sorting bins genomic start loci:
1	0	12957030
2	1	2834080
3	1	9210209
4	2	3466782
5	2	8939856
6	2	13631560
7	3	7137410
8	4	4030022
9	4	14161639
10	5	5315111
11	5	12195333
12	6	7749257
13	6	14183809
14	7	3851070
15	7	9609159
16	7	13129468
17	8	5584680
18	8	13889558
19	9	11171425
20	10	3293041
21	10	8983057
22	10	13000550
23	11	6228525
24	11	6995085
25	12	2257938
26	12	7768565
27	13	663303
28	13	12886094
29	14	8544704
30	14	15127468
31	15	7290329
32	16	2680820
33	16	9679259
34	17	5482949
35	17	11954754
36	18	10252010
37	19	5994864
38	20	4905555
39	20	12473866
40	21	614294
41	22	6332048
42	23	8946910
43	24	10154981
44	26	3221694
45	27	6154948
46	27	6226975
47	28	7287708
48	30	5103589
Thread #0 end of input stream, nextChar=-1
Completed: thread #5
Completed: thread #2
Completed: thread #14
Completed: thread #0
Completed: thread #11
Completed: thread #4
Completed: thread #15
Completed: thread #12
Completed: thread #1
Joined thread # 1
Joined thread # 2
Completed: thread #7
Completed: thread #8
Completed: thread #9
Completed: thread #6
Completed: thread #3
Joined thread # 3
Joined thread # 4
Joined thread # 5
Joined thread # 6
Joined thread # 7
Joined thread # 8
Joined thread # 9
Completed: thread #10
Joined thread # 10
Joined thread # 11
Joined thread # 12
Completed: thread #13
Joined thread # 13
Joined thread # 14
Joined thread # 15
Completed stage 1 mapping of outFilterBySJout mapping
Detected 1602 novel junctions that passed filtering, will proceed to filter reads that contained unannotated junctions
Created thread # 1
Created thread # 2
Created thread # 3
Created thread # 4
Created thread # 5
Created thread # 6
Created thread # 7
Created thread # 8
Created thread # 9
Created thread # 10
Created thread # 11
Created thread # 12
Created thread # 13
Created thread # 14
Created thread # 15
Completed: thread #0
Completed: thread #5
Completed: thread #9
Completed: thread #11
Completed: thread #12
Completed: thread #7
Completed: thread #2
Completed: thread #14
Completed: thread #15
Completed: thread #8
Completed: thread #1
Joined thread # 1
Joined thread # 2
Completed: thread #4
Completed: thread #10
Completed: thread #6
Completed: thread #13
Completed: thread #3
Joined thread # 3
Joined thread # 4
Joined thread # 5
Joined thread # 6
Joined thread # 7
Joined thread # 8
Joined thread # 9
Joined thread # 10
Joined thread # 11
Joined thread # 12
Joined thread # 13
Joined thread # 14
Joined thread # 15
Jun 13 21:28:00 ..... started sorting BAM
Max memory needed for sorting = 1219146012
ALL DONE!

@alexdobin
Copy link
Owner

Hi Francesco,

great, thanks!

I think I understand now what happened. I need to make it clear in the manual.
If you do not specify --alignIntronMax it does not default to a constant value, but rather is - for most alignments - defined by the "alignment window" size 2^(min(winBinNbits,genomeChrBinNbits))*winAnchorDistNbins
By default, winBinNbits=16, genomeChrBinNbits=18 and winAnchorDistNbins=9
so alignment window = 589824. However, you used genomeChrBinNbits=8, so alignment window=2304, and you see almost no junctions longer than that. You need to specify --alignIntronMax to override this small alignment window size.

The actual value of --alignIntronMax for a particular species is determined by what your expected maximum intron size is. This expectation can come from the annotated junctions - for instance, in human, there are annotated junctions with gaps ~1M bases, so I recommend --alignIntronMax 1000000. For smaller genomes the max gap are usually smaller and so you would need to choose this value accordingly. Of course, you cal always filter the gaps that are too long after mapping, so using a large value is OK.

Cheers
Alex

@francicco
Copy link
Author

Great! Thanks a lot Alex
Always helpful
F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants