Skip to content

Commit

Permalink
Added extras/scripts/calcUMIperCell.awk: a script to calculate total …
Browse files Browse the repository at this point in the history
…number of UMIs per cell and filtering status.
  • Loading branch information
alexdobin committed Mar 1, 2021
1 parent 3ae0966 commit 7707e77
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 2 deletions.
3 changes: 1 addition & 2 deletions docs/STARsolo.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,6 @@ CellRanger 3.0.0 use advanced filtering based on the EmptyDrop algorithm develop
```
It can be followed by 10 numeric parameters: nExpectedCells (3000), maxPercentile (0.99), maxMinRatio (10), indMin (45000), indMax (90000), umiMin (500), umiMinFracMedian (0.01), candMaxN (20000), FDR (0.01), simN (10000).


#### Cell filtering of previously generated raw matrix
It is possible to run only the filtering algorithm (without the need to re-map) inputting the previously generated **raw** matrix:
```
Expand Down Expand Up @@ -259,7 +258,7 @@ In case of BAM files, use ```samtools view``` command to convert to BAM:
```
--readFilesCommand samtools view -F 0x100
```
The file should contain one line for each read. For previously mapped file it can be achieved by filtering out non-primary alignments as show above. Note that unmapped reads have to be included in the file to be remapped.
The file should contain one line for each read. For previously mapped file it can be achieved by filtering out non-primary alignments as shown above. Note that unmapped reads have to be included in the file to be remapped.
We need to specify which SAM attributes correspond to seqeunces/qualities of cell barcodes (CR/CY) and UMIs (UR/UY):
```
--soloInputSAMattrBarcodeSeq CR UR --soloInputSAMattrBarcodeQual CY UY
Expand Down
27 changes: 27 additions & 0 deletions extras/scripts/calcUMIperCell.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# usage: awk -f calcUMIperCell.awk raw/matrix.mtx raw/barcodes.tsv filtered/barcodes.tsv | sort -k1,1rn > UMIperCell.txt
# output: column1 = total UMIs per cell
# column2 = 1 for cell that passed filtering, 0 otherwise

BEGIN {
OFS="\t";
}

{
if (ARGIND==1) {
if (FNR<4)
next; #skip header

umiCount[$2]+=$3;

} else if (ARGIND==2) {
rawCB[$1]=FNR;
} else if (ARGIND==3) {
filtCB[rawCB[$1]]=FNR;
}

}

END {
for (ii in umiCount)
print umiCount[ii], (ii in filtCB);
}

0 comments on commit 7707e77

Please sign in to comment.