Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can DNA alignment traceback result comply with HGVS 3' rule? #104

Open
GooJan opened this issue Nov 15, 2023 · 7 comments
Open

Can DNA alignment traceback result comply with HGVS 3' rule? #104

GooJan opened this issue Nov 15, 2023 · 7 comments

Comments

@GooJan
Copy link

GooJan commented Nov 15, 2023

Hi, jeffdaily, thanks for your great work!
Let's start with an exmple. When I do a global alignment for these two sequences:

seq1: CAGGACGAGGAGCAGAGGACAAGGAGGAT
seq2: CAGGACGAGGAGCAGAGGCTTAAGGAGGAGGAAGAAGACAAGAAACGCAAAGAGGAGGAGGAGGCAGAGGACAAGGAGGAT

The traceback result is:

[Alignment 1]
seq1 1  CAGGACGAGG A--------- ---------- ---------- ---------- ---------- ---GCAGAGG ACAAGGAGGA T  29
        |||||||||| |********* ********** ********** ********** ********** ***||||||| |||||||||| |    
seq2 1  CAGGACGAGG AGCAGAGGCT TAAGGAGGAG GAAGAAGACA AGAAACGCAA AGAGGAGGAG GAGGCAGAGG ACAAGGAGGA T  81

But, for my scenarios, the prefering reslult is:

[Alignment 2]
seq1 1  CAGGACGAGG AGCAGAGG-- ---------- ---------- ---------- ---------- ---------- ACAAGGAGGA T  29
        |||||||||| ||||||||** ********** ********** ********** ********** ********** |||||||||| |    
seq2 1  CAGGACGAGG AGCAGAGGCT TAAGGAGGAG GAAGAAGACA AGAAACGCAA AGAGGAGGAG GAGGCAGAGG ACAAGGAGGA T  81

For DNA variant naming, [Alignment 2] but not [Alignment 1] is complied with HGVS 3' rule (https://varnomen.hgvs.org/recommendations/general/)
According to my understanding, when there are same score alignment end points, parasail will choose the most left one (I want this), but for traceback this most left alignment, parasail will output the most right one (I don't want this).
Can I get a traceback option to output the most left side traceback of the most left side alignment?

@GooJan
Copy link
Author

GooJan commented Nov 15, 2023

I guess that outputing the most right side traceback is the consequence of internal traceback strategy.
At now, I have try some tricks to get the most left side alignment.
That is: Antiparallele seq1 and seq2, then do alignment, and then antiparallele the alignment reslult.
By this effort, I can get [Alignment 2].
But, this trick rely on another alignment option: When there are same score alignment end points, choose the most right one.
Can I get this option?

@jeffdaily
Copy link
Owner

Which alignment function are you using? local, global, semi-global?

When there are same score alignment end points, choose the most right one.

There was a recent request for allowing all same-score alignments to be returned. Would that satisfy your request, as well?

@GooJan
Copy link
Author

GooJan commented Nov 16, 2023

parasail.sg_qx_trace(seq1, seq2, open=6, extend=1, matrix=parasail.Matrix('nuc44')) output [Alignment 1],
parasail.nw_trace(seq1, seq2, open=6, extend=1, matrix=parasail.Matrix('nuc44')) also output [Alignment 1].
parasail.sw_trace(seq1, seq2, open=6, extend=1, matrix=parasail.Matrix('nuc44')) output in this manner, too.

@GooJan
Copy link
Author

GooJan commented Nov 16, 2023

I noticed the recent request for allowing all same-score alignments to be returned. I don't need this feature, partially becase it maybe hard to choose the most left-side traceback from these returns.
Although many many align tools output the most right-side traceback like parasail do (maybe consequence of DP matrix traceback strategy), it's the most left-side traceback which show better biologic significance.
Let me explain more about this:
For years, naming of a gene sequence change is varying. This is partially due to same score alignments exists.
If we choose the most right-side traceback like many align tools do, the variant point will goes up to 5' side of DNA (see [Alignment 1] above). If we choose the most left-side traceback, the variant point will goes down to 3' side of DNA (see [Alignment 2] above). Since mRNA transcript direction is from 5' to 3', the most left-side traceback is more natually in really biologic world. This is what gene variant naming committee likes HGVS recommended. They publish rules to unify naming of a gene sequence change. To achieve this goal, DNA align tool which output the most left-side traceback helps great.

@GooJan
Copy link
Author

GooJan commented Nov 16, 2023

In a word, at now, parasail output the most right-side traceback of the most left-side alignment.
My propersal is any of the following:
(1) output the most left-side traceback of the most left-side alignment.
(2) output the most right-side traceback of the most right-side alignment. And it is the user's duty to convert the traceback result to the most left-side traceback of the most left-side alignment.
I think, propersal (2) is easier to implement :)

@GooJan
Copy link
Author

GooJan commented Nov 16, 2023

1
2

@palakpsheth
Copy link

We also would find this feature very useful to output either left most or right most alignment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants