Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant method to return actual states #739

Open
hyanwong opened this issue Oct 14, 2022 · 4 comments
Open

Variant method to return actual states #739

hyanwong opened this issue Oct 14, 2022 · 4 comments
Milestone

Comments

@hyanwong
Copy link
Member

hyanwong commented Oct 14, 2022

@szhan and others are finding it pretty inconvenient to deal with mismatching between the underlying integers in the genotypes array returned by sample_data.variants() and ts.variants(). How about if we provided a method on the two variant classes to return the encoded variation as a numpy string array. It would be inefficient for large-scale stuff, but I think it might save many errors in smaller-scale testing, etc. Something like the following would probably work for SampleData instances, and an equivalent function could be created for tskit variants. Hopefully making it a function would make it clear to the user that a potentially inefficient calculation was going on under the hood.

@attr.s
class Variant:
    """
    A single variant. Mirrors the definition in tskit.
    """
    site = attr.ib()
    genotypes = attr.ib()
    alleles = attr.ib()

    def genotypes_as_strings(self):
        """
        Returns the variants at this site as an array of strings: Note, however, that it is
        much more efficient to work with the underlying integer representation as
        returned by the ``.genotypes`` property.
        """
        return np.array(alleles)[genotypes]
@jeromekelleher
Copy link
Member

This is useful all right, we use something like this in a bunch of places.

@hyanwong
Copy link
Member Author

Adding to 0.3.1 as this is a trivial but useful addition

@hyanwong hyanwong added this to the Release 0.3.1 milestone Oct 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Oct 30, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Oct 30, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Oct 30, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Oct 31, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Oct 31, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Jan 7, 2023
hyanwong added a commit to hyanwong/tskit that referenced this issue Jan 7, 2023
@benjeffery
Copy link
Member

+1 on this as I got very confused writing the sgkit ancestral allele tests.

@hyanwong
Copy link
Member Author

Over in tskit-dev/tskit#2617 @jeromekelleher suggested we call this method .states()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants