Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Variant.__repr__ #2694

Closed
jeromekelleher opened this issue Jan 14, 2023 · 7 comments · Fixed by #2695
Closed

Implement Variant.__repr__ #2694

jeromekelleher opened this issue Jan 14, 2023 · 7 comments · Fixed by #2695
Labels
enhancement New feature or request good first issue Good for newcomers Python API Issue is about the Python API

Comments

@jeromekelleher
Copy link
Member

The __str__ implementation of Variant looks nice but is quite unhelpful when you want to quickly inspect the actual values. We should implement a simple __repr__ that gives a string version of the raw values.

@jeromekelleher jeromekelleher added enhancement New feature or request good first issue Good for newcomers Python API Issue is about the Python API labels Jan 14, 2023
@charandeepsinghb
Copy link

I want to work on this issue. Please give me one chance. I have understood where str is used in Variant.

I am very keen to work in open source.
I want to learn and continue to work on tskit.
I'd really appreciate any kind of response to this comment.

Thank you

@benjeffery
Copy link
Member

Hi @charandeepsinghb and thanks for your interest in tskit!

The aim is to add a __repr__ function to Variant that shows the genotypes. Something like:
return ','.join(self.alleles[gt] for gt in self.genotypes)

There is a development guide here: https://tskit.dev/tskit/docs/latest/development.html

@jeromekelleher
Copy link
Member Author

Thanks for your interest in contributing to tskit @charandeepsinghb ❤️

I think we want something simpler than that @benjeffery, just something that shows the raw data held in the variant object literally, like say:

def __repr__(self):
     d = {
         "isolated_as_missing": self.isolated_as_missing,
         "site": self.site,
         "samples": self.samples,
         "alleles": self.alleles,
         "genotypes": self.genotypes,
    } 
    return f"Variant({repr(d)})"

I think this will do it? Hopefully this will give enough information for debugging without spewing out megabytes of text when the arrays are big, since:

>>> import numpy as np
>>> a = np.zeros(10000)
>>> a
array([0., 0., 0., ..., 0., 0., 0.])
>>> repr(a)
'array([0., 0., 0., ..., 0., 0., 0.])'

The Python docs for repr say:

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a repr() method.

So I think this is roughly OK?

@charandeepsinghb
Copy link

I'm really am sorry to say that I couldn't setup the project in the time available to me, maybe, also I can't set it up in my current system (non root environment to install and setup conda and stuff). I'm really sorry to disappoint you guys. It really meant a lot to me to hear anything from you. I'll for sure try to set it up in future and make some development if it will be possible. For now I can't work on this. although you explained the issue pretty well to me and I understand that how one would need to add a repr for quick inspection while debugging (because you guys explained so well). Thank you for your support.

@jeromekelleher
Copy link
Member Author

No problem @charandeepsinghb, totally understood. Thanks for your interest anyway!

@chriscrsmith
Copy link
Contributor

I'd like to work on this issue!

@chriscrsmith
Copy link
Contributor

chriscrsmith commented Jan 16, 2023

Opened PR #2695

I noticed if a site has experienced many mutations the repr() output is still quite large. Do you think leave it as is, or do something to abbreviate the number of mutations?

@mergify mergify bot closed this as completed in #2695 Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants