Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unweighted_rf_distance (symmetric_distance) function and tests #2643

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

Billyzhang1229
Copy link
Contributor

Add Unweighted Robinson–Foulds Distance (symmetric difference of two trees) function with tests

See also:

@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.96%. Comparing base (8a958e9) to head (3a11d75).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2643      +/-   ##
==========================================
- Coverage   89.72%   86.96%   -2.77%     
==========================================
  Files          29       11      -18     
  Lines       31581    24358    -7223     
  Branches     6118     4501    -1617     
==========================================
- Hits        28336    21182    -7154     
+ Misses       1853     1817      -36     
+ Partials     1392     1359      -33     
Flag Coverage Δ
c-tests 86.55% <ø> (ø)
lwt-tests 80.78% <ø> (ø)
python-c-tests 88.98% <ø> (ø)
python-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 18 files with indirect coverage changes

@benjeffery
Copy link
Member

benjeffery commented Jan 23, 2023

@Billyzhang1229 The docs error here is fixed in main you just need to rebase. I can do that if needed - just ask.

@Billyzhang1229
Copy link
Contributor Author

@Billyzhang1229 The docs error here is fixed in main you just need to rebase. I can do that if needed - just ask.

@benjeffery Thank you so much! I think I could do the rebase part. But I have question about the coverage report, the report shows some decrease for the files that I didn't change, do I need to worry about it?

@benjeffery
Copy link
Member

@Billyzhang1229 The docs error here is fixed in main you just need to rebase. I can do that if needed - just ask.

@benjeffery Thank you so much! I think I could do the rebase part. But I have question about the coverage report, the report shows some decrease for the files that I didn't change, do I need to worry about it?

codecov has been pretty unreliable in reporting numbers. The comments on the PR are usually good. I'll double check this before merge, so don't worry about it.

@Billyzhang1229 Billyzhang1229 marked this pull request as ready for review January 23, 2023 22:53
@jeromekelleher
Copy link
Member

This looks great @Billyzhang1229, thanks!

I think the main thing we need here is a bit more testing to reassure ourselves that what we're computing is the same as what'd get using existing methods. How about a small simulation based test like:

@pytest.mark.parametrize("n", [2, 3, 5, 10, 20])
def test_rf_distance_against_dendropy(n):
      trees = []
      for seed in [42, 43]:
           ts = msprime.sim_ancestry(n, ploidy=1, random_seed=seed)
           trees.append(ts.first())
       rf1 = trees[0].rf_distance(trees[1])
       rf2 = dendropy_rf_distance(*trees)
       assert rf1 == rf2

Other than that, we just need to think about the naming and be a bit more precise about the documentation - but I think myself and @hyanwong can help with that. @hyanwong - thoughts about function naming here?

@Billyzhang1229
Copy link
Contributor Author

@jeromekelleher Thanks for the feedback! I agree that additional testing would be beneficial to ensure the accuracy of our computations.

Naming

I am currently using symmetric_distance() and unweighted_rf_distance() for function names, the proper name should be unweighted_rf_distance. The term symmetric_distance is a made-up term used to represent the symmetric difference between trees.

I'm not sure which one is better for naming. So I just wrapped the symmetric_distance() with unweighted_rf_distance().

def unweighted_rf_distance(self, other):
    return self.symmetric_distance(other)

In the future, we may also want to consider adding a weighted_rf_distance() function or simply rf_distance()?


If there is anything else you would like me to add or change, please let me know.

@jeromekelleher
Copy link
Member

I think rf_distance is a good because it aligns with our existing Tree.kc_distance method. We can document that it's the unweighted distance, and if we like later can allow for the weighted version via a weighted argument.

I don't think symmetric_distance as a synonym helps much and I would drop it, as people are familiar with RF distance.

We will need to be quite precise about what we're computing here in the documentation, but I can do that in a follow-on commit.

@hyanwong
Copy link
Member

I agree with Jerome. rf_distance allows us to implement rf_distance(weighted=True) later. Everyone calls it the RF distance. We can mention the term "symmetric_distance" in the docstring to make it searchable.

@hyanwong
Copy link
Member

In fact, if we want to, we could allow the weighted parameter and simply raise a NotImplementedError if it's True, simply to give us a reminder to implement it every time we use it! But that's probably an abuse of the API :)

@jeromekelleher
Copy link
Member

In fact, if we want to, we could allow the weighted parameter and simply raise a NotImplementedError if it's True, simply to give us a reminder to implement it every time we use it! But that's probably an abuse of the API :)

I don't see much point in this - it's forward compatible by not having the argument and we don't need it for this PR.

@hyanwong
Copy link
Member

Sorry that we haven't got around to merging this yet @Billyzhang1229. It would be great to get this in, and I see you've added extra tests. Do you know how this performs in the case of trees with polytomies? There's an interesting publication at https://onlinelibrary.wiley.com/doi/full/10.1111/cla.12540 that sounds like it might give some useful pointers in this case.

@benjeffery benjeffery self-assigned this Sep 23, 2024
@benjeffery
Copy link
Member

@mergify rebase

Copy link
Contributor

mergify bot commented Sep 23, 2024

rebase

✅ Branch has been successfully rebased

@benjeffery benjeffery added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 24, 2024
@benjeffery
Copy link
Member

@Mergifyio rebase

Add more tests using other existing package
Copy link
Contributor

mergify bot commented Sep 24, 2024

rebase

✅ Branch has been successfully rebased

@benjeffery benjeffery merged commit 860a9c5 into tskit-dev:main Sep 24, 2024
12 of 13 checks passed
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants