Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distopia! #3914

Merged
merged 50 commits into from
Feb 15, 2023
Merged

Add distopia! #3914

merged 50 commits into from
Feb 15, 2023

Conversation

hmacdope
Copy link
Member

@hmacdope hmacdope commented Nov 10, 2022

Fixes #3783

Following bump of conda-forge feedstock of distopia to 0.2.0, including the inplace API we can now integrate distopia into MDAnalysis.

  • Note that currently only calc-bonds is implemented.
  • Note that a kludge is required to get the input array to the right datatype.

Changes made in this Pull Request:

  • Adds distopia to MDA distances library,

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@hmacdope hmacdope self-assigned this Nov 10, 2022
@hmacdope hmacdope added the CZI-performance performance track of CZIEOSS4 grant label Nov 10, 2022
@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Base: 93.52% // Head: 93.53% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (04bb71b) compared to base (94904b5).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3914   +/-   ##
========================================
  Coverage    93.52%   93.53%           
========================================
  Files          190      191    +1     
  Lines        25037    25062   +25     
  Branches      3543     3547    +4     
========================================
+ Hits         23417    23442   +25     
  Misses        1099     1099           
  Partials       521      521           
Impacted Files Coverage Δ
package/MDAnalysis/lib/_distopia.py 100.00% <100.00%> (ø)
package/MDAnalysis/lib/distances.py 97.68% <100.00%> (+0.06%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@orbeckst
Copy link
Member

Do we have ASV benchmarks that could show the before/after?

@hmacdope
Copy link
Member Author

Do we have ASV benchmarks that could show the before/after?

I'll run them today. :)

@orbeckst
Copy link
Member

I was more thinking about having a benchmark for calc_bonds and friends added to benchmarks in a separate PR. This this PR can extend them.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is in the works since it's WIP, but since discussion is happening:

Needs:

  • A CI runner (can just use extra conda deps argument to add a py3.10 runner with distopia)
    • My view is that we don't need to do a full sweep here, distopia already tests relevant OS/Python versions, so it shouldn't be needed here.
  • Docs

.github/actions/setup-deps/action.yaml Show resolved Hide resolved
azure-pipelines.yml Outdated Show resolved Hide resolved
maintainer/conda/environment.yml Outdated Show resolved Hide resolved
@hmacdope
Copy link
Member Author

I was more thinking about having a benchmark for calc_bonds and friends added to benchmarks in a separate PR. This this PR can extend them.

@orbeckst there is a benchmark for calc_bonds, but only handles the no-box case. I think there is an open PR to add PBC based distance benchmarks in #3475.

@hmacdope
Copy link
Member Author

See discussion on the distopia issue about whether it should be drop in replacement or selectable backend.

@hmacdope
Copy link
Member Author

@richardjgowers, @orbeckst I have run some timings comparing distopia, MDA bindings to distopia and also to MDTraj/freud.

First thing to clear up is Richard and I had a discussion about checking the overhead of the _run() backend selector function and it had almost no effect from my benchmarks, so the rest of the benchmarks were run this way.

TLDR of this whole thing:

  1. Our C++ layer is inherently slow probably due to problem 2
  2. Our use of mixed precision and various casting that needs to take place makes everything sluggish.

Here is a comparison showing the time required to calculate N distances (x axis), fully complying to our API which takes float coordinates and returns double. As distopia doesn't allow mixed precision, this requires we cast the input datatype to float32 and then upcast to the result to float64 with something that looks like this

                    # need explicit branch on backend to prepare input types correctly
                    # must assign to result to get correct reference on memview
                    bondlengths = _run("calc_bonds_ortho_float",
                         args=(coords1, coords2, box[:3]),
                         kwargs={'results': bondlengths.astype(np.float32)},
                         backend=backend)
                    # upcast is currently required
                    bondlengths = bondlengths.astype(np.float64)

compare_dists-incast-outcast

we can see that the MDA distopia bindings (purple) are far off its theoretical peak (blue) primarily due to the input and output casting required.

What happens if we remove the output cast which is an API break for MDA (guarantees double output)

compare_dists-incast-only

We can see that the performance is a lot better.

While we can remove the input cast due to distopia requiring consistent precision, we can subtract its approximate overhead
compare_dists-simulated_zerocast

With overhead of casting approximately removed (brown line) we are very close to the distopia-only maximum speed.

Hopefully this all makes sense.

My main proposal is we remove the guarantee that results are returned as doubles in 3.0. If people are amenable to this we can continue to dicsuss in a new issue?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looking forward to going single precision in 3.0 for these reasons

@hmacdope
Copy link
Member Author

LGTM. Looking forward to going single precision in 3.0 for these reasons

Issue raised #3927

@hmacdope hmacdope requested a review from IAlibay January 8, 2023 01:47
@hmacdope
Copy link
Member Author

hmacdope commented Jan 8, 2023

Hang on some issues with passing through args and returning the right thing

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

Btw, we don't have to follow the black part of darken, it's more of an aid than anything else.

package/MDAnalysis/lib/distances.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/distances.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/distances.py Outdated Show resolved Hide resolved
@hmacdope hmacdope requested a review from IAlibay January 16, 2023 01:57
@hmacdope
Copy link
Member Author

I think this should be good for re-review

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from the one thing that I'm not sure if it got resolved - lgtm! (sorry about the delay here)

@hmacdope hmacdope merged commit d27a32a into MDAnalysis:develop Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-lib CZI-performance performance track of CZIEOSS4 grant enhancement
Projects
Development

Successfully merging this pull request may close these issues.

Integrate the result of the Distopia project into MDAnalysis
4 participants