-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000
Comments
@zemanj, I see from the discussion in #2376 that you already have a headstart in unwrap optimization. What do you think of how this refactor could impact it? Could it be a base from where to merge your optimization work? |
So, I went ahead and implemented most of what I put forth above in the draft PR #3005. Besides a cleaner code, performance is improved. I timed the Legend:
The
Only in one case was there a slowdown, and the small 2% difference is likely not significant. All others were speedups, with a maximum uncached speedup of 2.66-fold and cached of 7.47-fold. I'm continuing this discussion in the draft PR #3005. Pointers are still needed, especially on cache management. |
Hi,
I realized that there is potential for optimization of most
AtomGroup
methods that implement compounds. Currently, compounds are implemented by cleverly splitting into compound groups of the same size, to allow vectorization at least on a per-size basis. This was a great idea! (git blame
tells me kudos go to @kain88-de and @richardjgowers).Current improvable aspects:
compound_indices
are always being sorted, to be robust to cases where compounds in anAtomGroup
are not contiguous. This is slow and unneeded for the most common use case of contiguous compounds. A contiguity check is cheap (this was already implemented forAtomGroup.center
only, in the resolution of AtomGroup.center fails for compounds+wrapping #2984 and Fixes AtomGroup.center when using compounds + unwrapping #2992);AtomGroup
and could be cached. This would positively impact a number of methods, especially when iterating over frames (with due care to invalidate such caches if the user changes theAtomGroup
's compound indices). Even within a single calculation this can already be impactful: when usingAtomGroup.center
withunwrap=True
indices are split once to do the unwrapping and then again to do the center calculation.I'd like to work on such refactoring, including some benchmarking for performance, but I need input so I don't set off on an effort that will either run against current code philosophy or is too large for anyone to review properly.
Currently needed pointers:
AtomGroup
due to caching. I expect about 1x-to-2x the size of._ix
per cached compound-type splitting. Is this acceptable?AtomGroups
with many compounds, but I wouldn't want to harm the small-case performance. I thought about systems with a combination of the following characteristics: number of atoms(100, 10k, 100k)
, number of compounds(1, 10, 100)
, heterogeneity of compound sizes(all_same_size, all_different)
, contiguity(ordered, unordered)
;AtomGroup.unwrap
(Implements compact wrapping and nojump, and updates associated transforms. #2982), and there areunwrap
-specific aspects that can be tweaked to take advantage of a centralized splitting solution. Should those be left out for anunwrap
-specific effort?The text was updated successfully, but these errors were encountered: