Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000

mnmelo · 2020-10-20T12:42:59Z

Hi,
I realized that there is potential for optimization of most AtomGroup methods that implement compounds. Currently, compounds are implemented by cleverly splitting into compound groups of the same size, to allow vectorization at least on a per-size basis. This was a great idea! (git blame tells me kudos go to @kain88-de and @richardjgowers).

Current improvable aspects:

By default, compound_indices are always being sorted, to be robust to cases where compounds in an AtomGroup are not contiguous. This is slow and unneeded for the most common use case of contiguous compounds. A contiguity check is cheap (this was already implemented for AtomGroup.center only, in the resolution of AtomGroup.center fails for compounds+wrapping #2984 and Fixes AtomGroup.center when using compounds + unwrapping #2992);
Most compound-using methods currently do their own splitting. This is inefficient code- and readability-wise, and should be concentrated in a dedicated method;
Splitting masks for a given compound type need to be calculated only once per AtomGroup and could be cached. This would positively impact a number of methods, especially when iterating over frames (with due care to invalidate such caches if the user changes the AtomGroup's compound indices). Even within a single calculation this can already be impactful: when using AtomGroup.center with unwrap=True indices are split once to do the unwrapping and then again to do the center calculation.

I'd like to work on such refactoring, including some benchmarking for performance, but I need input so I don't set off on an effort that will either run against current code philosophy or is too large for anyone to review properly.

Currently needed pointers:

Should this be a project?
What is the recommended way of implementing caching?
This will enlarge the memory footprint of an AtomGroup due to caching. I expect about 1x-to-2x the size of ._ix per cached compound-type splitting. Is this acceptable?
What test cases should be used for performance benchmarking? This will of course be more visible for large AtomGroups with many compounds, but I wouldn't want to harm the small-case performance. I thought about systems with a combination of the following characteristics: number of atoms (100, 10k, 100k), number of compounds (1, 10, 100), heterogeneity of compound sizes (all_same_size, all_different), contiguity (ordered, unordered);
What should be the scope of the enhancements? For instance, I stumbled upon this while trying to speed up AtomGroup.unwrap (Implements compact wrapping and nojump, and updates associated transforms. #2982), and there are unwrap-specific aspects that can be tweaked to take advantage of a centralized splitting solution. Should those be left out for an unwrap-specific effort?
Following from the previous point: what other aspects do you think could be optimized along with centralized splitting?
Who wants to join in on the fun? 😀

The text was updated successfully, but these errors were encountered:

mnmelo · 2020-10-20T21:20:26Z

@zemanj, I see from the discussion in #2376 that you already have a headstart in unwrap optimization. What do you think of how this refactor could impact it? Could it be a base from where to merge your optimization work?

mnmelo · 2020-10-23T03:05:30Z

So, I went ahead and implemented most of what I put forth above in the draft PR #3005.

Besides a cleaner code, performance is improved. I timed the AtomGroup.center method (best time of 1000 repeats) for several combinations of conditions, and almost all cases become faster, and some by more than twofold:

Legend:

n_atoms: total number of atoms in the system;
n_comp.: total number of compounds in the system;
heter.: (bool) whether compounds are all of the same size;
contig.: (bool) whether compound indices are monotonically increasing;
current: timing (in seconds) of current implementation in develop (beb5232);
speedup: ratio of timings between current and the optimized code, with caching disabled (higher is better);
speedup_cached: ratio of timings between current and the optimized code, with caching (higher is better).

The heter. and contig. flags were ignored for the combinations that don't make sense (like having a single compound and heter.=True or contig.=False).

n_atoms   n_comp.  heter.  contig.  current   speedup  speedup_cached
100       1        1       1        1.02e-04  1.36     3.21     
100       1        1       0        1.01e-04  1.22     3.05     
100       1        0       1        1.07e-04  1.24     3.43     
100       1        0       0        1.01e-04  1.37     3.18     
100       10       1       1        1.29e-04  1.63     4.46     
100       10       1       0        1.45e-04  1.79     4.97     
100       10       0       1        2.96e-04  1.48     3.32     
100       10       0       0        3.73e-04  1.50     2.76     
100       100      1       1        1.53e-04  1.86     4.68     
100       100      1       0        1.64e-04  1.91     5.76     
100       100      0       1        1.50e-04  1.76     4.67     
100       100      0       0        1.71e-04  1.87     5.97     
10000     1        1       1        7.90e-04  1.14     1.53     
10000     1        1       0        7.82e-04  1.13     1.49     
10000     1        0       1        7.84e-04  1.16     1.45     
10000     1        0       0        7.75e-04  1.12     1.44     
10000     10       1       1        8.38e-04  1.22     1.56     
10000     10       1       0        1.34e-03  1.38     2.56     
10000     10       0       1        1.36e-03  1.42     2.17     
10000     10       0       0        1.83e-03  1.31     2.71     
10000     100      1       1        1.08e-03  1.65     1.98     
10000     100      1       0        1.69e-03  1.51     2.94     
10000     100      0       1        3.43e-03  2.13     3.47     
10000     100      0       0        4.02e-03  1.75     3.94     
1000000   1        1       1        7.93e-02  1.31     1.46     
1000000   1        1       0        7.75e-02  1.18     1.43     
1000000   1        0       1        7.90e-02  1.25     1.40     
1000000   1        0       0        7.88e-02  1.24     1.39     
1000000   10       1       1        8.41e-02  1.33     1.50     
1000000   10       1       0        1.93e-01  1.12     1.77     
1000000   10       0       1        1.09e-01  1.51     2.18     
1000000   10       0       0        2.23e-01  0.98     1.56     
1000000   100      1       1        1.07e-01  1.71     1.98     
1000000   100      1       0        2.66e-01  1.24     2.01     
1000000   100      0       1        3.87e-01  2.66     7.47     
1000000   100      0       0        5.63e-01  1.53     4.10

Only in one case was there a slowdown, and the small 2% difference is likely not significant. All others were speedups, with a maximum uncached speedup of 2.66-fold and cached of 7.47-fold.

I'm continuing this discussion in the draft PR #3005. Pointers are still needed, especially on cache management.

mnmelo added Component-Core help wanted performance refactoring labels Oct 20, 2020

This was referenced Oct 20, 2020

Some methods that accept the compound keyword fail or don't return the expected set of values #3002

Closed

Optimized and refactored common compound code. #3005

Merged

mnmelo mentioned this issue Oct 23, 2020

why is (cached) fragments slow? #2376

Closed

mnmelo mentioned this issue Nov 3, 2020

Implements compact wrapping and nojump, and updates associated transforms. #2982

Open

4 tasks

richardjgowers closed this as completed in #3005 Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000

Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000

mnmelo commented Oct 20, 2020

mnmelo commented Oct 20, 2020

mnmelo commented Oct 23, 2020

Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000

Use of compounds involves excessive sorting and fancy-indexing of AtomGroups and compound indices. #3000

Comments

mnmelo commented Oct 20, 2020

mnmelo commented Oct 20, 2020

mnmelo commented Oct 23, 2020