Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results from capped_distance() with "nsgrid" search and atom in the center of the box #2919

Closed
a-anik opened this issue Aug 26, 2020 · 12 comments

Comments

@a-anik
Copy link

a-anik commented Aug 26, 2020

Expected behavior

capped_distance() function should always give the correct result regardless of the employed search method and atomic positions.
In the example below the expected result for cutoff=3.2 is:

  • cutoff=3.2
    bruteforce: 2497 pairs
    pkdtree: 2497 pairs
    nsgrid: 2497 pairs

Actual behavior

When one of the atoms of the reference group is positioned exactly in the center of the box, capped_distance() sometimes returns erroneous results for "nsgrid" search method.

  • cutoff=2.8
    bruteforce: 1115 pairs
    pkdtree: 1115 pairs
    nsgrid: 1115 pairs
  • cutoff=3.2
    bruteforce: 2497 pairs
    pkdtree: 2497 pairs
    nsgrid: 2510 pairs

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.lib.distances import capped_distance
from MDAnalysis.transformations.translate import center_in_box
from MDAnalysis.tests.datafiles import PDB_xvf

u = mda.Universe(PDB_xvf)
ag = u.select_atoms('index 0')
u.trajectory.ts = center_in_box(ag)(u.trajectory.ts)

box = u.dimensions
reference = u.select_atoms('protein')
configuration = u.select_atoms('not protein')

for cutoff in [2.8, 3.2]:
    print(f"* cutoff={cutoff}")
    for method in ['bruteforce', 'pkdtree', 'nsgrid']:
        pairs, distances = capped_distance(
            reference.positions,
            configuration.positions,
            max_cutoff=cutoff,
            box=box,
            method=method,
            return_distances=True,
            )
        print(f"{method}: {len(pairs)} pairs")

Current version of MDAnalysis

  • MDAnalysis 1.0.0 (conda-forge)
  • Python 3.7.7
  • Ubuntu 18.04.5 LTS
@zemanj
Copy link
Member

zemanj commented Aug 26, 2020

Hi @a-anik,

thank you for your detailed bug report. It looks like this is a duplicate of #2345 (see especially my comment therein).

@zemanj
Copy link
Member

zemanj commented Aug 26, 2020

Ah, sorry, I posted the wrong bug report. This one sounds rather like a duplicate of #2229 (see my comment for an explanation). Let me know if you think that this is not the case!

@a-anik
Copy link
Author

a-anik commented Aug 26, 2020

Yes, it looks like a part of that ongoing issue with the grid-based neighbor searches. There was a discussion in #2345 to make it opt-in. I wonder why it made the way into the latest release as a default choice for some systems. Periodic KD-trees would seem to be a better option. I found this bug while using the new hbond_analysis code. There is no way to control search parameters (and PBC behavior for that matter) in that module, so I'm stuck with off-tree patches for now.

@zemanj
Copy link
Member

zemanj commented Aug 26, 2020

You are right, this shouldn't even be part of MDAnalysis at the moment. Taken seriously, the bugs in NSGrid have the potential to make almost any result unreliable as NSGrid is used by capped_distance, which, in turn, is used all over the place in the entire code base...
@MDAnalysis/coredevs Could you please comment as this seems serious.

@richardjgowers
Copy link
Member

Yeah I'm looking at this

@richardjgowers richardjgowers added this to the 1.0.x milestone Aug 27, 2020
@IAlibay
Copy link
Member

IAlibay commented Sep 2, 2020

Looks like we'll want to get 1.0.1 done soon-ish (see #2798), would it make sense to temporarily take the performance hit and just disable nsgrid (or at least make _determine_method return pkdtree isntead of nsgrid)?

@zemanj
Copy link
Member

zemanj commented Sep 2, 2020 via email

@orbeckst
Copy link
Member

orbeckst commented Sep 2, 2020

I have to agree with @zemanj – scientific software has to be correct first and anything else comes second.

@orbeckst
Copy link
Member

orbeckst commented Sep 2, 2020

I raised #2930 — if @richardjgowers is faster then we use the correct solution, otherwise we should disable nsgrid-use for right now, at a minimum for 1.0.1.

@richardjgowers
Copy link
Member

richardjgowers commented Sep 3, 2020 via email

@richardjgowers richardjgowers mentioned this issue Sep 7, 2020
4 tasks
@orbeckst orbeckst modified the milestones: 1.0.1, 1.0.2 Oct 24, 2020
@orbeckst
Copy link
Member

For 1.0.1 we will disable nsgrid (see #2930 and PR #3004 ).

richardjgowers added a commit that referenced this issue Feb 28, 2021
Rewrote lib.nsgrid module

Issues #2919 #2345 #2229
@richardjgowers richardjgowers mentioned this issue Feb 28, 2021
4 tasks
richardjgowers added a commit that referenced this issue Mar 7, 2021
* Fixed NSGrid

Rewrote lib.nsgrid module

Issues #2919 #2345 #2229

* enabled nsgrid again

* bumped version to 1.0.2-dev

* remove nsgrid warnings

* regenerated nsgrid C files

* Revert removal of nsgrid from test_distances

Co-authored-by: IAlibay <irfan.alibay@gmail.com>
@orbeckst
Copy link
Member

Give me 48h, I think there’s a fix.

These were long 2 days ;-).

I am glad that this one is done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants