Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add two point distance function #1512

Closed
wants to merge 1 commit into from
Closed

Conversation

kain88-de
Copy link
Member

Fixes #1262

Changes made in this Pull Request:

TODO

  • type checking of input a,b
  • type checking of input box including the format that is used.
  • correct distance for cubic boxes
  • correct distance for rhombic boxes

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@richardjgowers
Copy link
Member

I'm not really a fan of this method. I know it's not meant to be high performance, but all those linalg calls are just slower.

I also think we'll need some sort of minimum image function anyway, (ie given a vector and a box, return a vector which is the minimum vector), so I'd rather this function use that.

@kain88-de
Copy link
Member Author

Returning a vector isn't difficult. But for correctness tests this implementation might still be useful. I have tested it on a simulation with a cubic box and the algorithm worked perfect. But if someone has a triclinic cell example that would be great.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you want to merge this, I would be in favor. Main issue is the docs and no tests.

sa = np.dot(ibox, a)
sb = np.dot(ibox, b)
s = sa - sb
s -= np.round(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is np.round() different from np.rint()? – the latter would be the canonical function for this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are equivalent

return np.linalg.norm(a - b)

box = triclinic_vectors(box)
ibox = np.linalg.inv(box)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can explicitly calculate the inverse as in #1475.

This is marginally faster in numpy but might be a lot faster in cython.

import numpy as np

def make_box():
    a = [np.random.random(), 0, 0]
    b = np.zeros(3)
    b[:2] = np.random.random(2)
    c = np.random.random(3)
    return 20 * np.array([a, b, c])

def ibox(box):
    a,b,c = box
    return np.array([[1/a[0], 0, 0], [-b[0]/(a[0]*b[1]), 1/b[1], 0], [(b[0]*c[1] - b[1]*c[0])/(a[0]*b[1]*c[2]), -c[1]/(b[1]*c[2]), 1/c[2]]])

Timing:

box = make_box()
%timeit ibox(box)
# 7.33 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.linalg.inv(box)
#9.98 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Correctness

In [42]: np.dot(box, ibox(box))
Out[42]:
array([[  1.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.90479818e-18,   1.00000000e+00,   0.00000000e+00],
       [  3.40613054e-17,  -7.40127872e-17,   1.00000000e+00]])

(Closed form from sympy...).

from sympy.interactive.printing import init_printing
init_printing(use_unicode=False, wrap_line=False, no_global=True)
a0, b0, b1, c0, c1, c2 = sym.symbols("a0, b0, b1, c0, c1, c2")
h = sym.Matrix([[a0, b0, c0], [0, b1, c1], [0, 0, c2]])
h.inv()

In [35]: h
Out[35]:
[a0  b0  c0]
[          ]
[0   b1  c1]
[          ]
[0   0   c2]

In [37]: h.inv()
Out[37]:
[1    -b0   b0*c1 - b1*c0]
[--  -----  -------------]
[a0  a0*b1     a0*b1*c2  ]
[                        ]
[     1          -c1     ]
[0    --        -----    ]
[     b1        b1*c2    ]
[                        ]
[                1       ]
[0     0         --      ]
[                c2      ]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with cython this can be speed up. But I have to rewrite to use less variables to get the full speedup and not keep calling python functions.

from numba import jit
import numpy as np

@jit
def jit_ibox(box):
    inv = np.zeros(box.shape)
    inv[0, 0] = 1 / box[0, 0]
    inv[1, 0] = -box[1,0] / (box[0,0]*box[1,1])
    inv[1, 1] = 1 / box[1,1]
    inv[2, 0] = (box[1,0]*box[2,1] - box[1,1]*box[2,0])/(box[0,0]*box[1,1]*box[2,2])
    inv[2, 1] = -box[2,1]/(box[1,1]*box[2,2])
    inv[2, 2] = 1 / box[2,2]
    return inv

note: I tested this with numba because it has similar performance but much shorter test cycles

%timeit jit_ibox(box)
656 ns ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising; what is the performance of np.linalg.inv() on your machine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sour don't what you report. It might not help much though because of the dot products that are used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry autocorrect. My performance of Linalg.inv is comparable to yours.

Copy link
Member

@orbeckst orbeckst Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. And yes, you're right, of course, the matrix-vector multiplications are bad (although I don't know if np.dot uses any optimized BLAS/LAPACK)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does but the python function call as well as the input verification that numpy does is expensive. Maybe we can unpack that more using explicit formulas calculated with sympy.

@@ -208,6 +208,23 @@ def _check_lengths_match(*arrays):
raise ValueError("Input arrays must all be same shape"
"Got {0}".format([a.shape for a in arrays]))


def distance(a, b, box=None):
"""calc distances between two points. Uses algorithm from Tuckerman
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc string should state that this takes any PBC into account. I would also reference Tuckerman properly (and possibly even write out the math because we would want to use it as a test case for correctness and in this case, one cannot be explicit enough.)

@orbeckst
Copy link
Member

orbeckst commented Aug 2, 2017

But if someone has a triclinic cell example that would be great.

In [40]: tric = mda.Universe(MDAnalysis.tests.datafiles.PSF_TRICLINIC, MDAnalysis.tests.datafiles.DCD_TRICLINIC)

In [41]: tric.dimensions
Out[41]:
array([ 35.44603729,  35.06156158,  34.15850449,  91.32802582,
        61.7352066 ,  44.4070282 ], dtype=float32)

@kain88-de
Copy link
Member Author

@orbeckst do you also have a testcase that doesn't require our test data? I would like to be able to place a bead as pure numpy arrays and calculate the minimal image by hand. These super basic methods shouldn't depend on our test files

@jbarnoud
Copy link
Contributor

jbarnoud commented Aug 2, 2017

@kain88-de You can copy the box and the coordinates of two points from one frame of PSF_TRICLINIC.

@kain88-de
Copy link
Member Author

@kain88-de You can copy the box and the coordinates of two points from one frame of PSF_TRICLINIC

I would like to have points for cell types that I can also check by hand. If you want to provide some I'm happy so use them. I think it will take me some time to generate them myself.

@kain88-de kain88-de closed this Sep 21, 2017
@kain88-de kain88-de deleted the single-atom-distance branch September 21, 2017 20:43
@kain88-de
Copy link
Member Author

See #1475 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants