Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triclinic PBC treatment is not always correct #1475

Closed
orbeckst opened this issue Jul 11, 2017 · 15 comments
Closed

triclinic PBC treatment is not always correct #1475

orbeckst opened this issue Jul 11, 2017 · 15 comments

Comments

@orbeckst
Copy link
Member

orbeckst commented Jul 11, 2017

I recently looked at lib/include/include/calc_distances.h and I wonder if we are treating triclinic boxes correctly: In calc_distances._calc_distance_array_triclinic() it looks as if the box inverse is calculated in the same way as for orthorhombic boxes:

box_inverse[0] = 1.0 / box[0][0];
box_inverse[1] = 1.0 / box[1][1];
box_inverse[2] = 1.0 / box[2][2];

even though a triclinic box contains three vectors where some have off-diagonal non-zero entries:

x0  x1   x2   
0   y1   y2 
0    0   z3

The inverse of this matrix is something else:

In [16]: from sympy.interactive.printing import init_printing
In [17]: init_printing(use_unicode=False, wrap_line=False, no_global=True)
In [18]: import sympy as sym

In [19]: x0, x1, y1, x2, y2, z2 = sym.symbols("x0, x1, y1, x2, y2, z2")
In [20]: h = sym.Matrix([[x0, x1, x2], [0, y1, y2], [0, 0, z2]])
In [21]: h.inv()
Out[21]:
[1    -x1   x1*y2 - x2*y1]
[--  -----  -------------]
[x0  x0*y1     x0*y1*z2  ]
[                        ]
[     1          -y2     ]
[0    --        -----    ]
[     y1        y1*z2    ]
[                        ]
[                1       ]
[0     0         --      ]
[                z2      ]

In [22]: h
Out[22]:
[x0  x1  x2]
[          ]
[0   y1  y2]
[          ]
[0   0   z2]

Admittedly, I might not quite understand the algorithm that is used in _triclinic_pbc() where only diagonal elements of the inverse matrix are used.

How does this algorithm compare to the general algorithm for remapping into a triclinic unitcell proposed by Waasenaar [1] and later by Tuckerman [2] (see #136)? Tuckerman [2] in Appendix B, Eq B.9 provides the general algorithm for the minimum image convention:

  h := [a, b, c], a=(a1,a2,a3), ... (the matrix of box vectors)
  r_ij := r_i - r_j                 (difference vector)

  s_i = h^{-1} r_i  
  s_ij = s_i - s_j
  s_ij <-- s_ij - NINT(s_ij)        (general minimum image convention)
  r_ij = h s_ij

NINT(x) is the nearest integer function (e.g. numpy.rint()), h^{-1} is the
matrix-inverse of h.

Or is the approach taken with minimum_image_triclinic() completely different from the Tuckerman approach?

@richardjgowers can you explain?


NOTE: see the discussion below on the limitations of the "general image convention" (aka Waassenaar algorithm) as discussed in [1] Ch 3, Appendix C.


[1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2

[2] M. E. Tuckerman. Statistical Mechanics: Theory and Molecular Simulation.
Oxford University Press, Oxford, UK, 2010.

@jbarnoud
Copy link
Contributor

I did not look at the code, but I am currently calculating minimum distances between proteins in triclinic boxes. I just compared my MDAnalysis output for one of my systems with an output from gmx mindist and the result is the same.

@kain88-de
Copy link
Member

We noticed problems in my group as well. I'll see that I get an example we can post publicly

@richardjgowers
Copy link
Member

So the box_inverse isn't an inverse (great naming @richardjgowers!), but is only used to calculate the multiple of box vectors to take away from a given separation.

The algorithm I'm using is copied from domain.cpp which used to assume that the maximum separation was one box length, but it seems they've changed it to cater for multiple box separations.

There's probably a fancy mathematical term for this, but the way minimum image convention work is, it takes away multiples of the x vector of the box (which include y and z components) from the separation vector until the x distance is minimised. The number of multiples is nint(vector[0], box_inverse[0]). Then the x distance in the separation vector is fixed. You can then do the same for y which won't affect the x components as the box matrix is 0 for that dimension.

@kain88-de
Copy link
Member

Just a FYI I found a nice document explaining what boxes to choose in a simulation with graphics that show how the mapping to a triclinic unit cell works.

@jbarnoud
Copy link
Contributor

jbarnoud commented Aug 3, 2017

@Tsjerk's thesis is indeed my go-to document for anything unitcell related.

@kain88-de
Copy link
Member

I also now looked for test data. As far as I can see every MD code out there either doesn't test it or uses historical data as a comparison

@kain88-de
Copy link
Member

After looking through the mic implementations in various MD codes and into different papers (mostly the thesis of @Tsjerk) I now have a good understanding how to calculate distances in triclinic unit cells.
The Tuckerman algorithm which is implemented in a slightly different form in MDAnalysis is also used in major MD engines (openMM, lammps, GROMACS). That algorithm though is only an approximation and yields correct distances up to half the shortest box length. Here is a notebook illustrating that.

This might sound scary but even in a uniform filled box only maximum 25% of all distances possible are affected. In real simulations comparing proteins the number is likely lower given the simulation box is large enough.

But we should implement the distance calculation using a algorithm that looks into all neighboring images. Unfortunately this algorithm is a lot slower and not easily optimized by the compiler. I'm currently working on a specialized SIMD implementation.

@Tsjerk
Copy link

Tsjerk commented Sep 21, 2017 via email

@kain88-de
Copy link
Member

It seems my thesis was printed 4 years before the book from Tuckerman. It should be called the Wassenaar algorithm :)

Maybe yeah. I went with olivers nomenclature. But I'll remember it for the future.

The error will only affect larger distances which are not giving contacts anyway,

Calculating distances is not only interesting for a contact analysis. For example to calculate SAXS profiles one needs radial distribution functions between particles in the whole simulation box.

The second case is more tricky, but as you mention you can first do the trick with the box matrix and then see which distances need to be corrected, by investigating other images
Now it will be possible to also avoid that to some extent, as you also know where the distance vector points in the box, from which you can deduce how to correct it. Not entirely sure how to do that, without spending some time with pen and paper, but I'm confident it's possible (and much quicker than checking all images).

I thought that too. My first implementation actually use that idea and had good performance. But I went a little bit obsessive with performance for two days and in my final implementation using manual loop unrolling with SIMD doing the image search was the best option and only about 2 times slower then the Wassenaar algorithm. Due to the loop unrolling a implementation that first uses Wassenaar and corrects the distance if found to be wrong you actually get worse performance. I'm not entirely sure what the reason for that is but modern CPU's are complicated and give unpredictable behavior when writing high performance code.

@orbeckst
Copy link
Member Author

orbeckst commented Sep 23, 2017

I'll be happy to call it the Waassenaar algorithm henceforth and refer to Ch 3, Appendix C of [1], especially with its discussion of its range of applicability.

(I edited the top of the thread.)

[1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2

@orbeckst orbeckst changed the title review triclinic PBC treatment triclinic PBC treatment is not always correct Sep 23, 2017
@richardjgowers
Copy link
Member

Is this blocking for 0.17? Ie could we put in the slow method for triclinic pbc so it's correct, then hopefully optimise later?

@arm61
Copy link
Contributor

arm61 commented Sep 20, 2024

Am I being daft, or can the positions of particles in a triclinic cell be transformed into positions in an orthorhombic cell and, therefore, get around many of these problems (see image)? It is very possible that I am overlooking something.

@Tsjerk
Copy link

Tsjerk commented Sep 20, 2024

No, because the problem is about distances between particles in adjacent cells. After putting everything in a brick, the neighbours still need to be determined through the lattice vectors. It's not the shape of the cell that matters, but the lattice.

@arm61
Copy link
Contributor

arm61 commented Sep 20, 2024

Ah, yes, for the calculation of distances, you are right. I was thinking about the positions of the particles alone.

@richardjgowers
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants