Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HALMD files (H5MD) #762

Closed
kain88-de opened this issue Mar 10, 2016 · 27 comments · Fixed by #2787
Closed

Add support for HALMD files (H5MD) #762

kain88-de opened this issue Mar 10, 2016 · 27 comments · Fixed by #2787
Labels
Component-Readers enhancement NSF REU NSF Research Experience for Undergraduates project parallelization

Comments

@kain88-de
Copy link
Member

HALMD is an MD code that uses HDF5 for it's file format. The C++ wrapper they have written for it is modeled on h5py. So a reader using H5py should be relatively easy to write.

@orbeckst
Copy link
Member

Can we use the wrapper directly (code-wise and license-wise)?
You can probably take off the proposal tag and just add help wanted --- I think that we generally encourage adding new formats, especially some that might not be ubiquitous (yet). Do you have test trajectories and/or access to the HALMD?

@kain88-de
Copy link
Member Author

I would write our own wrapper the C++ code relies a lot on templates. I should be able to organize a short trajectory.

@orbeckst orbeckst removed the proposal label Mar 25, 2016
@orbeckst
Copy link
Member

We just need someone to do it ;-)

@KaiSzuttor
Copy link

In that context I would prefer to implement a H5MD reader (and writer). It's a format convention that uses hdf5 (used by LAMMPS and ESPResSo ). Therefore the h5py module could be used to implement such a reader. Is anybody working on such a reader or could anybody give me some hints how to best start to implement such a reader?

@richardjgowers
Copy link
Member

Any new format is always a good addition, nobody is looking at h5md yet.

I should really write a wiki page on writing a Reader, but here goes...

All trajectory information is stored inside a Timestep (ts) object, (mda.coordinates.base.Timestep). Atoms etc then point to the arrays contained inside the ts.

The Reader is responsible for filling the contents of ts. It should inherit from mda.coordinates.base.Reader then define:

  • __init__ to prepare things, allocate a ts and read the 0th frame
  • _read_next_frame read the next frame (filling ts and returning it)
  • _read_frame skip to frame i and read it (by filling ts) and then return ts

Most other stuff (iteration etc) is defined in the base class, so you just need to define those hooks. I think PDB is a fairly clear implementation of this:
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/PDB.py

Readers are added automatically into the list of known readers once you define the format attribute in their class, so format='H5MD' would make any files with a .h5md suffix get passed to that Reader. So once you've imported the class somewhere (either inside the package or outside), you could load a mda.Universe like this:

import MDAnalysis as mda
u = mda.Universe('myfile.pdb','myfile.h5md')

So the signature is (topology, trajectory), you can use xyz/gro/pdb/etc as a topology, not sure what you've got.

There is already a python interface to h5md, maybe this would be useful as a start:
https://github.com/pdebuyl/pyh5md

Let me know if you need any more info!

@KaiSzuttor
Copy link

Thanks a lot for your comments, that is a good starting point. The h5md python package is actually not necessary in my opinion since it only ensures the format convention for the trajectories. I think we should stick to the more generic h5py package since development is more active on that package and we will have the respective dependency on one of those packages (in fact one more if we use pyh5md, since it depends on h5py itself). Whats your opinion on that?

@richardjgowers
Copy link
Member

Yep, if you don't need it don't use it.

On Thu, 19 May 2016 09:27 Kai Szuttor, notifications@github.com wrote:

Thanks a lot for your comments, that is a good starting point. The h5md
python package is actually not necessary in my opinion since it only
ensures the format convention for the trajectories. I think we should stick
to the more generic h5py package since development is more active on that
package and we will have the respective dependency on one of those packages
(in fact one more if we use pyh5md, since it depends on h5py itself). Whats
your opinion on that?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#762 (comment)

@orbeckst
Copy link
Member

Nice intro, @richardjgowers! Why don't you just copy it to the wiki? ;-)

Am May 19, 2016 um 1:00 schrieb Richard Gowers notifications@github.com:

I should really write a wiki page on writing a Reader, but here goes..

@KaiSzuttor
Copy link

Can somebody quickly give me a hint on where i have to add new files to the build system in order its recognized as a package?

@kain88-de
Copy link
Member Author

A python file? You can place it in 'MDAnalysis/coordinatesalong with the other files. You can then include it inMDAnalysis/coordinates/init.py` to load it automatically into the namespace with the other packages.

@jdetle
Copy link
Contributor

jdetle commented May 24, 2016

Hi, in addition to what @kain88-de said, when you ran setup.py did you do a
developer installation?

On Tue, May 24, 2016 at 8:46 AM kain88-de notifications@github.com wrote:

A python file? You can place it in 'MDAnalysis/coordinatesalong with the
other files. You can then include it inMDAnalysis/coordinates/init.py`
to load it automatically into the namespace with the other packages.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#762 (comment)

Have a wonderful day,
John J. Detlefs

@KaiSzuttor
Copy link

Thanks @jdetle and @kain88-de . I have not build an developer installation. What's the difference?
Actually I managed to get an installation of my new module. Is there any documentation on how to best debug new implementations within the setup.py framework.
Sorry for the newbie questions.

@jdetle
Copy link
Contributor

jdetle commented May 24, 2016

Hi Kai,
Updated: Here is the link to the setup dev environment info

I asked about this not too long ago and received some helpful advice on developer installation and virtual environments:

Also, setup tools explanation
I think my install on a linux environment was roughly :
(updated, @kain88-de's explanation in setup develop environment was better)

sudo apt-get install git
cd 'path/to/directory'
git clone https://github.com/MDAnalysis/mdanalysis
cd MDAnalysis/package
python setup.py develop --user -e 

I am also a newbie but my understanding is that --userinstalls the packages in your user directory and-e ensures that the packages are editable by you.

@KaiSzuttor
Copy link

Ah thanks, that helps a lot. What i have always done with my own python modules was using the pip tool.
I thought that this wont work properly due to the C dependencies of MDAnalysis. With pip you can update the installation in /home/<user>/.local by calling the following pip command in the package directory:

pip install . --user --upgrade

@jdetle
Copy link
Contributor

jdetle commented May 24, 2016

Hi Kai,
I updated the comment to link to the actual MDAnalysis page on a developer installation and it too uses pip. I think all you needed was the -e to ensure packages were editable.

@KaiSzuttor
Copy link

Thanks @jdetle .
Is anyone of the developers aware of anyone using H5MD as the trajectory format?
I just realized that I also have to write a parser for that format since it doesn't use any of the topologies implemented so far.

@kain88-de
Copy link
Member Author

pip is using setuptools. So pip install . and python setup.py install is equivalent. We have information to setup a dev environment in the wiki. If you still have problems please open a new issue and we can discuss it there. Having 'newbie' questions to point us at weak points in the documentation is good so don't worry about it.

@jdetle
Copy link
Contributor

jdetle commented May 24, 2016

I just opened #861 to address this issue. It's kind of hard to find the guide for developers as a new user.

@KaiSzuttor
Copy link

I really like the active discussions of this project, thats how it should be!
After looking at the implementation of the parser function, another question came into my mind: is it possible to have something like a time dependent topology, e.g. for grandcanonical MD or if reactions are involved?

@kain88-de
Copy link
Member Author

@KaiSzutter please another issue for that and/or write on the mailing list. Our current GSoC student @fiona-naughton is working on supporting REMC simulations which might overlap with your suggestion.

@dotsdl
Copy link
Member

dotsdl commented May 30, 2016

@KaiSzuttor definitely open an issue for this so discussion can begin. I think we could make this work under the new topology system, probably by having the Reader make changes the the Universe's Topology as the frame changes. There are probably many little issues with that idea, but we can start discussing it at the issue.

@KaiSzuttor
Copy link

Allright, started a new issue #864 . @dotsdl maybe you want to start pushing your ideas to that issue dialog.

@orbeckst
Copy link
Member

There is also a Python package pdebuyl/pyh5md) by @pdebuyl that implements h5md.

This looks like a good format if we are looking for anything based on HDF5.

@orbeckst orbeckst changed the title Add support for HALMD files Add support for HALMD files (H5MD) Nov 28, 2018
@pdebuyl
Copy link

pdebuyl commented Nov 28, 2018

Hi,

I was automatically subscribed via the tagging mechanism. I am happy to provide info on H5MD if necessary.

In the current version of pyh5md, I have tried to do a lightweight wrapping of h5py via subclassing so that the interface is very close to h5py.

@orbeckst
Copy link
Member

orbeckst commented Nov 28, 2018

@pdebuyl many thanks, great to hear that we can ping you if questions arise.

I have one immediate question: Can pyh5md make use of parallel I/O (using PHDF5 or probably just h5py parallel pympi-based)?

EDIT: link to parallel hdf5 in h5py

@pdebuyl
Copy link

pdebuyl commented Nov 28, 2018

Yes, there is support for parallel I/O. pyh5md.File is basically a h5py.File object. You can pass the driver argument on opening. All arguments apart from author and creator are passed to the constructor of h5py.File.

The example https://github.com/pdebuyl/pyh5md/blob/master/examples/run_parallel.py illustrates parallel writing (region in the dataset are selected by the region argument as a tuple giving start and end indices for the particle group).

@edisj edisj mentioned this issue Jun 23, 2020
4 tasks
@orbeckst orbeckst added the NSF REU NSF Research Experience for Undergraduates project label Aug 7, 2020
orbeckst pushed a commit that referenced this issue Aug 7, 2020
* Fixes #762
* add H5MD coordinate reader (supports parallel MPI in principle but is not well tested at the moment, see #2865)
* added test h5md datafiles:  real example (derived from cobrotoxin.trr) and synthetic example for MultiframeReaderTest reader tests
* add tests (MultiframeReaderTest and custom)
* add documentation (example and parallel MPI)
* added h5py into conda dependencies and pyh5md into pip dependencies
* update CHANGELOG
* update AUTHORS
@orbeckst
Copy link
Member

orbeckst commented Aug 7, 2020

The H5MD Writer is issue #2866

PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this issue Mar 30, 2021
* Fixes MDAnalysis#762
* add H5MD coordinate reader (supports parallel MPI in principle but is not well tested at the moment, see MDAnalysis#2865)
* added test h5md datafiles:  real example (derived from cobrotoxin.trr) and synthetic example for MultiframeReaderTest reader tests
* add tests (MultiframeReaderTest and custom)
* add documentation (example and parallel MPI)
* added h5py into conda dependencies and pyh5md into pip dependencies
* update CHANGELOG
* update AUTHORS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Readers enhancement NSF REU NSF Research Experience for Undergraduates project parallelization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants