Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CRD (CHARMM card file) EXT format optional for smaller systems. #3605

Closed
mdpoleto opened this issue Apr 6, 2022 · 15 comments · Fixed by #3635
Closed

Make CRD (CHARMM card file) EXT format optional for smaller systems. #3605

mdpoleto opened this issue Apr 6, 2022 · 15 comments · Fixed by #3635
Labels
enhancement Format-CHARMM/NAMD CHARMM and NAMD specific formats
Milestone

Comments

@mdpoleto
Copy link
Contributor

mdpoleto commented Apr 6, 2022

Hi,

When writing CRD (CHARMM card) files, it would be useful to allow users to force the EXT format even for systems smaller than 99,999 particles. An optional argument could be quick and simple solution. Any thoughts?

@orbeckst orbeckst added enhancement Format-CHARMM/NAMD CHARMM and NAMD specific formats labels Apr 6, 2022
@orbeckst
Copy link
Member

orbeckst commented Apr 6, 2022

Sounds sensible, do you want to make a PR?

@mdpoleto
Copy link
Contributor Author

mdpoleto commented Apr 6, 2022

I would love to. I will work on this in the next days and I will make a PR.

@orbeckst
Copy link
Member

orbeckst commented Apr 6, 2022

Sorry to ask @mdpoleto , do you intend to apply for GSOC with MDAnalysis (or Outreachy)?

@mdpoleto
Copy link
Contributor Author

mdpoleto commented Apr 6, 2022

No problem @orbeckst . I am not sure I am able to apply since I am J-1 visa holder in the US. Still, it sounds a very interesting opportunity.

@orbeckst
Copy link
Member

orbeckst commented Apr 6, 2022

Have a look at https://summerofcode.withgoogle.com/ for the program conditions. GSOC is open to everyone who is "new to open source software" (in terms of contributing).

@mdpoleto
Copy link
Contributor Author

Thanks @orbeckst . I believe I have a fix for forcing EXT format if requested, but while I was tinkering with it, I realized that requesting for a specific frame is not working (CRD.py, line 167).

I am testing it using:

u = mda.Universe("top.psf", "traj.dcd")
system = u.select_atoms("all")
system.write("system.crd", frame=10)

and

with mda.Writer("protein.crd", system.n_atoms, frame=10) as W:
        W.write(system)

but my .crd file is still frame 0. Could you confirm that behavior? If so, I can try to fix for that too.

@orbeckst
Copy link
Member

We always treat CHARMM CRD files to only contain a single frame. Can you point to a file format specification that indicates that these files can contain multiple frames?

We try to be standard compliant so if CRDs are not supposed to contain multiple frames then we will likely not want to add this capability. It’s still possible for you to write your own reader for these files and use it like other readers (without having to change the source code), but we can have a separate discussion on how to do this if necessary.

@mdpoleto
Copy link
Contributor Author

CHARMM CRD files contain only just a single frame. However, CRD.py seems to have a function that allows users to define which frame they want to output:

lines 167 to 186 in CRD.py

def write(self, selection, frame=None):
    """Write selection at current trajectory frame to file.

    Parameters
    ----------
    selection : AtomGroup
         group of atoms to be written
    frame : int (optional)
         Move the trajectory to frame `frame`; by default, write
         the current frame.

    """
    u = selection.universe
    if frame is not None:
        u.trajectory[frame]  # advance to frame
    else:
        try:
            frame = u.trajectory.ts.frame
        except AttributeError:
            frame = 0  # should catch cases when we are analyzing a single PDB (?)

If directly selecting which single frame is written was indeed something previously intended, I can not reproduce this feature. This is what I would like to confirm before making any changes.

@orbeckst
Copy link
Member

The write() code allows you to write a frame of another universe to a CRD. selection is an input to the function.

@mdpoleto
Copy link
Contributor Author

mdpoleto commented Apr 13, 2022

Right. While using the current stable version of the write() function in CRD.py, I am failing to make use of the input frame when I use the following:

u = mda.Universe("top.psf", "traj.dcd")
system = u.select_atoms("all")
system.write("system.crd", frame=10) 

The frame written in the file system.crd is the first one (0):

* FRAME 0 FROM traj.dcd
*

I believe I have a fix for: 1) make the EXT format optional and 2) properly use frame in the write() function. But before submitting a PR request, I would like to confirm the bug when using frame exists or if I am just using the function in a wrong way.

@orbeckst
Copy link
Member

orbeckst commented Apr 13, 2022

It should write frame 10. Can you check that (1) it's not just the title that this written incorrectly but that the coordinates are really frame 0 instead of 10, (2) that the same problem occurs with the PDB and the GRO output format?

EDIT: According to the docs for MDAnalysis.coordinates.CRD.CRDWriter.write() it needs to write frame 10. If it doesn't do it, it's a bug and an issue should be raised.

@orbeckst
Copy link
Member

Btw, if you can make your examples work with the test files then other developers can easily reproduce them and we can easily turn them into tests, e.g.

import MDAnalysis as mda
from MDAnalysisTests import datafiles as data
u = mda.Universe(data.PSF, data.DCD)
...

@mdpoleto
Copy link
Contributor Author

The problem of CRDWriter.write() not writing the requested frame persists with the test files and the current stable version.

In addition, the GRO output format does write the requested frame (it matches the coordinates of a .pdb file generated by catdcd), but the PDB output format did not recognize frame=10 as an input to the function write().

Traceback (most recent call last):
  File "testcrd.py", line 28, in <module>
    system.write("frame_10.pdb", frame=10 )
  File "/home/mdpoleto/Documents/Softwares/anaconda3/lib/python3.7/site-packages/MDAnalysis/core/groups.py", line 3498, in write
    with writer(filename, n_atoms=self.n_atoms, **kwargs) as w:
TypeError: __init__() got an unexpected keyword argument 'frame'

Should I open a new issue for this frame bug? Also, should I make separated PR requests for each fix?

@orbeckst
Copy link
Member

Please open separate issues. Then submitting separate PRs is best as it keeps the scope smaller and so code review can be faster. Of course, if it turns out that there's a single fix (e.g., in a base class) that closes multiple issues then that's good, too, but as a rule we try to do 1 issue/1 PR.

Thanks for looking into these API inconsistencies.

orbeckst pushed a commit that referenced this issue Apr 18, 2022
* Fixes issue #3605
* Inclusion of an boolean kwarg extension=False|True for the CRDWriter.write() function to
   force use of the EXTENDED output format
* add tests for EXT writing
* update CHANGELOG
* update AUTHORS
@orbeckst orbeckst added this to the 2.2.0 milestone Apr 18, 2022
@orbeckst orbeckst linked a pull request Apr 18, 2022 that will close this issue
4 tasks
@orbeckst
Copy link
Member

closed in #3635

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Format-CHARMM/NAMD CHARMM and NAMD specific formats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants