Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occupancy and b-factors not updated when importing multi-frame PBD #3825

Closed
AntonJansen96 opened this issue Sep 12, 2022 · 10 comments · Fixed by #3988
Closed

occupancy and b-factors not updated when importing multi-frame PBD #3825

AntonJansen96 opened this issue Sep 12, 2022 · 10 comments · Fixed by #3988

Comments

@AntonJansen96
Copy link

MDAnalysis uses the occupancies/b-factors of the first frame for all subsequent frames when importing a multi-frame PDB.

Code to reproduce the behavior

import MDAnalysis

u   = MDAnalysis.Universe('traj.pdb')
sel = u.select_atoms("resid 178 and segid A and name HD2")

for _ in u.trajectory:
   print(sel.positions)

for _ in u.trajectory:
   print(list(sel.occupancies))

for _ in u.trajectory:
   print(list(sel.tempfactors))

Also,

grep 'HD2 ASPTA 178' traj.pdb                                
ATOM   2808  HD2 ASPTA 178      81.480  35.990  49.120  0.00  0.00           H
ATOM   2808  HD2 ASPTA 178      83.700  34.570  50.290  1.00  1.00           H
ATOM   2808  HD2 ASPTA 178      80.540  40.020  49.060  0.06  0.06           H
ATOM   2808  HD2 ASPTA 178      81.100  37.920  49.720  0.20  0.20           H
ATOM   2808  HD2 ASPTA 178      84.130  38.260  50.810  0.09  0.09           H
ATOM   2808  HD2 ASPTA 178      81.790  39.980  49.520  0.04  0.04           H

And link to traj.pdb.

Expected behavior

[[81.48 35.99 49.12]]
[[83.7  34.57 50.29]]
[[80.54 40.02 49.06]]
[[81.1  37.92 49.72]]
[[84.13 38.26 50.81]]
[[81.79 39.98 49.52]]
[0.00]
[1.00]
[0.06]
[0.20]
[0.09]
[0.04]
[0.00]
[1.00]
[0.06]
[0.20]
[0.09]
[0.04]

Actual behavior

[[81.48 35.99 49.12]]
[[83.7  34.57 50.29]]
[[80.54 40.02 49.06]]
[[81.1  37.92 49.72]]
[[84.13 38.26 50.81]]
[[81.79 39.98 49.52]]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]

Current version of MDAnalysis

  • Which version are you using? 2.3.0
  • Which version of Python ? 3.10.6
  • Which operating system? MacOS
@IAlibay
Copy link
Member

IAlibay commented Sep 12, 2022

This is an interesting one. We treat things like the occupancy and bfactors main attributes as a "topology" things, so they don't don't get updated as you read a new frame.

However, specifically for occupancy, we do actually store that time dependent data under ts.data.

So for example if you did:

import MDAnalysis as mda

u = mda.Universe('traj.pdb')

for ts in u.trajectory:
    print(ts.data['occupancy'])

You should be able to access the time dependent occupancy data.

Adding tempfactors to this isn't a very difficult task, and would only require adding similar code bits to what we do with occupancy in this method:

def _read_frame(self, frame):

This offers a quick fix, but doesn't remove the fact that we have two competing attributes and asking folks to look in ts.data over the topology attribute isn't very intuitive / error prone. Hooking back the ts.data values back into the topology attribute might require a bit more work / extra thought. @orbeckst had one option in #3504 which could maybe work (#3504 (comment)) but probably changes the expected behaviour of some attributes.

@AntonJansen96
Copy link
Author

Hi, thanks for getting back to me so quickly. I wasn't aware one is able to access the time-depedent data through ts.data. This is very helpful in my particular case.

@BradyAJohnston
Copy link

BradyAJohnston commented Jan 2, 2023

I can second the want for this. Lots of people store frame-specific data in the b_factor column for visualisation purposes, and it would be really useful to have easy access to that value. I've just tested out the use of the occupancy column and works well (BradyAJohnston/MolecularNodes#128 (comment)) but having it available for tempfactors would be great.

@IAlibay
Copy link
Member

IAlibay commented Jan 2, 2023

@BradyAJohnston for arbitrary frame specific data we do have the auxiliary system, that probably would be the best place to fetch & store such data if looking to interfacing formally with the MDAnalysis API.

@BradyAJohnston
Copy link

@IAlibay Thanks but the use case I have I use MDAnalysis under the hood so there is no user interfacing with it. The data goes straight from topology / trajectory to 3D model inside of Blender so the arbitrary data needs to already exist inside of the trajectory (in this case it would be multi-model .pdb files usually).

@IAlibay
Copy link
Member

IAlibay commented Jan 2, 2023

Dynamically updating a topology is a behaviour change I doubt we could allow any time soon (it's a pretty big overhaul of how we expect things to work), is there any way users could pass in a separate file with timeseries data that could be read alongside? I.e. like a CSV, XVG, or EDR file or maybe some kind of serialised xarray input?

@richardjgowers
Copy link
Member

So one way to (ab)use the current tooling would be via a Transformation... these are in some ways a callback that is triggered on each frame read, and they could have vision of an AtomGroup (even of all atoms). They are intended to modify the Timestep (positions) based on AtomGroups (i.e. unwrapping coords) but you could flip this relationship

class BFactorUpdater(TransformationBase):
    def __init__(self, ag):
        self._ag = ag

    def _transform(self, ts):
        self._ag.bfactors = ts.data['occupancy']  # assumes this is populated by the Reader, true for PDB
        return ts

u = mda.Universe()

bfactor_updater = BFactorUpdater(u.atoms)
u.trajectory.add_transformations([bfactor_updater])

that's some scrap code off the top of my head, but it should be possible along those lines

@BradyAJohnston
Copy link

For my use case at least, the accessing via:

for ts in univ.traj:
    ts.data['tempfactors']

With the same setup as the occupancies would work totally fine. I don't need it to be topology related, just a frame-specific data access.

There is potential for reading .csv or other formats alongside the data and adding it via index, but as it's already common practice for a lot of people to use the (now deprecated) .pdb format for storing arbitrary data in the B-factor column, I'm hoping to support that as best I can. For now at least getting them to store the data inside of the occupancy instead will achieve the same result, but having access to ts.data['tempfactor'] would be ideal.

@IAlibay
Copy link
Member

IAlibay commented Jan 2, 2023

 but as it's already common practice for a lot of people to use the (now deprecated) .pdb format for storing arbitrary data in the B-factor column,

Herein lies the issue. It's a non-standard use case for a deprecated format. I myself am guilty of abusing the PDB format this way so I'm happy to open a PR with the relevant fix as a temporary stopgap, but I would very much like to open the discussion about how this should be dealt with going forward.

Having these arbitrary changes to readers are both user unfriendly and end up having a reasonably large maintenance cost in the long run. I'd like to be able to offer something that suits the downstream needs that would be the canonical way we encourage folks to use things.

@BradyAJohnston
Copy link

I would certainly be very appreciative of any direction you wish to take, both in short-term solutions and longer-term design plans.

richardjgowers added a commit that referenced this issue Jan 4, 2023
richardjgowers added a commit that referenced this issue Feb 19, 2023
@IAlibay IAlibay added this to the 2.5.0 milestone Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants