Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formal charge 1 is unrecognized #4027

Closed
BradyAJohnston opened this issue Feb 15, 2023 · 7 comments · Fixed by #4133
Closed

Formal charge 1 is unrecognized #4027

BradyAJohnston opened this issue Feb 15, 2023 · 7 comments · Fixed by #4133
Assignees
Milestone

Comments

@BradyAJohnston
Copy link

The charge from some PDB files is not recognised. Testing with PDB files from CHARMM-GUI that have their charges specified in a slightly different format cannot be parsed.

While I believe the 'official' way to specify charges is 1+ or 1- (which MDAnalysis reads), these PDB files have either 1 or -1 which gets the error :

ValueError: Formal charge 1 is unrecognized

Minimum file that DOESN'T work:

ATOM      1  N   MET A   1     173.150 193.330  29.890  0.00  0.00           N 1
ATOM      2  HT1 MET A   1     172.490 193.030  29.140  0.00  0.00           H-1

Minimum file that DOES work:

ATOM      1  N   MET A   1     173.150 193.330  29.890  0.00  0.00           N1+
ATOM      2  HT1 MET A   1     172.490 193.030  29.140  0.00  0.00           H1-

This breaks at the parsing stage for mda.Universe(file).

Both of the files can be parsed successfully in version 2.2.0, but 2.3.0+ introduced the formal charge parsing for the .pdb files which now breaks the import.

I am unsure if this is something that MDAnalysis would formally support, but if not, any suggestions for a fix that I could include in Molecular Nodes so I can bump the MDAnalysis version from 2.2.0 would be very much appreciated.

@IAlibay
Copy link
Member

IAlibay commented Feb 15, 2023

We can probably loosen up the checks to not throw a value error but just not recognise the charge for that entry and throw a warning.

I would suggest we don't try to add a check for this formal charge case and bend reading so that it supports nonstandard PDB formats just because charmm-gui couldn't get it right though.

@BradyAJohnston
Copy link
Author

BradyAJohnston commented Feb 15, 2023

I'm not really after the charge information (and agree they should be instead stick to the standards, which exist for a reason).

A more relaxed check that just says 0 or nothing at all instead of an error would be a big win for me 🙏

@IAlibay IAlibay added this to the 2.5.0 milestone Feb 18, 2023
@orbeckst
Copy link
Member

orbeckst commented Mar 1, 2023

@BradyAJohnston it would also be good to tell CHARMM-GUI that their PDB format violates PDB standard for ATOM records

Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases, these are blank.

Perhaps @wonpil can be convinced to look into fixing it there?

@nk53
Copy link

nk53 commented Mar 2, 2023

On behalf of CHARMM-GUI, I am looking into this. Since we use CHARMM to write all of our PDB files, I will probably just have to forward the case to the CHARMM developers, but perhaps I can at least fix PDB files in our archives, if you tell me which ones you found that provide element + charge info (so far, all of the ones I've looked at use the format described below).

I would generally caution against reading topology info from CHARMM-GUI's PDB files, because the PDB format does not allow enough space for many of our residue names. Instead, we always read either PSF+PDB or PSF+CRD.

Almost all PDB files provided by CHARMM-GUI use CHARMM PDB format, not the official format, in which the most obvious changes are:

  • the chain field is ignored to provide more space for the resname field
  • columns 73+ are reserved for the segment ID, which in CHARMM can be up to 8 chars

CHARMM supposedly allows writing "official" PDB format, but in a quick test I performed just now, it seems that the option merely adds back the chain column, and still writes the segment ID in columns 73+. I'll let the CHARMM developers know that and hopefully get a fixed version soon.

@orbeckst
Copy link
Member

orbeckst commented Mar 2, 2023

Thanks for the explanation @nk53 . In particular the reminder that not everything that says “PDB” is a RCSB PDB.

@richardjgowers
Copy link
Member

@nk53 the description of the format is very useful. Is there a full description anywhere? This would help a lot in supporting this format.

@nk53
Copy link

nk53 commented Mar 2, 2023

Unfortunately, the CHARMM documentation is often incomplete and outdated. The only reliable definitions are in the (fortran) source code. If you have a copy of it (free for academic use), you can find the relevant code in source/io/coorio.F90. Here's what the format for ATOM could look like if it were posted on wwpdb.org:

COLUMNS        DATA  TYPE    FIELD        DEFINITION
-------------------------------------------------------------------------------------
 1 -  6        Record name   "ATOM  "
 7 - 11        Integer       I            Atom index.
13 - 16        Atom          ATYPE(I)     CHARMM atom type (i.e., atom name).
18 - 21        Residue name  REN          Residue name.
23 - 27        Integer       ARID         CHARMM residue ID (i.e., resSeq).
31 - 38        Real(8.3)     X(I)         Orthogonal coordinates for X in Angstroms.
39 - 46        Real(8.3)     Y(I)         Orthogonal coordinates for Y in Angstroms.
47 - 54        Real(8.3)     Z(I)         Orthogonal coordinates for Z in Angstroms.
55 - 60        Real(6.2)     1.00         Occupancy, hard-coded.
61 - 66        Real(6.2)     WMAIN(I)     Main weight of this atom (see below).
73 - 76        Segment ID    SID          CHARMM segid.

WMAIN is the "main coordinate weights" array, which is affected by SCALar commands. Many other commands affect the values in this array so I would not rely on it to contain anything meaningful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants