Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic guesser features #3753

Open
wants to merge 292 commits into
base: develop
Choose a base branch
from

Conversation

aya9aladdin
Copy link
Contributor

@aya9aladdin aya9aladdin commented Jul 15, 2022

Fixes #

Changes made in this Pull Request:

  • I have created a guesser package that will contain all upcoming the context aware guessers. it contain the following so far:

a core module that only contain one method that will be used by the guess_topologyAttribute method of the universe
a base module that contains the BaseGusser class that will be inherited by every guesser
a DefaultGuesser module which will handle the currently existing guess methods (for now, I just added the mass guessing to test with one attribute guessing method)

  • I made changes to the universe class to work with the new methodology
  • I removed the mass guessing from the PDBParser to test guessing it at the universe level

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2022

Hello @aya9aladdin! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 184:80: E501 line too long (101 > 79 characters)
Line 185:80: E501 line too long (118 > 79 characters)
Line 197:80: E501 line too long (107 > 79 characters)
Line 200:80: E501 line too long (90 > 79 characters)
Line 202:1: W293 blank line contains whitespace

Line 3298:1: W293 blank line contains whitespace

Line 387:80: E501 line too long (90 > 79 characters)
Line 589:80: E501 line too long (85 > 79 characters)
Line 1464:80: E501 line too long (92 > 79 characters)
Line 1474:80: E501 line too long (81 > 79 characters)
Line 1474:82: W291 trailing whitespace
Line 1476:1: W293 blank line contains whitespace
Line 1480:80: E501 line too long (87 > 79 characters)
Line 1491:1: W293 blank line contains whitespace
Line 1492:79: W291 trailing whitespace
Line 1498:1: W293 blank line contains whitespace
Line 1507:80: E501 line too long (126 > 79 characters)
Line 1510:52: W291 trailing whitespace
Line 1522:80: E501 line too long (83 > 79 characters)
Line 1525:13: E303 too many blank lines (2)
Line 1528:35: E127 continuation line over-indented for visual indent
Line 1528:52: W291 trailing whitespace
Line 1532:30: E127 continuation line over-indented for visual indent

Line 48:80: E501 line too long (98 > 79 characters)
Line 59:1: W293 blank line contains whitespace
Line 61:1: W293 blank line contains whitespace
Line 69:80: E501 line too long (89 > 79 characters)
Line 70:80: E501 line too long (104 > 79 characters)
Line 71:80: E501 line too long (92 > 79 characters)
Line 71:93: W291 trailing whitespace
Line 73:1: W293 blank line contains whitespace
Line 85:1: W293 blank line contains whitespace
Line 117:1: W293 blank line contains whitespace
Line 133:32: W291 trailing whitespace
Line 141:1: W293 blank line contains whitespace

Line 32:1: E302 expected 2 blank lines, found 1
Line 34:80: E501 line too long (115 > 79 characters)
Line 34:116: W291 trailing whitespace
Line 35:80: E501 line too long (104 > 79 characters)
Line 37:14: W291 trailing whitespace
Line 45:1: W293 blank line contains whitespace
Line 46:80: E501 line too long (86 > 79 characters)
Line 52:1: W293 blank line contains whitespace
Line 57:1: W293 blank line contains whitespace
Line 65:34: E203 whitespace before ':'
Line 88:1: W293 blank line contains whitespace
Line 95:80: E501 line too long (84 > 79 characters)
Line 120:80: E501 line too long (112 > 79 characters)
Line 156:14: E225 missing whitespace around operator
Line 159:14: W291 trailing whitespace
Line 232:80: E501 line too long (80 > 79 characters)
Line 234:80: E501 line too long (80 > 79 characters)
Line 236:80: E501 line too long (80 > 79 characters)
Line 240:80: E501 line too long (82 > 79 characters)
Line 241:80: E501 line too long (80 > 79 characters)
Line 242:80: E501 line too long (83 > 79 characters)
Line 245:80: E501 line too long (80 > 79 characters)
Line 246:80: E501 line too long (82 > 79 characters)
Line 261:80: E501 line too long (80 > 79 characters)
Line 282:80: E501 line too long (81 > 79 characters)
Line 292:42: E713 test for membership should be 'not in'
Line 319:5: E303 too many blank lines (2)
Line 322:80: E501 line too long (93 > 79 characters)
Line 324:80: E501 line too long (85 > 79 characters)
Line 325:27: E128 continuation line under-indented for visual indent
Line 325:80: E501 line too long (81 > 79 characters)
Line 326:1: W293 blank line contains whitespace
Line 355:80: E501 line too long (80 > 79 characters)
Line 356:80: E501 line too long (83 > 79 characters)
Line 365:80: E501 line too long (93 > 79 characters)
Line 405:80: E501 line too long (94 > 79 characters)
Line 406:1: W293 blank line contains whitespace
Line 410:80: E501 line too long (81 > 79 characters)
Line 437:20: E713 test for membership should be 'not in'
Line 446:5: E303 too many blank lines (2)
Line 455:5: E303 too many blank lines (2)
Line 488:1: W293 blank line contains whitespace

Line 334:80: E501 line too long (103 > 79 characters)
Line 341:80: E501 line too long (108 > 79 characters)

Line 293:80: E501 line too long (100 > 79 characters)

Line 122:16: E111 indentation is not a multiple of four

Line 618:80: E501 line too long (90 > 79 characters)
Line 622:80: E501 line too long (105 > 79 characters)
Line 624:1: W293 blank line contains whitespace
Line 629:1: W293 blank line contains whitespace
Line 903:9: E741 ambiguous variable name 'l'
Line 904:80: E501 line too long (97 > 79 characters)
Line 905:80: E501 line too long (91 > 79 characters)

Line 177:23: E127 continuation line over-indented for visual indent
Line 179:80: E501 line too long (89 > 79 characters)
Line 180:80: E501 line too long (106 > 79 characters)
Line 205:80: E501 line too long (95 > 79 characters)

Line 148:80: E501 line too long (83 > 79 characters)

Line 383:1: E302 expected 2 blank lines, found 1
Line 393:1: W293 blank line contains whitespace
Line 400:13: E117 over-indented
Line 403:1: W293 blank line contains whitespace
Line 410:1: W293 blank line contains whitespace
Line 518:5: E301 expected 1 blank line, found 0
Line 755:80: E501 line too long (86 > 79 characters)
Line 1370:80: E501 line too long (86 > 79 characters)
Line 1374:9: E303 too many blank lines (2)

Line 29:1: E302 expected 2 blank lines, found 1
Line 30:1: W293 blank line contains whitespace
Line 35:29: W291 trailing whitespace
Line 38:1: W293 blank line contains whitespace
Line 39:1: W391 blank line at end of file

Line 53:1: E302 expected 2 blank lines, found 1
Line 60:80: E501 line too long (93 > 79 characters)
Line 63:5: E303 too many blank lines (2)
Line 66:25: E201 whitespace after '('
Line 66:80: E501 line too long (91 > 79 characters)
Line 73:80: E501 line too long (121 > 79 characters)
Line 79:80: E501 line too long (80 > 79 characters)
Line 90:80: E501 line too long (84 > 79 characters)
Line 150:51: E231 missing whitespace after ','
Line 150:53: E231 missing whitespace after ','
Line 174:1: E302 expected 2 blank lines, found 1
Line 176:80: E501 line too long (92 > 79 characters)
Line 178:27: E127 continuation line over-indented for visual indent
Line 182:1: E302 expected 2 blank lines, found 1
Line 188:19: E127 continuation line over-indented for visual indent
Line 190:1: E302 expected 2 blank lines, found 1
Line 196:19: E127 continuation line over-indented for visual indent
Line 232:27: E127 continuation line over-indented for visual indent

Line 29:1: W293 blank line contains whitespace

Line 249:1: W293 blank line contains whitespace

Line 259:1: W293 blank line contains whitespace

Comment last updated at 2022-09-12 19:40:16 UTC

@aya9aladdin aya9aladdin marked this pull request as ready for review July 15, 2022 01:58
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aya9aladdin, this is a neat start! I only had time to skim but left a couple comments. The tests are failing because _GUESSERS can't be imported -- it's not in the top level __init__.py yet.

@@ -1436,6 +1439,19 @@ def from_smiles(cls, smiles, sanitize=True, addHs=True,

return cls(mol, **kwargs)

def guess_topoloyAttribute(self, context, to_guess):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def guess_topoloyAttribute(self, context, to_guess):
def guess_TopologyAttr(self, context, to_guess):

There's already a method called add_TopologyAttr, and the class is called a TopologyAttr. Going with the principal of least astonishment that advises a consistent API, could you please rename this method to something that users could easily guess (😛) themselves?

context = 'default'

def __init__(self):
self._guess = {'mass': self.guess_masses}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Instead of this, would it be tidier to have a class-level _guess dictionary like TopologyAttr.transplants? I'm not sure it makes sense for instances of guesser classes to differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this and see how I could integrate the transplant idea

.format(self.context, a))
return True

def guessTopologyAttribute(self, to_guess):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of type hinting, could you please change the name of the to_guess variables to be clearer? here it's one string, but in is_guessed it's a list of strings -- that is quite confusing. Also, could you please name your methods with conventional snake_case and use TopologyAttr instead of TopologyAttribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to have an overall method for guessing all the attributes at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm planning to have a method for guessing all the attributes at once, this was just a test for guessing one attribute

values = self._guess[to_guess]()
return values

def setAtoms(self, atoms):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method need to exist? Could __init__ just take atoms instead, like a Writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why I did it this way tbh, so I'll remove it

package/MDAnalysis/guesser/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
@jbarnoud
Copy link
Contributor

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

@aya9aladdin
Copy link
Contributor Author

@aya9aladdin I recommend that you push your changes often. Do not wait for the code to be finished. By pushing often, we get to see your progress, we get to comment on your latest version instead of a version that may already be obsolete, and we may catch issues earlier, reducing the amount of work you will have to redo.

I was planning to push everything when I finish indeed. I don't know if I have to make a pull request at this unmatured level or not?

@jbarnoud
Copy link
Contributor

You already have a pull request, no need to open another one. Instead, update this pull request by pushing your new commits.

@aya9aladdin
Copy link
Contributor Author

I have made some updates:
1- I removed all mass and type guessing from parsers and transferred it to take place inside the universe initiation, so mass and atom types will still be guessed automatically until we are ready to remove it

2- some attributes depend on other attributes to be guessed. if those parent attributes also need to be guessed, this can raise unnecessary errors if we attempted to guess the child attribute before the parent one. So, I added a rank dictionary to the guesser class which will rank each attribute based on its dependency on parent attributes to be guessed. For example ,AtomType depend on atom name then it will have rank 1. While masses depends on atom type which depends on atom name then it will have rank 2. By using this rank atom types will be guessed then masses to avoid errors.
(I hard coded it for now in the DefaultGuesser for testing)

next update will have:
1-finishing the BaseGuesser
2- add guesser metaclass for registration of guessers and other required dictionaries
3- add AtomType attributes to be equal to Element for PDBParser, FHIAIMSParser, TXYZParser, and XYZParser parsers

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a partial look through, I hope the minor comments help.

package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/DefaultGuesser.py Outdated Show resolved Hide resolved
@aya9aladdin
Copy link
Contributor Author

I'm just more concerned with the logic than the documentation at the moment that's why it's not very accurate

@aya9aladdin
Copy link
Contributor Author

I think the MR is ok now..... you can review and give me feedback @lilyminium @IAlibay ...... sorry for the late reply, it was a tough month for me :)

@IAlibay
Copy link
Member

IAlibay commented Jun 16, 2024

Thanks @aya9aladdin and I'm sorry to hear about the difficult month :(

I'll get a review in as soon as possible - there's a few things I need to check with @lilyminium first.

@lilyminium
Copy link
Member

Just a heads up @aya9aladdin we haven't forgotten about this, it's just been a busy time all around -- but @IAlibay and I are meeting soon!

@aya9aladdin
Copy link
Contributor Author

Just a heads up @aya9aladdin we haven't forgotten about this, it's just been a busy time all around -- but @IAlibay and I are meeting soon!

Thank you for the reminder! Let me know if u need any source of help or discussion

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things, may follow-up with more after tomorrow's call.

package/CHANGELOG Outdated Show resolved Hide resolved
package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/converters/OpenMMParser.py Outdated Show resolved Hide resolved
package/MDAnalysis/converters/OpenMMParser.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/universe.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/default_guesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/default_guesser.py Outdated Show resolved Hide resolved
package/MDAnalysis/guesser/default_guesser.py Outdated Show resolved Hide resolved
----------
context: str or :mod:`Guesser<MDAanalysis.guesser>` class
For calling a matching guesser class for this specific context
to_guess: list[str] (optional, default ``['types', 'masses']``)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  1. Optionals should really be None rather than (), it's a nit-pick but only in the sense that Optional[List] means Union[None, List] in Python
  2. From the signature, the default is () not ['types', 'masses'].

@@ -303,8 +306,7 @@ def parse(self, **kwargs):
if atomtypes:
attrs.append(Atomtypes(np.array(atomtypes, dtype=object)))
else:
atomtypes = guessers.guess_types(names)
attrs.append(Atomtypes(atomtypes, guessed=True))
atomtypes = elements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be upper case, since atomtypes are always upper case when guessed.

@IAlibay
Copy link
Member

IAlibay commented Aug 31, 2024

@aya9aladdin apologies for taking so long here again. @lilyminium and I finally managed to discuss this and came up with a plan forward here.

As far as we think, if you can resolve the remaining issues / comments that have been raised, then we should be good to merge this. Please do let us know what you think!

@aya9aladdin
Copy link
Contributor Author

aya9aladdin commented Sep 3, 2024

@aya9aladdin apologies for taking so long here again. @lilyminium and I finally managed to discuss this and came up with a plan forward here.

As far as we think, if you can resolve the remaining issues / comments that have been raised, then we should be good to merge this. Please do let us know what you think!

happy to hear that! will work on the recently raised issues/reviews at the weekend

@lilyminium
Copy link
Member

Hi @aya9aladdin, I hope you're doing well! I just wanted to check in and see how things are going with the PR. I know the cycle of review has gone on for quite a while, so if there's anything we can do to help, please don’t hesitate to let us know. We'd be more than happy to help out, e.g by opening additional PRs into this one. Thanks so much for all your efforts on this!

@aya9aladdin
Copy link
Contributor Author

Hi @aya9aladdin, I hope you're doing well! I just wanted to check in and see how things are going with the PR. I know the cycle of review has gone on for quite a while, so if there's anything we can do to help, please don’t hesitate to let us know. We'd be more than happy to help out, e.g by opening additional PRs into this one. Thanks so much for all your efforts on this!

Hi @lilyminium sorry for the late response! it was quite messy for me last week. Will check it today or tomorrow maximum and push needed updates

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
aya9aladdin and others added 4 commits September 17, 2024 19:23
- updated guess_TopologyAttrs docs
- fixed some tests
- capitalize atomtypes from elements in RDKitParser
@aya9aladdin
Copy link
Contributor Author

@lilyminium @IAlibay I pushed last updates, let me know if any other updates is needed

@lilyminium
Copy link
Member

Thanks @aya9aladdin!! Will aim to get my review in by this weekend.

@lilyminium
Copy link
Member

Looks like I'd already approved the PR! I did just look over your changes and they seem to address the remaining concerns @IAlibay and I discussed, but I'll also wait on @IAlibay to re-review.

@IAlibay
Copy link
Member

IAlibay commented Sep 22, 2024

I won't be able to re-review for a little while unfortunately, probably end of the week or next weekend.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I have two remaining questions, but happy to approve.

@lilyminium would it be ok if you took care of shepherding the remaining things?

"""
def __init__(self, topology=None, *coordinates, all_coordinates=False,
format=None, topology_format=None, transformations=None,
guess_bonds=False, vdwradii=None, fudge_factor=0.55,
lower_bound=0.1, in_memory=False,
lower_bound=0.1, in_memory=False, context='default',
to_guess=('types', 'masses'), force_guess=(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to_guess=('types', 'masses'), force_guess=(),
to_guess=('types', 'masses'), force_guess=None,

Could you confirm that keeping () instead of None was intentional?
See: #3753

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyminium feel free to turn this into an issue, we can fix this in post.

@@ -303,8 +306,7 @@ def parse(self, **kwargs):
if atomtypes:
attrs.append(Atomtypes(np.array(atomtypes, dtype=object)))
else:
atomtypes = guessers.guess_types(names)
attrs.append(Atomtypes(atomtypes, guessed=True))
atomtypes = np.char.upper(elements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aya9aladdin @lilyminium - this is assigned but nothing else is done with it afterwards it seems? I feel like I'm missing something, could you please just double check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC GSoC project
Projects
None yet
10 participants