Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand for 'or' / 'and' in selections #345

Closed
mnmelo opened this issue Jul 11, 2015 · 7 comments
Closed

Shorthand for 'or' / 'and' in selections #345

mnmelo opened this issue Jul 11, 2015 · 7 comments

Comments

@mnmelo
Copy link
Member

mnmelo commented Jul 11, 2015

I'm sure you've had the case where you end up typing something like:

apolar = u.selectAtoms("resname LEU or resname ILE or resname ALA or resname VAL or resname PHE")

VMD syntax, in contrast takes a list of values as an implicit 'or':

resname LEU ILE ALA VAL PHE

Having something like this in MDAnalysis would save a lot of typing, and arguably improve readability.
I am against doing it implicitly like VMD because I think it'll land us in trouble sooner or later. What I propose are dedicated delimiters, like this:

apolar = u.selectAtoms("resname [LEU, ILE, ALA, VAL, PHE]")

I've actually implemented this and it works fine. I took the easier approach where the brackets and dividing commas are replaced at the preprocessing stage by the preceding keyword and concatenated with or; the whole thing gets wrapped in parentheses:

#The previous gets expanded to:
apolar = u.selectAtoms("(resname LEU or resname ILE or resname ALA or resname VAL or resname PHE)")

This is purely syntactic sugar, with no checking for whatever is before or inside the brackets.

I went ahead and implemented {} for the and shorthand too. (Though I can see it won't have so much use). At first I thought the curly brackets might conflict with formatting, but it's just a matter of escaping them properly.

Feedback appreciated.

@mnmelo mnmelo self-assigned this Jul 11, 2015
@richardjgowers
Copy link
Member

I think this is a good shorthand, I'm assuming this can be implemented for all operations? So I could do

"name [CH2 CH3 Ca]"

If VMD doesn't use brackets, it might be a good idea(tm) to try and mimic this if possible? I have no idea how the selection parsing works... but can't we split around the keywords and then assume that multiple selections are space delimited?

"resname LEU ALA VAL"

"resname", "LEU ALA VAL"

"resname", ["LEU", "ALA", "VAL"]

Or more abstractly

"KEYWORD TEXT KEYWORD TEXT"

"KEYWORD", "TEXT", "KEYWORD", "TEXT"

And then we try split() on the TEXT bits? I do agree that the brackets can improve readability though, so maybe allow these optionally?

With the curly bracket idea, won't these selections always be empty? If I'm expanding around a given definition using AND, and each item only has one definition, then isn't it impossible for any item to have both definitions?

Ie

selectAtoms("resname {LEU ALA}")

selectAtoms("PROP {A, B}")

Will always return nothing won't it?

@mnmelo
Copy link
Member Author

mnmelo commented Jul 11, 2015

So, parsing is done token by token. Each keyword consumes a defined number of subsequent tokens (resname, for instance, consumes a single token, around consumes two, protein consumes zero, etc.)

This has the advantage that you always know what's a keyword token and what's an argument token. I think you can even have an atom named 'and' and select for its name and it would still work.

Implicitly or'ing argument tokens, like VMD does, gets trickier there because now we have to tell keywords and arguments apart. That's why I stated it might land us in hot water, though I actually prefer VMD's cleaner syntax over my brackets.

How I see it happening: we can always start by assuming a keyword only consumes its default number of arguments. When we then try the following token either of these 2 can happen:

  • it's a token that matches our keyword list, and we process it as such;
  • it doesn't match a known keyword and we therefore couple it with or to the running keyword selection. (This must be a special high-precedence or since we want not resname ALA LEU to behave like not (resname ALA or resname LEU) instead of the naïve not resname ALA or resname LEU).

I'll try to see what I can do.

In this scenario there's no room for an implicit 'and'. As you pointed out it's less useful, but with the (dumb preprocessing) curly bracket syntactic sugar it can still return something other than an empty set:

corner = u.selectAtoms("prop {x < 10, y < 10, z < 10}")

which gets expanded to

corner = u.selectAtoms("(prop x < 10 and prop y < 10 and prop z < 10)")

I'll report if I get something on the implicit or front. More ideas are always welcome.

@mnmelo
Copy link
Member Author

mnmelo commented Jul 11, 2015

A third possibility would be to have both syntaxes: explicit with braces, that does both 'and' and 'or', and implicit (only 'or'), VMD-style.

Since the brace syntax is really just a preprocessing there should be no trouble getting them to play along. Do note that for this to work cleanly the brace syntax should always have some sort of divider (commas, in my example).

@richardjgowers
Copy link
Member

We could try and reuse python syntax more, & for and, and | for or.

corner = u.selectAtoms('prop x < 10 & y < 10')

residues = u.selectAtoms('resname ALA | LEU') = u.selectAtoms('resname ALA LEU')

So the lack of separator defaults to |

@orbeckst
Copy link
Member

As a sidenote, it might be worthwhile to look at pyparsing in case @nmichaud 's hand-crafted parser becomes difficult to extend. (ProDy uses it, for example.) It might also give us the flexibility to generate multiple selection languages for MDAnalysis, e.g. the user could select if she wants "MDA native", "VMD", "CHARMM", "PyMOL", etc. (Truth be told, I don't know how difficult it would be to describe these different selection languages but at least with an abstract framework like pyparsing it might be possible.)

@nmichaud
Copy link
Contributor

Good idea! The parser was hacked together in an afternoon and is definitely
not optimal in speed because all intermediate subexpressions are fully
evaluated on the entire universe.
On Jul 13, 2015 1:04 PM, "Oliver Beckstein" notifications@github.com
wrote:

As a sidenote, it might be worthwhile to look at pyparsing
https://pyparsing.wikispaces.com/ in case @nmichaud
https://github.com/nmichaud 's hand-crafted parser becomes difficult to
extend. (ProDy http://prody.csb.pitt.edu/ uses it, for example.) It
might also give us the flexibility to generate multiple selection languages
for MDAnalysis, e.g. the user could select if she wants "MDA native",
"VMD", "CHARMM", "PyMOL", etc. (Truth be told, I don't know how difficult
it would be to describe these different selection languages but at least
with an abstract framework like pyparsing it might be possible.)


Reply to this email directly or view it on GitHub
#345 (comment)
.

@mnmelo
Copy link
Member Author

mnmelo commented Jul 13, 2015

Ah, pyparsing does seem the way to go. I'll read on it and let you guys know if I find it feasible.

richardjgowers added a commit that referenced this issue Jan 18, 2016
Allows selections such as:
u.select_atoms('name C N') == u.select_atoms('name C or name N')
u.select_atoms('name N and resname GLY LEU') == 'name N and (resname GLY
or resname LEU)'

Defines how to detect keywords in selection strings (`is_keyword`)
Issue #347
richardjgowers added a commit that referenced this issue Jan 21, 2016
dotsdl added a commit that referenced this issue Jan 21, 2016
Implicit or in selections (Issue #345)

Allows selections such as:
``` python

u.select_atoms('name C* N* O*')
# == u.select_atoms('name C or name N')

u.select_atoms('name N and resname GLY LEU')
# == 'name N and (resname GLY or resname LEU)'

u.select_atoms('resid 1:10 14 15')

u.select_atoms('resid 1:100 200-300 and not resname MET GLY')
```
Defines how to detect keywords in selection strings with `is_keyword` (Issue #347).
@richardjgowers richardjgowers added this to the 0.14 milestone Jan 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants