Shorthand for 'or' / 'and' in selections #345

mnmelo · 2015-07-11T02:08:24Z

I'm sure you've had the case where you end up typing something like:

apolar = u.selectAtoms("resname LEU or resname ILE or resname ALA or resname VAL or resname PHE")

VMD syntax, in contrast takes a list of values as an implicit 'or':

resname LEU ILE ALA VAL PHE

Having something like this in MDAnalysis would save a lot of typing, and arguably improve readability.
I am against doing it implicitly like VMD because I think it'll land us in trouble sooner or later. What I propose are dedicated delimiters, like this:

apolar = u.selectAtoms("resname [LEU, ILE, ALA, VAL, PHE]")

I've actually implemented this and it works fine. I took the easier approach where the brackets and dividing commas are replaced at the preprocessing stage by the preceding keyword and concatenated with or; the whole thing gets wrapped in parentheses:

#The previous gets expanded to:
apolar = u.selectAtoms("(resname LEU or resname ILE or resname ALA or resname VAL or resname PHE)")

This is purely syntactic sugar, with no checking for whatever is before or inside the brackets.

I went ahead and implemented {} for the and shorthand too. (Though I can see it won't have so much use). At first I thought the curly brackets might conflict with formatting, but it's just a matter of escaping them properly.

Feedback appreciated.

The text was updated successfully, but these errors were encountered:

richardjgowers · 2015-07-11T09:00:23Z

I think this is a good shorthand, I'm assuming this can be implemented for all operations? So I could do

"name [CH2 CH3 Ca]"

If VMD doesn't use brackets, it might be a good idea(tm) to try and mimic this if possible? I have no idea how the selection parsing works... but can't we split around the keywords and then assume that multiple selections are space delimited?

"resname LEU ALA VAL"

"resname", "LEU ALA VAL"

"resname", ["LEU", "ALA", "VAL"]

Or more abstractly

"KEYWORD TEXT KEYWORD TEXT"

"KEYWORD", "TEXT", "KEYWORD", "TEXT"

And then we try split() on the TEXT bits? I do agree that the brackets can improve readability though, so maybe allow these optionally?

With the curly bracket idea, won't these selections always be empty? If I'm expanding around a given definition using AND, and each item only has one definition, then isn't it impossible for any item to have both definitions?

Ie

selectAtoms("resname {LEU ALA}")

selectAtoms("PROP {A, B}")

Will always return nothing won't it?

mnmelo · 2015-07-11T14:19:52Z

So, parsing is done token by token. Each keyword consumes a defined number of subsequent tokens (resname, for instance, consumes a single token, around consumes two, protein consumes zero, etc.)

This has the advantage that you always know what's a keyword token and what's an argument token. I think you can even have an atom named 'and' and select for its name and it would still work.

Implicitly or'ing argument tokens, like VMD does, gets trickier there because now we have to tell keywords and arguments apart. That's why I stated it might land us in hot water, though I actually prefer VMD's cleaner syntax over my brackets.

How I see it happening: we can always start by assuming a keyword only consumes its default number of arguments. When we then try the following token either of these 2 can happen:

it's a token that matches our keyword list, and we process it as such;
it doesn't match a known keyword and we therefore couple it with or to the running keyword selection. (This must be a special high-precedence or since we want not resname ALA LEU to behave like not (resname ALA or resname LEU) instead of the naïve not resname ALA or resname LEU).

I'll try to see what I can do.

In this scenario there's no room for an implicit 'and'. As you pointed out it's less useful, but with the (dumb preprocessing) curly bracket syntactic sugar it can still return something other than an empty set:

corner = u.selectAtoms("prop {x < 10, y < 10, z < 10}")

which gets expanded to

corner = u.selectAtoms("(prop x < 10 and prop y < 10 and prop z < 10)")

I'll report if I get something on the implicit or front. More ideas are always welcome.

mnmelo · 2015-07-11T15:50:33Z

A third possibility would be to have both syntaxes: explicit with braces, that does both 'and' and 'or', and implicit (only 'or'), VMD-style.

Since the brace syntax is really just a preprocessing there should be no trouble getting them to play along. Do note that for this to work cleanly the brace syntax should always have some sort of divider (commas, in my example).

richardjgowers · 2015-07-11T16:15:47Z

We could try and reuse python syntax more, & for and, and | for or.

corner = u.selectAtoms('prop x < 10 & y < 10')

residues = u.selectAtoms('resname ALA | LEU') = u.selectAtoms('resname ALA LEU')

So the lack of separator defaults to |

orbeckst · 2015-07-13T17:04:23Z

As a sidenote, it might be worthwhile to look at pyparsing in case @nmichaud 's hand-crafted parser becomes difficult to extend. (ProDy uses it, for example.) It might also give us the flexibility to generate multiple selection languages for MDAnalysis, e.g. the user could select if she wants "MDA native", "VMD", "CHARMM", "PyMOL", etc. (Truth be told, I don't know how difficult it would be to describe these different selection languages but at least with an abstract framework like pyparsing it might be possible.)

nmichaud · 2015-07-13T17:23:36Z

Good idea! The parser was hacked together in an afternoon and is definitely
not optimal in speed because all intermediate subexpressions are fully
evaluated on the entire universe.
On Jul 13, 2015 1:04 PM, "Oliver Beckstein" notifications@github.com
wrote:

As a sidenote, it might be worthwhile to look at pyparsing
https://pyparsing.wikispaces.com/ in case @nmichaud
https://github.com/nmichaud 's hand-crafted parser becomes difficult to
extend. (ProDy http://prody.csb.pitt.edu/ uses it, for example.) It
might also give us the flexibility to generate multiple selection languages
for MDAnalysis, e.g. the user could select if she wants "MDA native",
"VMD", "CHARMM", "PyMOL", etc. (Truth be told, I don't know how difficult
it would be to describe these different selection languages but at least
with an abstract framework like pyparsing it might be possible.)

—
Reply to this email directly or view it on GitHub
#345 (comment)
.

mnmelo · 2015-07-13T18:08:07Z

Ah, pyparsing does seem the way to go. I'll read on it and let you guys know if I find it feasible.

Allows selections such as: u.select_atoms('name C N') == u.select_atoms('name C or name N') u.select_atoms('name N and resname GLY LEU') == 'name N and (resname GLY or resname LEU)' Defines how to detect keywords in selection strings (`is_keyword`) Issue #347

Implicit or in selections (Issue #345) Allows selections such as: ``` python u.select_atoms('name C* N* O*') # == u.select_atoms('name C or name N') u.select_atoms('name N and resname GLY LEU') # == 'name N and (resname GLY or resname LEU)' u.select_atoms('resid 1:10 14 15') u.select_atoms('resid 1:100 200-300 and not resname MET GLY') ``` Defines how to detect keywords in selection strings with `is_keyword` (Issue #347).

mnmelo added the enhancement label Jul 11, 2015

mnmelo self-assigned this Jul 11, 2015

richardjgowers added the Component-Selections label Jul 11, 2015

mnmelo mentioned this issue Jul 12, 2015

Reserved selection keywords #347

Closed

mnmelo mentioned this issue Jul 13, 2015

Shell selections and common syntax for distance selections #348

Open

mnmelo mentioned this issue Jul 27, 2015

Selection parsing overhaul. New syntax features. #371

Open

richardjgowers assigned richardjgowers and unassigned mnmelo Jan 18, 2016

richardjgowers added a commit that referenced this issue Jan 21, 2016

Changes for #345 following review

565ac1a

richardjgowers closed this as completed Jan 21, 2016

richardjgowers added this to the 0.14 milestone Jan 21, 2016

orbeckst mentioned this issue Jan 23, 2020

Allow for more flexibility with wildcard in selections #2436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorthand for 'or' / 'and' in selections #345

Shorthand for 'or' / 'and' in selections #345

mnmelo commented Jul 11, 2015

richardjgowers commented Jul 11, 2015

mnmelo commented Jul 11, 2015

mnmelo commented Jul 11, 2015

richardjgowers commented Jul 11, 2015

orbeckst commented Jul 13, 2015

nmichaud commented Jul 13, 2015

mnmelo commented Jul 13, 2015

Shorthand for 'or' / 'and' in selections #345

Shorthand for 'or' / 'and' in selections #345

Comments

mnmelo commented Jul 11, 2015

richardjgowers commented Jul 11, 2015

mnmelo commented Jul 11, 2015

mnmelo commented Jul 11, 2015

richardjgowers commented Jul 11, 2015

orbeckst commented Jul 13, 2015

nmichaud commented Jul 13, 2015

mnmelo commented Jul 13, 2015