-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shorthand for 'or' / 'and' in selections #345
Comments
I think this is a good shorthand, I'm assuming this can be implemented for all operations? So I could do "name [CH2 CH3 Ca]" If VMD doesn't use brackets, it might be a good idea(tm) to try and mimic this if possible? I have no idea how the selection parsing works... but can't we split around the keywords and then assume that multiple selections are space delimited? "resname LEU ALA VAL"
"resname", "LEU ALA VAL"
"resname", ["LEU", "ALA", "VAL"] Or more abstractly "KEYWORD TEXT KEYWORD TEXT"
"KEYWORD", "TEXT", "KEYWORD", "TEXT" And then we try With the curly bracket idea, won't these selections always be empty? If I'm expanding around a given definition using Ie selectAtoms("resname {LEU ALA}")
selectAtoms("PROP {A, B}") Will always return nothing won't it? |
So, parsing is done token by token. Each keyword consumes a defined number of subsequent tokens ( This has the advantage that you always know what's a keyword token and what's an argument token. I think you can even have an atom named 'and' and select for its name and it would still work. Implicitly or'ing argument tokens, like VMD does, gets trickier there because now we have to tell keywords and arguments apart. That's why I stated it might land us in hot water, though I actually prefer VMD's cleaner syntax over my brackets. How I see it happening: we can always start by assuming a keyword only consumes its default number of arguments. When we then try the following token either of these 2 can happen:
I'll try to see what I can do. In this scenario there's no room for an implicit 'and'. As you pointed out it's less useful, but with the (dumb preprocessing) curly bracket syntactic sugar it can still return something other than an empty set: corner = u.selectAtoms("prop {x < 10, y < 10, z < 10}") which gets expanded to corner = u.selectAtoms("(prop x < 10 and prop y < 10 and prop z < 10)") I'll report if I get something on the implicit |
A third possibility would be to have both syntaxes: explicit with braces, that does both 'and' and 'or', and implicit (only 'or'), VMD-style. Since the brace syntax is really just a preprocessing there should be no trouble getting them to play along. Do note that for this to work cleanly the brace syntax should always have some sort of divider (commas, in my example). |
We could try and reuse python syntax more, corner = u.selectAtoms('prop x < 10 & y < 10')
residues = u.selectAtoms('resname ALA | LEU') = u.selectAtoms('resname ALA LEU') So the lack of separator defaults to |
As a sidenote, it might be worthwhile to look at pyparsing in case @nmichaud 's hand-crafted parser becomes difficult to extend. (ProDy uses it, for example.) It might also give us the flexibility to generate multiple selection languages for MDAnalysis, e.g. the user could select if she wants "MDA native", "VMD", "CHARMM", "PyMOL", etc. (Truth be told, I don't know how difficult it would be to describe these different selection languages but at least with an abstract framework like |
Good idea! The parser was hacked together in an afternoon and is definitely
|
Ah, pyparsing does seem the way to go. I'll read on it and let you guys know if I find it feasible. |
Allows selections such as: u.select_atoms('name C N') == u.select_atoms('name C or name N') u.select_atoms('name N and resname GLY LEU') == 'name N and (resname GLY or resname LEU)' Defines how to detect keywords in selection strings (`is_keyword`) Issue #347
Implicit or in selections (Issue #345) Allows selections such as: ``` python u.select_atoms('name C* N* O*') # == u.select_atoms('name C or name N') u.select_atoms('name N and resname GLY LEU') # == 'name N and (resname GLY or resname LEU)' u.select_atoms('resid 1:10 14 15') u.select_atoms('resid 1:100 200-300 and not resname MET GLY') ``` Defines how to detect keywords in selection strings with `is_keyword` (Issue #347).
I'm sure you've had the case where you end up typing something like:
VMD syntax, in contrast takes a list of values as an implicit 'or':
Having something like this in MDAnalysis would save a lot of typing, and arguably improve readability.
I am against doing it implicitly like VMD because I think it'll land us in trouble sooner or later. What I propose are dedicated delimiters, like this:
I've actually implemented this and it works fine. I took the easier approach where the brackets and dividing commas are replaced at the preprocessing stage by the preceding keyword and concatenated with
or
; the whole thing gets wrapped in parentheses:This is purely syntactic sugar, with no checking for whatever is before or inside the brackets.
I went ahead and implemented
{}
for theand
shorthand too. (Though I can see it won't have so much use). At first I thought the curly brackets might conflict with formatting, but it's just a matter of escaping them properly.Feedback appreciated.
The text was updated successfully, but these errors were encountered: