-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter formula syntax #986
Stricter formula syntax #986
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @merkys! The test failures are just that the proportion "1" was previously being caught by the Python validators and not the regexp, so we will just need to update the expected error message in the tests. This should allow us to remove some other code too, which I can commit on top of your PR (if that is okay with you!)
The regexp also needs to be updated in the OpenAPI spec which is the cause of the other failure in the openapi
and pre-commit
checks.
optimade/models/utils.py
Outdated
@@ -202,7 +202,7 @@ def anonymous_element_generator(): | |||
ANONYMOUS_ELEMENTS = tuple(itertools.islice(anonymous_element_generator(), 150)) | |||
""" Returns the first 150 values of the anonymous element generator. """ | |||
|
|||
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?\d*)*$" | |||
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([02-9]|[1-9]\d+)?)+$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also exclude "0"-proportioned elements right? e.g.
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([02-9]|[1-9]\d+)?)+$" | |
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([2-9]|[1-9]\d+)?)+$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I was not sure about 0-proportioned elements as per specification, but I do not think they make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 0 is not allowed, we should mention this in the OPTIMADE standard.
Now it says that the API implementation can choose how to do the rounding. A compound like LaO0.7F0.3FeAs could thus be written as LaOF0FeAs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure that having proportion of 0 is a reasonable approximation. In your example I would multiply all the proportions by 10 and thus preserve the original proportion. Rounding 0.3 to 0 loses a lot of information. But you are right about the specification allowing one to choose whichever approximation one wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would indeed be more informative to multiply by 10. Rounding to the nearest integer can however be a very simple way to generate the chemical_formula_reduced field from the chemical_formula_descriptive. So I would not exclude the possibility at least some implementers have tried this.
It seems I can't push to your fork @merkys, would you mind giving me collaborator access? I have updated the test cases and the regexp to disallow 0 too |
I have just added you as a collaborator - could you please try pushing again? |
That worked, thanks! I'll let @CasperWA or @JPBergsma have a look over this too, since I have also made edits... |
@ml-evs do you by the way know why |
Codecov Report
@@ Coverage Diff @@
## master #986 +/- ##
==========================================
- Coverage 92.92% 92.88% -0.04%
==========================================
Files 67 67
Lines 3787 3782 -5
==========================================
- Hits 3519 3513 -6
- Misses 268 269 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This is a bit of an awkard quirk between OpenAPI, pydantic and the way we define "SHOULD"/"OPTIONAL" fields. SHOULD fields are listed under Perhaps it would be clearer if we additionally set all optional fields to be nullable too. |
@@ -180,7 +180,7 @@ def test_bad_structures(bad_structures, mapper): | |||
), | |||
( | |||
{"chemical_formula_anonymous": "A44B15C9D4E3F2GHI0J0K0L0"}, | |||
"chemical_formula_anonymous 'A44B15C9D4E3F2GHI0J0K0L0' cannot contain chemical proportion of 0.", | |||
"string does not match regex", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been nice if we could keep the old error messages. They are much more informative and easier to reas than a regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this is what we lose with this approach. One workaround could be to keep the pydantic validators and apply them before the regexp with pre=True
but this adds some complexity and overhead. Any thoughts @CasperWA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could use the pydantic validators after the check with the regular expression has found an error.
There is no use in checking the same thing twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine. While it's less informative, it's still as informative as it needs to be. One can, technically, copy paste our regular expression and try it out in any of the many online regex testers - or just do it offline locally to try and find the culprit sub-string.
In essence, while it would be nice to be extra helpful, I think the hassle of re-implementing that is much worse than the nice "cleanup" we're getting here.
72f5ce5
to
89e0d3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me.
Just to reiterate what I'm seeing happening (mainly):
- You're elegantly moving the tests for
0
and1
stoichiometries/"concentrations", whatever... to a regular expression. Neat! - Through the regular expression you're changing/removing the possibility for the values to be empty strings.
Is this technically correct? I could probably argue for it being correct, since in that case the specification probably intends for the value to benull
? I guess this is your interpretation as well.
@@ -180,7 +180,7 @@ def test_bad_structures(bad_structures, mapper): | |||
), | |||
( | |||
{"chemical_formula_anonymous": "A44B15C9D4E3F2GHI0J0K0L0"}, | |||
"chemical_formula_anonymous 'A44B15C9D4E3F2GHI0J0K0L0' cannot contain chemical proportion of 0.", | |||
"string does not match regex", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine. While it's less informative, it's still as informative as it needs to be. One can, technically, copy paste our regular expression and try it out in any of the many online regex testers - or just do it offline locally to try and find the culprit sub-string.
In essence, while it would be nice to be extra helpful, I think the hassle of re-implementing that is much worse than the nice "cleanup" we're getting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other added benefit is that external OpenAPI-based tools will now do a better job of validating responses outside of this package! Thanks @merkys!
- Formulas are unlikely to be empty strings. - Integer proportions cannot be equal to 1.
Co-authored-by: Johan Bergsma <JPBergsma@users.noreply.github.com>
89e0d3f
to
14cf5e2
Compare
I have rebased into two commits in case we want to revert to the old pydantic validators at a later date. |
I mixed up requirements arising from the specification and from "common sense" (mine of course). Thus:
It should be much better to forward these "common sense" restrictions for the discussion and possible enshrinement in the specification. I will do that. |
Making formula syntax closer follow the specification by forbidding integer proportions of elements equal to 1 (can they be equal to 0?). Forbidding empty string formula as well, as they do not make much sense.