Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter formula syntax #986

Merged
merged 2 commits into from
Oct 26, 2021

Conversation

merkys
Copy link
Member

@merkys merkys commented Oct 25, 2021

Making formula syntax closer follow the specification by forbidding integer proportions of elements equal to 1 (can they be equal to 0?). Forbidding empty string formula as well, as they do not make much sense.

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @merkys! The test failures are just that the proportion "1" was previously being caught by the Python validators and not the regexp, so we will just need to update the expected error message in the tests. This should allow us to remove some other code too, which I can commit on top of your PR (if that is okay with you!)

The regexp also needs to be updated in the OpenAPI spec which is the cause of the other failure in the openapi and pre-commit checks.

@@ -202,7 +202,7 @@ def anonymous_element_generator():
ANONYMOUS_ELEMENTS = tuple(itertools.islice(anonymous_element_generator(), 150))
""" Returns the first 150 values of the anonymous element generator. """

CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?\d*)*$"
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([02-9]|[1-9]\d+)?)+$"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also exclude "0"-proportioned elements right? e.g.

Suggested change
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([02-9]|[1-9]\d+)?)+$"
CHEMICAL_FORMULA_REGEXP = r"^([A-Z][a-z]?([2-9]|[1-9]\d+)?)+$"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I was not sure about 0-proportioned elements as per specification, but I do not think they make any sense.

Copy link
Contributor

@JPBergsma JPBergsma Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 0 is not allowed, we should mention this in the OPTIMADE standard.
Now it says that the API implementation can choose how to do the rounding. A compound like LaO0.7F0.3FeAs could thus be written as LaOF0FeAs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure that having proportion of 0 is a reasonable approximation. In your example I would multiply all the proportions by 10 and thus preserve the original proportion. Rounding 0.3 to 0 loses a lot of information. But you are right about the specification allowing one to choose whichever approximation one wants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed be more informative to multiply by 10. Rounding to the nearest integer can however be a very simple way to generate the chemical_formula_reduced field from the chemical_formula_descriptive. So I would not exclude the possibility at least some implementers have tried this.

@ml-evs
Copy link
Member

ml-evs commented Oct 25, 2021

It seems I can't push to your fork @merkys, would you mind giving me collaborator access? I have updated the test cases and the regexp to disallow 0 too

@merkys
Copy link
Member Author

merkys commented Oct 25, 2021

It seems I can't push to your fork @merkys, would you mind giving me collaborator access? I have updated the test cases and the regexp to disallow 0 too

I have just added you as a collaborator - could you please try pushing again?

@ml-evs
Copy link
Member

ml-evs commented Oct 25, 2021

It seems I can't push to your fork @merkys, would you mind giving me collaborator access? I have updated the test cases and the regexp to disallow 0 too

I have just added you as a collaborator - could you please try pushing again?

That worked, thanks! I'll let @CasperWA or @JPBergsma have a look over this too, since I have also made edits...

@merkys
Copy link
Member Author

merkys commented Oct 25, 2021

@ml-evs do you by the way know why chemical_formula_hill is not nullable while specification says that it "MAY be null"? Maybe I just do not understand the meaning of nullable...

@codecov
Copy link

codecov bot commented Oct 25, 2021

Codecov Report

Merging #986 (d8470c9) into master (8d94df0) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head d8470c9 differs from pull request most recent head 14cf5e2. Consider uploading reports for the commit 14cf5e2 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #986      +/-   ##
==========================================
- Coverage   92.92%   92.88%   -0.04%     
==========================================
  Files          67       67              
  Lines        3787     3782       -5     
==========================================
- Hits         3519     3513       -6     
- Misses        268      269       +1     
Flag Coverage Δ
project 92.88% <100.00%> (-0.04%) ⬇️
validator 92.88% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
optimade/models/structures.py 95.38% <ø> (-0.47%) ⬇️
optimade/models/utils.py 91.56% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d94df0...14cf5e2. Read the comment docs.

@ml-evs
Copy link
Member

ml-evs commented Oct 25, 2021

@ml-evs do you by the way know why chemical_formula_hill is not nullable while specification says that it "MAY be null"? Maybe I just do not understand the meaning of nullable...

This is a bit of an awkard quirk between OpenAPI, pydantic and the way we define "SHOULD"/"OPTIONAL" fields. SHOULD fields are listed under required but have nullable: true, whereas OPTIONAL fields are not listed under required but do not have nullable: true.

Perhaps it would be clearer if we additionally set all optional fields to be nullable too.

@@ -180,7 +180,7 @@ def test_bad_structures(bad_structures, mapper):
),
(
{"chemical_formula_anonymous": "A44B15C9D4E3F2GHI0J0K0L0"},
"chemical_formula_anonymous 'A44B15C9D4E3F2GHI0J0K0L0' cannot contain chemical proportion of 0.",
"string does not match regex",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been nice if we could keep the old error messages. They are much more informative and easier to reas than a regular expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this is what we lose with this approach. One workaround could be to keep the pydantic validators and apply them before the regexp with pre=True but this adds some complexity and overhead. Any thoughts @CasperWA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could use the pydantic validators after the check with the regular expression has found an error.
There is no use in checking the same thing twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine. While it's less informative, it's still as informative as it needs to be. One can, technically, copy paste our regular expression and try it out in any of the many online regex testers - or just do it offline locally to try and find the culprit sub-string.

In essence, while it would be nice to be extra helpful, I think the hassle of re-implementing that is much worse than the nice "cleanup" we're getting here.

CasperWA
CasperWA previously approved these changes Oct 26, 2021
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me.

Just to reiterate what I'm seeing happening (mainly):

  1. You're elegantly moving the tests for 0 and 1 stoichiometries/"concentrations", whatever... to a regular expression. Neat!
  2. Through the regular expression you're changing/removing the possibility for the values to be empty strings.
    Is this technically correct? I could probably argue for it being correct, since in that case the specification probably intends for the value to be null? I guess this is your interpretation as well.

@@ -180,7 +180,7 @@ def test_bad_structures(bad_structures, mapper):
),
(
{"chemical_formula_anonymous": "A44B15C9D4E3F2GHI0J0K0L0"},
"chemical_formula_anonymous 'A44B15C9D4E3F2GHI0J0K0L0' cannot contain chemical proportion of 0.",
"string does not match regex",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine. While it's less informative, it's still as informative as it needs to be. One can, technically, copy paste our regular expression and try it out in any of the many online regex testers - or just do it offline locally to try and find the culprit sub-string.

In essence, while it would be nice to be extra helpful, I think the hassle of re-implementing that is much worse than the nice "cleanup" we're getting here.

ml-evs
ml-evs previously approved these changes Oct 26, 2021
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other added benefit is that external OpenAPI-based tools will now do a better job of validating responses outside of this package! Thanks @merkys!

merkys and others added 2 commits October 26, 2021 15:33
- Formulas are unlikely to be empty strings.
- Integer proportions cannot be equal to 1.
Co-authored-by: Johan Bergsma <JPBergsma@users.noreply.github.com>
@ml-evs ml-evs dismissed stale reviews from CasperWA and themself via 14cf5e2 October 26, 2021 14:33
@ml-evs
Copy link
Member

ml-evs commented Oct 26, 2021

I have rebased into two commits in case we want to revert to the old pydantic validators at a later date.

@ml-evs ml-evs enabled auto-merge (rebase) October 26, 2021 14:34
@merkys
Copy link
Member Author

merkys commented Oct 26, 2021

I mixed up requirements arising from the specification and from "common sense" (mine of course). Thus:

  • To stay true to the spec, only proportions of 1 should be forbidden via regular expressions;
  • To stay true to "common sense" (mine), empty formulas and proportions of 0 should be forbidden.

It should be much better to forward these "common sense" restrictions for the discussion and possible enshrinement in the specification. I will do that.

@ml-evs ml-evs merged commit 32d2cdf into Materials-Consortia:master Oct 26, 2021
@merkys merkys deleted the stricter-formula-syntax branch October 26, 2021 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants