Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model validators and regexp for chemical formulae fields #547

Merged
merged 3 commits into from
Oct 30, 2020

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Oct 8, 2020

Closes #546 by adding validators and regexps to the chemical_formula_* OPTIMADE fields.

  • Adds a test harness for modifying a single 'good' structure, to more easily test validator features beyond the bulk good/bad tests
  • Does not ensure consistency between various formulae and species fields, which maybe it should... raises the question of how heavyweight our validators should be, and whether pydantic validation can/should be disabled by an implementation?

@ml-evs ml-evs added priority/medium Issue or PR with a consensus of medium priority python Pull requests that update Python code models For issues related to the pydantic models directly labels Oct 8, 2020
@ml-evs ml-evs requested review from shyamd and CasperWA October 8, 2020 15:53
@ml-evs
Copy link
Member Author

ml-evs commented Oct 8, 2020

Have just discovered that all of our test data has the wrong element ordering for chemical_formula_reduced! Will fix this evening.

@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #547 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #547      +/-   ##
==========================================
+ Coverage   91.67%   91.77%   +0.10%     
==========================================
  Files          62       62              
  Lines        3182     3221      +39     
==========================================
+ Hits         2917     2956      +39     
  Misses        265      265              
Flag Coverage Δ
#project 91.77% <100.00%> (+0.10%) ⬆️
#validator 64.63% <75.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
optimade/models/structures.py 95.79% <100.00%> (+0.65%) ⬆️
optimade/models/utils.py 91.56% <100.00%> (+1.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f34399...0ed2f49. Read the comment docs.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 9, 2020

I think that's caught them all now...

optimade/models/structures.py Outdated Show resolved Hide resolved
optimade/models/utils.py Outdated Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Oct 16, 2020

Penny for anyone's thoughts? @CasperWA @shyamd

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ml-evs !
This is some of the more intricate stuff that hasn't been fully specified in the specification, only in a human-readable way, so we might want to contribute some of this back to the specification. I'm thinking specifically of the regex.

I've put in some requested changes, suggested changes, observations, and other comments :)

optimade/models/structures.py Outdated Show resolved Hide resolved
optimade/models/structures.py Outdated Show resolved Hide resolved
optimade/models/structures.py Outdated Show resolved Hide resolved
optimade/models/structures.py Show resolved Hide resolved
optimade/models/structures.py Show resolved Hide resolved
optimade/models/utils.py Show resolved Hide resolved
tests/models/conftest.py Outdated Show resolved Hide resolved
tests/models/conftest.py Show resolved Hide resolved
tests/models/test_structures.py Outdated Show resolved Hide resolved
tests/models/test_utils.py Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Oct 19, 2020

From a resolved suggestion above:

Since fields are not cross-validated, I've added an extra test case with both C and H in them as "deformities". Do you have any opinions on my second point above? Should we add pairwise validators for all correlated fields?

@ml-evs
Copy link
Member Author

ml-evs commented Oct 19, 2020

I think that's dealt with everything, except my question above (which is for another PR).

Are we happy to add the regexp to our OpenAPI specification for now, but potentially demote it down to a validator property pending the meeting next week?

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor changes. Thanks @ml-evs.

optimade/models/structures.py Outdated Show resolved Hide resolved
tests/models/test_structures.py Outdated Show resolved Hide resolved
@CasperWA
Copy link
Member

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 27, 2020

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

ValueErrors get turned into the pydantic ValidationErrors that we are testing for with all our entry tests right? At the very least we have a match on the message that it returns an error corresponding to the field we think is incorrect. Everything I've added here should be tested in the new tests that use the good_structure fixture, plus a lot of things we weren't testing directly before

@ml-evs ml-evs requested a review from CasperWA October 27, 2020 12:00
@CasperWA
Copy link
Member

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

ValueErrors get turned into the pydantic ValidationErrors that we are testing for with all our entry tests right? At the very least we have a match on the message that it returns an error corresponding to the field we think is incorrect. Everything I've added here should be tested in the new tests that use the good_structure fixture, plus a lot of things we weren't testing directly before

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 27, 2020

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

@CasperWA
Copy link
Member

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

So what you're saying is that they are now never reached?... :)

@ml-evs
Copy link
Member Author

ml-evs commented Oct 27, 2020

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

@CasperWA
Copy link
Member

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

Personally, I'd prefer the cleaner option; to cut away the fat/non-used code.
This will diminish confusion and if we wish to revert, we can always do this from the git history.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 27, 2020

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

Personally, I'd prefer the cleaner option; to cut away the fat/non-used code.
This will diminish confusion and if we wish to revert, we can always do this from the git history.

Understood; let's park this until the meeting tomorrow, in case they don't like the regexp.

  • If we can adopt the regexp in the spec, then the extra checks can be removed.
  • Alternatively, w could run the specific field validators with pre=True or always=True so they are triggered even when the regexp doesn't match, to provide a more informative error

@ml-evs ml-evs added the on-hold For PRs/issues that are on-hold for an unspecified time label Oct 27, 2020
@ml-evs ml-evs mentioned this pull request Oct 28, 2020
@ml-evs ml-evs removed the on-hold For PRs/issues that are on-hold for an unspecified time label Oct 28, 2020
@ml-evs
Copy link
Member Author

ml-evs commented Oct 28, 2020

In the OPTIMADE meeting today it was agreed that putting the chemical formula regexp into the schema is good, so we can proceed here and remove the extra validators that never fail. The discussion also included the possibility of expanding the regexp to also check for element symbols.

@ml-evs ml-evs added the schema Concerns the schema models label Oct 29, 2020
@ml-evs ml-evs force-pushed the ml-evs/validate_formulae branch 2 times, most recently from 6df223c to b491698 Compare October 29, 2020 01:04
@ml-evs
Copy link
Member Author

ml-evs commented Oct 29, 2020

In the OPTIMADE meeting today it was agreed that putting the chemical formula regexp into the schema is good, so we can proceed here and remove the extra validators that never fail. The discussion also included the possibility of expanding the regexp to also check for element symbols.

Regexp has been left in, any validators added by this PR that were not being hit were either removed or tests were added to hit them. Should be good to go.

CasperWA
CasperWA previously approved these changes Oct 30, 2020
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good on my part.
I still did a few suggestions, but they are not worth not approving for :)

optimade/models/structures.py Show resolved Hide resolved
optimade/models/structures.py Outdated Show resolved Hide resolved
@ml-evs ml-evs merged commit 2212107 into master Oct 30, 2020
@ml-evs ml-evs deleted the ml-evs/validate_formulae branch October 30, 2020 11:58
@ml-evs ml-evs added the enhancement New feature or request label Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request models For issues related to the pydantic models directly priority/medium Issue or PR with a consensus of medium priority python Pull requests that update Python code schema Concerns the schema models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chemical formulae are not properly validated on model creation
2 participants