Add model validators and regexp for chemical formulae fields #547

ml-evs · 2020-10-08T15:53:16Z

Closes #546 by adding validators and regexps to the chemical_formula_* OPTIMADE fields.

Adds a test harness for modifying a single 'good' structure, to more easily test validator features beyond the bulk good/bad tests
Does not ensure consistency between various formulae and species fields, which maybe it should... raises the question of how heavyweight our validators should be, and whether pydantic validation can/should be disabled by an implementation?

ml-evs · 2020-10-08T16:09:53Z

Have just discovered that all of our test data has the wrong element ordering for chemical_formula_reduced! Will fix this evening.

codecov · 2020-10-09T10:43:28Z

Codecov Report

Merging #547 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #547      +/-   ##
==========================================
+ Coverage   91.67%   91.77%   +0.10%     
==========================================
  Files          62       62              
  Lines        3182     3221      +39     
==========================================
+ Hits         2917     2956      +39     
  Misses        265      265

Flag	Coverage Δ
#project	`91.77% <100.00%> (+0.10%)`	⬆️
#validator	`64.63% <75.00%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
optimade/models/structures.py	`95.79% <100.00%> (+0.65%)`	⬆️
optimade/models/utils.py	`91.56% <100.00%> (+1.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f34399...0ed2f49. Read the comment docs.

ml-evs · 2020-10-09T10:56:06Z

I think that's caught them all now...

optimade/models/structures.py

optimade/models/utils.py

ml-evs · 2020-10-16T14:39:02Z

Penny for anyone's thoughts? @CasperWA @shyamd

CasperWA

Great work @ml-evs !
This is some of the more intricate stuff that hasn't been fully specified in the specification, only in a human-readable way, so we might want to contribute some of this back to the specification. I'm thinking specifically of the regex.

I've put in some requested changes, suggested changes, observations, and other comments :)

optimade/models/structures.py

optimade/models/utils.py

tests/models/conftest.py

tests/models/test_structures.py

tests/models/test_utils.py

ml-evs · 2020-10-19T13:32:19Z

From a resolved suggestion above:

Since fields are not cross-validated, I've added an extra test case with both C and H in them as "deformities". Do you have any opinions on my second point above? Should we add pairwise validators for all correlated fields?

ml-evs · 2020-10-19T13:54:43Z

I think that's dealt with everything, except my question above (which is for another PR).

Are we happy to add the regexp to our OpenAPI specification for now, but potentially demote it down to a validator property pending the meeting next week?

CasperWA

Only minor changes. Thanks @ml-evs.

optimade/models/structures.py

tests/models/test_structures.py

CasperWA · 2020-10-27T11:20:05Z

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

ml-evs · 2020-10-27T11:46:16Z

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

ValueErrors get turned into the pydantic ValidationErrors that we are testing for with all our entry tests right? At the very least we have a match on the message that it returns an error corresponding to the field we think is incorrect. Everything I've added here should be tested in the new tests that use the good_structure fixture, plus a lot of things we weren't testing directly before

CasperWA · 2020-10-27T12:14:08Z

Also, the changes in this PR introduces a lot of ValueError raises that are not tested. Do you think it's reasonable to add tests for these or simply leave it be? In an ideal world it would be nice to know that the raises are valid exits for the validators and it's not a code block that will never actually be reached.

ValueErrors get turned into the pydantic ValidationErrors that we are testing for with all our entry tests right? At the very least we have a match on the message that it returns an error corresponding to the field we think is incorrect. Everything I've added here should be tested in the new tests that use the good_structure fixture, plus a lot of things we weren't testing directly before

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

ml-evs · 2020-10-27T12:26:30Z

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

CasperWA · 2020-10-27T12:28:18Z

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

So what you're saying is that they are now never reached?... :)

ml-evs · 2020-10-27T12:41:46Z

Right. It just seems 6 new lines are missed for the Structure model validators, possibly not being tested? See here.

Ah, those are the checks that aren't triggered as they are already caught by the overall field regexp. How much do we want to placate the coverage gods?

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

CasperWA · 2020-10-27T13:13:25Z

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

Personally, I'd prefer the cleaner option; to cut away the fat/non-used code.
This will diminish confusion and if we wish to revert, we can always do this from the git history.

ml-evs · 2020-10-27T13:21:21Z

So what you're saying is that they are now never reached?... :)

Well, the checks themselves are, if they raise an error then we can diagnose any future problems that we have introduced into the regexp. Equally, if we decide tomorrow that we shouldn't include the regexp in the schema, then we'll have to rely on these checks again (or at least stop fastapi from putting the regexp in the schema itself)

Personally, I'd prefer the cleaner option; to cut away the fat/non-used code.
This will diminish confusion and if we wish to revert, we can always do this from the git history.

Understood; let's park this until the meeting tomorrow, in case they don't like the regexp.

If we can adopt the regexp in the spec, then the extra checks can be removed.
Alternatively, w could run the specific field validators with pre=True or always=True so they are triggered even when the regexp doesn't match, to provide a more informative error

ml-evs · 2020-10-28T23:50:44Z

In the OPTIMADE meeting today it was agreed that putting the chemical formula regexp into the schema is good, so we can proceed here and remove the extra validators that never fail. The discussion also included the possibility of expanding the regexp to also check for element symbols.

ml-evs · 2020-10-29T11:01:48Z

In the OPTIMADE meeting today it was agreed that putting the chemical formula regexp into the schema is good, so we can proceed here and remove the extra validators that never fail. The discussion also included the possibility of expanding the regexp to also check for element symbols.

Regexp has been left in, any validators added by this PR that were not being hit were either removed or tests were added to hit them. Should be good to go.

CasperWA

All good on my part.
I still did a few suggestions, but they are not worth not approving for :)

optimade/models/structures.py

… tests

ml-evs added priority/medium Issue or PR with a consensus of medium priority python Pull requests that update Python code models For issues related to the pydantic models directly labels Oct 8, 2020

ml-evs requested review from shyamd and CasperWA October 8, 2020 15:53

ml-evs force-pushed the ml-evs/validate_formulae branch from 4326a16 to 235271d Compare October 9, 2020 10:38

ml-evs force-pushed the ml-evs/validate_formulae branch from 235271d to 0d13b74 Compare October 9, 2020 10:44

ml-evs commented Oct 9, 2020

View reviewed changes

optimade/models/structures.py Outdated Show resolved Hide resolved

ml-evs commented Oct 9, 2020

View reviewed changes

optimade/models/utils.py Outdated Show resolved Hide resolved

ml-evs force-pushed the ml-evs/validate_formulae branch from 0d13b74 to 1e9e387 Compare October 9, 2020 13:58

CasperWA requested changes Oct 16, 2020

View reviewed changes

ml-evs force-pushed the ml-evs/validate_formulae branch from d622806 to 50c7913 Compare October 16, 2020 17:21

ml-evs force-pushed the ml-evs/validate_formulae branch from f158311 to 3dcc5bc Compare October 19, 2020 13:53

ml-evs requested a review from CasperWA October 19, 2020 13:53

CasperWA requested changes Oct 27, 2020

View reviewed changes

optimade/models/structures.py Outdated Show resolved Hide resolved

tests/models/test_structures.py Outdated Show resolved Hide resolved

ml-evs requested a review from CasperWA October 27, 2020 12:00

ml-evs added the on-hold For PRs/issues that are on-hold for an unspecified time label Oct 27, 2020

ml-evs mentioned this pull request Oct 28, 2020

Chemical symbols D and T #570

Closed

ml-evs removed the on-hold For PRs/issues that are on-hold for an unspecified time label Oct 28, 2020

ml-evs force-pushed the ml-evs/validate_formulae branch from 824e2ba to e011f28 Compare October 29, 2020 00:06

Fixed all incorrect formulae in tests and test data

9a65465

ml-evs force-pushed the ml-evs/validate_formulae branch from 408b8a6 to 87a1da5 Compare October 29, 2020 00:13

ml-evs added the schema Concerns the schema models label Oct 29, 2020

ml-evs force-pushed the ml-evs/validate_formulae branch 2 times, most recently from 6df223c to b491698 Compare October 29, 2020 01:04

CasperWA previously approved these changes Oct 30, 2020

View reviewed changes

optimade/models/structures.py Show resolved Hide resolved

optimade/models/structures.py Outdated Show resolved Hide resolved

ml-evs added 2 commits October 30, 2020 11:20

Added regexp and extra validators for formula fields, plus associated…

552492f

… tests

Pin pydantic version to 1.6.1 (see #578)

0ed2f49

ml-evs dismissed CasperWA’s stale review via 0ed2f49 October 30, 2020 11:23

ml-evs force-pushed the ml-evs/validate_formulae branch from 8cd476a to 0ed2f49 Compare October 30, 2020 11:23

CasperWA approved these changes Oct 30, 2020

View reviewed changes

ml-evs merged commit 2212107 into master Oct 30, 2020

ml-evs deleted the ml-evs/validate_formulae branch October 30, 2020 11:58

This was referenced Oct 30, 2020

Relax models to allow for all SHOULD fields to be None #560

Merged

Update dependencies #578

Merged

ml-evs added the enhancement New feature or request label Oct 31, 2020

ml-evs mentioned this pull request Feb 17, 2021

Stricter validation of chemical formulas in OpenAPI schema #708

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model validators and regexp for chemical formulae fields #547

Add model validators and regexp for chemical formulae fields #547

ml-evs commented Oct 8, 2020

ml-evs commented Oct 8, 2020

codecov bot commented Oct 9, 2020 •

edited

Loading

ml-evs commented Oct 9, 2020

ml-evs commented Oct 16, 2020

CasperWA left a comment

ml-evs commented Oct 19, 2020

ml-evs commented Oct 19, 2020

CasperWA left a comment

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020 •

edited

Loading

ml-evs commented Oct 28, 2020

ml-evs commented Oct 29, 2020

CasperWA left a comment

Add model validators and regexp for chemical formulae fields #547

Add model validators and regexp for chemical formulae fields #547

Conversation

ml-evs commented Oct 8, 2020

ml-evs commented Oct 8, 2020

codecov bot commented Oct 9, 2020 • edited Loading

Codecov Report

ml-evs commented Oct 9, 2020

ml-evs commented Oct 16, 2020

CasperWA left a comment

Choose a reason for hiding this comment

ml-evs commented Oct 19, 2020

ml-evs commented Oct 19, 2020

CasperWA left a comment

Choose a reason for hiding this comment

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020

CasperWA commented Oct 27, 2020

ml-evs commented Oct 27, 2020 • edited Loading

ml-evs commented Oct 28, 2020

ml-evs commented Oct 29, 2020

CasperWA left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 9, 2020 •

edited

Loading

ml-evs commented Oct 27, 2020 •

edited

Loading