Relax models to allow for all SHOULD fields to be None #560

ml-evs · 2020-10-19T21:29:37Z

This PR tracks the relaxation of our models to allow for None values for almost any field. The current approach for this is to make non-should fields have the type pydantic-type Optional but a default value of ..., which is interpreted as a required OpenAPI field that is allowed to be None.

This PR then also adds the OpenAPI nullable property to each of these fields, indicating that they can be null.

To-do:

Change Structure model field types to Optional[...] with default values of ...
Update model validators to allow them to be robust to missing data
Add combinatorial tests for missing data in structures
Only run server tests under Mongo in CI (rather than all tests)
Add nullable to schema

codecov · 2020-10-19T21:34:11Z

Codecov Report

Merging #560 (5fe03e5) into master (d207200) will increase coverage by 0.08%.
The diff coverage is 96.70%.

@@            Coverage Diff             @@
##           master     #560      +/-   ##
==========================================
+ Coverage   92.11%   92.20%   +0.08%     
==========================================
  Files          61       61              
  Lines        3210     3246      +36     
==========================================
+ Hits         2957     2993      +36     
  Misses        253      253

Flag	Coverage Δ
project	`92.20% <96.70%> (+0.08%)`	⬆️
validator	`65.89% <78.02%> (+0.94%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
optimade/models/structures.py	`96.35% <96.62%> (+0.60%)`	⬆️
optimade/models/entries.py	`97.50% <100.00%> (ø)`
optimade/server/warnings.py	`86.66% <100.00%> (+0.95%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d207200...5fe03e5. Read the comment docs.

ml-evs · 2020-10-28T17:52:03Z

Following discussions in the meeting today, it was agreed that we would go down the "required optional" key route for OPTIMADE should fields, so the changes in this PR should (and do) preserve the OpenAPI schema as it is. This has the side effect that some valid OPTIMADE responses will not be able to validated against the OpenAPI schema (which is the case at the moment anyway). This is mostly due to a mismatch between our definitions and OpenAPI, which we can consider tidying in the specification text.

We could consider patching the OpenAPI schema for these SHOULD field models with the OpenAPI property nullable, in reading https://pydantic-docs.helpmanual.io/usage/models/#required-optional-fields, it almost seems like this should be the case already...

optimade/models/structures.py

tests/models/test_data/test_bad_structures.json

optimade/models/structures.py

CasperWA · 2020-10-30T13:51:49Z

What's the motivation behind only testing the server for the real MongoDB in the CI?

ml-evs · 2020-10-30T14:00:20Z

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

ml-evs · 2020-10-30T14:40:24Z

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

Going to revisit this, I think coverage might actually improve if testing was performed on only the awkward edge-case structure, which will speed things up massively. Then we can decide whether we want to keep running all tests with Mongo for extra redudancy

CasperWA · 2020-10-30T15:08:45Z

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

Great. And sounds completely reasonable 👍

CasperWA

Looks alright, I'm just most concerned with the comment I've made below.
I don't see why these new SHOULD/MUST levels should ever dictate changing the inter-requirements of the fields?

CasperWA · 2020-11-02T18:50:50Z

optimade/models/structures.py

-            raise ValueError(
-                "nperiodic_dimensions is REQUIRED, since dimension_types was provided."
-            )


Even though some values can be null, I would deem it understood that if some specific fields are provided in a resource, then that will automatically require related fields to also be present, i.e., inherently raising them from SHOULD to MUST?

This could be a bit of a sticking point, yeah. I guess the distinction is between null in the response and null in the implementation, but the specification doesn't really make that distinction? If e.g. you hadn't precomputed nperiodic_dimensions because you could say, well, all my crystals are 3D so this is redundant, would that not be within the bounds of a "SHOULD" interpretation?

Going with the current approach makes testing the validation significantly simpler, but I'm aware that isn't the best argument... I don't think I was in favour of the near-blanket "SHOULD"-ening of the entire specification, but I think we have to interpret it quite loosely to be consistent with it... We could discuss this tomorrow if we have time

Response vs. implementation. That's of course the two central use cases for these models, indeed.

Hmm, well there are still MUST level requirements of inter-field dependencies. This is what I am now seeing being removed. Which to me seems like we're moving away from the specification.

While this is much more easily obvious for the response use case, it should be equally as valid for the implementation, since it by virtue of the specification shouldn't be allowed to serve a resource that doesn't comply with the requirements of the specification.

Hence, there should be two checks for some fields, as there was originally for the highlighted validation, namely both a check of whether the current value is None and/or if an inter-field requirement is respected.

I understand the SHOULD level of all fields and their allowance to be None/null to exist only to the limit of it not violating these inter-field dependencies/requirements or similar. I.e., if you NEED a particular field to be None, then it's up to you as the creator of this resource to ensure that all fields relying on this field on a MUST level are also indeed None. Since this is the only way all requirements can be fulfilled.
If you wish to save something "wrong" in your private database that's fine, but then it doesn't comply with the OPTIMADE specification and hence it should fail the validation.

ml-evs · 2020-11-04T21:51:27Z

Have added the root validator directly into this PR to see how it interacts with the model changes, but it could also be pulled out. As we're only raising warnings (for now) on null, the if v is None bits of the field validators should still stay in.

One important think to check is how well CORRELATED_STRUCTURE_FIELDS matches with the specification, and if there are any better ways we could include it (instead of a module level constant).

CasperWA

Let's get this in now and optimize afterwards.
This is blocking some things in my implementation as well.

Thanks @ml-evs ! 💪

ml-evs · 2020-11-16T11:48:30Z

Let's get this in now and optimize afterwards.
This is blocking some things in my implementation as well.

Thanks @ml-evs ! muscle

Could I be slightly awkward and request that we merge #591 (tiny changes but removes a false negative for validation), release a very minor 0.12.4, and then merge this, ready to release 0.13 after the other bigger PRs are finished?

- Allowed None values for several fields without updating validators - Relaxed pydantic validators for linked properties with allowed null values - Restructured structure validators to explicitly return early on None

- Test fewer structures with missing data - Only run server tests under mongo - Use single 'worst case' document for combinatorial null validation - Added exception for validation of assemblies

…ciated tests

ml-evs added schema Concerns the schema models priority/medium Issue or PR with a consensus of medium priority models For issues related to the pydantic models directly labels Oct 19, 2020

This was referenced Oct 26, 2020

Fix Species validators and error messages #561

Merged

Incomplete output of optimade-validator #568

Closed

ml-evs force-pushed the ml-evs/relax_models branch 4 times, most recently from f445e2e to c3c29de Compare October 28, 2020 17:45

ml-evs force-pushed the ml-evs/relax_models branch from c3c29de to fb40824 Compare October 28, 2020 21:05

ml-evs commented Oct 28, 2020

View reviewed changes

optimade/models/structures.py Show resolved Hide resolved

ml-evs force-pushed the ml-evs/relax_models branch from fb40824 to 76e812b Compare October 28, 2020 21:24

ml-evs commented Oct 28, 2020

View reviewed changes

tests/models/test_data/test_bad_structures.json Show resolved Hide resolved

ml-evs commented Oct 28, 2020

View reviewed changes

tests/models/test_data/test_bad_structures.json Outdated Show resolved Hide resolved

ml-evs force-pushed the ml-evs/relax_models branch 3 times, most recently from 69ad52b to 843cbfe Compare October 28, 2020 23:28

ml-evs marked this pull request as ready for review October 28, 2020 23:30

ml-evs force-pushed the ml-evs/relax_models branch from 843cbfe to 6999db9 Compare October 28, 2020 23:35

ml-evs requested a review from CasperWA October 28, 2020 23:48

ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 0f62cb5 to 19591f3 Compare October 30, 2020 13:27

ml-evs commented Oct 30, 2020

View reviewed changes

optimade/models/structures.py Outdated Show resolved Hide resolved

ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 9cb0527 to cfac654 Compare October 30, 2020 15:07

ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 0545299 to c3586db Compare October 31, 2020 22:31

CasperWA reviewed Nov 2, 2020

View reviewed changes

ml-evs mentioned this pull request Nov 4, 2020

Should atomic properties be REQUIRED? Materials-Consortia/OPTIMADE#285

Open

ml-evs force-pushed the ml-evs/relax_models branch from fc0113c to 1ec531d Compare November 4, 2020 21:49

ml-evs force-pushed the ml-evs/relax_models branch from 1ec531d to 220e033 Compare November 4, 2020 21:54

CasperWA previously approved these changes Nov 16, 2020

View reviewed changes

CasperWA mentioned this pull request Nov 16, 2020

Use aliases when handling response_fields aiidateam/aiida-optimade#159

Merged

ml-evs dismissed CasperWA’s stale review via 4541989 November 16, 2020 13:19

ml-evs force-pushed the ml-evs/relax_models branch from 220e033 to 4541989 Compare November 16, 2020 13:19

ml-evs requested a review from CasperWA November 16, 2020 13:23

ml-evs added 4 commits November 16, 2020 13:54

Handle OPTIMADE SHOULD fields as required optional fields:

3dae960

- Allowed None values for several fields without updating validators - Relaxed pydantic validators for linked properties with allowed null values - Restructured structure validators to explicitly return early on None

Add tests for deserialization of combinations of null data in structures

428d9c6

- Test fewer structures with missing data - Only run server tests under mongo - Use single 'worst case' document for combinatorial null validation - Added exception for validation of assemblies

Add OpenAPI property 'nullable' to the SHOULD fields

f98e229

Added root validator that warns on missing expected fields, plus asso…

5fe03e5

…ciated tests

ml-evs force-pushed the ml-evs/relax_models branch from 4541989 to 5fe03e5 Compare November 16, 2020 13:54

CasperWA approved these changes Nov 16, 2020

View reviewed changes

ml-evs merged commit 4f04832 into master Nov 16, 2020

ml-evs deleted the ml-evs/relax_models branch November 16, 2020 15:18

This was referenced Dec 14, 2020

Review the required properties of StructureResourceAttributes in openapi.json #198

Closed

Our models and validator are too strict #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax models to allow for all SHOULD fields to be None #560

Relax models to allow for all SHOULD fields to be None #560

ml-evs commented Oct 19, 2020 •

edited

Loading

codecov bot commented Oct 19, 2020 •

edited

Loading

ml-evs commented Oct 28, 2020 •

edited

Loading

CasperWA commented Oct 30, 2020

ml-evs commented Oct 30, 2020

ml-evs commented Oct 30, 2020

CasperWA commented Oct 30, 2020

CasperWA left a comment

CasperWA Nov 2, 2020

ml-evs Nov 2, 2020

CasperWA Nov 2, 2020

ml-evs commented Nov 4, 2020

CasperWA left a comment

ml-evs commented Nov 16, 2020

Relax models to allow for all SHOULD fields to be None #560

Relax models to allow for all SHOULD fields to be None #560

Conversation

ml-evs commented Oct 19, 2020 • edited Loading

codecov bot commented Oct 19, 2020 • edited Loading

Codecov Report

ml-evs commented Oct 28, 2020 • edited Loading

CasperWA commented Oct 30, 2020

ml-evs commented Oct 30, 2020

ml-evs commented Oct 30, 2020

CasperWA commented Oct 30, 2020

CasperWA left a comment

Choose a reason for hiding this comment

CasperWA Nov 2, 2020

Choose a reason for hiding this comment

ml-evs Nov 2, 2020

Choose a reason for hiding this comment

CasperWA Nov 2, 2020

Choose a reason for hiding this comment

ml-evs commented Nov 4, 2020

CasperWA left a comment

Choose a reason for hiding this comment

ml-evs commented Nov 16, 2020

ml-evs commented Oct 19, 2020 •

edited

Loading

codecov bot commented Oct 19, 2020 •

edited

Loading

ml-evs commented Oct 28, 2020 •

edited

Loading