Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax models to allow for all SHOULD fields to be None #560

Merged
merged 4 commits into from
Nov 16, 2020

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Oct 19, 2020

This PR tracks the relaxation of our models to allow for None values for almost any field. The current approach for this is to make non-should fields have the type pydantic-type Optional but a default value of ..., which is interpreted as a required OpenAPI field that is allowed to be None.

This PR then also adds the OpenAPI nullable property to each of these fields, indicating that they can be null.

To-do:

  • Change Structure model field types to Optional[...] with default values of ...
  • Update model validators to allow them to be robust to missing data
  • Add combinatorial tests for missing data in structures
  • Only run server tests under Mongo in CI (rather than all tests)
  • Add nullable to schema

@ml-evs ml-evs added schema Concerns the schema models priority/medium Issue or PR with a consensus of medium priority models For issues related to the pydantic models directly labels Oct 19, 2020
@codecov
Copy link

codecov bot commented Oct 19, 2020

Codecov Report

Merging #560 (5fe03e5) into master (d207200) will increase coverage by 0.08%.
The diff coverage is 96.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #560      +/-   ##
==========================================
+ Coverage   92.11%   92.20%   +0.08%     
==========================================
  Files          61       61              
  Lines        3210     3246      +36     
==========================================
+ Hits         2957     2993      +36     
  Misses        253      253              
Flag Coverage Δ
project 92.20% <96.70%> (+0.08%) ⬆️
validator 65.89% <78.02%> (+0.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
optimade/models/structures.py 96.35% <96.62%> (+0.60%) ⬆️
optimade/models/entries.py 97.50% <100.00%> (ø)
optimade/server/warnings.py 86.66% <100.00%> (+0.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d207200...5fe03e5. Read the comment docs.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 28, 2020

Following discussions in the meeting today, it was agreed that we would go down the "required optional" key route for OPTIMADE should fields, so the changes in this PR should (and do) preserve the OpenAPI schema as it is. This has the side effect that some valid OPTIMADE responses will not be able to validated against the OpenAPI schema (which is the case at the moment anyway). This is mostly due to a mismatch between our definitions and OpenAPI, which we can consider tidying in the specification text.

We could consider patching the OpenAPI schema for these SHOULD field models with the OpenAPI property nullable, in reading https://pydantic-docs.helpmanual.io/usage/models/#required-optional-fields, it almost seems like this should be the case already...

@ml-evs ml-evs force-pushed the ml-evs/relax_models branch 3 times, most recently from 69ad52b to 843cbfe Compare October 28, 2020 23:28
@ml-evs ml-evs marked this pull request as ready for review October 28, 2020 23:30
@ml-evs ml-evs requested a review from CasperWA October 28, 2020 23:48
@ml-evs ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 0f62cb5 to 19591f3 Compare October 30, 2020 13:27
optimade/models/structures.py Outdated Show resolved Hide resolved
@CasperWA
Copy link
Member

What's the motivation behind only testing the server for the real MongoDB in the CI?

@ml-evs
Copy link
Member Author

ml-evs commented Oct 30, 2020

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 30, 2020

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

Going to revisit this, I think coverage might actually improve if testing was performed on only the awkward edge-case structure, which will speed things up massively. Then we can decide whether we want to keep running all tests with Mongo for extra redudancy

@ml-evs ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 9cb0527 to cfac654 Compare October 30, 2020 15:07
@CasperWA
Copy link
Member

What's the motivation behind only testing the server for the real MongoDB in the CI?

The server tests are the only bits that use the backend, so testing everything twice is redundant. When I first added the tests in this PR on all of our structures, it was taking 4-5 mins per backend to run. You could argue that these tests are overkill anyway (testing all possible combinations of 1-N fields being null in the response), but I think it's worth doing for at least a few structures. I reduced this to about ~1 minute by only testing a few structures, but it doesn't make sense to me to run e.g. the model tests with Mongo too.

Great. And sounds completely reasonable 👍

@ml-evs ml-evs force-pushed the ml-evs/relax_models branch 2 times, most recently from 0545299 to c3586db Compare October 31, 2020 22:31
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright, I'm just most concerned with the comment I've made below.
I don't see why these new SHOULD/MUST levels should ever dictate changing the inter-requirements of the fields?

Comment on lines -846 to -850
raise ValueError(
"nperiodic_dimensions is REQUIRED, since dimension_types was provided."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though some values can be null, I would deem it understood that if some specific fields are provided in a resource, then that will automatically require related fields to also be present, i.e., inherently raising them from SHOULD to MUST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a bit of a sticking point, yeah. I guess the distinction is between null in the response and null in the implementation, but the specification doesn't really make that distinction? If e.g. you hadn't precomputed nperiodic_dimensions because you could say, well, all my crystals are 3D so this is redundant, would that not be within the bounds of a "SHOULD" interpretation?

Going with the current approach makes testing the validation significantly simpler, but I'm aware that isn't the best argument... I don't think I was in favour of the near-blanket "SHOULD"-ening of the entire specification, but I think we have to interpret it quite loosely to be consistent with it... We could discuss this tomorrow if we have time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response vs. implementation. That's of course the two central use cases for these models, indeed.

Hmm, well there are still MUST level requirements of inter-field dependencies. This is what I am now seeing being removed. Which to me seems like we're moving away from the specification.

While this is much more easily obvious for the response use case, it should be equally as valid for the implementation, since it by virtue of the specification shouldn't be allowed to serve a resource that doesn't comply with the requirements of the specification.

Hence, there should be two checks for some fields, as there was originally for the highlighted validation, namely both a check of whether the current value is None and/or if an inter-field requirement is respected.

I understand the SHOULD level of all fields and their allowance to be None/null to exist only to the limit of it not violating these inter-field dependencies/requirements or similar. I.e., if you NEED a particular field to be None, then it's up to you as the creator of this resource to ensure that all fields relying on this field on a MUST level are also indeed None. Since this is the only way all requirements can be fulfilled.
If you wish to save something "wrong" in your private database that's fine, but then it doesn't comply with the OPTIMADE specification and hence it should fail the validation.

@ml-evs
Copy link
Member Author

ml-evs commented Nov 4, 2020

Have added the root validator directly into this PR to see how it interacts with the model changes, but it could also be pulled out. As we're only raising warnings (for now) on null, the if v is None bits of the field validators should still stay in.

One important think to check is how well CORRELATED_STRUCTURE_FIELDS matches with the specification, and if there are any better ways we could include it (instead of a module level constant).

CasperWA
CasperWA previously approved these changes Nov 16, 2020
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in now and optimize afterwards.
This is blocking some things in my implementation as well.

Thanks @ml-evs ! 💪

@ml-evs
Copy link
Member Author

ml-evs commented Nov 16, 2020

Let's get this in now and optimize afterwards.
This is blocking some things in my implementation as well.

Thanks @ml-evs ! muscle

Could I be slightly awkward and request that we merge #591 (tiny changes but removes a false negative for validation), release a very minor 0.12.4, and then merge this, ready to release 0.13 after the other bigger PRs are finished?

- Allowed None values for several fields without updating validators
- Relaxed pydantic validators for linked properties with allowed null values
- Restructured structure validators to explicitly return early on None
- Test fewer structures with missing data
- Only run server tests under mongo
- Use single 'worst case' document for combinatorial null validation
- Added exception for validation of assemblies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models For issues related to the pydantic models directly priority/medium Issue or PR with a consensus of medium priority schema Concerns the schema models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants