Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting on unknown properties: returning Bad Request when appropriate #424

Merged
merged 15 commits into from
Jul 30, 2020

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Jul 22, 2020

This PR enables 400: Bad Requests to be returned when requesting unknown fields. The specification provides a little bit of flexibility here, so I'll illustrate my approach with a few cases, with three classes of fields: known, unknown and "other provider" (those prefixed by _ with no known alias):

  1. User requests sorting on exclusively unknown fields -> Bad Request. e.g. sort=foo,bar.
  2. User requests sorting on exclusively "other provider" fields -> Warning, e.g. sort=_exmpl_foo,_exmpl_bar.
  3. User requests sorting on known field and unknown field -> Bad Request, e.g. sort=nelements,foo.
  4. User requests sorting on known field and "other provider" field -> sort on known field only. e.g. sort=nelements,_exmpl_foo.

Closes #276.

To-do:

  • Turn case 2/ above into a warning

@codecov
Copy link

codecov bot commented Jul 22, 2020

Codecov Report

Merging #424 into master will increase coverage by 0.44%.
The diff coverage is 98.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #424      +/-   ##
==========================================
+ Coverage   90.95%   91.40%   +0.44%     
==========================================
  Files          56       56              
  Lines        2632     2653      +21     
==========================================
+ Hits         2394     2425      +31     
+ Misses        238      228      -10     
Flag Coverage Δ
#unittests 91.40% <98.66%> (+0.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...made/server/entry_collections/entry_collections.py 98.79% <98.38%> (+9.90%) ⬆️
optimade/server/entry_collections/mongo.py 96.82% <100.00%> (+0.90%) ⬆️
optimade/server/exceptions.py 100.00% <100.00%> (ø)
optimade/server/mappers/entries.py 98.36% <100.00%> (+0.11%) ⬆️
optimade/server/warnings.py 85.71% <100.00%> (+1.09%) ⬆️
optimade/server/middleware.py 92.85% <0.00%> (+4.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2e8847...9e7d2d5. Read the comment docs.

@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from 1fe2861 to 419341c Compare July 22, 2020 16:18
@CasperWA CasperWA self-requested a review July 22, 2020 23:32
@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from c6f5fc4 to a8163ed Compare July 23, 2020 10:02
@CasperWA
Copy link
Member

@ml-evs Concerning 1411a13 you need to also add the structures parameter to the test function, since it's a fixture you need to invoke, and then use (as you do).

@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from 1411a13 to ad2aa50 Compare July 23, 2020 10:29
@CasperWA
Copy link
Member

This PR enables 400: Bad Requests to be returned when requesting unknown fields. The specification provides a little bit of flexibility here, so I'll illustrate my approach with a few cases, with three classes of fields: known, unknown and "other provider" (those prefixed by _ with no known alias):

  1. User requests sorting on exclusively unknown fields -> Bad Request. e.g. sort=foo,bar.
  2. User requests sorting on exclusively "other provider" fields -> Bad Request, e.g. sort=_exmpl_foo,_exmpl_bar.
  3. User requests sorting on known field and unknown field -> Bad Request, e.g. sort=nelements,foo.
  4. User requests sorting on known field and "other provider" field -> sort on known field only. e.g. sort=nelements,_exmpl_foo.

Just to be certain, your definitions of the three classes are something like the following:

  • Known: Fields standardized by the OPTIMADE API specification including implementation-specific fields for the current provider, i.e., this list should/can be found by visiting /info/<entry>.
  • Other provider: Implementation-specific fields for a non-current provider.
  • Unknown: Everything else not covered by the first two classes.

@CasperWA
Copy link
Member

This PR enables 400: Bad Requests to be returned when requesting unknown fields. The specification provides a little bit of flexibility here, so I'll illustrate my approach with a few cases, with three classes of fields: known, unknown and "other provider" (those prefixed by _ with no known alias):

  1. User requests sorting on exclusively unknown fields -> Bad Request. e.g. sort=foo,bar.
  2. User requests sorting on exclusively "other provider" fields -> Bad Request, e.g. sort=_exmpl_foo,_exmpl_bar.

I don't think I agree with this response, as it represents a well-formed and correctly valued request from the point of view of the specification, but will result in the return of different response codes depending on the implementation that is queried.
While that may be true of some filter queries as well, I consider this on the same level as a filter query where one uses fields that are on the SHOULD level of being filterable...
I.e., here I would maybe expect a 200 OK response with an added warning in the output that while these values are not supported by the current implementation they are (possibly) still valid in another implementation.

  1. User requests sorting on known field and unknown field -> Bad Request, e.g. sort=nelements,foo.

Yes, for any unknown (not "other provider" field) a 400 Bad Request should definitely be returned :)

  1. User requests sorting on known field and "other provider" field -> sort on known field only. e.g. sort=nelements,_exmpl_foo.

As with my suggestion for 2. above, I think this should return exactly what you have intended here, but also include a warning that _empl_foo, while (possibly) being a valid field in another OPTIMADE API implementation, it is not valid in the current implementation.

I think I should really start working on that warnings addition system ...

@ml-evs
Copy link
Member Author

ml-evs commented Jul 27, 2020

  1. User requests sorting on exclusively "other provider" fields -> Bad Request, e.g. sort=_exmpl_foo,_exmpl_bar.

I don't think I agree with this response, as it represents a well-formed and correctly valued request from the point of view of the specification, but will result in the return of different response codes depending on the implementation that is queried.

Agree this one can be interpreted many different ways. Thinking about use cases, if sorting is requested, I agree that at least a warning should be returned, but until we have that in, I would rather return a bad request. This is perhaps something that should be clarified in 1.0.1 of the spec.

As with my suggestion for 2. above, I think this should return exactly what you have intended here, but also include a warning that _empl_foo, while (possibly) being a valid field in another OPTIMADE API implementation, it is not valid in the current implementation.

I think I should really start working on that warnings addition system ...

Agreed and agreed 😁

@CasperWA
Copy link
Member

  1. User requests sorting on exclusively "other provider" fields -> Bad Request, e.g. sort=_exmpl_foo,_exmpl_bar.

I don't think I agree with this response, as it represents a well-formed and correctly valued request from the point of view of the specification, but will result in the return of different response codes depending on the implementation that is queried.

Agree this one can be interpreted many different ways. Thinking about use cases, if sorting is requested, I agree that at least a warning should be returned, but until we have that in, I would rather return a bad request. This is perhaps something that should be clarified in 1.0.1 of the spec.

Warnings will be implemented by #431 :)
And even if not, think of a case where one does aggregated searches. Should this return a bad request for any of the searched implementations? I think that's a bit silly. Instead, I think implementations should merely ignore otherwise valid and well-formed values to sort.

As with my suggestion for 2. above, I think this should return exactly what you have intended here, but also include a warning that _empl_foo, while (possibly) being a valid field in another OPTIMADE API implementation, it is not valid in the current implementation.
I think I should really start working on that warnings addition system ...

Agreed and agreed grin

See #431 ! 😄

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @ml-evs !

So I've digged quite hard into the doc strings in this one.
I think I would prefer American diction and syntax as this was also the choice for the specification. If you strongly feel for the British syntax, then that's also fine by me, we just need to do one or the other.
After colons, I think starting with a capital letter looks nicer in the docs and makes more sense as well syntactically, no?

Concerning the move to an abstract class that's fine. However, I think you've moved over a bit more than what is needed? As soon as something is pymongo-specific, it should probably be kept in the MongoCollection? Or what is your rationale here?
I see that for much of it, it could actually serve as a general method, however, since it's not compeltely severed, I would prefer either moving it back or doing a mix of the two and properly splitting up the backend-specific stuff from the -agnostic.

optimade/server/entry_collections/entry_collections.py Outdated Show resolved Hide resolved
optimade/server/entry_collections/entry_collections.py Outdated Show resolved Hide resolved
optimade/server/entry_collections/entry_collections.py Outdated Show resolved Hide resolved
optimade/server/entry_collections/entry_collections.py Outdated Show resolved Hide resolved
optimade/server/entry_collections/mongo.py Outdated Show resolved Hide resolved
optimade/server/entry_collections/mongo.py Show resolved Hide resolved
optimade/server/entry_collections/mongo.py Outdated Show resolved Hide resolved
optimade/server/mappers/entries.py Outdated Show resolved Hide resolved
optimade/server/mappers/entries.py Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Jul 28, 2020

So I've digged quite hard into the doc strings in this one.
I think I would prefer American diction and syntax as this was also the choice for the specification. If you strongly feel for the British syntax, then that's also fine by me, we just need to do one or the other.
After colons, I think starting with a capital letter looks nicer in the docs and makes more sense as well syntactically, no?

I can't say I'd given these much thought, so happy either way.

Concerning the move to an abstract class that's fine. However, I think you've moved over a bit more than what is needed? As soon as something is pymongo-specific, it should probably be kept in the MongoCollection? Or what is your rationale here?
I see that for much of it, it could actually serve as a general method, however, since it's not compeltely severed, I would prefer either moving it back or doing a mix of the two and properly splitting up the backend-specific stuff from the -agnostic.

My rationale was that the "pymongo-specific" handling is probably already a useful start point for other collections, and I wanted to avoid spawning loads of methods for every query parameter. I'll see how tasty the mixed approach gets, as I update this PR to include warnings for case 2. discussed in the OP.

@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from ad2aa50 to 454b271 Compare July 29, 2020 16:13
@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from e5ab87a to 0911b18 Compare July 29, 2020 17:55
@CasperWA
Copy link
Member

To-do:

  • Turn case 2/ above into a warning

I would more say to add a warning and use the known sort values.

And the same for case 4 above. But here, since there are no valid sort values (for this particular implementation), it just ignore the sort parameter all-together. But of course adds a warning. Maybe even for each unknown (but generally valid) sort parameter? :)

@ml-evs
Copy link
Member Author

ml-evs commented Jul 29, 2020

To-do:

  • Turn case 2/ above into a warning

I would more say to add a warning and use the known sort values.

And the same for case 4 above. But here, since there are no valid sort values (for this particular implementation), it just ignore the sort parameter all-together. But of course adds a warning. Maybe even for each unknown (but generally valid) sort parameter? :)

That's what I've implemented right?

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a regex here should be a bit better?

optimade/server/entry_collections/entry_collections.py Outdated Show resolved Hide resolved
@CasperWA
Copy link
Member

To-do:

  • Turn case 2/ above into a warning

I would more say to add a warning and use the known sort values.
And the same for case 4 above. But here, since there are no valid sort values (for this particular implementation), it just ignore the sort parameter all-together. But of course adds a warning. Maybe even for each unknown (but generally valid) sort parameter? :)

That's what I've implemented right?

Yes you have. Sorry :)

@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from 120c508 to ef8948d Compare July 30, 2020 12:24
ml-evs and others added 10 commits July 30, 2020 13:24
- Moved many generic methods from `MongoCollection` into base class
- Moved and renamed `MongoCollection._parse_params(...)` to base class
`EntryCollection.handle_query_params(...).
- Refactored query params parsing into multiple methods.
- Cache list of field names with property `.all_fields`.
- `EntryCollection` no longer inherits from typing.Collection.
- Removed dunder methods required to inherit from typing.Collection.
- Improved docstrings
Co-authored-by: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>
Co-authored-by: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>
@ml-evs ml-evs force-pushed the ml-evs/sorting_on_unknown_properties branch from ef8948d to 0210427 Compare July 30, 2020 12:24
@ml-evs
Copy link
Member Author

ml-evs commented Jul 30, 2020

@CasperWA think this is ready for a final review? I've made your original docstring suggestions, think we'll need to do another pass once #430 is merged though.

@ml-evs ml-evs requested a review from CasperWA July 30, 2020 12:24
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would approve, if it wasn't for an unnecessary use of the structures fixture in one of the tests, which should definitely be removed.
Then it's essentially good to go.

All the other comments are merely suggestions.

optimade/server/entry_collections/mongo.py Show resolved Hide resolved
optimade/server/mappers/entries.py Outdated Show resolved Hide resolved
tests/server/query_params/test_sort.py Outdated Show resolved Hide resolved
tests/server/query_params/test_sort.py Outdated Show resolved Hide resolved
tests/server/query_params/test_sort.py Outdated Show resolved Hide resolved
Co-authored-by: Casper Welzel Andersen <CasperWA@users.noreply.github.com>
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go from my side now, thanks for the hard work @ml-evs ! 👍 💪

@ml-evs ml-evs merged commit 59bba9e into master Jul 30, 2020
@ml-evs ml-evs deleted the ml-evs/sorting_on_unknown_properties branch July 30, 2020 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve handling of sorting in MongoDB backend
2 participants