Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching by ID / Collection ID #383

Closed
matthewhanson opened this issue Feb 4, 2019 · 23 comments
Closed

Searching by ID / Collection ID #383

matthewhanson opened this issue Feb 4, 2019 · 23 comments
Assignees
Labels
prio: should-have would be very good to have in the release
Milestone

Comments

@matthewhanson
Copy link
Collaborator

Because id is a top level field, it is not queryable with the query parameter. However there is also no other parameter in the search body for ID (only bbox, time, and intersects).

http://sat-utils.github.io/sat-api/#tocssearchbody

This means that there's no way to query by ID, you can only form the complete self link which requires the Collection ID, e.g., http://landsat-stac.s3.amazonaws.com/landsat-8-l1/151/018/2015-09-11/LC81510182015254LGN00.json

Which does seem a little strange that you can't search by Item ID and can only look it up if you know the collection.

I'd propose a new search field, ids, which allows a list of ids to be provided and a FeatureCollection returned of those Items. The one possible drawback here is that the Item ID's, while unique across a collection, may not be unique across a catalog. But I think that's fine, this would return a FeatureCollection so if there is more than one ID match then they would all be returned (note that in sat-api ids are required to be unique, but this could be different in other implementations).

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 4, 2019

In the actual OpenAPI files there are fragments for featureId and collectionId, but they are not referenced at all. So it seems somebody had the idea, but at some point it got lost (or I am missing something here).

@joshfix
Copy link
Contributor

joshfix commented Feb 4, 2019

Back when we still had the /items endpoint, we also had an /items/{id} endpoint. This seems to have gotten lost, maybe when we moved to the /search/stac endpoint.

'/items/{id}':

I actually did not realize this was dropped and still support it in my implementation but under /search/stac/{id}:

https://stac.boundlessgeo.io/search/stac/LC08_L1TP_058017_20180721_20180721_01_RT

@joshfix
Copy link
Contributor

joshfix commented Feb 5, 2019

update... something else I missed... apparently /search/stac has now become /stac/search

https://stac.boundlessgeo.io/stac/search/LC08_L1TP_058017_20180721_20180721_01_RT

@cholmes
Copy link
Contributor

cholmes commented Feb 12, 2019

Definitely +1 on adding an id search. Curious about the Harris implementation - cc @hgs-msmith and @hgs-truthe01 - if there was something in there for id search.

I'm happy with whatever @matthewhanson @joshfix and the Harris folk agree on, since I think those are our three main api implementations.

I think /items/{id} is part of the WFS 3 specification, so I think it'd be useful to still try to use that convention. Which sat-api does, like https://sat-api.developmentseed.org/collections/landsat-8-l1/items/LC81810982019043LGN00 But I agree the search endpoint should have the ability to respond to a set of ID's.

Happy for a PR and to get this into 0.7.0

@joshfix
Copy link
Contributor

joshfix commented Feb 13, 2019

I agree with @matthewhanson that it's fine if the same ID is used cross-collection; the response should simply be a FeatureCollection containing all items with that ID. If you didn't want that and just wanted to search a single collection, you could obviously use the appropriate collections endpoint.

I'm totally fine with either introducing an ids parameter as it would allow you to provide a list vs using path variables, eg: /stac/search/{id}.

I think I'm a bit behind on my implementation, as I still use a filter parameter instead of a query parameter. However I still support building queries against root-level fields. Is there a specific reason to disallow this and only make properties searchable via the query parameter?

https://stac.boundlessgeo.io/stac/search?filter=id=LC08_L1TP_058012_20180721_20180721_01_RT%20or%20id=LC08_L1TP_058011_20180721_20180721_01_RT

@cholmes
Copy link
Contributor

cholmes commented Feb 14, 2019

I don't think there's a reason to disallow queries against root-level fields, unless I'm misunderstanding.

@cholmes cholmes added this to the 0.7.0 milestone Feb 14, 2019
@cholmes cholmes added the prio: should-have would be very good to have in the release label Feb 14, 2019
@hgs-msmith
Copy link
Contributor

@cholmes, I agree there is no reason to disallow queries again root-level fields, including ID. Although, that is exactly what we did in our current implementation.
However, we are implementing WFS3 core, which supports access via ID:
/collections/{collectionId}/items/{itemId}

@matthewhanson
Copy link
Collaborator Author

If we understood the current spec properly, the queryFilter currently only allows querying of properties, so not root-level fields. To do root-level we either need to change the queryFilter to allow (in which case properties will be need to be specified as e.g., "properties.eo:cloud_cover), or add another filter to allow for it.

@hgs-msmith The spec does still allow access via ID like WFS, and we do so in sat-api (e.g., https://sat-api.developmentseed.org/collections/landsat-8-l1/items/LC80840242018240LGN00), but the problem I have with it is that you need to know the collection as well and I'd like to see a way to query by ID without the collection.

Above I proposed an "ids" filter, as this allows to query multiple IDs and get back a FeatureCollection.

@joshfix
Copy link
Contributor

joshfix commented Feb 16, 2019

@matthewhanson that is also how my implementation behaves. Not advocating for it, I'll change mine to whatever, but just wanted to provide an example.

http://stac.boundlessgeo.io/stac/search?filter=properties.landsat:wrs_row=11

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 25, 2019

I don't like the properties as prefix for the fields to be used. The fields should be directly queryable. The bbox/geometry is handled with a special query parameter anyway, the same could be done for ID(s) (+ maybe collection, see #384) and then you don't need to prefix the fields in properties. I assume you'd never search for type, links or assets without a specially defined query parameter.

@ghost
Copy link

ghost commented Feb 25, 2019

I would agree with @m-mohr. We should really encourage everything searchable going into properties. I think a special query field for ID is reasonable due to it being a special case.

@joshfix
Copy link
Contributor

joshfix commented Mar 1, 2019

I do feel like it is an unnecessarily imposed restriction. We will need a new id query parameter and possibly a collection query parameter, and a special parameter for any other root-level field that may be added in the future. Letting the filter operate against the entire document for any type of special or crazy edge case that may exist feels like an enabler to me and I'm not sure I see the benefit of restricting it.

However, I'm not going to push back on this one if I'm flying solo and there's no reason to drag it out. I'm fine with adding special query parameters for root level fields and restricting the search filter to properties fields.

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 1, 2019

Oh wait, I did not say we restrict it to properties. What I wanted to say is that I don't like the properties prefix for the search fields and therefore I thought of something like a two level approach:

  1. Check if field name (with a primitive value) is on the root level. If yes, use it.
  2. If nothing found on the root level, search in properties.

This way you can query for ?id= and ?eo:epsg= without the ugly ?properties.eo:epsg=

@fredliporace
Copy link
Contributor

My suggestion is that any solution we adopt should explicitly define which fields may be queried and, as such, should be indexed by a compliant implementation. For huge datasets it might be important for the developer to explicitly limit indexes on some fields/objects for indexing performance.

@m-mohr I'm not a fan of the approach of double queries, this increases the load on the searching service for something that IMHO should be defined by the standard. This may also be a problem for huge datasets with heavy searching loads.

I have no problem with @joshfix approach of prefixing. Since the query extension currently imply the properties prefix we could use something like r.id for root fields for which search is allowed.

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 1, 2019

@fredliporace Isn't that an implementation detail? I guess you can avoid double queries and just capture id and collection to only do the first query and all the others do the second query.

I'm only speaking about the user/API view here. Indeed, my description was not very good, I did not meant the implementation, but how the API fields should be mapped. You can specify this in general for the API as a rule.

@matthewhanson
Copy link
Collaborator Author

I don't think an open-ended search where the query might be in properties but might be at the top level is a good idea. That lends itself to potential conflicts for one.

I think we either need to:

  • explicitly provide the whole key (e.g., properties.eo:cloud_cover)
    or
  • only support properties with queries, then have different filters for top level fields.

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 1, 2019

I don't think conflicts are very likely, but out of the two options I'd go with the second option then.

@fredliporace
Copy link
Contributor

@fredliporace Isn't that an implementation detail? I guess you can avoid double queries and just capture id and collection to only do the first query and all the others do the second query.

I don't know, it doesn't feel right to have this kind of ambiguity, this is something I'd expect to be solved at spec level.

@fredliporace
Copy link
Contributor

I don't think an open-ended search where the query might be in properties but might be at the top level is a good idea. That lends itself to potential conflicts for one.

I think we either need to:

  • explicitly provide the whole key (e.g., properties.eo:cloud_cover)
    or
  • only support properties with queries, then have different filters for top level fields.

I'm OK with the second option, it leads to more work in maintaining the spec/documentation since it requires new filters but it would be clear what is searchable and what is not. I'm assuming that a compliant query extension implementation should allow searching on every fields in properties.

@cholmes
Copy link
Contributor

cholmes commented Apr 20, 2019

I think we're at consensus on this. Matt, would have you have time to make a PR for this? If needed I could try to pair on it to help out.

@matthewhanson
Copy link
Collaborator Author

Yes, will issue PR for this once I've got a final #432

@cholmes cholmes closed this as completed May 2, 2019
@m-mohr m-mohr reopened this May 2, 2019
@m-mohr
Copy link
Collaborator

m-mohr commented May 2, 2019

I think we still need to add search by ID and collection ID.

@m-mohr m-mohr changed the title Searching by ID Searching by ID / Collection ID May 2, 2019
@cholmes cholmes closed this as completed May 2, 2019
@cholmes cholmes reopened this May 2, 2019
@cholmes
Copy link
Contributor

cholmes commented May 2, 2019

closed with #455

@cholmes cholmes closed this as completed May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: should-have would be very good to have in the release
Projects
None yet
Development

No branches or pull requests

6 participants