Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the Facets proposal with more details and new requirements #320

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jahow
Copy link

@jahow jahow commented Oct 31, 2023

This PR builds on top of the work by @pvgenuchten to offer a more detailed proposal, including two new endpoints which should let API consumers build a robust user experience.

The YAML schemas have not been updated yet, and there are several questions which still have to be answered:

  • Should we stick to the "aggregation" word, or use "facet" everywhere instead?
  • Should we let users choose explicitly which aggregations will appear in the overview alongside the /items response? (instead of just letting them say yes/no)
  • Do we agree that users should have very few control over what the aggregations will actually give them, i.e. not being able to change the property, sort criteria, etc.?
  • Is it reasonable to define a new endpoint (/aggregation/{aggregationId}) which will carry over many of the query params used in /items so as to end up matching the same records as a search on /items would?

Looking forward to discuss all these topics further before the end of the sprint! Please leave comments/suggestions as needed.

@pvgenuchten
Copy link
Contributor

Quick reply before checking;

  • I like the idea of renaming to facets, it seems an accepted term
  • Why are you splitting to a separate method, it means 2 queries are required in stead of 1. Although it also makes sense to be more flexible and certain backends require an additional query anyway.

@jahow
Copy link
Author

jahow commented Nov 1, 2023

The second "drill-down" query mirrors the typical behavior of a user when using facets:

  1. Look at an overview of all facets next to the search results, with maybe 20-30 buckets max per facet: this is done with the /items query
  2. Click on "show more" on a specific facet to show more buckets, and filter values with a text field: this is done with the /aggregation/{aggregationId} query

Alternatively a user could decide to zoom in on a histogram, which would also trigger the drill-down query.

@cnreediii
Copy link

WRT the term "facets". I did a bit of research. First, this terms, after checking numerous OGC Standards, is not used. This is would be the first use in an OGC Standard. Therefore, a very clear definition of what is meant by ther term "facet" is required, Second, I also researched the use of the term on the broader Catalog context. While "facet search" is in Wikipedia (https://en.wikipedia.org/wiki/Faceted_search). If one digs more deeply, in Library Science, there are pre-facet and post-facet searches. In another context, facets are shown as panels of aggregated results on a web page (e.g. HCL Software). Finally, the use of the term facet in GIS goes back to 1965, although in a completely different context - a unit of storage

As such, if the term facet/facets is to be used in the API - Records standard, then the term need to be very clearly defined and formally agreed to.

Oh, and to just make life interesting: Forecasting a Continuum of Environmental Threats (FACETs)

@jahow
Copy link
Author

jahow commented Nov 2, 2023

Thanks for the analysis @cnreediii. I feel like the term "aggregation" is closer to what is really happening, and would also better apply to the Features API (which is our intention eventually) where the concept of "faceted search" would sound even more unusual.

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Nov 9, 2023

Hi team, great work here, i'm wondering, why are we using a dictionary for the facets, with the facet property as key, wouldn't it be more logical to have an array of facets (similar to a array of features)?

{"facets": [{ 
   "id": "keywords",
   "type": "",
   "buckets":[]
}]}

it could have been me suggesting this at one point, but now while implementing, i have my doubts

@jahow
Copy link
Author

jahow commented Nov 10, 2023

Hi team, great work here, i'm wondering, why are we using a dictionary for the facets, with the facet property as key, wouldn't it be more logical to have an array of facets (similar to a array of features)?

{"facets": [{ 
   "id": "keywords",
   "type": "",
   "buckets":[]
}]}

it could have been me suggesting this at one point, but now while implementing, i have my doubts

No I think I came up with the dict structure and indeed it is useless if we end up repeating the id in the facet object; I also liked having an id field for each facet, to make it clear that this is the id that the user can then use in other queries; conversely, using the facet id as the dictionary key isn't all that clear IMO

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Nov 11, 2023

another thought here; which properties to add as facet should be a configuration, not something defined in the api schema, this allows for adjustments of the configured facets, without redefining the api definition

alternatively it can be argued that by hardcoding the facets in the api schema, it will be clear by users beforehand which facets will be made available in the search result


#### Examples

The facet `hasDownloads` will return the amount of records which have at least 1 distribution of a download type (CSV, Excel...):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one facet here "Available by"

  • download service
  • view service

@jahow jahow changed the title Expand the Aggregations proposal with more details and new requirements Expand the Facets proposal with more details and new requirements Jan 11, 2024
* `hasMaps` : 11495 records
The facet `is available by` will provide 2 sub-queries :
- `Download service` returns the amount of records which have at least 1 distribution of a download type (CSV, Excel...):
- `Visualization service` returns the amount of records which have at least 1 distribution of a type (WMS, WMTS...):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're in ogc-api realm, would be good to add ogc-api type of examples; eg OGCAPI:Maps, OGCAPI:tiles

@tomkralidis
Copy link
Contributor

tomkralidis commented Feb 27, 2024

Discussion/refresh during 2024 Joint OGC-ASF-OSGeo Code Sprint:

Two Requirements Classes: Simple and Advanced

Simple

  • advertisement options:
    1. add /collections/{collectionId}/facetables
    2. extend /collections/{collectionId}/queryables with a facet object definition for each facet-supported queryable
  • .../items query default behaviour (to enable faceted-search or not)
    • user specified
      • some requests may not need faceted results
    • facets API parameter. Options:
      • all: all server configured facets
      • facet_name:num_buckets

Advanced

  • POST payload
  • how to resolve in a CQL JSON? Additional facets property?
  • does this break compat with OGC API - Features - Part 3
  • perhaps /collections/{collectionId}/facets for these types of queries

@tomkralidis
Copy link
Contributor

Discussion with @cportele:

  • advanced facets
  • use a different endpoint
  • use the HTTP QUERY method
    • lack of tooling

Decision:

  • two endpoints
    Simple
  • add to /collections/{collectionId}/queryables (part 5)
    • .../items
      • facets=true or facets.propertyname
    • add to queryables
    • extend queryable property
      • x-ogc-facet
        Advanced
  • /collections/{collectionId}/facets/{facetId} - drill down (advanced)

Next steps are to close this PR and start the extension draft.

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Feb 27, 2024

About the quaryable configuration; can it have 3 options:

  • off (never returned, even if requested via facets parameter)
  • default (always returned, unless not specified in facets)
  • optional (only returned if requested via facets parameter)

facets parameter can then have 3 values:

  • "" or no facets parameter (returns default facets)
  • off (no facets returned)
  • {facet1},{facet2} (returns list of requested facets only)

I most appreciate the behavior of some facets being returned with every search result, without the need to configure a facets parameter

@pvgenuchten
Copy link
Contributor

want to refer to opengeospatial/ogcapi-features#681 facet functionality could offer the requested functionality

@cnreediii
Copy link

Just to be sure I understand. Facets, also called smart filters, are a type of search filter that customers use to narrow down their search results quickly. Facets can be static (set up for every query) or dynamic (they can change depending on the context of the search query). Am I correct in my understanding? There is a fairly wide scope of definitions by different vendors. Thanks

@tomkralidis
Copy link
Contributor

FYI as decided in #320 (comment), PR #372 adds OGC API - Records - Part 2: Facets. Once this PR is merged we can move the contents of this PR into the appropriate areas of the extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants