Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggs and constant_keyword #94637

Closed
nik9000 opened this issue Mar 22, 2023 · 13 comments · Fixed by #98295
Closed

Aggs and constant_keyword #94637

nik9000 opened this issue Mar 22, 2023 · 13 comments · Fixed by #98295
Assignees
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nik9000
Copy link
Member

nik9000 commented Mar 22, 2023

Description

It'd be lovely is a filters or filter agg that touches a constant_keyword would properly rewrite itself to match_all and match_none and "go fast", at least when it's at the root of the aggs tree. Theoretically it should "just work" but it doesn't seem to. Also! The terms agg should be able to act on there being only a single possible value. Maybe it can even act on the match_all/match_none nature of the constant_keyword. I'm not sure how, but it does feel like there are optimizations in there. Particularly when terms is the top level agg.

@nik9000 nik9000 added >enhancement needs:triage Requires assignment of a team area label :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Mar 22, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Mar 22, 2023

We should also have a look at filters when the constant_keyword is deep in a bool query inside the filter. It should get rewritten and then merged into a simpler query.

@AkisPanagiotopoulos
Copy link

Hello @nik9000 , can i take this issue as a first time contributor?

@wchaparro wchaparro added the good first issue low hanging fruit label May 4, 2023
@kkrik-es kkrik-es self-assigned this May 4, 2023
@utkarsh-shrivastav77
Copy link

Hi is this issue still open

@afn7081
Copy link

afn7081 commented Jun 11, 2023

Can i take this up if its still open?

@kkrik-es
Copy link
Contributor

This issue is fairly involved and has some weird interactions with other ongoing efforts in aggregations. We prefer to keep this assigned as is, hope to make progress in the coming weeks.

@atript8
Copy link

atript8 commented Jun 18, 2023

@kkrik-es please remove the good first issue tag then.

@kkrik-es kkrik-es removed the good first issue low hanging fruit label Jun 18, 2023
@kkrik-es
Copy link
Contributor

kkrik-es commented Aug 8, 2023

A single setup with one table containing a const_keywork shows that the query gets actually rewritten to match_all or match_none, e.g. (YAML)

  - do:
      indices.create:
        index:  bicycles
        body:
          settings:
            number_of_shards: 1
          mappings:
            properties:
              cycle_type:
                type: constant_keyword
                value: bicycle
              id:
                type: integer
              status:
                type: keyword
              price:
                type: integer

  - do:
      bulk:
        refresh: true
        body:
          - index:
              _index: bicycles
              _id: 1
          - id: 1
            status: up
            price: 100
          - index:
              _index: bicycles
              _id: 2
          - id: 2
            status: up
            price: 150
          - index:
              _index: bicycles
              _id: 3
          - id: 3
            status: down
            price: 200

  - do:
      search:
        index: bicycles
        body:
          size: 0
          aggs:
            agg:
              histogram:
                field: price
                interval: 50
              aggs:
                0-bucket:
                  filter:
                    bool:
                      filter:
                        - bool:
                            filter:
                              - bool:
                                  filter:
                                    - bool:
                                        should:
                                          - term:
                                              id: 1
                                        minimum_should_match: 1
                                    - bool:
                                        should:
                                          - term:
                                              # Ignored.  Setting to 'bicycles' turns the filter to MatchNone.
                                              cycle_type: bicycle   
                                        minimum_should_match: 1
                              - bool:
                                  should:
                                    - term:
                                        status: up
                                  minimum_should_match: 1

I wonder if the synthetic source or the use of timestamp filtering over many indexes contribute to the root cause.

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Aug 8, 2023

I think that behaviour might be a result of this change #96161.

It is not clear to me if you are trying to test this avoiding the rewrite. In case you would like to skip rewriting the query to test the behaviour without rewriting, you can act on the pre-filter shard size.

See SearchRequest::setPreFilterShardSize.

Basically by setting a pre-filter shard size that is large enough you will skip rewriting of the query.

@kkrik-es
Copy link
Contributor

kkrik-es commented Aug 8, 2023

After further discussion with the team, this is actually a legit repro: the filter is converted to a MatchNone but we still process every doc. We need to investigate if we can either move the filter into the query, or just skip doc processing in there's no possible match.

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Aug 8, 2023
When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes elastic#94637
elasticsearchmachine pushed a commit that referenced this issue Aug 10, 2023
When a query of a filter gets rewritten to `MatchNoDocsQuery`, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to `FilterByFilterAggregator` and `FiltersAggregator`, as
well as to `TermsAggregator` when it uses
`StringTermsAggregatorFromFilters` internally; the latter is an adapter
aggregator wrapping `FilterByFilterAggregator`.

Fixes #94637
@martijnvg
Copy link
Member

I think the terms agg part hasn't been implemented yet of this issue?

@martijnvg martijnvg reopened this Aug 11, 2023
@kkrik-es
Copy link
Contributor

kkrik-es commented Aug 11, 2023

I checked simple term aggregations on const_keyword, the TermsAggregator uses StringTermsAggregatorFromFilters that is adapter agg wrapping FilterByFilterAggregator. In this case, the submitted fix should also apply - though I'm not sure how this can get accelerated since the query gets rewritten to MatchAllDocsQuery.

Do we have a good use case here, where the terms aggregation can be further improved?

@kkrik-es
Copy link
Contributor

Closing for now, we can reopen if we have good use cases to cover.

kkrik-es added a commit that referenced this issue Aug 23, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes #94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (#98586)"

This reverts commit 56abb86.
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Aug 23, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes elastic#94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (elastic#98586)"

This reverts commit 56abb86.
elasticsearchmachine pushed a commit that referenced this issue Aug 23, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes #94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (#98586)"

This reverts commit 56abb86.
kkrik-es added a commit that referenced this issue Aug 25, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes #94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (#98586)"

This reverts commit 56abb86.

* Revert "Rollback of #98586 (#98805)"

This reverts commit e370194.

* Skip updating source when missing synthetic mode

* Update docs/changelog/98808.yaml

* Skip matching assert in MapperService too

* Refine the assert

* Extend versions before 8.6, when TS had no synthetic source

* Add source field mapping for non-synthetic TSDB

* Delete 98586.yaml

Duplicate changelog

* Add comment to TSDB_NO_SYNTHETIC mapping

* Spotless fix

* Add yaml test

* Fix version skip in yaml test
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Aug 25, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes elastic#94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (elastic#98586)"

This reverts commit 56abb86.

* Revert "Rollback of elastic#98586 (elastic#98805)"

This reverts commit e370194.

* Skip updating source when missing synthetic mode

* Update docs/changelog/98808.yaml

* Skip matching assert in MapperService too

* Refine the assert

* Extend versions before 8.6, when TS had no synthetic source

* Add source field mapping for non-synthetic TSDB

* Delete 98586.yaml

Duplicate changelog

* Add comment to TSDB_NO_SYNTHETIC mapping

* Spotless fix

* Add yaml test

* Fix version skip in yaml test
elasticsearchmachine pushed a commit that referenced this issue Aug 25, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes #94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (#98586)"

This reverts commit 56abb86.

* Revert "Rollback of #98586 (#98805)"

This reverts commit e370194.

* Skip updating source when missing synthetic mode

* Update docs/changelog/98808.yaml

* Skip matching assert in MapperService too

* Refine the assert

* Extend versions before 8.6, when TS had no synthetic source

* Add source field mapping for non-synthetic TSDB

* Delete 98586.yaml

Duplicate changelog

* Add comment to TSDB_NO_SYNTHETIC mapping

* Spotless fix

* Add yaml test

* Fix version skip in yaml test
bpintea pushed a commit to bpintea/elasticsearch that referenced this issue Aug 25, 2023
* Skip segment for MatchNoDocsQuery filters.

When a query of a filter gets rewritten to MatchNoDocsQuery, segments
will not produce any results. We can therefore skip processing such
segments.

This applies to FilterByFilterAggregator and FiltersAggregator, as well
as to TermsAggregator when it uses StringTermsAggregatorFromFilters
internally; the latter is an adapter aggregator to
FilterByFilterAggregator.

Fixes elastic#94637

* Update docs/changelog/98295.yaml

* Check all filters for `MatchNoDocsQuery`.

* Skip optimization when 'other' bucket is requested.

* Revert "Set default index mode for TimeSeries to `null` (elastic#98586)"

This reverts commit 56abb86.

* Revert "Rollback of elastic#98586 (elastic#98805)"

This reverts commit e370194.

* Skip updating source when missing synthetic mode

* Update docs/changelog/98808.yaml

* Skip matching assert in MapperService too

* Refine the assert

* Extend versions before 8.6, when TS had no synthetic source

* Add source field mapping for non-synthetic TSDB

* Delete 98586.yaml

Duplicate changelog

* Add comment to TSDB_NO_SYNTHETIC mapping

* Spotless fix

* Add yaml test

* Fix version skip in yaml test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants