Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially wrong behaviour on date_range query with overlapping lte values #111484

Closed
randomknowledge opened this issue Jul 31, 2024 · 6 comments
Assignees
Labels
>bug :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@randomknowledge
Copy link

Elasticsearch Version

8.14.3

Installed Plugins

No response

Java Version

bundled

OS Version

elastic cloud or Linux 032a394ea0c9 6.6.32-linuxkit #1 SMP Thu Jun 13 14:13:01 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux.

Problem Description

In a range query I am not getting the results I would expect. Please see below for mapping, docs and query details.

Please note: For the sake of simplicity I will leave out data that is irrelevant here.

Steps to Reproduce

I am doing a range query on a field opening_dates that has this mapping:

{
    "opening_dates": {
        "type": "date_range",
        "format": "yyyy-MM-dd",
    }
}

I have exactly one document in the index with that data:

{
    "opening_dates": [
    {
      "gte": "1900-03-01",
      "lte": "1900-10-31"
    }
  ]
}

This is the _search query I am doing:

{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "opening_dates": {
              "gte": "1900-03-01",
              "lte": "1900-10-31",
              "relation": "contains",
              "format": "yyyy-MM-dd"
            }
          }
        }
      ]
    }
  }
}

Expected result: return the indexed document
Actual result: return no document

It would find the document if I change lte to lt in the query and this behavior does not apply for the gte value.
Am I missing something or is this a bug?

Logs (if relevant)

No response

@randomknowledge randomknowledge added >bug needs:triage Requires assignment of a team area label labels Jul 31, 2024
@saikatsarkar056 saikatsarkar056 added the :Search/Search Search-related issues that do not fall into other categories label Jul 31, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jul 31, 2024
@benwtrent benwtrent added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Aug 1, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Aug 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@randomknowledge
Copy link
Author

/ping

@john-wagster john-wagster self-assigned this Aug 19, 2024
@john-wagster
Copy link
Contributor

john-wagster commented Aug 19, 2024

I took a quick look at this. The lte part of the search query is being rounded as part of parsing to capture the appropriate range in some scenarios (or at least I assume that's why it's being rounded) to 1900-10-31 23:59:59.999999, but particularly in this case where only a date is provided the time information that gets rounded is causing equality to fail. Explicitly formatting the date in the query itself (independent of how it's indexed) appears to mitigate the problem forcing the lte check against 1900-10-31 00:00:00.000000, which is probably the correct behavior here (date math is hard):

GET test_index1/_search
{
  "explain": true, 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "opening_dates": {
              "gte": "1900-03-01 00:00:00.000000",
              "lte": "1900-10-31 00:00:00.000000",
              "relation": "contains",
              "format": "yyyy-MM-dd HH:mm:ss.SSSSSS"
            }
          }
        }
      ]
    }
  }
}

I'm not sure why this was being rounded this to begin with (but I'm sure there's a good reason). Removing the rounding naively fixes the bug. I'll see if I can determine the use-cases for this rounding and make the appropriate fix. I'll update when I have a fix available.

@john-wagster
Copy link
Contributor

john-wagster commented Oct 1, 2024

@randomknowledge this turned out to be an issue with how date range information is indexed. After several discussions around this here is where we wound up.

The fix for this is now in ES 9.x (main) and is backported to 8.x currently starting with 8.15.3. The query you called out will now work as expected given the indexed data in this issue.

The fix will immediately change how data is indexed. So an upgrade to a subsequent version of ES that has the fix will immediately start indexing data appropriately. However, if the index already contains data that data will NOT be re-indexed as that could incur large unexpected slow downs. Instead anyone impacted by the bug will need to re-index data appropriately for any data indexed prior to 8.15.3 to satisfy the query called out originally in this issue.

Let me know if there are any questions or concerns about this. And otherwise I'll close the issue out shortly here and we can always re-open or create new issues as needed.

@randomknowledge
Copy link
Author

randomknowledge commented Oct 1, 2024

Thanks a lot @john-wagster,
I will test this asap (most probably next week).

EDIT: I will test this as soon as 8.15.3 is available on elastic cloud, but I guess you do not need my feedback anyways, so again thanks a lot and feel free to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants