Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed to parse field [...] of type [date] when painless script updates unrelated field #108977

Open
pmishev opened this issue May 23, 2024 · 5 comments
Labels
>bug :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team

Comments

@pmishev
Copy link

pmishev commented May 23, 2024

Elasticsearch Version

7.17.12

Installed Plugins

No response

Java Version

bundled

OS Version

Linux aa933ae49f18 5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When trying to delete a field from an index, which contains an epoch_second date field with a decimal number, an unexpected error occurs on the date field and the deletion doesn't happen.

Seems like in a certain scenario, ES does not recognise decimal numbers in the scientific notation it itself saves the data in.

Steps to Reproduce

PUT /test_ts
{
  "mappings": {
    "properties": {
      "update_datetime" : {
        "type" : "date",
        "format" : "epoch_second"
      },
      "is_private" : {
        "type" : "boolean"
      }
    }
  }
}
POST test_ts/_doc/1
{
  "update_datetime": 1716462600.37034
}
POST test_ts/_update_by_query
{
  "script": {
    "source": "ctx._source.remove('is_private');",
    "lang": "painless"
  }
}

Results in:

failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'

Strangely reindexing works fine with no errors:

POST _reindex
{
  "source": {
    "index": "test_ts"
  },
  "dest": {
    "index": "test_ts_1"
  }
}

Logs (if relevant)

No response

@pmishev pmishev added >bug needs:triage Requires assignment of a team area label labels May 23, 2024
@henningandersen henningandersen added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed needs:triage Requires assignment of a team area label labels May 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 31, 2024
@rjernst
Copy link
Member

rjernst commented May 31, 2024

Can you please add error_trace=true to your test_ts/_update_by_query request? ie:

POST test_ts/_update_by_query?error_trace=true

That should give more details about where the error is actually occurring.

@pmishev
Copy link
Author

pmishev commented Jun 4, 2024

{
  "took" : 5,
  "timed_out" : false,
  "total" : 1,
  "updated" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [
    {
      "index" : "test_ts",
      "type" : "_doc",
      "id" : "1",
      "cause" : {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "failed to parse date field [1.71646260037034E9] with format [epoch_second]",
          "caused_by" : {
            "type" : "date_time_parse_exception",
            "reason" : "Failed to parse with all enclosed parsers"
          }
        }
      },
      "status" : 400
    }
  ]
}

@rjernst
Copy link
Member

rjernst commented Jun 18, 2024

Thanks for the info, I see what is happening.

Your update_datetime is passed as a JSON number. When this is parsed in Java (as it is when reindexing), it is placed in a double type. When that double is serialized back out, it uses scientific notation. Yet the epoch_second date format can't handle scientific notation.

While understandably confusing, I think fixing this would be difficult. When reindexing we don't know about the mapped types when parsing the source, it's just a json object. It might be possible to rework reindexing to use the original source bytes, but not without a bit of rework.

One workaround that should work is to use a string. So when indexing your original document, try this:

POST test_ts/_doc/1
{
  "update_datetime": "1716462600.37034"
}

That should retain the orignal formatting when parsed as JSON, and then serialized again as a string to be reindexed.

@pmishev
Copy link
Author

pmishev commented Jun 24, 2024

Thanks for the workaround. So far seems to work after fixing my existing data:

POST test_ts/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source.update_datetime instanceof Double) {
        double updateDatetime = ctx._source.update_datetime;
        // Convert double to String
        String updateDateTimeString = updateDatetime + "";
        // Remove the E9 suffix
        updateDateTimeString = updateDateTimeString.splitOnToken('E')[0];
        // Remove decimal point
        String[] splitString = updateDateTimeString.splitOnToken('.');
        updateDateTimeString = splitString[0] + splitString[1];
        // Insert the decimal point in the correct place
        String part1 = updateDateTimeString.substring(0, 10);
        String part2 = updateDateTimeString.substring(10);
        ctx._source.update_datetime = part1 + "." + part2;
      }
    """
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

4 participants