Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic source is on by default for TSDS but does not appear in mappings #97429

Closed
nerophon opened this issue Jul 6, 2023 · 3 comments · Fixed by #98586 or #98808
Closed

Synthetic source is on by default for TSDS but does not appear in mappings #97429

nerophon opened this issue Jul 6, 2023 · 3 comments · Fixed by #98586 or #98808
Assignees
Labels
>bug :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nerophon
Copy link

nerophon commented Jul 6, 2023

Elasticsearch Version

8.7

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

As of 8.7 the default _source behaviour linked to index.mode: time_series was changed. Prior to 8.7, _source was explicitly stored, from this version onward it became synthetics.
This change cannot be easily visualised by users, as the corresponding setting ("_source": {"mode": "synthetic"}) doesn't appear in the resulting indices.
Moreover, if a user sets this by hand in a template, indices created don't have the setting. It is as if the setting is ignored by Elasticsearch upon index creation. This is extremely confusing for users.

In general, this behaviour is misleading, since there is no clear indication to the user that the _source isn't fully stored. Some customers also explicitly require their _source not to be tampered with, for legal and auditing reasons.

This behaviour (if intended) needs to be clearly documented in the TSDS or _source documentation page.
We further propose that any mapping or setting explicitly written in a template should be explicitly written in any index that is instantiated from it. Anything else is too misleading.
It is also somewhat of a concern that there now appear to be two separate sources of truth for "mode": "synthetic": one in mappings and another hidden somewhere in our code.

Steps to Reproduce

This can be reproduced by running the following (notice the change of behaviour pre/post 8.7):

PUT /_index_template/test
{
  "index_patterns": [
    "foobar*"
  ],
  "template": {
    "settings": {
      "index": {
        "mode": "time_series",
        "routing_path": [
          "host.name"
        ]
      }
    },
    "mappings": {
      "_source": {
        "mode": "synthetic"
      }
    }
  },
  "composed_of": [
  ],
  "priority": 500,
  "data_stream": {
  }
}

POST /_index_template/_simulate/test

Gives the following on 8.6.x:

    "mappings": {
      "_source": {
        "mode": "synthetic"
      }
    },

This very section disappears on 8.7.x, while synthetic source remains activated.

@nerophon nerophon added >bug needs:triage Requires assignment of a team area label labels Jul 6, 2023
@dliappis dliappis added the :StorageEngine/TSDB You know, for Metrics label Jul 10, 2023
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jul 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@lucabelluccini
Copy link
Contributor

lucabelluccini commented Jul 24, 2023

I'm raising this as there are also other consequences.
Generally, a synthetic source index cannot be used with runtime fields accessing the _source.
And TSDS is implicitly a synthetic source, but due to this bug it's hard to spot what's going on for a end user.

Testing on 8.9.0.

# GET .ds-metrics-apm.internal-default-2023.07.20-000001/_mapping?filter_path=**._source 200 OK
{
  ".ds-metrics-apm.internal-default-2023.07.20-000001": {
    "mappings": {
      "_source": {
        "mode": "synthetic"
      }
    }
  }
}
# GET .ds-metrics-apm.internal-default-2023.07.20-000001/_search 400 Bad Request
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "org.elasticsearch.server@8.9.0/org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:440)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:39)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.script.AbstractFieldScript.lambda$static$0(AbstractFieldScript.java:69)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.script.DynamicMap.get(DynamicMap.java:58)",
          "emit(params._source['@timestamp'])",
          "           ^---- HERE"
        ],
        "script": "emit(params._source['@timestamp'])",
        "lang": "painless",
        "position": {
          "offset": 11,
          "start": 0,
          "end": 34
        }
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": ".ds-metrics-apm.internal-default-2023.07.20-000001",
        "node": "YvpHp5HiRhSm0mX1yD8Owg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.elasticsearch.server@8.9.0/org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:440)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:39)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.script.AbstractFieldScript.lambda$static$0(AbstractFieldScript.java:69)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.script.DynamicMap.get(DynamicMap.java:58)",
            "emit(params._source['@timestamp'])",
            "           ^---- HERE"
          ],
          "script": "emit(params._source['@timestamp'])",
          "lang": "painless",
          "position": {
            "offset": 11,
            "start": 0,
            "end": 34
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot access source from scripts in synthetic mode"
          }
        }
      }
    ]
  },
  "status": 400
}
# GET .ds-metrics-apm.internal-default-2023.07.20-000001?filter_path=**.mode 200 OK
{
  ".ds-metrics-apm.internal-default-2023.07.20-000001": {
    "mappings": {
      "_source": {
        "mode": "synthetic"
      }
    }
  }
}
# GET .ds-metrics-apm.internal-default-2023.07.20-000001/_search 400 Bad Request
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "org.elasticsearch.server@8.9.0/org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:440)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:39)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.script.AbstractFieldScript.lambda$static$0(AbstractFieldScript.java:69)",
          "org.elasticsearch.server@8.9.0/org.elasticsearch.script.DynamicMap.get(DynamicMap.java:58)",
          "emit(params._source['@timestamp'])",
          "           ^---- HERE"
        ],
        "script": "emit(params._source['@timestamp'])",
        "lang": "painless",
        "position": {
          "offset": 11,
          "start": 0,
          "end": 34
        }
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": ".ds-metrics-apm.internal-default-2023.07.20-000001",
        "node": "YvpHp5HiRhSm0mX1yD8Owg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.elasticsearch.server@8.9.0/org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:440)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:39)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.script.AbstractFieldScript.lambda$static$0(AbstractFieldScript.java:69)",
            "org.elasticsearch.server@8.9.0/org.elasticsearch.script.DynamicMap.get(DynamicMap.java:58)",
            "emit(params._source['@timestamp'])",
            "           ^---- HERE"
          ],
          "script": "emit(params._source['@timestamp'])",
          "lang": "painless",
          "position": {
            "offset": 11,
            "start": 0,
            "end": 34
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot access source from scripts in synthetic mode"
          }
        }
      }
    ]
  },
  "status": 400
}

@Danouchka
Copy link

+1 for 8.9.0 , we don't see synthetic mode in the mappings

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Aug 17, 2023
If the default index mode matches the specified, we skip printing the
synthetic source info in the mappings printing. This leads to confusion
as it's not immediately visible (or well known) that time series indices
use synthetic source by default.

Leaving the default index mode to null does the trick here. We do pass
the right value for time series indexes while building the mapping so
there's no functional impact here.

Fixes elastic#97429
kkrik-es added a commit that referenced this issue Aug 23, 2023
* Default index mode null for TimeSeries

If the default index mode matches the specified, we skip printing the
synthetic source info in the mappings printing. This leads to confusion
as it's not immediately visible (or well known) that time series indices
use synthetic source by default.

Leaving the default index mode to null does the trick here. We do pass
the right value for time series indexes while building the mapping so
there's no functional impact here.

Fixes #97429

* Update docs/changelog/98586.yaml

* Restore other error messages.

* Update source in DocumentMapper to include synthetic source.

* Add version check for skipping assert
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Aug 23, 2023
* Default index mode null for TimeSeries

If the default index mode matches the specified, we skip printing the
synthetic source info in the mappings printing. This leads to confusion
as it's not immediately visible (or well known) that time series indices
use synthetic source by default.

Leaving the default index mode to null does the trick here. We do pass
the right value for time series indexes while building the mapping so
there's no functional impact here.

Fixes elastic#97429

* Update docs/changelog/98586.yaml

* Restore other error messages.

* Update source in DocumentMapper to include synthetic source.

* Add version check for skipping assert
elasticsearchmachine pushed a commit that referenced this issue Aug 23, 2023
* Default index mode null for TimeSeries

If the default index mode matches the specified, we skip printing the
synthetic source info in the mappings printing. This leads to confusion
as it's not immediately visible (or well known) that time series indices
use synthetic source by default.

Leaving the default index mode to null does the trick here. We do pass
the right value for time series indexes while building the mapping so
there's no functional impact here.

Fixes #97429

* Update docs/changelog/98586.yaml

* Restore other error messages.

* Update source in DocumentMapper to include synthetic source.

* Add version check for skipping assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
7 participants