Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for runtime fields #59332

Closed
30 tasks done
javanna opened this issue Jul 9, 2020 · 4 comments
Closed
30 tasks done

Add support for runtime fields #59332

javanna opened this issue Jul 9, 2020 · 4 comments
Assignees
Labels
>feature Meta release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@javanna
Copy link
Member

javanna commented Jul 9, 2020

Runtime fields

We would like to increase the flexibility of the search API by introducing support for runtime fields.

Runtime fields are not indexed and do not have doc_values, meaning Lucene is completely unaware of them, but they are consumed through the field capabilities API and the search API like any ordinary field. It is possible to retrieve them as well as query them, aggregate and sort on them.

Runtime fields make searches slower, as computing their values for each document (that may match the query) is costly, depending on how they are calculated; it is highly recommended that the Async Search API is used to run searches that use runtime fields.

One limitation of runtime fields compared to ordinary fields is that they don’t support scoring as they are not indexed and we are not going to compute the document frequency for them, which is required for scoring.

Runtime fields are not part of the _source, hence they are not returned by default as part of the search hits. They can be specifically requested through the field retrieval API (#55363).

A runtime field is defined by its data type and the script that computes its values. As of today, each search section already supports scripting, but the contexts are different, as well as the required syntax. We want to unify this to a single place where a script can be specified. Such a script always has access to _source, any other stored fields as well as doc_values.

A runtime fields can be defined in the mappings by adding its definition to a new runtime section at the same level as properties, where fields that exist in _source are defined:

PUT /my-index/_mappings
{
    "runtime" : {
        "day_of_week" : {
            "type" : "keyword",
            "script" : {
                "source" : "emit(doc['timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
            }
        }
    }
}

The data types supported for runtime fields are initially keyword, long, double, date, ip, boolean, geo_point. In the example above, we extract the day of week (e.g. Monday) from another field called timestamp which is defined as a date. The script can refer to other fields, including other runtime fields: we need to implement a mechanism to resolve fields dependencies in the correct order, and prevent cyclic dependencies.

The defined field can then be used like any other field in the different sections of the search API:

GET my_index/_search
{
    "aggs" : {
        "days_of_week" : {
            "terms" : {
                "field" : "day_of_week"
            }
        }
    }
}

Each runtime field type will consist of a MappedFieldType that exposes a runtime fielddata implementation that generates doc_values on the fly for the needed data type. Additionally, all the basic Lucene queries for each runtime field type need to be written to query the corresponding fielddata/doc_values implementation.

Support for runtime fields in Elasticsearch will be released under the Elastic license.

The following is a high-level list of tasks required to develop the initial support for runtime fields, which will lay the foundations for the next phases:

Mappers and field types

API

Scripting

Infrastructure

Security

Telemetry

Docs

  • Document runtime section and corresponding field types ([DOCS] Add docs for runtime fields #62653)
    • inconsistencies caused by updating a script while queries that rely on it are running
    • existing queries / visualizations may break because runtime fields can be updated
    • queries against runtime fields are deemed expensive and rejected when expensive queries are disallowed
  • Document how to define runtime fields in a search request
  • Document the ability to omit the script from the definition of a runtime field
  • Document the ability to shadow existing fields with a runtime field
  • Document dynamic runtime mode ([DOCS] Add dynamic runtime fields to docs (#66194) #66304)
@javanna javanna added >feature :Search/Search Search-related issues that do not fall into other categories Meta labels Jul 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 9, 2020
javanna added a commit that referenced this issue Jul 14, 2020
javanna added a commit that referenced this issue Jul 17, 2020
This addresses a TODO around using the script params, which are now parsed from the mappings. It also expand existing tests to verify that params are carried around and accessible in script for both fielddata and queries.

Relates to #59332
nik9000 added a commit that referenced this issue Sep 15, 2020
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.

I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.

Relates to #59332
nik9000 added a commit that referenced this issue Sep 16, 2020
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332
nik9000 added a commit that referenced this issue Sep 16, 2020
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.

I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.

Relates to #59332
nik9000 added a commit that referenced this issue Nov 10, 2020
This adds a way to specify the `runtime_mappings` on a search request
which are always "runtime" fields. It looks like:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d'
{"index": {}}
{"animal": "cat", "sound": "meow"}
{"index": {}}
{"animal": "dog", "sound": "woof"}
{"index": {}}
{"animal": "snake", "sound": "hisssssssssssssssss"}
'

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'
{
  "runtime_mappings": {
    "animal.upper": {
      "type": "keyword",
      "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}"
    }
  },
  "query": {
    "match": {
      "animal.upper": "DOG"
    }
  }
}'
```

NOTE:
If we have to send a search request with runtime mappings to a node that
doesn't support runtime mappings at all then we'll fail the search
request entirely. The alternative would be to not send those runtime
mappings and let the node fail the search request with an "unknown field"
error. I believe this is would be hard to surprising because you defined
the field in the search request.

NOTE:
It isn't obvious but you can also use `runtime_mappings` to override fields
inside objects by naming the runtime fields with `.` in them. Like this:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d'
{"index":{}}
{"name": {"first": "Andrew", "last": "Wiggin"}}
{"index":{}}
{"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}}
'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "runtime_mappings": {
    "name.first": {
      "type": "keyword",
      "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}"
    }
  },
  "query": {
    "match": {
      "name.first": "Bean"
    }
  }
}'
```

Relates to #59332
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Nov 10, 2020
This adds a way to specify the `runtime_mappings` on a search request
which are always "runtime" fields. It looks like:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d'
{"index": {}}
{"animal": "cat", "sound": "meow"}
{"index": {}}
{"animal": "dog", "sound": "woof"}
{"index": {}}
{"animal": "snake", "sound": "hisssssssssssssssss"}
'

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'
{
  "runtime_mappings": {
    "animal.upper": {
      "type": "keyword",
      "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}"
    }
  },
  "query": {
    "match": {
      "animal.upper": "DOG"
    }
  }
}'
```

NOTE:
If we have to send a search request with runtime mappings to a node that
doesn't support runtime mappings at all then we'll fail the search
request entirely. The alternative would be to not send those runtime
mappings and let the node fail the search request with an "unknown field"
error. I believe this is would be hard to surprising because you defined
the field in the search request.

NOTE:
It isn't obvious but you can also use `runtime_mappings` to override fields
inside objects by naming the runtime fields with `.` in them. Like this:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d'
{"index":{}}
{"name": {"first": "Andrew", "last": "Wiggin"}}
{"index":{}}
{"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}}
'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "runtime_mappings": {
    "name.first": {
      "type": "keyword",
      "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}"
    }
  },
  "query": {
    "match": {
      "name.first": "Bean"
    }
  }
}'
```

Relates to elastic#59332
nik9000 added a commit that referenced this issue Nov 11, 2020
* Add `runtime_mappings` to search request (backport of #64374)

This adds a way to specify the `runtime_mappings` on a search request
which are always "runtime" fields. It looks like:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d'
{"index": {}}
{"animal": "cat", "sound": "meow"}
{"index": {}}
{"animal": "dog", "sound": "woof"}
{"index": {}}
{"animal": "snake", "sound": "hisssssssssssssssss"}
'

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'
{
  "runtime_mappings": {
    "animal.upper": {
      "type": "keyword",
      "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}"
    }
  },
  "query": {
    "match": {
      "animal.upper": "DOG"
    }
  }
}'
```

NOTE:
If we have to send a search request with runtime mappings to a node that
doesn't support runtime mappings at all then we'll fail the search
request entirely. The alternative would be to not send those runtime
mappings and let the node fail the search request with an "unknown field"
error. I believe this is would be hard to surprising because you defined
the field in the search request.

NOTE:
It isn't obvious but you can also use `runtime_mappings` to override fields
inside objects by naming the runtime fields with `.` in them. Like this:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d'
{"index":{}}
{"name": {"first": "Andrew", "last": "Wiggin"}}
{"index":{}}
{"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}}
'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "runtime_mappings": {
    "name.first": {
      "type": "keyword",
      "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}"
    }
  },
  "query": {
    "match": {
      "name.first": "Bean"
    }
  }
}'
```

Relates to #59332
javanna added a commit that referenced this issue Nov 12, 2020
The runtime section is at the same level as the existing properties section. Its purpose is to hold runtime fields only. With the introduction of the runtime section, a runtime field can be defined by specifying its type (previously called runtime_type) and script.

```
PUT /my-index/_mappings
{
    "runtime" : {
        "day_of_week" : {
            "type" : "keyword",
            "script" : {
                "source" : "emit(doc['timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
            }
        }
    },
    "properties" : {
        "timestamp" : {
            "type" : "date"
        }
    }
}
```

Fields defined in the runtime section can be updated at any time as they are not present in the lucene index. They get replaced entirely when they get updated.

Thanks to the introduction of the runtime section, runtime fields override existing mapped fields defined with the same name, similarly to runtime fields defined in the search request.

Relates to #59332
javanna added a commit to javanna/elasticsearch that referenced this issue Nov 12, 2020
The runtime section is at the same level as the existing properties section. Its purpose is to hold runtime fields only. With the introduction of the runtime section, a runtime field can be defined by specifying its type (previously called runtime_type) and script.

```
PUT /my-index/_mappings
{
    "runtime" : {
        "day_of_week" : {
            "type" : "keyword",
            "script" : {
                "source" : "emit(doc['timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
            }
        }
    },
    "properties" : {
        "timestamp" : {
            "type" : "date"
        }
    }
}
```

Fields defined in the runtime section can be updated at any time as they are not present in the lucene index. They get replaced entirely when they get updated.

Thanks to the introduction of the runtime section, runtime fields override existing mapped fields defined with the same name, similarly to runtime fields defined in the search request.

Relates to elastic#59332
javanna added a commit that referenced this issue Nov 13, 2020
The runtime section is at the same level as the existing properties section. Its purpose is to hold runtime fields only. With the introduction of the runtime section, a runtime field can be defined by specifying its type (previously called runtime_type) and script.

```
PUT /my-index/_mappings
{
    "runtime" : {
        "day_of_week" : {
            "type" : "keyword",
            "script" : {
                "source" : "emit(doc['timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
            }
        }
    },
    "properties" : {
        "timestamp" : {
            "type" : "date"
        }
    }
}
```

Fields defined in the runtime section can be updated at any time as they are not present in the lucene index. They get replaced entirely when they get updated.

Thanks to the introduction of the runtime section, runtime fields override existing mapped fields defined with the same name, similarly to runtime fields defined in the search request.

Relates to #59332
javanna added a commit to javanna/elasticsearch that referenced this issue Dec 8, 2020
@javanna
Copy link
Member Author

javanna commented Dec 18, 2020

All of the tasks for the first phase of Runtime Fields are completed. Runtime fields will ship with 7.11. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Meta release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants