Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more contexts to painless execute api #30511

Merged

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented May 10, 2018

This change adds two contexts the execute scripts against:

  • SCORE_SCRIPT: Allows to run scripts in a score script context.
    This context is used in function_score query's script function,

  • FILTER_SCRIPT: Allows to run scripts in a filter script context.
    This context is used in the script query.

In both contexts a index name needs to be specified and a sample document.
The document is needed to create an in-memory index that the script can
access via the doc[...] and other notations. The index name is needed
because a mapping is needed to index the document.

Examples:

POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length()"
  },
  "execute_context" : {
    "score": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}

Returns:

{
  "result": 4
}
POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length() <= params.max_length",
    "params": {
      "max_length": 4
    }
  },
  "execute_context" : {
    "filter": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}

Returns:

{
  "result": true
}

@martijnvg martijnvg added WIP :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels May 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@martijnvg martijnvg force-pushed the painless_execute_api_add_more_contexts branch from e8a072f to afd5691 Compare May 17, 2018 12:08
@rjernst
Copy link
Member

rjernst commented May 18, 2018

SEARCH_SCRIPT: Allows to run scripts in a search script context.
This context is used in function_score query's script function,
script fields, script sorting and terms_set query.

I think we should have separate context names for script fields and script sorting before this PR. This should be pretty quick to do (see SearchScript.AGGS_CONTEXT as an example) and I'm happy to do the work myself if you don't feel comfortable. All of these cases will eventually change to have their own Script classes, and I want there to be less pain in transitioning to those new contexts in the code.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some general comments.

token = p.nextToken();
assert token == XContentParser.Token.END_OBJECT;
} else if (supportedContext == SupportedContext.FILTER_SCRIPT || supportedContext == SupportedContext.SEARCH_SCRIPT) {
for (token = p.nextToken(); token != XContentParser.Token.END_OBJECT; token = p.nextToken()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example, preferably in the docs for this, of what the input will look like here? It is difficult to figure out from the code. I also wonder if it would be easier to use object parser internally here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to add docs, but it going to look like the examples I provided in the PR description:

"search_script": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }

In this context case, a document and an index name needs to be provided.

return PARSER.parse(parser, null);
ParseContext parseContext = new ParseContext();
Request request = PARSER.parse(parser, parseContext);
request.setIndex(parseContext.index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the idea of using ObjectParser so the object returned from the parser is completely constructed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but that couldn't be done in this case. In the json the document / index name are provided under the context field, which maps the context field in the request class, but that is an enum. So via this way we attach the index name and document to the parse context, so that we later apply them to the document and index fields in the request class.

@@ -193,6 +318,10 @@ public static SupportedContext fromId(byte id) {
switch (id) {
case 0:
return PAINLESS_TEST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a unique identifier for each context (the name). Could we use that instead of creating an enumeration here that will need to be updated for every new context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me try that.

SearchScript.LeafFactory leafFactory =
factory.newFactory(request.getScript().getParams(), context.lookup());
SearchScript searchScript = leafFactory.newInstance(leafReaderContext);
Object result = searchScript.run();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another reason I would like script fields and sort scripts separated out. Actual scoring scripts should call the method expecting a double here.

@martijnvg
Copy link
Member Author

Thanks for taking a look @rjernst.

I think we should have separate context names for script fields and script sorting before this PR. This should be pretty quick to do (see SearchScript.AGGS_CONTEXT as an example) and I'm happy to do the work myself if you don't feel comfortable. All of these cases will eventually change to have their own Script classes, and I want there to be less pain in transitioning to those new contexts in the code.

I see, makes sense. I'm happy to try it out myself in separate PRs.

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 20, 2018
Added dedicated script contexts for:
* script function score
* script sorting
* terms_set query

Scripts for these contexts will either have a specific return value or
use scoring and therefor in the future will need their own scripting classes.

Relates to elastic#30511
martijnvg added a commit that referenced this pull request May 20, 2018
Added dedicated script contexts for:
* script function score
* script sorting
* terms_set query

Scripts for these contexts will either have a specific return value or
use scoring and therefor in the future will need their own scripting classes.

Relates to #30511
martijnvg added a commit that referenced this pull request May 20, 2018
Added dedicated script contexts for:
* script function score
* script sorting
* terms_set query

Scripts for these contexts will either have a specific return value or
use scoring and therefor in the future will need their own scripting classes.

Relates to #30511
This change adds two contexts the execute scripts against:

* SEARCH_SCRIPT: Allows to run scripts in a search script context.
This context is used in `function_score` query's script function,
script fields, script sorting and `terms_set` query.

* FILTER_SCRIPT: Allows to run scripts in a filter script context.
This context is used in the `script` query.

In both contexts a index name needs to be specified and a sample document.
The document is needed to create an in-memory index that the script can
access via the `doc[...]` and other notations. The index name is needed
because a mapping is needed to index the document.

Examples:

```
POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length()"
  },
  "context" : {
    "search_script": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}
```

Returns:

```
{
  "result": 4
}
```

POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length() <= params.max_length",
    "params": {
      "max_length": 4
    }
  },
  "context" : {
    "filter_script": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}

Returns:

```
{
  "result": true
}
```
@martijnvg martijnvg force-pushed the painless_execute_api_add_more_contexts branch from afd5691 to e86764a Compare May 23, 2018 12:48
…hardAction

instead of HandledAction, because now in case score or filter contexts are used
the request needs to be redirected to a node that has an active IndexService
for the index being referenced (a node with a shard copy for that index).
@martijnvg
Copy link
Member Author

@rjernst I've updated the PR.

Besides addressing your comment, I also made the following changes:

  • After Add more script contexts #30721 got merged, I exposed SCORE_SCRIPT context instead of SEARCH_SCRIPT context.
  • Execute on a management thread instead of a network thread. Not sure if management is the right threadpool, but executing on a network thread is not good anymore, because now also documents get indexed into an in-memory index and I don't consider that a light weight operation and therefor imo can't execute on a network thread.
  • Changed the painless execute api's transport action to extend from TransportSingleShardAction instead of HandledTransportAction. In the case that script or filter script contexts are used then an IndexService instance is needed for indexing. Not all nodes may have an instance for the index referenced in request (only nodes with a shard copy for that index). By extending from TransportSingleShardAction requests will be redirected to a node that have an IndexService instance.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more requests

"max_rank": 4
}
"context": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is quite confusing using "context" here, since for scripts, context is something different. It would be really good if we had a different term for this environment the script runs in...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will rename it to execute_script_context. (after the ExecuteScriptContext class)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to execute_context (the script part sounded redundant to me): 539abde

return validationException;
}

@Override
public void readFrom(StreamInput in) throws IOException {
super.readFrom(in);
script = new Script(in);
context = SupportedContext.fromId(in.readByte());
if (in.getVersion().onOrBefore(Version.V_6_4_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the execute action only in 6.x right now? Why would wire compat be necessary when no release includes it yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also in the 6.3 branch. Unless we want to get this change into 6.3 branch too (while 6.3.0 has not been released) then I think we should have this wire backwards compatibility logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is marked as experimental, which allows us to make changes to request / response format. I don't know what we exactly guarantee for experimental APIs in a cluster with mixed nodes, but if we don't have any guarantees for that then the wire backwards compatibility logic can be removed.

@@ -23,3 +38,53 @@
context:
painless_test: {}
- match: { result: "-90" }

---
"Execute with filter context":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way these could be pure unit tests? At least single node tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this.

@martijnvg
Copy link
Member Author

@rjernst I've updated the PR.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close, but I would like to continue iterating on the naming so as so avoid as much confusion as possible.

|======

==== Contexts
==== Execute contexts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still be called Contexts? While the parameter is execute_context, the concept is still just "contexts"

// TESTRESPONSE

===== Filter execute context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still call this section "Filter context". Ditto for the rest of this file on naming.


private static final ParseField SCRIPT_FIELD = new ParseField("script");
private static final ParseField CONTEXT_FIELD = new ParseField("context");
private static final ParseField EXECUTE_CONTEXT_FIELD = new ParseField("execute_context");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think think this name doesn't convey the meaning. Execute context could easily be a normal script context. But this is really using a script context, but providing some arguments to it. I think perhaps just context is better, but maybe the context name could be separated from a separate argument like context_setup that is parsed based on which context is chosen?

throw new IllegalArgumentException("unknown context [" + id + "]");
}
static boolean needDocumentAndIndex(ScriptContext<?> scriptContext) {
if (scriptContext == FilterScript.CONTEXT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be:

return scriptContext == FilterScript.CONTEXT || scriptContext == ScoreScript.CONTEXT;

@martijnvg
Copy link
Member Author

Thanks @rjernst! I've updated the PR. I like your idea of introducing a context_setup parameter, so I've done that and made the context parameter just a string parameter.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @martijnvg

@martijnvg martijnvg merged commit 1924f5d into elastic:master Jul 18, 2018
martijnvg added a commit that referenced this pull request Jul 18, 2018
This change adds two contexts the execute scripts against:

* SEARCH_SCRIPT: Allows to run scripts in a search script context.
This context is used in `function_score` query's script function,
script fields, script sorting and `terms_set` query.

* FILTER_SCRIPT: Allows to run scripts in a filter script context.
This context is used in the `script` query.

In both contexts a index name needs to be specified and a sample document.
The document is needed to create an in-memory index that the script can
access via the `doc[...]` and other notations. The index name is needed
because a mapping is needed to index the document.

Examples:

```
POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length()"
  },
  "context" : {
    "search_script": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}
```

Returns:

```
{
  "result": 4
}
```

POST /_scripts/painless/_execute
{
  "script": {
    "source": "doc['field'].value.length() <= params.max_length",
    "params": {
      "max_length": 4
    }
  },
  "context" : {
    "filter_script": {
      "document": {
        "field": "four"
      },
      "index": "my-index"
    }
  }
}

Returns:

```
{
  "result": true
}
```

Also changed PainlessExecuteAction.TransportAction to use TransportSingleShardAction
instead of HandledAction, because now in case score or filter contexts are used
the request needs to be redirected to a node that has an active IndexService
for the index being referenced (a node with a shard copy for that index).
dnhatn added a commit that referenced this pull request Jul 19, 2018
* 6.x:
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Revert "Introduce a Hashing Processor (#31087)" (#32179)
  [test] use randomized runner in packaging tests (#32109)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158) (#32169)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Convert Version to Java - clusterformation part1 (#32009)
  Fix Java 11 javadoc compile problem
  Improve docs for search preferences (#32098)
  Configurable password hashing algorithm/cost(#31234) (#32092)
  [DOCS] Update TLS on Docker for 6.3
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Build: Move shadow customizations into common code (#32014)
  Painless: Add PainlessClassBuilder (#32141)
  Fix accidental duplication of bwc test for script behavior
  Handle missing values in painless (#30975) (#31903)
  Build: Make additional test deps of check (#32015)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Adjust translog after versionType removed in 7.0 (#32020)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  [Rollup] Add new capabilities endpoint for concrete rollup indices (#32111)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  [ML] Wait for aliases in multi-node tests (#32086)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Docs: Fix missing example script quote (#32010)
  Add Index UUID to `/_stats` Response (#31871) (#32113)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  [ML][DOCS] Add missing 6.3.0 release notes (#32099)
  Updates the build to gradle 4.9 (#32087)
  Update monitoring template version to 6040099 (#32088)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
dnhatn added a commit that referenced this pull request Jul 20, 2018
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >feature v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants