Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial search pipelines implementation #6587

Merged
merged 18 commits into from
Apr 10, 2023

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Mar 8, 2023

Description

This commit includes the basic features of search pipelines (see opensearch-project/search-processor#80).

Search pipelines are modeled after ingest pipelines and provide a simple, clean API for components to modify search requests and responses.

With this commit we can:

  1. Can create, retrieve, update, and delete search pipelines.
  2. Transform search requests and responses by explicitly referencing a pipeline.

Later work will include:

  1. Adding an index setting to specify a default search pipeline.
  2. Allowing search pipelines to be defined within a search request (for development/testing purposes, akin to simulating an ingest pipeline).
  3. Adding a collection of search pipeline processors to support common useful transformations. (Suggestions welcome!)

Issues Resolved

opensearch-project/search-processor#97

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2023

Gradle Check (Jenkins) Run Completed with:

public class SearchPipelineServiceTests extends OpenSearchTestCase {
@BeforeClass
public static void enableFeature() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gradle runs test in parallel but the system properties (changed from tests) introduce instability since they affect the whole forked JVM that could run a few test suites simultaneously. Could we refactor the test case to use settings instead? Thank you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the parallel tests run in their own classloader?

Looking at how we've implemented FeatureFlags, it looks like the settings are held in a static variable, so using FeatureFlags.initializeSettings would similarly pollute things across all tests running in the same classloader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing purposes (since it's just this test), I think I might add a boolean field to SearchPipelineService that bypasses the FeatureFlags for that specific instance.

Copy link
Collaborator

@reta reta Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the parallel tests run in their own classloader?

I think we definitely fork the process, but not sure how many forks we create

For testing purposes (since it's just this test), I think I might add a boolean field to SearchPipelineService that bypasses the FeatureFlags for that specific instance.

Probably cleaner (and safer) so feature flag won't spread around, thank you

- Don't use system properties for SearchPipelineServiceTests.
- Enable feature flag for multinode smoke tests.

Signed-off-by: Michael Froh <froh@amazon.com>
@@ -426,4 +428,12 @@ static class PipelineHolder {
this.pipeline = Objects.requireNonNull(pipeline);
}
}

private boolean isFeatureEnabled() {
Copy link
Collaborator

@reta reta Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh since feature flags are on JVM level (they are not dynamic), I think we could use just simple enabled property that will be passed to constructor (as FeatureFlags.isEnabled(FeatureFlags.SEARCH_PIPELINE) from core or manually during testing), wdyt?

@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

Gradle Check (Jenkins) Run Completed with:

Thanks for the suggestion, @reta!

Signed-off-by: Michael Froh <froh@amazon.com>
@@ -979,7 +980,8 @@ protected Node(
xContentRegistry,
namedWriteableRegistry,
pluginsService.filterPlugins(SearchPipelinePlugin.class),
client
client,
FeatureFlags.isEnabled(SEARCH_PIPELINE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

Gradle Check (Jenkins) Run Completed with:

Comment on lines 934 to 936
registerHandler.accept(new RestPutSearchPipelineAction());
registerHandler.accept(new RestGetSearchPipelineAction());
registerHandler.accept(new RestDeleteSearchPipelineAction());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about guarding these registrations with the feature flag similar to how REMOTE_STORE does it below?

Signed-off-by: Michael Froh <froh@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.indices.replication.SegmentReplicationIT.testScrollWithOngoingSegmentReplication

@navneet1v
Copy link
Contributor

@andrross @msfroh can someone merge this code as it is approved

@noCharger
Copy link
Contributor

@andrross @msfroh can someone merge this code as it is approved

+1

@andrross andrross added the backport 2.x Backport to 2.x branch label Apr 10, 2023
@andrross andrross merged commit ee990bd into opensearch-project:main Apr 10, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6587-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ee990bd40ceacbf9ebd6ddea0874aa98c48ece47
# Push it to GitHub
git push --set-upstream origin backport/backport-6587-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-6587-to-2.x.

@andrross
Copy link
Member

@msfroh Can you look into creating the backport since it couldn't be automatically cherry-picked?

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 10, 2023

Can you look into creating the backport since it couldn't be automatically cherry-picked?

On it, thanks! 👍

@msfroh msfroh deleted the search_pipelines branch April 10, 2023 20:16
msfroh added a commit to msfroh/OpenSearch that referenced this pull request Apr 10, 2023
* Initial search pipelines implementation

This commit includes the basic features of search pipelines
(see opensearch-project/search-processor#80).

Search pipelines are modeled after ingest pipelines and provide a
simple, clean API for components to modify search requests and
responses.

With this commit we can:

1. Can create, retrieve, update, and delete search pipelines.
2. Transform search requests and responses by explicitly referencing a
   pipeline.

Later work will include:

1. Adding an index setting to specify a default search pipeline.
2. Allowing search pipelines to be defined within a search request (for
   development/testing purposes, akin to simulating an ingest
   pipeline).
3. Adding a collection of search pipeline processors to support common
   useful transformations. (Suggestions welcome!)

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta and @navneet1v

1. SearchPipelinesClient: JavaDoc fix
2. SearchRequest: Check versions when (de)serializing new "pipeline"
   property.
3. Rename SearchPipelinesPlugin -> SearchPipelinePlugin.
4. Pipeline: Change visibility to package private
5. SearchPipelineProcessingException: New exception type to wrap
   exceptions thrown when executing a pipeline.

Bonus: Added an integration test for filter_query request processor.

Signed-off-by: Michael Froh <froh@amazon.com>

* Register SearchPipelineProcessingException

Also added more useful messages to unit tests to explicitly explain
what hoops need to be jumped through in order to add a new serializable
exception.

Signed-off-by: Michael Froh <froh@amazon.com>

* Remove unneeded dependencies from search-pipeline-common

I had copied some dependencies from ingest-common, but they are not used
by search-pipeline-common (yet).

Signed-off-by: Michael Froh <froh@amazon.com>

* Avoid cloning SearchRequest if no SearchRequestProcessors

Also, add tests to confirm that a pipeline with no processors works
fine (as a no-op).

Signed-off-by: Michael Froh <froh@amazon.com>

* Use NamedWritableRegistry to deserialize SearchRequest

Queries are serialized as NamedWritables, so we need to use a
NamedWritableRegistry to deserialize.

Signed-off-by: Michael Froh <froh@amazon.com>

* Check for empty pipeline with CollectionUtils.isEmpty

Signed-off-by: Michael Froh <froh@amazon.com>

* Update server/src/main/java/org/opensearch/search/pipeline/SearchPipelineService.java

Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @noCharger

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta

- Renamed various classes from "SearchPipelinesSomething" to
"SearchPipelineSomething" to be consistent.
- Refactored NodeInfo construction in NodeService to avoid ternary
  operator and improved readability.

Signed-off-by: Michael Froh <froh@amazon.com>

* Gate search pipelines behind a feature flag

Also renamed SearchPipelinesRequestConverters.

Signed-off-by: Michael Froh <froh@amazon.com>

* More feature flag fixes for search pipeline testing

- Don't use system properties for SearchPipelineServiceTests.
- Enable feature flag for multinode smoke tests.

Signed-off-by: Michael Froh <froh@amazon.com>

* Move feature flag into constructor parameter

Thanks for the suggestion, @reta!

Signed-off-by: Michael Froh <froh@amazon.com>

* Move REST handlers behind feature flag

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
(cherry picked from commit ee990bd)
reta pushed a commit that referenced this pull request Apr 13, 2023
* Initial search pipelines implementation (#6587)

* Initial search pipelines implementation

This commit includes the basic features of search pipelines
(see opensearch-project/search-processor#80).

Search pipelines are modeled after ingest pipelines and provide a
simple, clean API for components to modify search requests and
responses.

With this commit we can:

1. Can create, retrieve, update, and delete search pipelines.
2. Transform search requests and responses by explicitly referencing a
   pipeline.

Later work will include:

1. Adding an index setting to specify a default search pipeline.
2. Allowing search pipelines to be defined within a search request (for
   development/testing purposes, akin to simulating an ingest
   pipeline).
3. Adding a collection of search pipeline processors to support common
   useful transformations. (Suggestions welcome!)

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta and @navneet1v

1. SearchPipelinesClient: JavaDoc fix
2. SearchRequest: Check versions when (de)serializing new "pipeline"
   property.
3. Rename SearchPipelinesPlugin -> SearchPipelinePlugin.
4. Pipeline: Change visibility to package private
5. SearchPipelineProcessingException: New exception type to wrap
   exceptions thrown when executing a pipeline.

Bonus: Added an integration test for filter_query request processor.

Signed-off-by: Michael Froh <froh@amazon.com>

* Register SearchPipelineProcessingException

Also added more useful messages to unit tests to explicitly explain
what hoops need to be jumped through in order to add a new serializable
exception.

Signed-off-by: Michael Froh <froh@amazon.com>

* Remove unneeded dependencies from search-pipeline-common

I had copied some dependencies from ingest-common, but they are not used
by search-pipeline-common (yet).

Signed-off-by: Michael Froh <froh@amazon.com>

* Avoid cloning SearchRequest if no SearchRequestProcessors

Also, add tests to confirm that a pipeline with no processors works
fine (as a no-op).

Signed-off-by: Michael Froh <froh@amazon.com>

* Use NamedWritableRegistry to deserialize SearchRequest

Queries are serialized as NamedWritables, so we need to use a
NamedWritableRegistry to deserialize.

Signed-off-by: Michael Froh <froh@amazon.com>

* Check for empty pipeline with CollectionUtils.isEmpty

Signed-off-by: Michael Froh <froh@amazon.com>

* Update server/src/main/java/org/opensearch/search/pipeline/SearchPipelineService.java

Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @noCharger

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta

- Renamed various classes from "SearchPipelinesSomething" to
"SearchPipelineSomething" to be consistent.
- Refactored NodeInfo construction in NodeService to avoid ternary
  operator and improved readability.

Signed-off-by: Michael Froh <froh@amazon.com>

* Gate search pipelines behind a feature flag

Also renamed SearchPipelinesRequestConverters.

Signed-off-by: Michael Froh <froh@amazon.com>

* More feature flag fixes for search pipeline testing

- Don't use system properties for SearchPipelineServiceTests.
- Enable feature flag for multinode smoke tests.

Signed-off-by: Michael Froh <froh@amazon.com>

* Move feature flag into constructor parameter

Thanks for the suggestion, @reta!

Signed-off-by: Michael Froh <froh@amazon.com>

* Move REST handlers behind feature flag

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
(cherry picked from commit ee990bd)

* Resolve various backporting issues

1. Can't reference version 3.0.0.
2. Bad merges of adjacent version checks.
3. Use of Apache HTTP client 4 (vs 5).
4. Use of old cluster manager naming in REST params.
5. CollectionUtils didn't have isEmpty for collections.

Signed-off-by: Michael Froh <froh@amazon.com>

* Support deprecated master_timeout parameter

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Apr 28, 2023
* Initial search pipelines implementation

This commit includes the basic features of search pipelines
(see opensearch-project/search-processor#80).

Search pipelines are modeled after ingest pipelines and provide a
simple, clean API for components to modify search requests and
responses.

With this commit we can:

1. Can create, retrieve, update, and delete search pipelines.
2. Transform search requests and responses by explicitly referencing a
   pipeline.

Later work will include:

1. Adding an index setting to specify a default search pipeline.
2. Allowing search pipelines to be defined within a search request (for
   development/testing purposes, akin to simulating an ingest
   pipeline).
3. Adding a collection of search pipeline processors to support common
   useful transformations. (Suggestions welcome!)

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta and @navneet1v

1. SearchPipelinesClient: JavaDoc fix
2. SearchRequest: Check versions when (de)serializing new "pipeline"
   property.
3. Rename SearchPipelinesPlugin -> SearchPipelinePlugin.
4. Pipeline: Change visibility to package private
5. SearchPipelineProcessingException: New exception type to wrap
   exceptions thrown when executing a pipeline.

Bonus: Added an integration test for filter_query request processor.

Signed-off-by: Michael Froh <froh@amazon.com>

* Register SearchPipelineProcessingException

Also added more useful messages to unit tests to explicitly explain
what hoops need to be jumped through in order to add a new serializable
exception.

Signed-off-by: Michael Froh <froh@amazon.com>

* Remove unneeded dependencies from search-pipeline-common

I had copied some dependencies from ingest-common, but they are not used
by search-pipeline-common (yet).

Signed-off-by: Michael Froh <froh@amazon.com>

* Avoid cloning SearchRequest if no SearchRequestProcessors

Also, add tests to confirm that a pipeline with no processors works
fine (as a no-op).

Signed-off-by: Michael Froh <froh@amazon.com>

* Use NamedWritableRegistry to deserialize SearchRequest

Queries are serialized as NamedWritables, so we need to use a
NamedWritableRegistry to deserialize.

Signed-off-by: Michael Froh <froh@amazon.com>

* Check for empty pipeline with CollectionUtils.isEmpty

Signed-off-by: Michael Froh <froh@amazon.com>

* Update server/src/main/java/org/opensearch/search/pipeline/SearchPipelineService.java

Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @noCharger

Signed-off-by: Michael Froh <froh@amazon.com>

* Incorporate feedback from @reta

- Renamed various classes from "SearchPipelinesSomething" to
"SearchPipelineSomething" to be consistent.
- Refactored NodeInfo construction in NodeService to avoid ternary
  operator and improved readability.

Signed-off-by: Michael Froh <froh@amazon.com>

* Gate search pipelines behind a feature flag

Also renamed SearchPipelinesRequestConverters.

Signed-off-by: Michael Froh <froh@amazon.com>

* More feature flag fixes for search pipeline testing

- Don't use system properties for SearchPipelineServiceTests.
- Enable feature flag for multinode smoke tests.

Signed-off-by: Michael Froh <froh@amazon.com>

* Move feature flag into constructor parameter

Thanks for the suggestion, @reta!

Signed-off-by: Michael Froh <froh@amazon.com>

* Move REST handlers behind feature flag

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants