Prototype search pipelines #97

msfroh · 2023-02-06T19:00:13Z

I'm going to put together a scrappy first implementation of search pipelines.

This first implementation will largely be a copy/paste from ingest pipelines.

I think it should be a good conversation-starter about whether/how to share implementation with ingest pipelines. Depending on where I get with this task, it may be throwaway learning code or it may be the first draft of what we eventually want to merge.

This task should be moved to the OpenSearch core project, but I'm creating it here as a placeholder.

The goals for my prototype include:

Should be able to CRUD search pipelines, with persistence in cluster state.
Should be able to invoke a named search pipeline from a search request. (The named search pipeline might include a dummy "hello world" processor that e.g. adds a field to the first hit in the response or adds a filter to an incoming query.)

Features that can come later (but before "release") include:

Setting a default search pipeline for an index.
Specifying an ad hoc search pipeline as part of a search request.
Availability of a "standard" set of search pipeline processors (similar to ingest-common).
Support for BracketProcessor (processors that modify both request and response, with state carried from request time to response time).

The text was updated successfully, but these errors were encountered:

Jeevananthan-23 · 2023-02-22T14:58:08Z

Hi @msfroh, I could like to understand the concept behind the pipelines. So it is similar to Redis pipeline where to optimize round-trip times by batching tasks request in client side socket and send to server without waiting for the replies at all, and finally read the replies in a single step.

This is the design model of the ingest and search pipelines?

Thanks in advances!

msfroh · 2023-02-23T06:28:01Z

Hi @Jeevananthan-23, the motivation is about providing a (relatively) lightweight way to modify behavior of searches at the cluster level, since that may make more sense than modifying behavior at the application layer.

For example, ingest pipelines provide a way of manipulating incoming documents on the cluster before they're sent for indexing. You could just modify the documents before sending them to the cluster in the first place, but maybe that's not convenient (like maybe you have multiple applications sending documents). Also, you get the open-source benefit where one person can write a useful ingest pipeline processor and share it with other OpenSearch users, who don't need to modify any of their applications' indexing code.

On the search side, the specific thing we've been trying to tackle is final-stage rerankers (which is what we've been covering in https://github.com/opensearch-project/search-processor), where you want to run the collated search results through an external reranker to get more relevant results than you could get through term frequency-based relevance alone. You could send the results you get back from OpenSearch to the external reranker, but by letting the cluster drive the transformation you don't need to modify your search application. More importantly, one person can build and release a search pipeline processor that integrates with an external reranker, and many users can benefit without each having to modify their search application.

Inspired by ingest pipelines, we realized that "functional operator" model (where an ingest pipeline processor is effectively a function that takes an IngestDocument and returns an IngestDocument) is pretty powerful. We can similarly define a couple of interfaces that operate on SearchRequest and SearchResponse.

The linked RFC goes into much more detail, but I hope the above is a useful summary.

msfroh added the feature introduce a net new unit of functionality of a software system that satisfies a requirement label Feb 6, 2023

msfroh self-assigned this Feb 6, 2023

github-actions bot added the untriaged label Feb 6, 2023

msfroh mentioned this issue Feb 6, 2023

[RFC] Search pipelines #80

Closed

1 task

macohen removed the untriaged label Feb 6, 2023

macohen mentioned this issue Feb 9, 2023

[META] Search Pipelines - GA opensearch-project/OpenSearch#6278

Closed

10 tasks

macohen added this to the 2.7.0 Release milestone Mar 3, 2023

msfroh mentioned this issue Mar 8, 2023

Initial search pipelines implementation opensearch-project/OpenSearch#6587

Merged

6 tasks

macohen added the Search label Mar 10, 2023

msfroh mentioned this issue Apr 13, 2023

[SearchPipelines] Lower compatibility version to 2.7 opensearch-project/OpenSearch#7135

Merged

6 tasks

andrross closed this as completed in opensearch-project/OpenSearch#7135 Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype search pipelines #97

Prototype search pipelines #97

msfroh commented Feb 6, 2023 •

edited

Loading

Jeevananthan-23 commented Feb 22, 2023

msfroh commented Feb 23, 2023 •

edited

Loading

Prototype search pipelines #97

Prototype search pipelines #97

Comments

msfroh commented Feb 6, 2023 • edited Loading

Jeevananthan-23 commented Feb 22, 2023

msfroh commented Feb 23, 2023 • edited Loading

msfroh commented Feb 6, 2023 •

edited

Loading

msfroh commented Feb 23, 2023 •

edited

Loading