Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Streaming Indexing API #5001

Closed
4 tasks
adnapibar opened this issue Oct 31, 2022 · 1 comment
Closed
4 tasks

[POC] Streaming Indexing API #5001

adnapibar opened this issue Oct 31, 2022 · 1 comment
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@adnapibar
Copy link
Contributor

adnapibar commented Oct 31, 2022

Problem

Current _bulk indexing API places a high configuration burden on users today to avoid RejectedExecutionException due to TOO_MANY_REQUESTS. This forces the user to "experiment" with bulk block sizes, multi-threading, refresh intervals, etc.

The use HTTP streaming for _bulk indexing would:

  • improve API usability: streams for request and response
  • improve resource utilization: the coordinators may funnel the streams from multiple clients
  • improve overall stability: the coordinators may use backpressure to slow down the clients and apply the optimal batching strategy taking into account resource availability (heap / CPU / ...)
  • improve durability: the coordinators may start processing as soon the the first bulk item is received (using translog / other means to deal with crashes / restarts / disconnects)

See please [RFC] Streaming Index API

Implementation Options

With all the options available, the _bulk should continue to use HTTP protocol, however there are few options to consider.

Chunked Transfer Encoding

More details here #3000 (comment). This is the more or less the only option available in case of HTTP/1.1. The benefit of this implementation is that it would work for 2.x and 3.x releases.

HTTP/2

HTTP/2 offers an optimized transport for HTTP semantics, including superior streaming capabilities, see please Streams and Multiplexing for more details.

HTTP/2 uses DATA frames to carry message payloads. The "chunked"
transfer encoding defined in Section 4.1 of [RFC7230] MUST NOT be
used in HTTP/2.

The HTTP/2 is only supported by 3.x release line (both for clients and servers).

Websockets

The Websockets would offer bidirectional stream, similarly to HTTP/2, but from implementation perspective it would be easier to integrate (in theory): this is new protocol that will not touch the existing OpenSearch HTTP layer.

Implementation Notes

The OpenSearch supports both HTTP/1.1 and HTTP/2 (including H2C). However, the OpenSearch HTTP server model does not support chunked transfer encoding nor exposes HTTP/2 streams (especially data frames):

  • the OpenSearch HTTP layer always expects complete requests (and sends complete responses)
  • the OpenSearch HTTP layer is based on Netty's HTTP/1.1 abstractions
  • the OpenSearch HTTP/2 uses Netty's conversions (fe Http2StreamFrameToHttpObjectCodec, ...) to convert to HTTP/1.1 abstractions

The suggested direction to proceed towards POC:

  • prototype streaming within OpenSearch HTTP layer (Chunked Transfer Encoding first), both client and server
  • understand Netty's conversions between HTTP/1.1 and HTTP/2 when chunked transfer encoding is used (if any)
  • understand if support streaming with explicit HTTP/2 data streams handling if required
  • conclude with implementation to move forward

At this moment, the POC focuses only on first step: understand the scope of changes to support HTTP streaming on OpenSearch server and client sides.

@adnapibar adnapibar added the enhancement Enhancement or improvement to existing feature or request label Oct 31, 2022
@adnapibar adnapibar removed their assignment Mar 11, 2023
@reta reta self-assigned this Apr 13, 2023
@reta reta mentioned this issue May 1, 2023
6 tasks
@reta
Copy link
Collaborator

reta commented Aug 2, 2023

Closing the POC, the prove of the concept has been developed, the implementation path had been cleared out

@reta reta closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
Development

Successfully merging a pull request may close this issue.

4 participants