Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Enhancement of Repository Plugin #6354

Closed
ashking94 opened this issue Feb 17, 2023 · 11 comments
Closed

[Feature Proposal] Enhancement of Repository Plugin #6354

ashking94 opened this issue Feb 17, 2023 · 11 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request

Comments

@ashking94
Copy link
Member

The purpose of this issue is to gather community feedback on proposal of enhancing the repository plugin.

OpenSearch repository plugin, today, provides transfer capabilities via streams to remote store. This plugin allows user to store the index data in off-cluster external repositories such as Amazon S3, Google Cloud Storage, or a shared filesystem, in addition to the local disk of the OpenSearch cluster. By using the repository plugin, OpenSearch users can take advantage of Snapshot feature to backup and restore to protect against data loss, enable disaster recovery, and create replicas of their data for testing and development purposes. With remote-backed storage, the user now has the ability to protect against data loss by automatically creating continuous backups of all index transactions and sending them to remote storage. OpenSearch users can achieve request level durability using remote-backed storage.

Problem Statement

OpenSearch repository plugin today provides interfaces such as writeBlob interface to facilitate transfer of a file by using a single InputStream. This means that the file referenced by InputStream needs to be processed serially. This restricts the capabilities of underlying plugin to serially transfer buffered content of a file and only after successful processing of first buffer, subsequent buffer of content is read and transferred. Parallel processing of multiple parts of a file is therefore, not possible. Use cases such as download or upload of multiple parts of a file in parallel cannot be supported due to this.

S3 repository plugin, for instance, provides support for multi-part upload but due to single InputStream restriction of base plugin, upload of each part happens serially even though S3 provides support for parallel upload of individual parts of a file.

Proposed Solution

We propose to enhance existing repository plugin to extend support for mutliple stream suppliers for underlying vendor plugins to be able to optionally provide multi-stream based implementations for remote transfers. Provisioning multiple streams instead of abstracting with file based transfer would provide control in core Opensearch code to pre-process buffered content with multiple stream wrappers after content is read and before transfer can take place. Stream suppliers instead of concrete streams can further facilitate delegation of stream creation till remote transfer is started. Following can be some of the abstractions we propose to provide the required support :

  1. To check if upload blob is supported.
  2. Upload blob supplied with upload context which can consist of ordered collection of stream suppliers along with metadata of each stream like length, headers, etc.
  3. To check if download blob is supported.
  4. Download blob supplied with download context which can consist of ordered collection of stream appliers to be applied on top of sdk input stream before persisting data on disk. Each applier can have metadata needed for applying, associated with it.

Credits - @vikasvb90, @ashking94

@ashking94 ashking94 added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making feature New feature or request labels Feb 17, 2023
@ashking94 ashking94 self-assigned this Feb 17, 2023
@ashking94
Copy link
Member Author

ashking94 commented Feb 17, 2023

Tagging @elfisher @muralikpbhat @reta @mch2 @dreamer-89 @andrross @Bukhtawar @sachinpkale @itiyamas for feedback. Pls do tag others who can review this.

@andrross
Copy link
Member

This proposal looks like a good idea to me. I suggest putting together a quick proof-of-concept of the interface changes you have in mind, as I suspect it'll be easier for folks to provide feedback on a more concrete proposal.

It would also be great to have at least rough implementations from multiple object stores to ensure the API has the right abstractions (though what you're proposing does look pretty generic).

@saratvemulapalli
Copy link
Member

saratvemulapalli commented Mar 2, 2023

This is proposal looks great. My thumb rule is always, if we are making the product better its a go.

From what I understand, we are improving the upload (or download) to remote store using multipart[1]. It absolutely makes sense as it improves the performance and reliability dramatically.

The tricker part is to convert the single InputStream to multiple parts which I believe the proposal aims to solve. Let OpenSearch manage this and provide an interface to split into multiple streams parts.
From quick search on internet, looks like java doesnt have something out of the box and we should write up these utilities for every BlobContainer[2] implementations.

Also as an optimization, multipart upload/download only makes sense for large files (>100MB recommended for S3) we should gate this and make sure we dont overkill for small files.

Overall it makes sense, I second @andrross idea, lets implement this for one blob store and we could converse right set of abstractions.

[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
[2] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java

@vikasvb90
Copy link
Contributor

vikasvb90 commented Mar 3, 2023

@saratvemulapalli Your overall brief of using multiple streams emitted from a file and using them for transfer instead of a single stream is correct. There are few corrections though in detailed view :

  1. Since InputStream is an interface and underlying concrete streams define how content should be processed, it cannot be used to split the stream. Idea is to treat the file as a composition of multiple parts and each part to be referenced by a single stream.
  2. Repository plugin will only deal with handling transfer via provided multiple streams. Splitting of the file into multiple streams will still be the responsibility of caller (Opensearch in this case). By keeping this in caller, we allow it to add more processing wrapper streams such as encrypting streams.

We will add details in the design doc.

@saratvemulapalli
Copy link
Member

@vikasvb90 thank you. That makes sense.

@reta
Copy link
Collaborator

reta commented Mar 6, 2023

I think the proposal make sense, but I would love to see the API sketched, as @andrross suggested. The repository plugin is pretty generic in nature, there are feature disparities between S3 / Azure / HDFS / GCS / OCS / whatever comes next and this needs to be baked into API concisely.

@dblock
Copy link
Member

dblock commented Mar 8, 2023

This is a great proposal!

  • Is there a world where we don't want to do anything other than multipart/parallel uploads when it comes to S3?
  • Is there a way to support multipart upload with no API changes at all, just change the implementation of the S3 repository plugin (e.g. spawn N threads, each writes parts), even if only POC?
  • Is it possible/does it make sense to write parts out of order?

@andrross
Copy link
Member

andrross commented Mar 9, 2023

Is there a way to support multipart upload with no API changes at all, just change the implementation of the S3 repository plugin (e.g. spawn N threads, each writes parts), even if only POC?

The limiting factor, as I understand it, is that the interface passes in a single InputStream for one file. Given that an InputStream only gives sequential access to a file, the only way to do concurrent multipart uploads without changing the interface API would be to buffer the parts into memory from that InputStream. That could be done for a proof-of-concept, but I'm not sure how valuable that would be given that the memory requirements would likely be unacceptable in the general case. I think a proof-of-concept including the interface changes would be pretty straightforward.

@vikasvb90
Copy link
Contributor

vikasvb90 commented Mar 9, 2023

@dblock

  1. There are other operations but I am assuming that you are only referring to uploads. In case of uploads, multipart parallel uploads are certainly more efficient over serial upload but there are some factors based on which we would want to limit concurrent multipart uploads or switch to serial upload such as additional CPU overhead, size of the file, etc. But these decision can be left to be taken in vendor plugin.
  2. Existing interface only allows single InputStream to be passed and to convert single stream into multiple streams we would need to read the content twice which would be inefficient.
  3. Parts will be written in isolation with each other and therefore, can be written out of order and it will be the responsibility of cloud platform to make sure that support for combining all parts into a single file is present. For instance, S3 automatically aggregates all parts into one file after the upload using part number present against each part. If such support is not present in the cloud provider then vendor plugin may also choose to keep parts as separate files on remote store and accordingly adjust its read and write flows.

@dblock
Copy link
Member

dblock commented Mar 13, 2023

Even with a single InputStream is the limiting factor the reading of the stream or the writing of the data? Writing the contents of a the single input stream concurrently to S3 may be a worthy experiment without any interface changes.

@vikasvb90
Copy link
Contributor

@dblock
In case of S3, minimum part size of a file can be 5MB. That's why a part is not uploaded as a single request but multiple http requests of smaller buffer size of default ~10KB are created and executed. This means that if we enable concurrent writes from a single stream then this will result in random data present in a part which might not belong that part. On completion of the upload when S3 will try to combine all parts sequentially, the resultant file will contain shuffled data.

I am not sure if it is also the case in other remote stores but in case of S3 it will surely not work. This means that we need one stream per part referring to a particular portion of a file responsible for uploading only that portion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants