Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Move SDK to OpenSearch repo #616

Closed
saratvemulapalli opened this issue Mar 27, 2023 · 18 comments
Closed

[PROPOSAL] Move SDK to OpenSearch repo #616

saratvemulapalli opened this issue Mar 27, 2023 · 18 comments
Labels

Comments

@saratvemulapalli
Copy link
Member

saratvemulapalli commented Mar 27, 2023

What/Why

What are you proposing?

In a few sentences, describe the feature and its core capabilities.
Coming from: opensearch-project/OpenSearch#6470 (comment)
Move OpenSearch SDK to OpenSearch core repo to be the defacto interface for all dependencies.
There are obviously upsides and downsides:

Upside:

  • As a mechanism every new feature/extension point will be via the SDK which is the only dependency plugins/extensions/clients etc
  • OpenSearch server has only one way to enable extending features
  • Simplifies testing OpenSearch/SDK with in the same CI's
  • Simplifies breaking changes as SDK dictates the contracts with server.

Downsides:

  • Releasing SDK independently
  • Coupled with OpenSearch development (adding additional overhead on gradle check)

Maintainers I would like to hear thoughts and please vote 👍🏻 👎🏻
cc: @nknize

@saratvemulapalli
Copy link
Member Author

saratvemulapalli commented Mar 27, 2023

@dblock @opensearch-project/opensearch-sdk-java would love your thoughts, and also I might not have thought of all upsides/downside feel free to chip in.

@dbwiddis
Copy link
Member

Downside: dependency management. Java SDK will have api dependencies on OpenSearchClient (java client) and OpenSearchAsyncClient. Moving to OpenSearch will require OpenSearch to take that same dependency.... but opensearch-java depends on OpenSearch. There are ways to work around it with SPI, etc.

@dbwiddis
Copy link
Member

Neutral: I would love to see a diagram envisioning how an SDK module (ala JPMS) would interact with other OpenSearch modules in the modular future.

@nknize
Copy link

nknize commented Mar 27, 2023

Java SDK will have api dependencies on OpenSearchClient (java client) and OpenSearchAsyncClient.

I'm curious why the upstream SDK is planned to take a dependency on the downstream client? I think we should strive for no downstream inheritance in the upstream.

@dbwiddis
Copy link
Member

I'm curious why the upstream SDK is planned to take a dependency on the downstream client? I think we should strive for no downstream inheritance in the upstream.

Good point, @nknize; at this point it's mostly for convenience. I suppose we could remove them and make everyone instantiate and configure their own. We also have getActions() integrated with client.execute() acting the same on all the clients; removing the dependency would have users passing two clients around, a sdkClient for the actions and their own client for REST calls.

@dbwiddis
Copy link
Member

Downside: makes the opensearch-sdk-java implementation look completely different than opensearch-sdk-go or opensearch-sdk-python or opensearch-sdk-insertlanguagehere. All of those will necessarily be separate repos. Why should java have special status?

Simplifies testing OpenSearch/SDK with in the same CI's

Only for the Java SDK. Isn't one of the primary reasons for moving to clients to have generated code and integration tests that work cross-language? We need to design IT from the start to recognize the connection.

@nknize
Copy link

nknize commented Mar 27, 2023

Why should java have special status?

I think we're talking about different mechanisms? The client SDK/API for communicating with the OpenSearch cluster vs the extensions API for building new features.

IMHO we should restrict extensions (new features) to be java only as other languages (e.g., kotlin) have a tendency to "compile" into unexpected jvm behaviors. So in that regard, java is special.

For the client SDK, I don't think we want any language to be special. However, the core is written in Java. So at some point maybe it's here you need a multi language SDK generator. Probably thats in java? Maybe not? (Protobuf may give us everything we need here?).

What we don't want is what we have today, the urge/need to monitor any downstream repos when breaking changes are made to core (public or internal API). Core should be isolated.

Maybe it's best to start by identifying the needs here (so we can establish a common foundation of requirements) and then discuss the implementation details?

I'll start by throwing some thoughts here:

  1. OpenSearch breaking changes shall be detected early in the core without monitoring downstream repos.
  2. Plugins shall only depend on the SDK (and any transitive SDK dependencies)
  3. Downstream repos shall never take a dependency on the concrete :server jar (currently the fat opensearch-M.m.x jar. This is an awful requirement today.)
  4. Client and Extension APIs shall be versioned. (maybe independently? e.g., a 3.4 client SDK supports 2.0 transport API calls)
  5. Internal classes shall be isolated (free to break w/o affecting downstream)
  6. LTS shall be provided across APIs.
  7. Public SDKs shall support a path to deprecate and remove stale APIs w/ deprecation logging.

@dbwiddis
Copy link
Member

What we don't want is what we have today, the urge to monitor downstream repos when breaking changes are made to core. It should be isolated.

Agreed but possibly with multiple meanings of "isolated". But it should not be the responsibility of OpenSearch to monitor downstream repos. OpenSearch should follow semver. And we do. We all know main can break at any time and that is the risk we take developing against a snapshot. If we don't want to break, we can develop against a release.

The same work needs to be done to ensure compatibility, the only question is which team is doing the work.

What I think we want (and are getting) is a heads-up of major changes on main, but it should be our responsibility to monitor OpenSearch, not the other way around.

With regard to your points: 1 - Big yes. 2 - Yes (extensions, not plugins!). 4 - Yes. 5 - Yes (part of SemVer is defining what is API.) 6 - yes but needs a bit more specificity. 7 - yes.

  1. Yes, but we need some clear specifics on just what is being replaced by what, when. Specifically, Extensions make heavy use of Request/Response classes. There are at least 3 versions of most of those classes floating around, one in the Clients, and two different ones in OpenSearch in different packages. And there are subtle differences between them. This is crazy. But we do have some Request/Response we currently expose in API in Rest Handling (perhaps we need to add another layer of copying so those become internal?). So if we're going to define "never a dependency on server" I'd really like to know where to put these.

Adding one to your list:

  1. We need automated CI to detect breaking changes at the earliest possible point, preferably when a PR is submitted. Downstream projects need a way to CI test against "what will main look like when this is merged".

More thoughts:

There are a few things we do that somewhat relate to this but not "together":

  • We have a "code freeze" before a release, and we should logically expect major changes to happen earlier in the release cycle to allow time to adapt before the next code freeze
  • We have release notes/change log saying what's changed but they don't often distinguish between benign changes and "OMG we just moved a whole package, so everything will break, but don't worry, just search and replace Foo with Bar."

Final observations:

We've developing SDK for months and have an integrated "hello world" extension that has given us "early warning" when we broke things. However, this weekend I tried to do an extension from scratch from a completely different repo and ran into a few user-impacting issues (didn't declare api transitive dependencies, didn't include source and javadoc jars in our releases) that we simply wouldn't have (and haven't) noticed. When you're too tightly integrated you miss things that you accidentally connect when they aren't supposed to be connected.

@nknize
Copy link

nknize commented Mar 27, 2023

  1. Yes, but we need some clear specifics on just what is being replaced by what, when. Extensions make heavy use of Request/Response classes.

Action and Transport are being refactored out of :server into its own library. Likely by 2.8.

We have a "code freeze" before a release, and we should logically expect major changes to happen earlier in the release cycle.

Meh. Not really. Code freeze is there for a reason. As long as changes are made by code freeze the time after freeze is suppose to be "bake and fix" time. Imposing artificial rules like "get your BIG changes in early" creates a slippery slope we don't want to go down.

When you're too tightly integrated you miss things that you accidentally connect when they aren't supposed to be connected.

💯. Exactly why we don't want tight integration with the core. min distribution should be standalone. Imagine if Lucene had to monitor every downstream (Solr, Elasticsearch, Opensearch, etc) when we make an API change. We're not trailblazing here.

@nknize
Copy link

nknize commented Mar 27, 2023

OpenSearch should follow semver. And we do.

Kind of, but not really. Following semver would mean downstream repos could independently version and specify their opensearch dependencies using true semver semantics in package.json like

"dependencies": {
  "opensearch-sdk": "^2.0.0",
  "opensearch-knn": "~2.2.0"
},

That's where we should be moving towards. Similar to npm, k-NN should be free to release 2.9 tomorrow even if core is only releasing 2.8 in the coming weeks. But we're still working on this whole "bundle" concept that, quite frankly, just doesn't scale and perpetuates this downstream/upstream issue.

We should be working toward a world where OpenSearch consists of a core API and SDK (like lucene) and the concrete "distributed system" (like Solr) is just one implementation of those APIs.

Case in point, Solr only just released 9.2 today, and Lucene is close to releasing 9.6.

@dbwiddis
Copy link
Member

Took a day to think about this and summarize my thoughts:

  • In the not too distant future (3.x) it would be ideal for SDK to not depend on OpenSearch at all. The Request/Response classes I complained about in a previous comment can be specified in text files that auto-generate the necessary transport code (e.g., protobuf).
  • That puts us much in the same category as the clients with auto-generated code based on the REST spec
  • The only remaining "dependency" we have, which is a bigger nut to crack, is the core XContent (and in theory NamedWriteables although I would hope protobuf solves those somehow) which is really just some object description that can be exported to text (e.g., YAML) and also auto-generated (although I'm not sure how yet).

So moving to core will help us not break in the near term (2.x) but I don't think it's the best plan for the longer term. Better for us to establish rigorous CI that runs daily (or more often) to test for breakage... and to have a central communication channel (perhaps just a tag on PRs we can subscribe to) for potential breaking changes.

@nknize
Copy link

nknize commented Mar 29, 2023

In the not too distant future (3.x) it would be ideal for SDK to not depend on OpenSearch at all.

I'm curious about this in the context of the aggregations scenario in #5910. e.g., Defining a new values source, new aggregator, aggregation builder, etc. If you're suggesting we generate all of that from an IDL of some sort (hello CORBA) and have the concrete :server module and SDK depend on that spec, then where does the IDL live? A dependency repo 🤮? I'd advocate that spec live in the core so we can decouple repos and have one core that builds, versions, and releases independently. Downstream repos then also independently version taking opensearch dependencies (using true semver semantics) that are built and released from that one repo.

Let's simplify. One repo to build the core and validate BWC, not a bunch of separate repos. Orchestrating CI scaffolding across multiple repos is a fun project for some folks, but ultimately it's an Open Source nightmare where easy projects win over complicated ones. In that vain the whole idea of a "bundle" needs to go away. With downstreams independently versioning we enable those repos to release on their own terms. Those downstreams can then exist in their foundation of choice without tightly coupling to the OpenSearch project at all. Extension installation should leverage the OSS community mechanisms that exist today; maven repos, arch package repos, docker hub, charmhub.

This is the point of open source, the project itself shouldn't own building it all, it should focus on enabling the community to build what they want without requiring heavy scaffolding. It should be as easy as, git clone opensearch_url, open [IntelliJ, Eclipse, VSCode, ....whatever], ./gradle build (or sbt or whatever build systems the community wants to add support for).

@dbwiddis
Copy link
Member

In the not too distant future (3.x) it would be ideal for SDK to not depend on OpenSearch at all.

I'm curious about this in the context of the aggregations scenario in #5910. e.g., Defining a new values source, new aggregator, aggregation builder, etc. If you're suggesting we generate all of that from an IDL of some sort (hello CORBA) and have the concrete :server module and SDK depend on that spec, then where does the IDL live? A dependency repo 🤮?

That's a small problem. It's fine in the repo, we just need a consistent way of copying over the text files into <insert language of choice code generation> workflow.

How are we doing it for the spec-based clients? I'm envisioning something similar, whatever that looks like.

Let's simplify. One repo to build the core and validate BWC, not a bunch of separate repos.

Thats fair. But a python SDK doesn't need the whole (Java) JAR, it just needs a copy of the spec files. If those are easily extracted from a jar (which is really just a zip) that's totally fine.

This is the point of open source, the project itself shouldn't own building it all, it should focus on enabling the community to build what they want without requiring heavy scaffolding. It should be as easy as, git clone opensearch_url, open [IntelliJ, Eclipse, VSCode, ....whatever], ./gradle build (or sbt or whatever build systems the community wants to add support for).

Sure, I'm not opposed to keeping stuff simple for the Java ecosystem. But the design needs to make it easy for non-Java too.

@owaiskazi19
Copy link
Member

owaiskazi19 commented Mar 31, 2023

Thanks @saratvemulapalli for writing this up.

Upside: Can detect the breaking changes of OpenSearch, fix it in SDK and then extensions can consume the same.

Downside (which has been called out by @dbwiddis): If we are planning to support multi language SDKs, where does other SDKs will exist?

SDK should be treated as a library which any extension is able import and utilize the extension points APIs present to hook on to OpenSearch. Keeping it in a separate repository will also bring an overhead of adding gradle build of SDK in OpenSearch's gradle check to figure out the new changes of OpenSearch doesn't break anything on SDK.

If we are planning to have SDK just for Java then this makes sense for sure but still brings up the question where does the other languages SDKs will exist?

@dblock
Copy link
Member

dblock commented Apr 25, 2023

Sorry for the late reply. I feel pretty strongly that the SDK should not depend on OpenSearch core at all. One of the desired features is compatibility with multiple versions of OpenSearch, which is difficult to achieve within a specific version of the upstream for minor versions, and impossible to achieve across major ones.

I suggest closing the issue.

@saratvemulapalli
Copy link
Member Author

I was chatting with @dbwiddis. To get benefits on both sides, we could build a thin layer (DSL) which lies in OpenSearch outside of server which will be the contract for all interactions with OpenSearch.
SDK will still be independent and will rely on this just like clients, ingestion tools, plugins etc depend on.
WDYT? @dblock @nknize @dbwiddis @owaiskazi19

@dblock
Copy link
Member

dblock commented Apr 25, 2023

Let's address the original issue as written, which suggests merging the SDK to OpenSearch. That sends us back to the tightly coupled, but better organized core, now we would need to rely on discipline such as the SDK doesn't accidentally reuse something from core and bring the kitchen sink in. So I still think we need to close this issue.

Now, what problem do we need to solve? How the SDK code can call the server's REST endpoints?

@saratvemulapalli
Copy link
Member Author

Thanks @dblock and maintainers who chimed in.
I see there are more votes with not inclined. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants