Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] [Gradle Checks] Allow annotation of InternalApi on classes not meant to be consumed outside of OpenSearch #13308

Closed
gbbafna opened this issue Apr 19, 2024 · 36 comments · Fixed by #14597
Assignees
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@gbbafna
Copy link
Collaborator

gbbafna commented Apr 19, 2024

Is your feature request related to a problem? Please describe

Coming from #13275 (comment)

RemoteStorePathStrategy is an internal API and is not supposed by to used/extended by plugins . However we can't mark it internal api .

error: The element org.opensearch.index.remote.RemoteStorePathStrategy is part of the public APIs but is marked as @InternalApi (referenced by org.opensearch.index.IndexSettings) 
error: The element org.opensearch.index.remote.RemoteStorePathStrategy is part of the public APIs but is marked as @InternalApi (referenced by org.opensearch.index.snapshots.blobstore.RemoteStoreShardShallowCopySnapshot) 

RemoteStorePathStrategy's constructor has to be public as RemoteStoreShardShallowCopySnapshot needs to create it .

Some more context : #13244

To get around these issues, all the classes we introduced had to marked as PublicAPI , even though they were not meant to be consumed outside of OpenSearch. Whenever we make changes to these classes, the detect breaking changes workflow will fail , even though these are not consumed outside of OpenSearch core.

Describe the solution you'd like

In cases like this, we should allow the class/function etc to be marked as InternalApi.

Related component

Build

Describe alternatives you've considered

Do nothing . In such cases such classes will be PublicApi and we will get a lot of breaking changes, even they are not actually.

Additional context

No response

@gbbafna gbbafna added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 19, 2024
@github-actions github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Apr 19, 2024
@reta
Copy link
Collaborator

reta commented Apr 19, 2024

@gbbafna thanks, as we discussed, the scope of the suggested change is to support the @InternalApi on constructors to highlight the fact that instantiation of that particular type is not supposed to be done by outside of the OpenSearch.

The @PublicApi / @ExperimentalApi / ... annotations are required to any type that is exposed through to the plugins or/and extensions. The check we have does exactly that - it starts from annotated plugin interfaces and traverse all affected types.

Allowing wide use of @InternalApi unrestrictedly completely defeats the purpose.

@gbbafna
Copy link
Collaborator Author

gbbafna commented Apr 19, 2024

Why do we need to restrict it to constructors ? In above PR , all the members , getters etc are not supposed to be accessed from outside ?

I do agree that we shouldn't allow wide use of @InternalApi , but the usecase we saw above won't get solved with that .

@reta
Copy link
Collaborator

reta commented Apr 19, 2024

I do agree that we shouldn't allow wide use of @InternalApi , but the usecase we saw above won't get solved with that .

This is the reason, it is impossible to restrict once open wide

@gbbafna
Copy link
Collaborator Author

gbbafna commented Apr 22, 2024

Got it . We can always start with support constructors the @internalapi on to highlight the fact that instantiation of that particular type is not supposed to be done by outside of the OpenSearch. Then we can see how it is shaping for us .

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@gbbafna Thanks for creating this issue; however, it isn't being accepted due to not having a clear reproduction and expectations. Please feel free to open a new issue after addressing the reason.

@gbbafna
Copy link
Collaborator Author

gbbafna commented May 20, 2024

Reopening this as we faced the exact same issue in #13663 . see draft PR

RemoteStorePathStrategy is an internal API and is not supposed by to used/extended by plugins . However we can't mark it internal api .

error: The element org.opensearch.index.remote.RemoteStorePathStrategy is part of the public APIs but is marked as @InternalApi (referenced by org.opensearch.index.IndexSettings) 
error: The element org.opensearch.index.remote.RemoteStorePathStrategy is part of the public APIs but is marked as @InternalApi (referenced by org.opensearch.index.snapshots.blobstore.RemoteStoreShardShallowCopySnapshot) 

RemoteStorePathStrategy's constructor has to be public as RemoteStoreShardShallowCopySnapshot needs to create it .

@gbbafna gbbafna removed the untriaged label May 20, 2024
@peternied
Copy link
Member

peternied commented May 20, 2024

@gbbafna This issue was closed because it isn't clear what the problem is. Please update the description to describe what the specific problem is to focus the discussion. I might recommend closing this issue and creating a fresh issue that is specifically oriented to the specific problem.

For example, if the problem is that there are items that were included as part of the public API and they shouldn't be public? Then the question is why should a breaking change be made? I think we need to do a very low level dive into the nature of problem to know how to move forward on a solution

@vikasvb90 Do you have more context from the conversation during of last week's triage?

@gbbafna
Copy link
Collaborator Author

gbbafna commented May 21, 2024

Please update the description to describe what the specific problem is to focus the discussion.

Updated it . I don't want to reopen another issue as the context will get lost in multiple issues. .

For example, if the problem is that there are items that were included as part of the public API and they shouldn't be public? Then the question is why should a breaking change be made?

The problem is that we are forced to mark these as public API even when it is not . Also if we don't break it , it will lead to lot of unnecessary complexities and dead code in the core, which will decrease its maintainability and readability.

@vikasvb90
Copy link
Contributor

100% agreed with @gbbafna! Maintenance of the dead code if not paid attention to can lead to runtime problems. Since the dead code would be exposed to plugins, they can start using it which is say might no longer be working.

As discussed in the open community maintainers channel, this is a serious issue and require an immediate attention. We don't know how much dead or breaking code has already been pushed in OpenSearch to add workarounds for the success of the breaking change workflow.

@peternied
Copy link
Member

@gbbafna @vikasvb90 It sounds like we annotated RemoteStorePathStrategy with @PublicAPI and released it, but now we need to remove it from @PublicAPI to avoid (?) unnecessary exposure and maintainability issues. As I understand it, customers can see and access RemoteStorePathStrategy as it was shipped in 2.14.

Here is our projects stance on breaking changes to frame where I am coming from:

OpenSearch follows semver, which means we will only release breaking changes in major versions. All minor versions are compatible with every other minor version for that major. For example, 1.2.0 will work with 1.3.2, 1.4.1, etc, but may not work with 2.0.


I see the following potential outcomes:

Maintain the Interface

We could refactor the class to use a facade. This would help encapsulate the internal API and make it easier to manage changes without exposing them publicly. The facade would act as an intermediary, exposing only the necessary parts of the API while hiding internal details.

Create OpenSearch 3.0.0:

Introduce breaking changes following semver guidelines. This could involve making necessary modifications to the class or removing the PublicAPI annotation entirely. This approach would be suitable for a major version release.

Release a Breaking Change Without Following Semver:

This is generally not advisable as it will burn trust with our users that rely on the current versioning scheme and compatibility promises.


Can you help me understand what the issues with RemoteStorePathStrategy that would require breaking changes and how that should impact the approach we take?

@vikasvb90
Copy link
Contributor

vikasvb90 commented May 21, 2024

@peternied

It sounds like we annotated RemoteStorePathStrategy with @publicapi and released it, but now we need to remove it from @publicapi to avoid (?) unnecessary exposure and maintainability issues.

This isn't the issue. Issue is that gradle forced us to annotate the class RemoteStorePathStrategy as public API even when it wasn't supposed to because it was indirectly referenced by components of repository plugin interface. And now with the new breaking change workflow, any change in the methods of this implementation would break the workflow. Gaurav just took an example of RemoteStorePathStrategy and there are many more such classes.

@andrross
Copy link
Member

The problem is that we are forced to mark these as public API even when it is not

@gbbafna @vikasvb90 It's worth diving into this a bit. The technical definition of a "public API" is that any API that is accessible via Java public methods exposed via anything deliberately exposed to plugins via a org.opensearch.plugins entry point is public. So we should be clear that when you say it is not a public API, it is in fact publicly accessible to plugins, it's just that these classes were never designed to be used to plugins and are not expected to be useful to them in any way.

Now, given the extremely wide range of dependencies exposed via these APIs and the limited flexibility of Java access modifiers (at least in the pre-JPMS world we live in), it is extremely hard not to expose things that are logically intended not to be part of the plugin API. It was exactly this problem that caused us to introduce the annotations and the related enforcement mechanism. The previous problem was that we were basically relying on core developers to know the difference between "publicly accessible but not intended to be used and not actually used by plugins" versus "actually used by plugins" and only make signature changes to the former case. This didn't work well so we introduced a hardline enforcement mechanism that says if a thing is publicly visible then it must be supported until the next major version. What you're asking for here is a mechanism to opt back into developers getting to choose that some publicly visible classes shouldn't be used by plugins so we should be free to change them. How do we do this without reverting back to the same problem we had before the enforcement mechanism was introduced?

@vikasvb90
Copy link
Contributor

What you're asking for here is a mechanism to opt back into developers getting to choose that some publicly visible classes shouldn't be used by plugins so we should be free to change them.

Nope, we are not asking this. Take the example of class RemoteSegmentStoreDirectory. Do you think it should be publicly exposed? It shouldn't be but we still had to make it public because gradle didn't allow making it non-public. To be clear, we are not asking a provision to allow making changes in public methods. Ask is to prevent the wrongful restriction of annotating some supposedly non-public classes with public API annotation.

@peternied
Copy link
Member

peternied commented May 21, 2024

@vikasvb90 I think we can agree IndexSettings which is public and shouldn't have breaking changes, no?

RemoteSegmentStoreDirectory requires a @PublicAPI or @ExperimentalAPI annotation because IndexSettings has @PublicAPI and has a property that returns a type of RemoteSegmentStoreDirectory. This is the design of these annotations and how they are to be used together.

@reta
Copy link
Collaborator

reta commented May 21, 2024

Nope, we are not asking this. Take the example of class RemoteSegmentStoreDirectory. Do you think it should be publicly exposed?

@vikasvb90 this is the question you have to ask yourself at design time and not trying to circumvent the guardrails. The IndexShard leaks RemoteSegmentStoreDirectory through tons of public methods (there are many other places), so this is not the question you should have been asking but "Why RemoteSegmentStoreDirectory leaks into public API while it probably should not?"

@vikasvb90
Copy link
Contributor

Design time was the time before public API annotations were introduced. Now, since IndexShard is already public, there is nothing much to be done.

@reta
Copy link
Collaborator

reta commented May 21, 2024

Design time was the time before public API annotations were introduced

Again, the annotation is irrelevant to design, it helps at lowest level when design is flushed into code, but that is about it.

Now, since IndexShard is already public, there is nothing much to be done.

This is the point: think about change and its impact, this is what design means.

@vikasvb90
Copy link
Contributor

think about change and its impact, this is what design means.

This is much more relevant to design and implementation plan of annotations than the design of remote directory abstractions I believe. And as @andrross pointed earlier that before annotations, every public construct was essentially exposed to plugins, so before bringing in something to denote constructs as truly public, there should been a plan to refactor or move things around which are not meant to be public. IndexShard is used in various internal classes as well so lets say if as part of a feature, a new method is required to be introduced in the class which returns an abstraction to be used specifically by only internal classes, so then you will make the abstraction as well as its implementation a public API, right? To me it doesn't make any sense.

This has been discussed earlier also in the open channel where we came to the conclusion of fixing this problem. I will let others share their opinion.
cc: @Bukhtawar @sachinpkale @ashking94

@andrross
Copy link
Member

Take the example of class RemoteSegmentStoreDirectory. Do you think it should be publicly exposed?

@vikasvb90 It is publicly exposed. The Java access modifier + the way it is exposed through other classes is what is making it a public API. The annotation is just documenting this fact. The annotations do not determine whether something is public or not. This choice was made at design time (deliberately or not). I do believe you're asking for the ability to change this after the initial design to expose the thing publicly has already been released.

I'm not completely opposed to implementing the ability to reign in the public API after the fact on a case-by-case basis. But I do worry that simply allowing developers to tag public things with @InternalApi would completely dilute the utility of the annotation/enforcement mechanisms and we would be back in the same state that existed before they were introduced.

@vikasvb90
Copy link
Contributor

vikasvb90 commented May 22, 2024

@andrross I understand that we shouldn't be making changes to existing public methods or constructs exposed in a class annotated with public API annotation and I am not proposing the same as well. We shouldn't be allowing changes to anything already exposed as public API. I am not contending this fact. Also, the references which are present in the issue are just for understanding purpose and we are not asking to build something which allows changing them. The damage of exposing them as well as many other internal constructs/methods as public API is already done and there is nothing much we can do about it before the next major version.

The point we are trying to highlight is the inability to mark a new abstraction as internal if it is a part of a public API. We need a provision to make a new abstraction and its implementation Internal if it is going to be a part of public API and if it doesn't make sense to expose the new abstraction as public. Today, you can't mark any new abstraction as internal if it is going to sit in a public API marked class. Again, none of us are proposing to make any change to the existing public API annotated constructs, it is only about handling future abstractions and implementations.

The annotation is just documenting this fact.

I completely understand what it is intended for.

@dblock
Copy link
Member

dblock commented May 22, 2024

The point we are trying to highlight is the inability to mark a new abstraction as internal if it is a part of a public API.

While marking the abstraction internal there will be no way to prevent its leaking and being used in the future via a public API other than the good intentions, no?

@vikasvb90
Copy link
Contributor

@dblock the use of public API annotation itself is a good intention and nothing more.

@andrross
Copy link
Member

it is only about handling future abstractions and implementations.

The answer for future things is to use Java access modifiers to make sure it is not exposed in the public API, no? I get this can be awkward given the limited flexibility but I do believe it should be possible in pretty much all cases.

the use of public API annotation itself is a good intention and nothing more.

Everything exposed to the plugin interface must be annotated with @PublicApi or @ExperimentalApi or else the build fails. This is a mechanism that forces you to think about the API surface area being exposed, or else refactor your code to not expose it.

@vikasvb90
Copy link
Contributor

The answer for future things is to use Java access modifiers to make sure it is not exposed in the public API, no? I get this can be awkward given the limited flexibility but I do believe it should be possible in pretty much all cases.

I don't think it should be possible in any way for accessing in a package outside the package of public API annotated class. So, most of the cases are not handled. :)

This is a mechanism that forces you to think about the API

@dblock Raised a point that a mechanism to mark a method as internal will leak this method to plugins and my point was that marking a class as internal or public is not preventing it in any way.

@Bukhtawar
Copy link
Collaborator

There are inconsistencies in how classes were annotated to start with and the enforcement locks down any good first-step refactoring in this regard. For example
BufferedChecksumStreamOutput is marked as public while BufferedChecksumStreamInput is marked as internal which seems wrong to me, it has the potential to break ser/de

Any suggestions @andrross on how do we fix this and move this libs/common and unblock the PR backport which I think is a step in the right direction. Or do you think we need to hold it until 3.0?
cc: @himshikha

/**
* Similar to Lucene's BufferedChecksumIndexOutput, however this wraps a
* {@link StreamOutput} so anything written will update the checksum
*
* @opensearch.api
*/
@PublicApi(since = "1.0.0")
public final class BufferedChecksumStreamOutput extends StreamOutput {
private final StreamOutput out;
private final Checksum digest;

/**
* Similar to Lucene's BufferedChecksumIndexInput, however this wraps a
* {@link StreamInput} so anything read will update the checksum
*
* @opensearch.internal
*/
public final class BufferedChecksumStreamInput extends FilterStreamInput {
private static final int SKIP_BUFFER_SIZE = 1024;

@andrross
Copy link
Member

andrross commented Jun 6, 2024

There are inconsistencies in how classes were annotated to start with

@Bukhtawar The annotations were applied by starting with all the official "plugin" classes (everything in org.opensearch.plugins) and finding everything exposed via a public modifier (this was done via a tool, not manually, so no judgement calls were made here). In the case of BufferedChecksumStreamOutput, it is exposed via org.opensearch.plugins.Plugin -> IndexModule -> TranslogFactory -> Translog. BufferedChecksumStreamInput must not be exposed via any public signature so it remains internal.

the enforcement locks down any good first-step refactoring in this regard

The enforcement as it is implemented does preclude us from overriding the check and making a breaking change because we're sure that while technically visible the thing is not actually being used by any plugin. We have to do work-arounds to ensure the same public signature remains visible, even if it results in dead code. I'm not opposed to implementing some sort of override mechanism, but we need to do it in a way that doesn't result in the previous state where we were frequently making changes the would break downstream consumers. I'm not sure exactly how to do that, though.

@reta
Copy link
Collaborator

reta commented Jun 6, 2024

@Bukhtawar The annotations were applied by starting with all the official "plugin" classes (everything in org.opensearch.plugins) and finding everything exposed via a public modifier (this was done via a tool, not manually, so no judgement calls were made here).

💯

@Bukhtawar we have only annotated a small portion of overall codebase, if you want to eliminate inconsistencies - please fix those by annotating the classes with correct annotations.

@bharath-techie
Copy link
Contributor

I'm facing the same issue in one of my PRs where we need to modify constructors of indexShard, indexService, indicesService , indexModule etc. So we are introducing a new constructor. So for every release , we might have a new constructor, which results in dead code.

Open PR - Star tree mapping changes - #14261
Merged PR - File cache PR - #12782

#5618 - One suggestion is to use builders similar to EngineConfig.
But there were concerns around how to enforce required parameters with builder.
cc : @sachinpkale @gbbafna

As seen in earlier comments of this issue, are we going ahead with support @internalapi annotation for constructors ?

@reta @andrross @Bukhtawar

@andrross
Copy link
Member

As seen in earlier comments of this issue, are we going ahead with support @internalapi annotation for constructors ?

@bharath-techie I think there was agreement to support that annotation for constructors. Would you like to put up a PR to implement that?

@vikasvb90
Copy link
Contributor

The problem is not specific to a constructor but to any other new method or a new class which is supposed to be internal but due to the enforcement by recursive publicAPI annotation, it isn't.

@andrross
Copy link
Member

new method or a new class which is supposed to be internal but due to the enforcement by recursive publicAPI annotation, it isn't.

But any new class can be made not publicly accessible? I'm still confused about this. The annotation isn't making the thing public, the design is.

@vikasvb90 If you have a solution that solves the problem as you see it and addresses the concerns expressed above, please do put out a PR so we can debate the specifics.

@vikasvb90
Copy link
Contributor

But any new class can be made not publicly accessible?

No, it can't be always! Try adding a new class and make it a part of a public class directly or indirectly. If something is public then it doesn't mean that everything in it should be public. A public entity can be composed of direct or indirect internal entities as well. The design is currently not addressing this point and all of the mentioned problems above revolve around this flaw. If you look at the existing opensearch code, you will find a number of classes marked as publicAPI which do not fit in a public category.

Sure, I can think of a PR. I believe it would need code diff comparison to prevent marking an existing public method as internal and so that only new entities are marked as internal.

@reta
Copy link
Collaborator

reta commented Jun 26, 2024

The design is currently not addressing this point and all of the mentioned problems above revolve around this flaw.

@vikasvb90 please address these points in design and use the Java's language visibility rules to restrict the method / class accessibility. if class leaks through Plugins APIs / Extensions APIs, it means it is public, annotating individual methods would lead to complete unmaintainable mess, we should not do that.

@vikasvb90
Copy link
Contributor

@reta There are many examples in this conversation which makes it pretty clear that mess is already there in the code because of the annotation framework. Take the case which @Bukhtawar mentioned and try fixing it by removing or adding annotations on classes or by using Java modifiers and let us know here if you have a solution.

@reta
Copy link
Collaborator

reta commented Jun 26, 2024

As seen in earlier comments of this issue, are we going ahead with support @internalapi annotation for constructors ?

@bharath-techie yes, I think we agreed that this would be a simple solution, it allows us to denote the public API elements that are not supposed to be instantiated outside of the core. I will try to submit this option shortly so we could skip unnecessary code changes.

@bharath-techie
Copy link
Contributor

Thanks @reta for making the changes.

@reta reta added the v2.16.0 Issues and PRs related to version 2.16.0 label Jun 28, 2024
@reta reta self-assigned this Jun 28, 2024
@reta reta added the v3.0.0 Issues and PRs related to version 3.0.0 label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants