Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement proxy mode #860

Merged
merged 25 commits into from
Sep 6, 2022
Merged

Implement proxy mode #860

merged 25 commits into from
Sep 6, 2022

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Aug 4, 2022

Issue: #770

This PR implements the proxy mode to connect to EPR and combine remote resources with existing local packages.

Testing:

go run . --feature-proxy-mode --log-level debug

@mtojek mtojek self-assigned this Aug 4, 2022
@elasticmachine
Copy link

elasticmachine commented Aug 4, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-06T08:24:40.458+0000

  • Duration: 5 min 1 sec

Test stats 🧪

Test Results
Failed 0
Passed 213
Skipped 0
Total 213

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

dependabot bot and others added 9 commits August 24, 2022 12:51
…ic#861)

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.2 to 1.13.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.12.2...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.21.0 to 1.22.0.
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](uber-go/zap@v1.21.0...v1.22.0)

---
updated-dependencies:
- dependency-name: go.uber.org/zap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update favicon file with the new Elastic Package Registry logo.
A new "img" folder has also been added to include other
available formats of this new logo (png, svg and ico).
Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.24.0 to 1.25.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@pubsub/v1.24.0...spanner/v1.25.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#866)

Bumps [github.com/fsouza/fake-gcs-server](https://github.com/fsouza/fake-gcs-server) from 1.38.3 to 1.38.4.
- [Release notes](https://github.com/fsouza/fake-gcs-server/releases)
- [Commits](fsouza/fake-gcs-server@v1.38.3...v1.38.4)

---
updated-dependencies:
- dependency-name: github.com/fsouza/fake-gcs-server
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@mtojek
Copy link
Contributor Author

mtojek commented Aug 25, 2022

@dependabot merge upstream

@mtojek mtojek marked this pull request as ready for review August 25, 2022 16:13
@mtojek mtojek requested a review from a team August 26, 2022 11:50
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some comments.

My only concern is about needing to add logic to all handlers for this. I wonder if we can implement it as an indexer, added a comment about this. In any case I would be also ok if you prefer to don't use indexers for this.

proxymode/proxymode.go Outdated Show resolved Hide resolved
proxymode/proxymode.go Outdated Show resolved Hide resolved
proxymode/proxymode.go Outdated Show resolved Hide resolved
proxymode/proxymode.go Show resolved Hide resolved
search.go Outdated Show resolved Hide resolved
search.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
if err != nil {
logger.Error("getting package path failed", zap.Error(err))
http.Error(w, "internal server error", http.StatusInternalServerError)
return
}
if len(packages) == 0 {
if len(pkgs) == 0 && proxyMode.Enabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like so much that we need to add logic to all handlers for the proxy mode.

I guess that another option would be to implement it as an indexer, but as indexers are now we would be making unneeded calls to the remote registry if there is already a local package that matches. Is this the reason to don't use indexers?

Maybe a way to overcome this could be to add to the indexer interface a GetFirst() (*packages.Package, error) method, that could be used by indexers to optimize the [several] cases where we only want the first package. Most indexers would implement GetFirst() as returning the first package in the result of Get(), but the CombinedIndexer could implement it as breaking the loop once it finds a package in one indexer.
Or a similar approach would be to add a new MaxResults option to GetOptions, to return only the first n packages found. But an option can be ignored/forgotten, while a method in the interface needs to be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that another option would be to implement it as an indexer, but as indexers are now we would be making unneeded calls to the remote registry if there is already a local package that matches. Is this the reason to don't use indexers?

I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with all=true) on every API call. Responses would be heavy and heavier in the future. I wouldn't like to implement an artificial cache that syncs periodically with the remote Package Registry, but I'm happy to discuss other options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with all=true) on every API call.

Would this still be true with the GetFirst() approach? The logic would be similar to what is done here in each handler, but it would be centralized in the CombinedIndexer: it would return the first package found without needing to look for all indexers, so it wouldn't call the proxy for packages available locally. And when it calls the proxy, it would be with the parameters of the request, not with all=true.

Another advantage of using an indexer is that we can add it only when the feature is enabled, without needing to check on every call if the proxy is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't get me wrong, I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.

The problem with an indexer is that it isn't aware of the calling context and it can't return correct/full package metadata. Consider following flows:

Search packages:

  1. Call indexer.Get(options):
    1.1 Mashal the filter to query.
    1.2 Make a call /search?all=true (heavy HTTP response)
    1.3 Apply the filter.
    1.4 Fill private (fsBuilder, resolver, semver) or skipped fields (BasePath)

Problem: Unfortunately, returned packages don't contain all properties of policy templates (Package vs BasePackage), so the result returned by indexer is always incorrect.

Package resources/signature:

  1. Call indexer.Get(options):
    1.1 Mashal the filter to query - based on the filter we should figure out to which endpoint we should proxy request?
    1.2 Make a call, for example: /package/apm/8.4.0/
    1.3 Apply the filter.
    1.4 Fill private (fsBuilder, resolver, semver) or skipped fields (BasePath)

Categories:

  1. Call indexer.Get(options):
    1.1 Mashal the filter to query.
    1.2 Make a call /search?all=true (heavy HTTP response)
    ...

Problem: Now, we need to pull all packages to map them into categories. We need to visit also categories in policy templates, BUT the /search endpoint don't expose them (all categories are group together in categories: [ ... ]. For the sake of /categories, we could use those buggy packages, but they would work accidentally (we don't analyze policy templates).

I might have missed something while I tried to implement this approach. Feel free to share your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.

Yeah, it is a bit beyond of their concept to use indexers for this. More than specifically using them, I was wondering about a way to separate more the proxy mode without needing to modify all the handlers, so when disabled no additional code is executed.

To search packages I don't think that we would need to search all and filter, we could translate the options.Filter into an equivalent search query, and wouldn't be needed to filter afterwards. From the result the private fields would be filled as they are now in this PR. https://github.com/elastic/package-registry/pull/860/files#diff-d3d83ca5e1802bcf5d721013c2c366abb0d682c3e100e4307630c34f85cf6f89R93-R100

Though for categories it is true that we would need to search for all packages, and even this could be inaccurate as you mention.

Two more ideas to separate the proxy mode:

  • Use a middleware in the router. This middleware would need to be able to handle every API endpoint, capture the response from the actual handlers in the response writer and interpret it, decide if needed to make a request to the proxified registry, and merge the results. May be cumbersome and not sure if always possible, but would allow to inject the proxy only when enabled.
  • Follow an approach similar to the storage indexer, each registry has a new API to expose an "index" of the packages it serves. Proxy registries can periodically query this API and update their local index.

But we can continue with the current approach, every option seems to have its cons. Thanks for the discussion!

proxymode/proxymode.go Outdated Show resolved Hide resolved
@mtojek mtojek requested a review from jsoriano August 29, 2022 10:14
main.go Outdated Show resolved Hide resolved
proxymode/proxymode.go Outdated Show resolved Hide resolved
if err != nil {
logger.Error("getting package path failed", zap.Error(err))
http.Error(w, "internal server error", http.StatusInternalServerError)
return
}
if len(packages) == 0 {
if len(pkgs) == 0 && proxyMode.Enabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with all=true) on every API call.

Would this still be true with the GetFirst() approach? The logic would be similar to what is done here in each handler, but it would be centralized in the CombinedIndexer: it would return the first package found without needing to look for all indexers, so it wouldn't call the proxy for packages available locally. And when it calls the proxy, it would be with the parameters of the request, not with all=true.

Another advantage of using an indexer is that we can add it only when the feature is enabled, without needing to check on every call if the proxy is enabled.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's continue with current approach, unless my last comment rings some bell on how to do it without modifying the handlers 🙂 Thanks for the discussion!

if err != nil {
logger.Error("getting package path failed", zap.Error(err))
http.Error(w, "internal server error", http.StatusInternalServerError)
return
}
if len(packages) == 0 {
if len(pkgs) == 0 && proxyMode.Enabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.

Yeah, it is a bit beyond of their concept to use indexers for this. More than specifically using them, I was wondering about a way to separate more the proxy mode without needing to modify all the handlers, so when disabled no additional code is executed.

To search packages I don't think that we would need to search all and filter, we could translate the options.Filter into an equivalent search query, and wouldn't be needed to filter afterwards. From the result the private fields would be filled as they are now in this PR. https://github.com/elastic/package-registry/pull/860/files#diff-d3d83ca5e1802bcf5d721013c2c366abb0d682c3e100e4307630c34f85cf6f89R93-R100

Though for categories it is true that we would need to search for all packages, and even this could be inaccurate as you mention.

Two more ideas to separate the proxy mode:

  • Use a middleware in the router. This middleware would need to be able to handle every API endpoint, capture the response from the actual handlers in the response writer and interpret it, decide if needed to make a request to the proxified registry, and merge the results. May be cumbersome and not sure if always possible, but would allow to inject the proxy only when enabled.
  • Follow an approach similar to the storage indexer, each registry has a new API to expose an "index" of the packages it serves. Proxy registries can periodically query this API and update their local index.

But we can continue with the current approach, every option seems to have its cons. Thanks for the discussion!

@mtojek mtojek merged commit 21d510b into elastic:main Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants