Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[es] Add remote read clusters option for cross-cluster querying #2874

Merged

Conversation

dgrizzanti
Copy link
Contributor

Which problem is this PR solving?

Resolves #1883
Updates to PR #1892

Short description of the changes

The cross cluster query api functions by appending a pre-configured remote cluster to an index e.g. cluster_one:index_name. This change simply applies a prefix for each remote cluster and for each index. You will end up with total_indicies = indicies * remote_clusters. Also added a special case local which will not append a prefix and query the local cluster.

This works with both standard and rollover indices since it mostly comes down to a simple string manipulation.

@dgrizzanti dgrizzanti requested a review from a team as a code owner March 11, 2021 19:16
@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #2874 (6557e5b) into master (78019b6) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2874      +/-   ##
==========================================
- Coverage   95.92%   95.89%   -0.03%     
==========================================
  Files         223      223              
  Lines        9685     9704      +19     
==========================================
+ Hits         9290     9306      +16     
- Misses        327      328       +1     
- Partials       68       70       +2     
Impacted Files Coverage Δ
plugin/storage/es/factory.go 97.64% <100.00%> (+0.02%) ⬆️
plugin/storage/es/options.go 100.00% <100.00%> (ø)
plugin/storage/es/spanstore/reader.go 100.00% <100.00%> (ø)
cmd/query/app/server.go 95.62% <0.00%> (-1.46%) ⬇️
plugin/storage/badger/spanstore/reader.go 95.37% <0.00%> (-0.72%) ⬇️
plugin/storage/integration/integration.go 77.90% <0.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78019b6...6557e5b. Read the comment docs.

if archive {
var archiveSuffix string
if useReadWriteAliases {
archiveSuffix = archiveReadIndexSuffix
} else {
archiveSuffix = archiveIndexSuffix
}
return func(indexName, indexDateLayout string, startTime time.Time, endTime time.Time) []string {
return addRemoteReadClusters(func(indexName, indexDateLayout string, startTime time.Time, endTime time.Time) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: while you're at it, please rename indexName to indexPrefix.

}
if useReadWriteAliases {
return func(indices string, indexDateLayout string, startTime time.Time, endTime time.Time) []string {
return addRemoteReadClusters(func(indices string, indexDateLayout string, startTime time.Time, endTime time.Time) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also while you're at it, for consistency and readability func(indexPrefix, indexDateLayout string, ... :)

// Add a remote cluster prefix for each cluster and for each index and add it to the list of original indices
func addRemoteReadClusters(fn timeRangeIndexFn, remoteReadClusters []string) timeRangeIndexFn {
return func(indexName string, indexDateLayout string, startTime time.Time, endTime time.Time) []string {
jaegerIndices := fn(indexName, indexDateLayout, startTime, endTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could short-circuit and reduce nesting with:

if len(remoteReadCluster) == 0 {
    return jaegerIndices
}

plugin/storage/es/spanstore/reader_test.go Show resolved Hide resolved
Comment on lines 163 to 164
// Elasticsearch cross cluster query api example: GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
// Add a remote cluster prefix for each cluster and for each index and add it to the list of original indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please end comment sentences with a full-stop.

Same as above though I understand this convention is not followed in the adjacent functions below it.

@dgrizzanti
Copy link
Contributor Author

@albertteoh thanks for the comments and suggestions. I think I covered all of them if you can take a look again. Thanks!

@albertteoh
Copy link
Contributor

@albertteoh thanks for the comments and suggestions. I think I covered all of them if you can take a look again. Thanks!

One more :) Also, it seems another commit was pulled into this PR: 5105237.

@@ -36,7 +36,7 @@ import (
)

const (
iterations = 30
iterations = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these changes brought in from #2873?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did rebase so may be they got pulled in by accident? I flip it back so it doesn't show up in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I had more luck with git merge and did recall having similar problems when using git rebase. Thanks 🙏

…ter querying

Co-authored-by: allen13 <kex.allen13@gmail.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
LostLaser and others added 7 commits March 13, 2021 16:52
* Increasing test timeout to allow for trace storage settle

Signed-off-by: Jacob Goldsworthy <jtgolds6@gmail.com>

* Updating max storage test wait time to 10 seconds

Signed-off-by: Jacob Goldsworthy <jtgolds6@gmail.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
…when remoteReadClusters is emtpy

Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
…ReadWriteAliases: true

Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
* Increasing test timeout to allow for trace storage settle

Signed-off-by: Jacob Goldsworthy <jtgolds6@gmail.com>

* Updating max storage test wait time to 10 seconds

Signed-off-by: Jacob Goldsworthy <jtgolds6@gmail.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
@@ -193,6 +195,9 @@ func (c *Configuration) NewClient(logger *zap.Logger, metricsFactory metrics.Fac

// ApplyDefaults copies settings from source unless its own value is non-zero.
func (c *Configuration) ApplyDefaults(source *Configuration) {
if len(c.RemoteReadClusters) == 0 {
c.RemoteReadClusters = source.RemoteReadClusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this line ever be hit? I believe c.RemoteReadClusters defaults to []string{""} because strings.Split() should result in a non-empty slice overwriting the initial empty slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertteoh - I didn't realize this nuance with strings.Split() but after looking at the docs, this makes sense. I didn't see a great (easy) way around this since the cli takes in strings, so I updated options.go to check the length and only split when necessary (otherwise default to []string{}). I added 2 tests cases around options as well to cover this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test cases to cover the empty string. I think there's also the following case: --es.remote-read-clusters=cluster_one,,cluster_two, where the second item is empty but I think we can consider this a user error.

}

for _, jaegerIndex := range jaegerIndices {
for _, remoteCluster := range remoteReadClusters {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the earlier comment, if remoteReadClusters defaults to []string{""}, the returned indices count would be doubled with the new indices looking like :<index-name>. Is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be avoided now with the changes to plugin/storage/es/options.go

…sters are provided

Signed-off-by: David Grizzanti <david_grizzanti@comcast.com>
opts.InitFromViper(v)

primary := opts.GetPrimary()
assert.Equal(t, []string{}, primary.RemoteReadClusters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following would be more expressive/readable: assert.Empty(t, primary.RemoteReadClusters)

@@ -193,6 +195,9 @@ func (c *Configuration) NewClient(logger *zap.Logger, metricsFactory metrics.Fac

// ApplyDefaults copies settings from source unless its own value is non-zero.
func (c *Configuration) ApplyDefaults(source *Configuration) {
if len(c.RemoteReadClusters) == 0 {
c.RemoteReadClusters = source.RemoteReadClusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test cases to cover the empty string. I think there's also the following case: --es.remote-read-clusters=cluster_one,,cluster_two, where the second item is empty but I think we can consider this a user error.

Copy link
Contributor

@albertteoh albertteoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @dgrizzanti.

@yurishkuro yurishkuro changed the title Add remote read clusters option to elasticsearch to enable cross-cluster querying [es] Add remote read clusters option for cross-cluster querying Mar 16, 2021
@yurishkuro yurishkuro merged commit 915d3af into jaegertracing:master Mar 16, 2021
@yurishkuro
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch Cross Cluster Query
5 participants