-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude tasks based on serverless status #1760
Exclude tasks based on serverless status #1760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach looks great. I did first round of testing and found one issue with custom operations (they are filtered, but shouldn't).
esrally/rally.py
Outdated
@@ -1083,6 +1088,7 @@ def dispatch_sub_command(arg_parser, args, cfg): | |||
cfg.add(config.Scope.applicationOverride, "system", "list.to_date", args.to_date) | |||
configure_mechanic_params(args, cfg, command_requires_car=False) | |||
configure_track_params(arg_parser, args, cfg, command_requires_track=False) | |||
configure_default_serverless_params(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit of calling configure_default_serverless_params
3 times in list
, delete
and info
instead of calling it once at the top of dispatch_sub_command
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, it felt weird to set those values for commands where it's not needed. Happy to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to keep current form as long as all sub-commands have been tested. Our test coverage is not complete hence the thought of moving it to the top.
Forgot to mention I also went through all operation types and compared with serverless protection status as of July 27th and found no discrepancies. |
This is not required as it's the whole tuple that needs to be unique, but it's less confusing that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably missing some background which is not covered in the PR description, so I may be barking at the wrong tree, but I had issues testing this PR.
I've tested this with the PMC track that has a challenge with a parallel operation by explicitly setting
serverless.operator
toFalse
:
It's unclear to me how to reproduce what you did here.
Do I have to set the corresponding setting in rally.ini
as supported since #1756?
Looking at https://github.com/elastic/rally/pull/1760/files#diff-9cd59b2fffb801ee46e1819d3dc79a407b1b5784a09d535b3ff68820f9c4102dR202 it appears it's not needed to be done explicitly any more.
But in any case with (see below) or without this setting
$ tail -4 ~/.rally/rally.ini
[driver]
serverless.operator=False
serverless.mode=True
and targeting a serverless ES using the pmc
track I see:
[INFO] Race id is [8aebdb5b-d810-4a82-aa50-b96b9590ef3f]
[INFO] Treating parallel task in challenge [indexing-querying] as public.
[INFO] Downloading track data (5.5 GB total size) [100.0%]
[INFO] Decompressing track data from [/home/esbench/.rally/benchmarks/data/pmc/documents.json.bz2] to [/home/esbench/.rally/benchmarks/data/pmc/documents.json] (resulting size: [21.66] GB) ... [OK]
[INFO] Preparing file offset table for [/home/esbench/.rally/benchmarks/data/pmc/documents.json] ... [OK]
[INFO] Racing on track [pmc], challenge [indexing-querying] and car ['external'] with version [8.10.0].
which differs from your output, namely doesn't show:
[INFO] Excluding [put-settings], [check-cluster-health], [force-merge], [wait-until-merges-finish] as challenge [indexing-querying] is run on serverless.
Do I also need a modification in the track and/or pass the track parameter run-on-serverless
that is introduced in this PR?
You do need to manually set |
Where, in |
Ok after explicitly setting
It's unclear to me why explicitly setting that value in |
I fixed reading from rally.ini. Please take another look! |
tests/track/loader_test.py
Outdated
} | ||
reader = loader.TrackSpecificationReader() | ||
full_track = reader("unittest", copy.deepcopy(track_specification), "/mappings") | ||
filtered = self.filter(full_track, serverless_mode=True, serverless_operator=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we use filtered_track
consistently between the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I really like how readable it is, given the complexities here.
I tested it using this invocation:
$ esrally race --pipeline=benchmark-only --client-options='{"default": { "use_ssl": true, "verify_certs": false, "basic_auth_user":"elastic","basic_auth_password":"changeme", "timeout": 60, "headers": { "X-Found-Cluster": "b1ec3121-f6d4-4f62-9e57-422581b3bb65.es"}}}' --target-hosts="https://my.elb.eu-west-1.amazonaws.com" --track-params="number_of_replicas:1,bulk_indexing_clients:4" --track nyc_taxis --challenge append-no-conflicts --kill-running-processes --telemetry=blob-store-stats --test-mode
With serverless.operator=True
:
[...]
[INFO] Race id is [8a0017a4-fb1a-4392-9ab2-dc9e505c403d]
[INFO] Racing on track [nyc_taxis], challenge [append-no-conflicts] and car ['external'] with version [8.10.0].
Running delete-index [100% done]
Running create-index [100% done]
Running check-cluster-health [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running force-merge [100% done]
Running refresh-after-force-merge [100% done]
Running wait-until-merges-finish [100% done]
Running default_no_target [100% done]
Running default [100% done]
Running default_1k [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]
[...]
and with serverless.operator=False
:
[...]
[INFO] Race id is [66e95a15-78b9-4815-a710-ad6c0d298865]
[INFO] Excluding [check-cluster-health], [force-merge], [wait-until-merges-finish] as challenge [append-no-conflicts] is run on serverless.
[INFO] Racing on track [nyc_taxis], challenge [append-no-conflicts] and car ['external'] with version [8.10.0].
Running delete-index [100% done]
Running create-index [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running refresh-after-force-merge [100% done]
Running default_no_target [100% done]
Running default [100% done]
Running default_1k [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]
[...]
When running Rally benchmarks on Elasticsearch Serverless, some operations do not make sense depending on the operator status, such as force merge. To make running benchmarks simpler, this pull requests skips some operations automatically based on the operator status of the user.
I've tested this with the PMC track that has a challenge with a parallel operation by explicitly setting
serverless.operator
toFalse
:It prints the following at the beginning of the race:
TODO:
Sub commands to test: