Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include a new downsampling operation #1574

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Sep 7, 2022

The new downsampling operation allows Rally to hit the Elasticsearch /<source-index>/_downsample/<target-index> endpoint to start a downsampling operation on the source index. Expected parameters include:

  • fixed_interval: the aggregation granularity using the same interval definition used by date histograms
  • source-index: the index to downsample applying the aggregation
  • target-index: the index created as a result of the aggregation on the timestamp field

Example: aggregate the index test-source-index on the timestamp field using a 1 minute interval and store the result in a new index called test-target-index.

{
  "name": "downsample",
  "operation": {
    "operation-type": "downsample",
    "fixed-interval": "1m",
    "source-index": "test-source-index",
    "target-index": "test-target-index"
  }
}

The new downsampling operation allows Rally to hit the Elasticsearch
/<source-index>/_rollup/<target-index> endpoint to start a downsampling
operation on the source index. Expected parameters include:

* fixed_interval: the aggregation granularity using the same interval
  definition used by date histograms
* source-index: the index to downsample applying the aggregation
* target-index: the index created as a result of the aggregation
  on the @timestamp field
@DJRickyB DJRickyB added enhancement Improves the status quo highlight A substantial improvement that is worth mentioning separately in release notes labels Sep 7, 2022
@DJRickyB DJRickyB added this to the 2.6.1 milestone Sep 7, 2022
docs/track.rst Outdated Show resolved Hide resolved
docs/track.rst Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Outdated Show resolved Hide resolved
@salvatore-campagna salvatore-campagna changed the title feature: include a new downsampling operation Include a new downsampling operation Sep 8, 2022
@salvatore-campagna salvatore-campagna requested review from DJRickyB and removed request for pquentin and michaelbaamonde September 8, 2022 13:11
Copy link
Contributor

@DJRickyB DJRickyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions.

One optional thing to consider that might be cool/useful is to add a parameter source for this operation, that can provide defaults for source_index and target_index for ease-of-use. We have helper function params.get_target(track, params) for this purpose in param sources, and your target-index could just default to some fixed pattern like f"{source-index}_{fixed-interval}"

docs/track.rst Outdated Show resolved Hide resolved
docs/track.rst Outdated Show resolved Hide resolved
docs/track.rst Outdated Show resolved Hide resolved
docs/track.rst Outdated Show resolved Hide resolved
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
@salvatore-campagna
Copy link
Contributor Author

@elasticmachine run rally/rally-tracks-compat

@b-deam b-deam mentioned this pull request Sep 13, 2022
@salvatore-campagna
Copy link
Contributor Author

The CI is failing with
XFAIL (index deletion bug under investigation)

@pquentin
Copy link
Member

This means that test is now passing! It should be removed from elastic/rally-tracks@117f306 after merging this pull request. Or we could change that line in rally-tracks to @pytest.mark.xfail(reason="index deletion bug under investigation", strict=False).

See https://docs.pytest.org/en/7.1.x/how-to/skipping.html for details on that pytest fixture.

esrally/track/params.py Outdated Show resolved Hide resolved
@salvatore-campagna salvatore-campagna requested review from DJRickyB and removed request for dliappis September 15, 2022 10:19
docs/track.rst Outdated
""""""""""

* ``fixed_interval`` (optional, defaults to ``1h``): The aggregation interval key defined as in `https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#fixed_intervals`.
* ``source-index`` (optional, defaults to ``tsdb-index``): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation.
Copy link
Contributor

@DJRickyB DJRickyB Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``source-index`` (optional, defaults to ``tsdb-index``): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation.
* ``source-index`` (optional): The index containing data to aggregate which includes a ``@timestamp`` field. Note that this index should be marked read-only prior to the execution of this operation. If there is only one index defined in the ``indices`` of the track definition, that will be used as the default.

assert p["source-index"] == "tsdb"
assert p["target-index"] == "tsdb-1m"

def test_downsample_empty_params(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this test buys us? Checked with a debugger and captured the value of p:

{'fixed-interval': '1h', 'source-index': None, 'target-index': 'None-1h', 'request-timeout': None, 'headers': None, 'opaque-id': None}

Might be better instead to have the Track constructor call include an indices param with value like [{"name": "test-source-index", "body": "index.json"}] and assert that source-index gets provided as test-source-index


p = source.params()

assert p["fixed-interval"] != ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the documented default, and should need changing if the default changes

Copy link
Contributor

@DJRickyB DJRickyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have 3 "unresolved conversations" on this PR, relating to documentation update and hardening testing the defaults

p = source.params()

assert p["fixed-interval"] == "1h"
assert p["source-index"] != ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put my above comment slightly differently, this assert only passes because the source-index is None instead of "", technically different but not a going concern. You've covered my larger concern in test_downsample_default_index_param, where you ensure there's a default when there is supposed to be a default, so perhaps this can just be deleted and the test is left to assert the default fixed-interval?

Copy link
Contributor

@DJRickyB DJRickyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thank you for iterating

@salvatore-campagna salvatore-campagna merged commit af0c464 into elastic:master Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo highlight A substantial improvement that is worth mentioning separately in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants