Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate date histogram slowdowns with non-fixed timezones. #30534

Merged
merged 5 commits into from
May 16, 2018

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 11, 2018

Date histograms on non-fixed timezones such as Europe/Paris proved much slower
than histograms on fixed timezones in #28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: AbstractBuilderTestCase.

Mitigates #28727

Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in elastic#28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates elastic#28727
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

if (ft != null && reader != null) {
Long anyInstant = null;
final IndexFieldData<?> fieldData = context.getForField(ft);
if (fieldData instanceof IndexNumericFieldData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be guaranteed to be an IndexNumericFieldData? if so maybe this should either be an assert or we should throw an exception if this condition is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I had it implemented in rewrite() initially where it felt wrong since validation happens afterwards, but here the values source is already resolved so an exception would have been thrown already if the histogram ran against a non-numeric field

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments

assertSame(tz, builder.rewriteTimeZone(shardContextThatCrosses));
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a test that makes sure fixed time zones are not changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above assertions check it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I had missed that, you are right

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz
Copy link
Contributor Author

jpountz commented May 14, 2018

@colings86 I had to do some changes because of test failures in case all values were between the same transitions, but some rounded values were not. Could you have another look?

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz good catch on the case where the start date and transition are in the same bucket. The solution LGTM

@jpountz jpountz merged commit 28d4685 into elastic:master May 16, 2018
@jpountz jpountz deleted the enhancement/time_zone_rounding branch May 16, 2018 15:06
jpountz added a commit that referenced this pull request May 16, 2018
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in #28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates #28727
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 16, 2018
…ngs-to-true

* elastic/master: (34 commits)
  Test: increase search logging for LicensingTests
  Adjust serialization version in IndicesOptions
  [TEST] Fix compilation
  Remove version argument in RangeFieldType (elastic#30411)
  Remove unused DirectoryUtils class. (elastic#30582)
  Mitigate date histogram slowdowns with non-fixed timezones. (elastic#30534)
  Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (elastic#29594)
  Removes AwaitsFix on IndicesOptionsTests
  Template upgrades should happen in a system context (elastic#30621)
  Fix bug in BucketMetrics path traversal (elastic#30632)
  Fixes IndiceOptionsTests to serialise correctly (elastic#30644)
  S3 repo plugin populate SettingsFilter (elastic#30652)
  mute IndicesOptionsTests.testSerialization
  Rest High Level client: Add List Tasks (elastic#29546)
  Refactors ClientHelper to combine header logic (elastic#30620)
  [ML] Wait for ML indices in rolling upgrade tests (elastic#30615)
  Watcher: Ensure secrets integration tests also run triggered watch (elastic#30478)
  Move allocation awareness attributes to list setting (elastic#30626)
  [Docs] Update code snippet in has-child-query.asciidoc (elastic#30510)
  Replace custom reloadable Key/TrustManager (elastic#30509)
  ...
martijnvg added a commit that referenced this pull request May 17, 2018
* es/master: (74 commits)
  Preserve REST client auth despite 401 response (#30558)
  [test] packaging: add windows boxes (#30402)
  Make xpack modules instead of a meta plugin (#30589)
  Mute ShrinkIndexIT
  [ML] DeleteExpiredDataAction should use client with origin (#30646)
  Reindex: Fixed typo in assertion failure message (#30619)
  [DOCS] Fixes list of unconverted snippets in build.gradle
  [DOCS] Reorganizes RBAC documentation
  SQL: Remove dependency for server's version from JDBC driver (#30631)
  Test: increase search logging for LicensingTests
  Adjust serialization version in IndicesOptions
  [TEST] Fix compilation
  Remove version argument in RangeFieldType (#30411)
  Remove unused DirectoryUtils class. (#30582)
  Mitigate date histogram slowdowns with non-fixed timezones. (#30534)
  Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (#29594)
  Removes AwaitsFix on IndicesOptionsTests
  Template upgrades should happen in a system context (#30621)
  Fix bug in BucketMetrics path traversal (#30632)
  Fixes IndiceOptionsTests to serialise correctly (#30644)
  ...
martijnvg added a commit that referenced this pull request May 17, 2018
* es/6.x: (44 commits)
  SQL: Remove dependency for server's version from JDBC driver (#30631)
  Make xpack modules instead of a meta plugin (#30589)
  Security: Remove SecurityLifecycleService (#30526)
  Build: Add task interdependencies for ssl configuration (#30633)
  Mute ShrinkIndexIT
  [ML] DeleteExpiredDataAction should use client with origin (#30646)
  Reindex: Fixed typo in assertion failure message (#30619)
  [DOCS] Fixes list of unconverted snippets in build.gradle
  Use readFully() to read bytes from CipherInputStream (#30640)
  Add Create Repository High Level REST API (#30501)
  [DOCS] Reorganizes RBAC documentation
  Test: increase search logging for LicensingTests
  Delay _uid field data deprecation warning (#30651)
  Deprecate Empty Templates (#30194)
  Remove unused DirectoryUtils class. (#30582)
  Mitigate date histogram slowdowns with non-fixed timezones. (#30534)
  [TEST] Remove AwaitsFix in IndicesOptionsTests#testSerialization
  S3 repo plugin populates SettingsFilter (#30652)
  Rest High Level client: Add List Tasks (#29546)
  Fixes IndiceOptionsTests to serialise correctly (#30644)
  ...
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 17, 2018
* es/ccr: (75 commits)
  Preserve REST client auth despite 401 response (elastic#30558)
  [test] packaging: add windows boxes (elastic#30402)
  Make xpack modules instead of a meta plugin (elastic#30589)
  Mute ShrinkIndexIT
  [ML] DeleteExpiredDataAction should use client with origin (elastic#30646)
  Reindex: Fixed typo in assertion failure message (elastic#30619)
  [DOCS] Fixes list of unconverted snippets in build.gradle
  [DOCS] Reorganizes RBAC documentation
  SQL: Remove dependency for server's version from JDBC driver (elastic#30631)
  Test: increase search logging for LicensingTests
  Adjust serialization version in IndicesOptions
  [TEST] Fix compilation
  Remove version argument in RangeFieldType (elastic#30411)
  Remove unused DirectoryUtils class. (elastic#30582)
  Mitigate date histogram slowdowns with non-fixed timezones. (elastic#30534)
  Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (elastic#29594)
  Removes AwaitsFix on IndicesOptionsTests
  Template upgrades should happen in a system context (elastic#30621)
  Fix bug in BucketMetrics path traversal (elastic#30632)
  Fixes IndiceOptionsTests to serialise correctly (elastic#30644)
  ...
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this pull request May 23, 2018
…30534)

Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in elastic#28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates elastic#28727
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants