Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Parent circuit breaker error message #32056

Merged
merged 4 commits into from
Jul 20, 2018

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jul 13, 2018

This adds information about either the current real usage (if tracking "real"
memory usage) or the child breaker usages to the exception message when the
parent circuit breaker trips.

The messages now look like:

[parent] Data too large, data for [my_request] would be [211288064/201.5mb], which is larger than the limit of [209715200/200mb], usages [request=157286400/150mb, fielddata=54001664/51.5mb, in_flight_requests=0/0b, accounting=0/0b]

Or when tracking real memory usage:

[parent] Data too large, data for [request] would be [251/251b], which is larger than the limit of [200/200b], real usage: [181/181b], new bytes reserved: [70/70b]

This adds information about either the current real usage (if tracking "real"
memory usage) or the child breaker usages to the exception message when the
parent circuit breaker trips.

The messages now look like:

```
[parent] Data too large, data for [my_request] would be [211288064/201.5mb], which is larger than the limit of [209715200/200mb], usages [request=157286400/150mb, fielddata=54001664/51.5mb, in_flight_requests=0/0b, accounting=0/0b]
```

Or when tracking real memory usage:

```
[parent] Data too large, data for [request] would be [251/251b], which is larger than the limit of [200/200b], real usage: [181/181b], new bytes reserved: [70/70b]
```
@dakrone dakrone added :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload v7.0.0 v6.4.0 labels Jul 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone
Copy link
Member Author

dakrone commented Jul 13, 2018

I noticed a few of these recently and thought having the message show the sub-usages could be useful.

@dakrone
Copy link
Member Author

dakrone commented Jul 13, 2018

@elasticmachine test this please

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new error message is definitely better than before. Thanks! I left a comment / question around the calculation of the value.

throw new CircuitBreakingException(message, totalUsed, parentLimit);
parentLimit + "/" + new ByteSizeValue(parentLimit) + "]");
if (this.trackRealMemoryUsage) {
final long realUsage = currentMemoryUsage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling currentMemoryUsage() again here is tricky because the value is likely different than the one that cause the breaker to trip. To get the correct value, we could calculate it as final long realUsage = totalUsed - newBytesReserved which would basically revert the calculation in #parentUsed(long) but that also doesn't feel entirely right. Alternatively, we could also return a more structured representation which returns the memory usage as a dedicated property. Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I thought that giving the real memory usage at exception construction time was the least-worst option (as part as plain-text is concerned).

For a structured error, I do like the idea of making this exception more structured, it's just a matter of coming up with the structure.

We currently have:

  • bytesWanted
  • bytesLimit

How about adding

  • currentUsage (we could always retrieve it even if "track real memory" is disabled, just for message purposes)
  • breakerUsages - a map of the current breakers to their usages

The only downside for this is that we already have logic for serializing the CircuitBreakingException, and to make this structured we'd need a new exception (ParentCircuitBreakingException) since it doesn't make sense to add this additional structure to every circuit breaker trip (or maybe it does? what do you think?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a structured error, I do like the idea of making this exception more structured, it's just a matter of coming up with the structure.

My point was not to change CircuitBreakingException but rather to have #parentUsed(long) return a more structured representation, something along the lines of:

private static class ParentMemUsage {
    private final long baseUsage;
    private final long reservation;
    private final long totalUsage;
    
    public ParentMemUsage(long baseUsage, long reservation, long totalUsage) {
        // ...
    }
    // ... getters here ...
}

private long parentUsed(long newBytesReserved) {
    if (this.trackRealMemoryUsage) {
        long current = currentMemoryUsage();
        return new ParentMemUsage(current, newBytesReserved,  current + newBytesReserved);
    } else {
        long parentEstimated = 0;
        for (CircuitBreaker breaker : this.breakers.values()) {
            parentEstimated += breaker.getUsed() * breaker.getOverhead();
        }
        return new ParentMemUsage(parentEstimated, newBytesReserved, parentEstimated);
    }
}

We could then just call ParentMemUsage#getBaseUsage() in order to retrieve the original value again in case an exception occurs. A downside of this is that we allocate an additional object but it might be possible that the compiler is able to optimize it away (by inlining #parentUsed(long) and then eliminating the allocation via escape analysis).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I gave this a shot, let me know what you think

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! LGTM

@dakrone dakrone merged commit 74aa7b0 into elastic:master Jul 20, 2018
dakrone added a commit that referenced this pull request Jul 20, 2018
* Enhance Parent circuit breaker error message

This adds information about either the current real usage (if tracking "real"
memory usage) or the child breaker usages to the exception message when the
parent circuit breaker trips.

The messages now look like:

```
[parent] Data too large, data for [my_request] would be [211288064/201.5mb], which is larger than the limit of [209715200/200mb], usages [request=157286400/150mb, fielddata=54001664/51.5mb, in_flight_requests=0/0b, accounting=0/0b]
```

Or when tracking real memory usage:

```
[parent] Data too large, data for [request] would be [251/251b], which is larger than the limit of [200/200b], real usage: [181/181b], new bytes reserved: [70/70b]
```

* Only call currentMemoryUsage once by returning structured object
martijnvg added a commit that referenced this pull request Jul 21, 2018
* es/6.x: (24 commits)
  Fix broken backport
  Switch full-cluster-restart to new style Requests (#32140)
  Fix multi level nested sort (#32204)
  MINOR: Remove unused `IndexDynamicSettings` (#32237) (#32248)
  [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236)
  Switch rolling restart to new style Requests (#32147)
  Enhance Parent circuit breaker error message (#32056)
  [ML] Use default request durability for .ml-state index (#32233)
  Enable testing in FIPS140 JVM (#31666) (#32231)
  Remove indices stats timeout from monitoring docs
  TESTS: Check for Netty resource leaks (#31861) (#32225)
  Rename ranking evaluation response section (#32166)
  Dependencies: Upgrade to joda time 2.10 (#32160)
  Backport SSL context names (#30953) to 6.x (#32223)
  Require Gradle 4.9  as minimum version (#32200)
  Detect old trial licenses and mimic behaviour (#32209)
  Painless: Simplify Naming in Lookup Package (#32177)
  add support for write index resolution when creating/updating documents (#31520)
  A replica can be promoted and started in one cluster state update (#32042)
  Rest test - allow for snapshots to take 0 milliseconds
  ...
martijnvg added a commit that referenced this pull request Jul 21, 2018
* es/master: (23 commits)
  Switch full-cluster-restart to new style Requests (#32140)
  [DOCS] Clarified that you must remove X-Pack plugin when upgrading from pre-6.3. (#32016)
  Remove BouncyCastle dependency from runtime (#32193)
  INGEST: Extend KV Processor (#31789) (#32232)
  INGEST: Make a few Processors callable by Painless (#32170)
  Add region ISO code to GeoIP Ingest plugin (#31669)
  [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236)
  Make sure that field aliases count towards the total fields limit. (#32222)
  Switch rolling restart to new style Requests (#32147)
  muting failing test for internal auto date histogram to avoid failure before fix is merged
  MINOR: Remove unused `IndexDynamicSettings` (#32237)
  Fix multi level nested sort (#32204)
  Enhance Parent circuit breaker error message (#32056)
  [ML] Use default request durability for .ml-state index (#32233)
  Remove indices stats timeout from monitoring docs
  Rename ranking evaluation response section (#32166)
  Dependencies: Upgrade to joda time 2.10 (#32160)
  Remove aliases resolution limitations when security is enabled (#31952)
  Ensure that field aliases cannot be used in multi-fields. (#32219)
  TESTS: Check for Netty resource leaks (#31861)
  ...
@dakrone dakrone deleted the enhance-parent-breaker-message branch February 4, 2019 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload >enhancement v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants