Open Request Cost Aggregation (ORCA) #6614

htuch · 2019-04-17T03:19:28Z

Today in Envoy, simple load balancing decisions can be made by taking into account local or global knowledge of a backend’s load, for example CPU. More sophisticated load balancing decisions are possible with application specific knowledge, e.g. queue depth, or by combining multiple metrics.

This is useful for services that may be resource constrained along multiple dimensions (e.g. both CPU and memory may become bottlenecks, depending on the applied load and execution environment, it’s not possible to tell which upfront) and where these dimensions do not slot within predefined categories (e.g. the resource may be “number of free threads in a pool”, disk IOPS, etc.).

https://docs.google.com/document/d/1NSnK3346BkBo1JUU3I9I5NYYnaJZQPt8_Z_XCBCI3uA/edit# provides a design proposal for an Open Request Cost Aggregation (ORCA) standard for conveying this information between proxies like Envoy and upstreams. We propose that this become a standard part of UDPA and supported by Envoy.

The design document is in draft stage; from offline discussions I think the need for something like this is not very controversial, we can iterate on aspects of the design here.

stale · 2019-05-27T20:32:53Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

Mythra · 2019-08-15T02:02:45Z

I've been talking with @htuch about implementing ORCA's in-band reporting, and adding it's details to a particular stream.)

However, when bringing it up @htuch mentioned that the RFC had some debate around how inband reporting should be implemented. Currently the RFC Calls for stuffing JSON in x-endpoint-load-metrics. However after talking with @PiotrSikora I think this is the incorrect choice, and would like to spur extra conversation here. Since I shouldn't be the sole person to make this decision 😜

While the JSON Reporter is using a standard encoding (JSON), not all proxies currently support JSON out of the box, nor should they. Most of the time the bytes are just moving through them, and they only need to know how to parse HTTP. (Some can be extended: see NGINX, but those are always custom extensions/code written to do so).

If we want ORCA to become a true standard, by lowering the barrier to entry for them by not forcing them to add in JSON parsing if they don't need to, adoption would increase. Which would help the total number of users, and it's usefulness.

Instead I recommend we implement both parsing of: x-endpoint-load-metrics-bin (binary protobuf format), and: x-endpoint-load-metrics. The real one we should focus on is: x-endpoint-load-metrics (which maybe should even be called endpoint-load-metrics since the IETF recommends not using x-). The reason for this is two fold:

It's an RFC that's very close to completing, and already has other RFCs building on top of it. So it's chances of "dieing out" are low.
2. Even headers that are widely supported today are using something very close to parameter list: Cache-Control: max-age=seconds for example. This is already the basis for parameter list.

x-endpoint-load-metrics-bin I think should be supported because it can be a nice optimization for those already integrating with ORCA protobuf, and wanting to not have to juggle two seperate encodings of what to send on.I don't imagine it being a huge ask to do so, making it not worth the implementation effort.

Happy to hear thoughts on this, and coming up with something official that aren't just my bemused thoughts 😄

htuch · 2019-08-15T03:25:00Z

@securityinsanity yes, https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-11 is fine. Let's just make sure that this is a direct translation of the data model that exists in the proto, i.e. it should be equivalently expressive.

Mythra · 2019-08-15T03:27:27Z

For sure 👍🏻. I think keeping the model is needed.

I’ll start working on a PR tomorrow. Can you update the doc? (I don’t think I have write access).

htuch · 2019-08-15T12:11:32Z

@securityinsanity sure. I think we should hash out the new representation here first. Looking at https://github.com/envoyproxy/envoy/blob/master/api/udpa/data/orca/v1/orca_load_report.proto, I think we may need multiple headers, e.g. x-endpoints-load-metrics, x-endpoints-load-metrics-cost, x-endpoint-load-metrics-utilization to correctly distinguish the core fields and the distinct maps that exist in the data model. @PiotrSikora do you think this is correct?

FWIW, coincidentally I'm working on migrating the API tree in https://github.com/envoyproxy/envoy/tree/master/api/udpa (which includes the ORCA protos) to live in https://github.com/cncf/udpa-wg today. This shouldn't have a major impact on your work, but there might need to be some slight path or Bazel fixups needed once this lands.

Mythra · 2019-08-15T13:11:16Z

@htuch That's good to know about the proto's moving, thanks. Based on my understanding of the priority yes we'd need 3 headers. (one for the core type, and two for the two maps).

Mythra · 2019-08-28T22:19:15Z

After talking over this a bit the current plan for implementing ORCA is:

Create a series of stats for "static" ORCA metrics. (cpu_utilization, mem_utilization, and rps).
Give each worker thread a new thread local map for custom app metrics (that is unsynchronized):
- map<custom_metric_key, pair<total_count, avg>>
When LRS starts it's run it:
- Grabs the series of static metrics from stats.
- Grab a copy of the the metric keys, and average the averages.
Build the full response, and send that out.

There are a couple notes here:

The stats, and local maps won't be fully synchronized.
- This is considered a worthwhile tradeoff since the alternative is to take a lock in worker threads which would be more unideal.
An average of an average may be less precise than just a normal average.

I'm posting here incase anyone has any commments/questions/concerns.

htuch · 2019-08-29T19:59:09Z

@securityinsanity sounds like a plan from my side. Looking forwarding to seeing ORCA support landing :)

htuch · 2019-08-29T21:19:30Z

@securityinsanity assigning issue to you for the ORCA implementation work planned. Feel free to assign back if there is any remaining future work once that lands.

CodingSinger · 2020-01-19T13:08:53Z

Hello everyone, I have a question, is there any difference between orca_load_report and the original LRS? My understanding is that orca_load_report is the backend server passing load information to envoy, and LRS is passing information between envoy and management server？

Mythra · 2020-01-19T15:21:21Z

Hey @CodingSinger,

ORCA for now is actually going to be integrated into the LRS when it is implemented. It will provide a richer set of information.

Right now the LRS only provides load info about the number of requests, who it’s routing to and when. ORCA compliments that info by allowing services to report how much a request cost. For example a service can say “processing this request I took up 20% cpu”.

There are two ways a service can report this back to envoy:

Through headers in the response.
Through a separate out of band reporting mechanism.

We’re targeting reading response headers first. Admittedly I’ve had a lot going on so this has slumped, however I hope to have something up in the coming weeks.

CodingSinger · 2020-01-19T15:45:46Z

@securityinsanity
Thanks for your reply. But I think now it seems to be divided into LoadReportingService and OrcaLoadReport. According to your reply, are both ORCA and LRS acting between the backend server and envoy?
But I found in the comment in LoadReportingService that

 // Independently, Envoy will initiate a StreamLoadStats bidi stream with a
   // management serve

Mythra · 2020-01-19T15:53:19Z

@CodingSinger ,

ORCA metrics will be added to the LoadReportingService (not replacing) stats since we believe it is useful there as well, but the actual ORCA stats are being sent between the thing envoy is sending requests to and envoy.

CodingSinger · 2020-01-20T02:18:47Z

@securityinsanity Thanks.
I have got it.

erikbos · 2020-05-13T20:25:59Z

Does anybody have any pointers to blogs / papers about about considerations for multi region or global load balancing algorithms? (Useful for input into a designing a system which would leverage functionality like ORCA)

htuch · 2020-05-15T03:26:35Z

That's a great question @erikbos. @alexburnos @antoniovicente are you folks aware of any public material that talks to how backend named costs would integrate with global LB?

alexburnos · 2020-05-15T21:01:19Z

Do know anything public that would be specifically focused on LB algorithms, but maybe chapter on managing load in the SRE book could give some high level ideas.

erikbos · 2020-05-17T07:21:17Z

Thanks for the reference to the SRE book, it's always a good read but I was looking for the next level of depth.. On Slack @snowp mentioned https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c which contains some of that 👍

holooooo · 2022-11-18T10:06:50Z

It is an amazing feature.Is there any news? 🤙

htuch · 2022-11-21T03:47:49Z

gRPC has adopted ORCA (and its xDS definitions) as the basis of load reporting for gRPC-LB v2 CC @markdroth. We still do not have any Envoy implementation though, very much open to any contribution PRs here.

markdroth · 2024-06-04T21:19:58Z

Just for reference for anyone working on this, the ORCA support in gRPC is documented in gRFC A51: Custom Backend Metrics Support and gRFC A64: xDS LRS Custom Metrics Support.

See also gRFC A58: weighted_round_robin LB policy for how ORCA is used in load balancing.

soulxu · 2024-06-18T08:34:10Z

@Mythra are you still working on this? If not, I'm a little interesting in this issue.

htuch · 2024-06-18T16:47:12Z

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

osswangxining · 2024-07-02T10:42:04Z

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

Any detail info about this? Looking forward to this feature. Any we can help this?

efimki · 2024-07-12T21:10:08Z

Here is a draft outline of what we are trying to do:

Load Reports will be provided to the xDS control plane server via xDS LRS API.
Load reports will be used by a new Client Side Weighted Round Robin load balancing policy to dynamically calculate host weights on the client side. Inline reporting enables sub second load balancing reaction times (depending on backend load measurement and reporting intervals), a critical requirement for customers with coordinated and spiky traffic workloads.
Using these load reports, Envoy proxies will be able to implement load balancing policies that vary endpoint load balancing weights according to backend load reports.

More details are here.

wbpcode · 2024-07-18T07:30:09Z

Basically, I think there are two different part works:

bridge the orca report and LRS.
make the lb aware the orca.

I personally think we should only provide simplest support to the common metrics: cpu, mem, application_utilization, etc. first. These attributes could cover most cases.

The named_metrics, utilization, and request_cost may has more complex semantics and will bring more heavy overhead. So, I will prefer to only provide simplest implementation first until our users ask more.

efimki · 2024-07-25T20:47:20Z

I agree with a two parts distinction.

We will start with using orca report for LRS. I agree that common orca metrics cover many cases, however we want to provide our users with flexibility of using named metrics if necessary. The additional complexity of handling named metrics on top of processing orca report is not that high.

@efimki

Commit Message: Add support for multiple formats of ORCA headers. Additional Description: Add support for multiple formats of ORCA headers. ORCA parsing introduced in #35422 [Original Design Proposal](#6614) [Using ORCA load reports in Envoy](https://docs.google.com/document/d/1gb_2pcNnEzTgo1EJ6w1Ol7O-EH-O_Ysu5o215N9MTAg/edit#heading=h.bi4e79pb39fe) Risk Level: Low Testing: See included unit tests. Docs Changes: N/A Release Notes: N/A Platform Specific Features: JSON format unsupported on Mobile. CC @efimki @adisuissa @wbpcode --------- Signed-off-by: blake-snyder <blakesnyder@google.com>

htuch added the design proposal Needs design doc/proposal before implementation label Apr 17, 2019

htuch self-assigned this Apr 17, 2019

mattklein123 added this to the 1.11.0 milestone Apr 27, 2019

voidzcy mentioned this issue May 24, 2019

Tracking issue for ORCA related APIs grpc/grpc-java#5790

Closed

stale bot added the stale stalebot believes this issue/PR has not been touched recently label May 27, 2019

htuch added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels May 28, 2019

mattklein123 modified the milestones: 1.11.0, 1.12.0 Jul 3, 2019

htuch added enhancement Feature requests. Not bugs or questions. and removed design proposal Needs design doc/proposal before implementation labels Aug 29, 2019

htuch assigned Mythra and unassigned htuch Aug 29, 2019

mattklein123 modified the milestones: 1.12.0, 1.13.0 Oct 10, 2019

mattklein123 modified the milestones: 1.13.0, 1.14.0 Dec 5, 2019

mattklein123 modified the milestones: 1.14.0, 1.15.0 Mar 10, 2020

mattklein123 removed this from the 1.15.0 milestone May 11, 2020

mattklein123 mentioned this issue Jun 3, 2020

Traffic Control based upstream CPU and Memory etc measurement #11416

Closed

utsav-dbx mentioned this issue Jun 16, 2020

Proposal: Request Durations in Load Stats Reporting #11599

Open

mattklein123 added help wanted Needs help! and removed no stalebot Disables stalebot from closing an issue labels Mar 8, 2021

pianiststickman mentioned this issue Nov 10, 2021

upstream: implement UpstreamLocalityStats.loadMetricStats #18534

Merged

hzxuzhonghu mentioned this issue Apr 30, 2024

dynamic locality loadbalancer behaviour with hpa istio/istio#50727

Open

markdroth mentioned this issue Jun 4, 2024

Peak EWMA load balancing #20907

Open

blake-snyder mentioned this issue Jul 25, 2024

Added utility methods to parse ORCA response headers from backends. #35422

Merged

This was referenced Aug 27, 2024

Add checks during ORCA header parsing to catch error cases gracefully handled by util method. #35868

Merged

Add support for multiple formats of ORCA headers. #35894

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Request Cost Aggregation (ORCA) #6614

Open Request Cost Aggregation (ORCA) #6614

htuch commented Apr 17, 2019

stale bot commented May 27, 2019

Mythra commented Aug 15, 2019

htuch commented Aug 15, 2019

Mythra commented Aug 15, 2019

htuch commented Aug 15, 2019

Mythra commented Aug 15, 2019

Mythra commented Aug 28, 2019

htuch commented Aug 29, 2019

htuch commented Aug 29, 2019

CodingSinger commented Jan 19, 2020

Mythra commented Jan 19, 2020

CodingSinger commented Jan 19, 2020 •

edited

Loading

Mythra commented Jan 19, 2020

CodingSinger commented Jan 20, 2020

erikbos commented May 13, 2020

htuch commented May 15, 2020

alexburnos commented May 15, 2020

erikbos commented May 17, 2020

holooooo commented Nov 18, 2022

htuch commented Nov 21, 2022

markdroth commented Jun 4, 2024

soulxu commented Jun 18, 2024

htuch commented Jun 18, 2024

osswangxining commented Jul 2, 2024

efimki commented Jul 12, 2024

wbpcode commented Jul 18, 2024 •

edited

Loading

efimki commented Jul 25, 2024

Open Request Cost Aggregation (ORCA) #6614

Open Request Cost Aggregation (ORCA) #6614

Comments

htuch commented Apr 17, 2019

stale bot commented May 27, 2019

Mythra commented Aug 15, 2019

htuch commented Aug 15, 2019

Mythra commented Aug 15, 2019

htuch commented Aug 15, 2019

Mythra commented Aug 15, 2019

Mythra commented Aug 28, 2019

htuch commented Aug 29, 2019

htuch commented Aug 29, 2019

CodingSinger commented Jan 19, 2020

Mythra commented Jan 19, 2020

CodingSinger commented Jan 19, 2020 • edited Loading

Mythra commented Jan 19, 2020

CodingSinger commented Jan 20, 2020

erikbos commented May 13, 2020

htuch commented May 15, 2020

alexburnos commented May 15, 2020

erikbos commented May 17, 2020

holooooo commented Nov 18, 2022

htuch commented Nov 21, 2022

markdroth commented Jun 4, 2024

soulxu commented Jun 18, 2024

htuch commented Jun 18, 2024

osswangxining commented Jul 2, 2024

efimki commented Jul 12, 2024

wbpcode commented Jul 18, 2024 • edited Loading

efimki commented Jul 25, 2024

CodingSinger commented Jan 19, 2020 •

edited

Loading

wbpcode commented Jul 18, 2024 •

edited

Loading