Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Request Cost Aggregation (ORCA) #6614

Open
htuch opened this issue Apr 17, 2019 · 27 comments
Open

Open Request Cost Aggregation (ORCA) #6614

htuch opened this issue Apr 17, 2019 · 27 comments
Assignees
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@htuch
Copy link
Member

htuch commented Apr 17, 2019

Today in Envoy, simple load balancing decisions can be made by taking into account local or global knowledge of a backend’s load, for example CPU. More sophisticated load balancing decisions are possible with application specific knowledge, e.g. queue depth, or by combining multiple metrics.

This is useful for services that may be resource constrained along multiple dimensions (e.g. both CPU and memory may become bottlenecks, depending on the applied load and execution environment, it’s not possible to tell which upfront) and where these dimensions do not slot within predefined categories (e.g. the resource may be “number of free threads in a pool”, disk IOPS, etc.).

https://docs.google.com/document/d/1NSnK3346BkBo1JUU3I9I5NYYnaJZQPt8_Z_XCBCI3uA/edit# provides a design proposal for an Open Request Cost Aggregation (ORCA) standard for conveying this information between proxies like Envoy and upstreams. We propose that this become a standard part of UDPA and supported by Envoy.

The design document is in draft stage; from offline discussions I think the need for something like this is not very controversial, we can iterate on aspects of the design here.

@htuch htuch added the design proposal Needs design doc/proposal before implementation label Apr 17, 2019
@htuch htuch self-assigned this Apr 17, 2019
@mattklein123 mattklein123 added this to the 1.11.0 milestone Apr 27, 2019
@stale
Copy link

stale bot commented May 27, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label May 27, 2019
@htuch htuch added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels May 28, 2019
@mattklein123 mattklein123 modified the milestones: 1.11.0, 1.12.0 Jul 3, 2019
@Mythra
Copy link
Member

Mythra commented Aug 15, 2019

I've been talking with @htuch about implementing ORCA's in-band reporting, and adding it's details to a particular stream.)

However, when bringing it up @htuch mentioned that the RFC had some debate around how inband reporting should be implemented. Currently the RFC Calls for stuffing JSON in x-endpoint-load-metrics. However after talking with @PiotrSikora I think this is the incorrect choice, and would like to spur extra conversation here. Since I shouldn't be the sole person to make this decision 😜


While the JSON Reporter is using a standard encoding (JSON), not all proxies currently support JSON out of the box, nor should they. Most of the time the bytes are just moving through them, and they only need to know how to parse HTTP. (Some can be extended: see NGINX, but those are always custom extensions/code written to do so).

If we want ORCA to become a true standard, by lowering the barrier to entry for them by not forcing them to add in JSON parsing if they don't need to, adoption would increase. Which would help the total number of users, and it's usefulness.

Instead I recommend we implement both parsing of: x-endpoint-load-metrics-bin (binary protobuf format), and: x-endpoint-load-metrics. The real one we should focus on is: x-endpoint-load-metrics (which maybe should even be called endpoint-load-metrics since the IETF recommends not using x-). The reason for this is two fold:

  1. It's an RFC that's very close to completing, and already has other RFCs building on top of it. So it's chances of "dieing out" are low.
    2. Even headers that are widely supported today are using something very close to parameter list: Cache-Control: max-age=seconds for example. This is already the basis for parameter list.

x-endpoint-load-metrics-bin I think should be supported because it can be a nice optimization for those already integrating with ORCA protobuf, and wanting to not have to juggle two seperate encodings of what to send on.I don't imagine it being a huge ask to do so, making it not worth the implementation effort.


Happy to hear thoughts on this, and coming up with something official that aren't just my bemused thoughts 😄

@htuch
Copy link
Member Author

htuch commented Aug 15, 2019

@securityinsanity yes, https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-11 is fine. Let's just make sure that this is a direct translation of the data model that exists in the proto, i.e. it should be equivalently expressive.

@Mythra
Copy link
Member

Mythra commented Aug 15, 2019

For sure 👍🏻. I think keeping the model is needed.

I’ll start working on a PR tomorrow. Can you update the doc? (I don’t think I have write access).

@htuch
Copy link
Member Author

htuch commented Aug 15, 2019

@securityinsanity sure. I think we should hash out the new representation here first. Looking at https://github.com/envoyproxy/envoy/blob/master/api/udpa/data/orca/v1/orca_load_report.proto, I think we may need multiple headers, e.g. x-endpoints-load-metrics, x-endpoints-load-metrics-cost, x-endpoint-load-metrics-utilization to correctly distinguish the core fields and the distinct maps that exist in the data model. @PiotrSikora do you think this is correct?

FWIW, coincidentally I'm working on migrating the API tree in https://github.com/envoyproxy/envoy/tree/master/api/udpa (which includes the ORCA protos) to live in https://github.com/cncf/udpa-wg today. This shouldn't have a major impact on your work, but there might need to be some slight path or Bazel fixups needed once this lands.

@Mythra
Copy link
Member

Mythra commented Aug 15, 2019

@htuch That's good to know about the proto's moving, thanks. Based on my understanding of the priority yes we'd need 3 headers. (one for the core type, and two for the two maps).

@Mythra
Copy link
Member

Mythra commented Aug 28, 2019

After talking over this a bit the current plan for implementing ORCA is:

  1. Create a series of stats for "static" ORCA metrics. (cpu_utilization, mem_utilization, and rps).
  2. Give each worker thread a new thread local map for custom app metrics (that is unsynchronized):
    • map<custom_metric_key, pair<total_count, avg>>
  3. When LRS starts it's run it:
    • Grabs the series of static metrics from stats.
    • Grab a copy of the the metric keys, and average the averages.
  4. Build the full response, and send that out.

There are a couple notes here:

  • The stats, and local maps won't be fully synchronized.
    • This is considered a worthwhile tradeoff since the alternative is to take a lock in worker threads which would be more unideal.
  • An average of an average may be less precise than just a normal average.

I'm posting here incase anyone has any commments/questions/concerns.

@htuch
Copy link
Member Author

htuch commented Aug 29, 2019

@securityinsanity sounds like a plan from my side. Looking forwarding to seeing ORCA support landing :)

@htuch htuch added enhancement Feature requests. Not bugs or questions. and removed design proposal Needs design doc/proposal before implementation labels Aug 29, 2019
@htuch htuch assigned Mythra and unassigned htuch Aug 29, 2019
@htuch
Copy link
Member Author

htuch commented Aug 29, 2019

@securityinsanity assigning issue to you for the ORCA implementation work planned. Feel free to assign back if there is any remaining future work once that lands.

@mattklein123 mattklein123 modified the milestones: 1.12.0, 1.13.0 Oct 10, 2019
@mattklein123 mattklein123 modified the milestones: 1.13.0, 1.14.0 Dec 5, 2019
@CodingSinger
Copy link

Hello everyone, I have a question, is there any difference between orca_load_report and the original LRS? My understanding is that orca_load_report is the backend server passing load information to envoy, and LRS is passing information between envoy and management server?

@Mythra
Copy link
Member

Mythra commented Jan 19, 2020

Hey @CodingSinger,

ORCA for now is actually going to be integrated into the LRS when it is implemented. It will provide a richer set of information.

Right now the LRS only provides load info about the number of requests, who it’s routing to and when. ORCA compliments that info by allowing services to report how much a request cost. For example a service can say “processing this request I took up 20% cpu”.

There are two ways a service can report this back to envoy:

  • Through headers in the response.
  • Through a separate out of band reporting mechanism.

We’re targeting reading response headers first. Admittedly I’ve had a lot going on so this has slumped, however I hope to have something up in the coming weeks.

@CodingSinger
Copy link

CodingSinger commented Jan 19, 2020

@securityinsanity
Thanks for your reply. But I think now it seems to be divided into LoadReportingService and OrcaLoadReport. According to your reply, are both ORCA and LRS acting between the backend server and envoy?
But I found in the comment in LoadReportingService that

 // Independently, Envoy will initiate a StreamLoadStats bidi stream with a
   // management serve

@Mythra
Copy link
Member

Mythra commented Jan 19, 2020

@CodingSinger ,

ORCA metrics will be added to the LoadReportingService (not replacing) stats since we believe it is useful there as well, but the actual ORCA stats are being sent between the thing envoy is sending requests to and envoy.

@CodingSinger
Copy link

@securityinsanity Thanks.
I have got it.

@mattklein123 mattklein123 modified the milestones: 1.14.0, 1.15.0 Mar 10, 2020
@mattklein123 mattklein123 removed this from the 1.15.0 milestone May 11, 2020
@erikbos
Copy link
Contributor

erikbos commented May 13, 2020

Does anybody have any pointers to blogs / papers about about considerations for multi region or global load balancing algorithms? (Useful for input into a designing a system which would leverage functionality like ORCA)

@htuch
Copy link
Member Author

htuch commented May 15, 2020

That's a great question @erikbos. @alexburnos @antoniovicente are you folks aware of any public material that talks to how backend named costs would integrate with global LB?

@alexburnos
Copy link

Do know anything public that would be specifically focused on LB algorithms, but maybe chapter on managing load in the SRE book could give some high level ideas.

@erikbos
Copy link
Contributor

erikbos commented May 17, 2020

Thanks for the reference to the SRE book, it's always a good read but I was looking for the next level of depth.. On Slack @snowp mentioned https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c which contains some of that 👍

@holooooo
Copy link

It is an amazing feature.Is there any news? 🤙

@htuch
Copy link
Member Author

htuch commented Nov 21, 2022

gRPC has adopted ORCA (and its xDS definitions) as the basis of load reporting for gRPC-LB v2 CC @markdroth. We still do not have any Envoy implementation though, very much open to any contribution PRs here.

@markdroth
Copy link
Contributor

Just for reference for anyone working on this, the ORCA support in gRPC is documented in gRFC A51: Custom Backend Metrics Support and gRFC A64: xDS LRS Custom Metrics Support.

See also gRFC A58: weighted_round_robin LB policy for how ORCA is used in load balancing.

@soulxu
Copy link
Member

soulxu commented Jun 18, 2024

@Mythra are you still working on this? If not, I'm a little interesting in this issue.

@htuch
Copy link
Member Author

htuch commented Jun 18, 2024

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

@osswangxining
Copy link

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

Any detail info about this? Looking forward to this feature. Any we can help this?

@efimki
Copy link
Contributor

efimki commented Jul 12, 2024

Here is a draft outline of what we are trying to do:

  • Load Reports will be provided to the xDS control plane server via xDS LRS API.
  • Load reports will be used by a new Client Side Weighted Round Robin load balancing policy to dynamically calculate host weights on the client side. Inline reporting enables sub second load balancing reaction times (depending on backend load measurement and reporting intervals), a critical requirement for customers with coordinated and spiky traffic workloads.
  • Using these load reports, Envoy proxies will be able to implement load balancing policies that vary endpoint load balancing weights according to backend load reports.

More details are here.

@wbpcode
Copy link
Member

wbpcode commented Jul 18, 2024

Basically, I think there are two different part works:

  1. bridge the orca report and LRS.
  2. make the lb aware the orca.

I personally think we should only provide simplest support to the common metrics: cpu, mem, application_utilization, etc. first. These attributes could cover most cases.

The named_metrics, utilization, and request_cost may has more complex semantics and will bring more heavy overhead. So, I will prefer to only provide simplest implementation first until our users ask more.

@efimki
Copy link
Contributor

efimki commented Jul 25, 2024

I agree with a two parts distinction.

We will start with using orca report for LRS. I agree that common orca metrics cover many cases, however we want to provide our users with flexibility of using named metrics if necessary. The additional complexity of handling named metrics on top of processing orca report is not that high.

wbpcode pushed a commit that referenced this issue Oct 5, 2024
Commit Message: Add support for multiple formats of ORCA headers.
Additional Description: Add support for multiple formats of ORCA
headers. ORCA parsing introduced in
#35422
[Original Design
Proposal](#6614)
[Using ORCA load reports in
Envoy](https://docs.google.com/document/d/1gb_2pcNnEzTgo1EJ6w1Ol7O-EH-O_Ysu5o215N9MTAg/edit#heading=h.bi4e79pb39fe)
Risk Level: Low
Testing: See included unit tests.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: JSON format unsupported on Mobile.

CC @efimki @adisuissa @wbpcode

---------

Signed-off-by: blake-snyder <blakesnyder@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests