-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect controller from becoming unschedulable #214
Conversation
215c3e2
to
df4a57a
Compare
ce6500d
to
1310320
Compare
a116aea
to
f601ceb
Compare
models/src/constants.rs
Outdated
@@ -47,3 +47,6 @@ pub const CONTROLLER_DEPLOYMENT_NAME: &str = "brupop-controller-deployment"; | |||
pub const CONTROLLER_SERVICE_NAME: &str = "brupop-controller-server"; // The name for the `svc` fronting the controller. | |||
pub const CONTROLLER_INTERNAL_PORT: i32 = 8080; // The internal port on which the the controller service is hosted. | |||
pub const CONTROLLER_SERVICE_PORT: i32 = 80; // The k8s service port hosting the controller service. | |||
pub const BRUPOP_CONTROLLER_PRIORITY_CLASS: &str = "brupop-controller-high-priority"; | |||
pub const BRUPOP_CONTROLLER_PREEMPTION_POLICY: &str = "Never"; | |||
pub const BRUPOP_CONTROLLER_PRIORITY_VALUE: i32 = 1000000; // The controller priority class value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this priority value determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value can be set between 0 to 1 billion. The reasons are:
- I assume controller pod is needed to come back soon than other pods except critical pods like system related. I went through priorityclass document, and they usually set 1 million to present a high priority value but not critical.
- We don't know what priority strategies the customers are using to set up their k8s resources. I don't want to make controller take precedence over customers' critical k8s resources. After looking a lot priorityclass github examples, they usually set their high priority and critical values larger than 10 million. So for more safety I downgrade the value to 1 million here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to elaborate a bit on the constant in a comment.
controller/src/controller.rs
Outdated
|
||
shadows.sort_by(|a, b| a.compare_crash_count(b)); | ||
for brs in shadows { | ||
sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't actually sort the vector, since we are cloning it as input rather than using a mutable reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember I changed it to let sorted_shadows = sort_shadows(shadows.clone(), &get_associated_bottlerocketshadow_name()?)?;
. Maybe I was doing wrong with rebase. Thanks I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more idiomatic here to use a mutable reference as input to the function. You can just return an empty Result<()>
to indicate whether something or not went wrong.
controller/src/controller.rs
Outdated
shadows.sort_by(|a, b| a.compare_crash_count(b)); | ||
|
||
// move the shadow which associated bottlerocketshadow node hosts controller pod to the last | ||
for (position, brs) in shadows.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider using shadows.iter().position()
. This would also allow you to use an Option
rather than having to use usize::MAX
as a sentinel value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, i just remembered why I didn't use shadows.iter().position()
. When I was trying to do like this, the position function showed it was unable to handle result value as input. Actually the position function can't handle result
which is brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)?
let associated_brs_node_position = shadows.iter().position(|brs| brs.metadata.name.as_ref().context(error::MissingBottlerocketShadowName)? == associated_brs_name).context("")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have kube::ResourceExt
imported, you could just do brs.name()
which technically panics if the name is missing. Not great, but there are likely bigger problems if the name is missing from objects.
Alternatively, if the name is missing you can probably skip the item rather than returning early:
let associated_brs_node_position = shadows
.iter()
.position(|brs| brs.metadata.name.as_ref() == Some(associated_brs_name))
I think it's best to lean on Option
which the compiler can check rather than using a sentinel value, even though we can logically deduce that your sentinel would work.
f601ceb
to
5820a09
Compare
f26ec8d
to
ba26dfd
Compare
shadows.sort_by(|a, b| a.compare_crash_count(b)); | ||
for brs in shadows { | ||
sort_shadows(&mut shadows, &get_associated_bottlerocketshadow_name()?)?; | ||
for brs in shadows.drain(..) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to use drain
controller/src/controller.rs
Outdated
/// logic1: sort shadows by crash count | ||
/// logic2: move the shadow which associated bottlerocketshadow node hosts controller pod to the last | ||
#[instrument(skip())] | ||
fn sort_shadows(shadows: &mut Vec<BottlerocketShadow>, associated_brs_name: &str) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this function cannot return an error value. You can probably remove the return type.
As part of prevent controller from becoming unschedualable, we redesign the responsibilities of controler and agent. Basically new design makes agent be able to complete a node update even lack controller's signal.
As part of Protect controller from becoming unschedulable, this part can make brupop Update the node which hosts controller last. Therefore, this approach can reduce the effect when controller pods is dead as much as possible.
As part of Protect controller from becoming unschedulable, this approach means to give controller a high priority so that controller may be scheduled sooner than Pods with lower priority if its scheduling requirements are met.
ba26dfd
to
cb71fd3
Compare
@@ -37,4 +37,7 @@ pub enum Error { | |||
|
|||
#[snafu(display("Controller failed due to internal assertion issue: '{}'", source))] | |||
Assertion { source: serde_plain::Error }, | |||
|
|||
#[snafu(display("Unable to get host controller pod node name: {}", source))] | |||
GetNodeName { source: std::env::VarError }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note here this GetNodeName
didn't show up in the adjust error module, you might want to reimplement this error when you merge two PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I would like to merge this PR first and then solve the conflict on the PR you mentioned above. Thanks!
Issue number:
#14
Description of changes:
Testing done:
The node which hosts controller pod is
ip-192-168-131-7.us-west-2.compute.internal The priority class value is
1000000`Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.