Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Scheduling Readiness #3658

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

ykcai-daniel
Copy link
Contributor

Updated PR of #3612, a gated task no longer blocks the job. Do not use new state.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 7, 2024
@googs1025
Copy link
Member

googs1025 commented Aug 7, 2024

Does this change have the same purpose as this PR #3612 ?

@Monokaix
Copy link
Member

Monokaix commented Aug 8, 2024

Hi,I think there is no necessary to submit two PRs, just based the previous PR and update it OK,and please also update the design doc.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2024
@Monokaix
Copy link
Member

Please refer to kube-scheduler to add a featuregate to enable/disable this feature.

@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2024
@ykcai-daniel
Copy link
Contributor Author

@Monokaix PR updated. I will also update the design proposal #3581 to document the changes.

@ykcai-daniel
Copy link
Contributor Author

ykcai-daniel commented Aug 15, 2024

PR summary: Pod Scheduling gates is a new feature in K8S, which uses the .spec.schedulingGates field of Pod to signal that a Pod is not ready for scheduling (Pod will be in pending state when gated). This feature is often used to implement customized resource managers and we want to support it for compatibility. Main Changes involve:

  1. Scheduling Actions: Job with scheduling gated tasks mainly exist in the Inqueue state. In allocate action, if the pod of a task is scheduling gates, it will be skipped for allocation and not bound to a node. Consequently, if too many tasks are scheduling gates, the job will be gang-unschedulable
  2. Plugins: Since scheduling gated pods are not ready to be allocated, we don't want scheduling gated tasks to consume inqueue resources, making other jobs uninqueuable. Therefore, proportion, capacity and overcommit plugins have to be changed to deduct the resources of scheduling gated tasks
  3. Pod Conditions: When each session close, the Condition of scheduling gated pods will not be updated, showing the Schedulable False condition with reason ScheduleGated, which is created by api-server

Future Works that are not covered in this PR:

  1. Now, we do not support removing scheduling gates by modifying the template of a vcjob because supporting this feature involve major changes to controller and Pod-level gates removal is sufficient for most usecase. In the future, if there is a need for task template support, we can add it.
  2. Events and Correct Message: Now, if a Job is gang unschedulable due to scheduling gates, the reason in condition is "NoEnoughResources". We might need better reason and message for scheduling gates.

@ykcai-daniel
Copy link
Contributor Author

ykcai-daniel commented Aug 30, 2024

Commit Author: /assign @ykcai-daniel

@ykcai-daniel
Copy link
Contributor Author

ykcai-daniel commented Aug 30, 2024

/assign @Monokaix

@ykcai-daniel
Copy link
Contributor Author

/assign @googs1025

@Monokaix
Copy link
Member

/assign @lowang-bh @hwdef

@@ -228,6 +230,17 @@ func (ti *TaskInfo) ClearLastTxContext() {
ti.LastTransaction = nil
}

// Return if the pod of a task is scheduling gated by checking if length of sch gates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments: if length of sch gates is zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the typo. I will fix the commit

@@ -228,6 +230,17 @@ func (ti *TaskInfo) ClearLastTxContext() {
ti.LastTransaction = nil
}

// Return if the pod of a task is scheduling gated by checking if length of sch gates
// When the Pod is not yet created or sch gates field not set, return false
func (ti *TaskInfo) SchedulingGated() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about store this value in taskInfo, instead of a function calling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a field schGated into taskInfo

@@ -142,7 +142,7 @@ func (cp *capacityPlugin) OnSessionOpen(ssn *framework.Session) {
}

if job.PodGroup.Status.Phase == scheduling.PodGroupInqueue {
attr.inqueue.Add(job.GetMinResources())
attr.inqueue.Add(job.DeductSchGatedResources(job.GetMinResources()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to do the calcalation in GetMinResources function. Let GetMinResources return the min resource to run the job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this function into GetMinresources have some downsides. The motivation of DeductSchGatedResources is to deduct the resources of scheduling gated tasks in a job when calculating inqueued resources so that it will not block other jobs from being inqueued. Therefore, this function is will only be used in the queue-related plugins (proporation, capacity, overcommit).

On the other hand, GetMinResources is a function that returns the minimum resource requirement of a Job, which should be an immutable quantity determined at the creation of the PodGroup. This quantity should not change regardless of scheduling gated tasks. Unlike DeductSchGatedResources, apart from inqueue resource calculation, GetMinResources is also used in other places, such as GetElasticResources

If we include DeductSchGatedResources into GetMinresources and tightly couple these two functions, this will bring some runtime information about the state of Pods into a function that should return immutable information, making it bug-prone when future developers use it.

@ykcai-daniel
Copy link
Contributor Author

@lowang-bh cc @Monokaix Pull request updated. Please review the changes. Thank you!


// ti.SchGated is initialized in NewTaskInfo
func (ti *TaskInfo) SchedulingGated() bool {
return ti.SchGated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this filed be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this function, there is no need to export it. I will change it to schGated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to use this filed instead of a function calling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an optional way, but seems there is no need to expose an extra filed of task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable.

@@ -282,18 +282,18 @@ func (cp *capacityPlugin) OnSessionOpen(ssn *framework.Session) {
queue := ssn.Queues[queueID]
// If no capability is set, always enqueue the job.
if attr.realCapability == nil {
klog.V(4).Infof("Capability of queue <%s> was not set, allow job <%s/%s> to Inqueue.",
klog.V(3).Infof("Capability of queue <%s> was not set, allow job <%s/%s> to Inqueue.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change these log level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this log level when debugging the capcacity plugin. Will change is back

// If resource is less than gated resources, return zero;
// Gated resources might be more than minResource when total rep of each task > minAvailable
func (ji *JobInfo) DeductSchGatedResources(res *Resource) *Resource {
result := res.Clone()
Copy link
Member

@Monokaix Monokaix Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check whether res is zero to have a shortcut here, because in most case task is not gated.

@ykcai-daniel
Copy link
Contributor Author

For more details about the design of the PR, see #3581 . Please also merge the design doc.

// If resource is less than gated resources, return zero;
// Note: The purpose of this functionis to deduct the resources of scheduling gated tasks
// in a job when calculating inqueued resources so that it will not block other jobs from being inqueued.
func (ji *JobInfo) DeductSchGatedResources(res *Resource) *Resource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should start with DeductSchGatedResources to be consistent with func name.

// Note: The purpose of this functionis to deduct the resources of scheduling gated tasks
// in a job when calculating inqueued resources so that it will not block other jobs from being inqueued.
func (ji *JobInfo) DeductSchGatedResources(res *Resource) *Resource {
result := res.Clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no need to clone res because no one will also change it, and also we can change to:

schGatedResource := ji.GetSchGatedPodResources()
// Most jobs do not have any scheduling gated tasks, hence we add this short cut
if schGatedResource.IsEmpty() {
    return res
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we don't clone res if scheduling gated resource is zero.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 4, 2024
@volcano-sh-bot volcano-sh-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 6, 2024
@ykcai-daniel ykcai-daniel force-pushed the sch-gates-update branch 2 times, most recently from 9aee58a to 64b16e1 Compare September 6, 2024 01:36
@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 6, 2024
return CreateDeploymentGated(ctx, name, rep, img, req, []v1.PodSchedulingGate{})
}

// CreateDeployment creates a new deployment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateDeployment -》 CreateDeploymentGated

@ykcai-daniel ykcai-daniel force-pushed the sch-gates-update branch 3 times, most recently from 82cef03 to 6ab5464 Compare September 10, 2024 04:24
Signed-off-by: ykcai-daniel <1155141377@link.cuhk.edu.hk>
@Monokaix
Copy link
Member

/lgtm
/approve

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 11, 2024
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 11, 2024
@volcano-sh-bot volcano-sh-bot merged commit 5d3106d into volcano-sh:master Sep 11, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants