Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline task if task's request resource less than the releasing resource of node during performing allocate action #541

Merged
merged 1 commit into from
Dec 19, 2019

Conversation

sivanzcw
Copy link
Contributor

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 20, 2019
@TravisBuddy
Copy link

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 3aa881f0-0b81-11ea-9cd6-8f216fa7db85

if err := stmt.Pipeline(task, node.Name); err != nil {
glog.Errorf("Failed to pipeline Task %v on %v",
task.UID, node.Name)
if err := ssn.Pipeline(task, node.Name); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any more info on why change it from stmt to ssn ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reclaim action is enabled after allocate action. The cluter situation is as below

serial node name resource
1 node1 4c8g
2 node2 4c8g
serial queue name weight quota status
1 default 1 0.8c 1.5M overused
2 queue1 100000 7c 8g active
serial job name pods number minA queue status
1 joba 7 1 default all running
2 jobb 7 7 queue1 all pending

There are two jobs in the cluster, joba and jobb. Joba was placed in default queue. Jobb was placed in queue1 queue. Joba has 7pods Running. Jobb has 7pods pending. default queue was overused. Pods in queue1 will try to reclaim resource from defualt queue.

  • In reclaim action, podb-1 in jobb evicted pod poda-1 in joba, poda-1 was originally at node node1, the scheduling loop ends.

  • In the next scheduling loop. In allocate action, podb-1 want to be pipelined to node node1, but gang-restriction of jobb was not meet, the pipeline action will be discard. In allocate action no pod was pipelined, though there are releasing resources in cluster.

  • In relcaim action of this scheduling loop, podb-1 in jobb will try to evicted other pods in joba.

  • Finally, podb-1 will evicted 6 pods from joba.

  • So if there are releasing resources in cluster, pod who has the higher priority, may should be pipelined to the node, regardless of whether the gang restriction of job of the pod was meet, in case that, the pod will evict other pods in subsequent actions.

@k82cn
Copy link
Member

k82cn commented Dec 16, 2019

/approve

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2019
@TravisBuddy
Copy link

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 53495a40-1fd6-11ea-ba47-7f442aed9c1e

@TravisBuddy
Copy link

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 3fc95e60-2077-11ea-830b-038034041c48

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, sivanzcw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TravisBuddy
Copy link

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: b9b37c50-207d-11ea-830b-038034041c48

…urce of node during performing allocate action
@TravisBuddy
Copy link

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 39863f60-2085-11ea-830b-038034041c48

@TravisBuddy
Copy link

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 03bebdc0-2086-11ea-830b-038034041c48

@k82cn
Copy link
Member

k82cn commented Dec 19, 2019

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2019
@volcano-sh-bot volcano-sh-bot merged commit ba6677b into volcano-sh:master Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants