Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

[feature] Add support for podGroup number limits for one queue #452

Closed
jiaxuanzhou opened this issue Oct 19, 2018 · 22 comments
Closed

[feature] Add support for podGroup number limits for one queue #452

jiaxuanzhou opened this issue Oct 19, 2018 · 22 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Milestone

Comments

@jiaxuanzhou
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug
/kind feature
/kind feature

What happened:
For scenarios of BigData and ML, the batch jobs often summited in queues, we need to provide the capacity to control the number of running pod groups in queues with limited hardware resources.
What you expected to happen:
Add a parameter in Queue API:

type QueueSpec struct {
	Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
         //  PodGroupNumber defines the max number of running podGroup in one queue, default to infinite.
        PodGroupNumber `json:"podgroupnumber,omitempty" protobuf:"bytes,1,opt,name=podgroupnumber"`
}

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@jiaxuanzhou
Copy link
Contributor Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 19, 2018
@k82cn
Copy link
Contributor

k82cn commented Oct 19, 2018

/sig scheduling
/milestone v0.3
/assign

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 19, 2018
@k82cn k82cn added this to the v0.3 milestone Oct 19, 2018
@k82cn k82cn modified the milestones: v0.3, v0.4 Dec 21, 2018
@k82cn k82cn modified the milestones: v0.4, v0.5 Jan 15, 2019
@hex108
Copy link
Contributor

hex108 commented Mar 23, 2019

@k82cn I could start to work on it using admission webhook if you have not started.

@k82cn
Copy link
Contributor

k82cn commented Mar 24, 2019

/assign hex108

I'm not working on that, please go ahead :)

@k82cn
Copy link
Contributor

k82cn commented Apr 22, 2019

/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Apr 22, 2019
@jiaxuanzhou
Copy link
Contributor Author

jiaxuanzhou commented Apr 22, 2019

@hex108 it is perfect if you also consider the resource quota of one podGroup queue, default to be infinite.

type QueueSpec struct {
	Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
         //  PodGroupNumber defines the max number of running podGroup in one queue, default to infinite.
        PodGroupNumber `json:"podgroupnumber,omitempty" protobuf:"bytes,1,opt,name=podgroupnumber"`
        ResourceQuota *Resource `json:"resourcequota,omitempty`
}

@hex108
Copy link
Contributor

hex108 commented Apr 22, 2019

@jiaxuanzhou There is no resource quota for PodGroup, is there any use case for it?

@jiaxuanzhou
Copy link
Contributor Author

@jiaxuanzhou There is no resource quota for PodGroup, is there any use case for it?
the resource quota is for the queue of the pgs
in prod, the users are teams to run the jobs, but the resources can not be shared fairly even we use the namespace queue
so we can restrict them by claiming the resource limits for the queues in a manual way.

@hex108
Copy link
Contributor

hex108 commented Apr 22, 2019

@jiaxuanzhou You might need something like minResource that mentioned at #813 (comment)?

@jiaxuanzhou
Copy link
Contributor Author

@hex108 actually , I need maxResource for one queue

@hex108
Copy link
Contributor

hex108 commented Apr 22, 2019

@hex108 actually , I need maxResource for one queue

Thanks. Get it now! We'll add it. cc @k82cn

@jiaxuanzhou
Copy link
Contributor Author

@hex108 thx, 👍

@jiaxuanzhou
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Apr 23, 2019
@k8s-ci-robot
Copy link
Contributor

@jiaxuanzhou: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k82cn
Copy link
Contributor

k82cn commented Apr 23, 2019

@jiaxuanzhou , what's the purpose of maxResources? are you trying to avoid too many pending pods? If so, "delay pod creation" maybe helpful.

@jiaxuanzhou
Copy link
Contributor Author

@jiaxuanzhou , what's the purpose of maxResources? are you trying to avoid too many pending pods? If so, "delay pod creation" maybe helpful.

maxResource defines the threshold of the resource for one queue, once the threshold reached, the subsequent jobs will still stay in pending status even the idle resource of the cluster can satisfy them.

@k82cn
Copy link
Contributor

k82cn commented Apr 30, 2019

maxResource defines the threshold of the resource for one queue, once the threshold reached, the subsequent jobs will still stay in pending status even the idle resource of the cluster can satisfy them.

Similar to Quota but for Job and Queue, right? Admission controller + Queue controller maybe better :)

@jiaxuanzhou
Copy link
Contributor Author

maxResource defines the threshold of the resource for one queue, once the threshold reached, the subsequent jobs will still stay in pending status even the idle resource of the cluster can satisfy them.

Similar to Quota but for Job and Queue, right? Admission controller + Queue controller maybe better :)

yeah, Quota for queue.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants