-
Notifications
You must be signed in to change notification settings - Fork 946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vgpu feature for volcano #2724
Vgpu feature for volcano #2724
Conversation
docs/user-guide/how_to_use_vgpu.md
Outdated
- name: drf | ||
- name: predicates | ||
arguments: | ||
predicate.VGPU4pdEnable: true # enable vgpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the
vgpu
andgpu share
cannot be enabled at the same time, a detection should be done when the scheduler starts. - I'm not sure if the name of a commercial company like
4pd
can be used directly in cncf projects. @Thor-wl @william-wang Do you have any suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the
vgpu
andgpu share
cannot be enabled at the same time, a detection should be done when the scheduler starts.- I'm not sure if the name of a commercial company like
4pd
can be used directly in cncf projects. @Thor-wl @william-wang Do you have any suggestion
Yes, you are right. suggest use a common word like predicate.VGPUEnable
if !ok { | ||
return nil | ||
} | ||
handshake := node.Annotations[VolcanoVGPUHandshake] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the second return value missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
func (gs *GPUDevices) FilterNode(pod *v1.Pod) (bool, error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an invalid blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
v, ok = pod.Spec.Containers[i].Resources.Limits[resourceMem] | ||
singledevice = true | ||
} | ||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to use else
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic won't be the same if I use (else) instead of (if ok)
pkg/scheduler/api/resource_info.go
Outdated
@@ -87,7 +87,7 @@ func NewResource(rl v1.ResourceList) *Resource { | |||
if v1helper.IsScalarResourceName(rName) { | |||
ignore := false | |||
for _, val := range IgnoredDevicesList { | |||
if strings.Compare(val, rName.String()) == 0 { | |||
if len(val) > 0 && strings.Contains(rName.String(), val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to change compare
to contains
for judgment? Are there any special considerations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, I thought using strings.Contains will save me the trouble of trimming string.
resources: | ||
limits: | ||
volcano.sh/vgpu-number: 2 # requesting 2 vGPUs | ||
volcano.sh/vgpu-memory: 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archlitchi the vgpu-number
and vgpu-memory
are allowed to configured at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and currently, you have to set volcano.sh/vgpu-number in order for the vgpu to work.
@@ -0,0 +1,225 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change the directory name from vgpu4pd
to vgpu
to make it more common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/scheduler/api/node_info.go
Outdated
@@ -26,6 +26,7 @@ import ( | |||
|
|||
"volcano.sh/apis/pkg/apis/scheduling/v1beta1" | |||
"volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare" | |||
"volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu4pd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change the package name from vgpu4pd
to vgpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -103,7 +105,16 @@ func GetInqueueResource(job *api.JobInfo, allocated *api.Resource) *api.Resource | |||
if api.IsCountQuota(rName) || !v1helper.IsScalarResourceName(rName) { | |||
continue | |||
} | |||
|
|||
ignore := false | |||
for _, ignoredDevice := range api.IgnoredDevicesList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the IgnoreDevicesList used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a list of custom-defined resources, that vc-scheduler should ignore, instead of searching corresponding device-plugin to process it
/priority important-soon |
klog.Errorf("AllocateToPod failed %s", err.Error()) | ||
return | ||
for _, val := range api.RegisteredDevices { | ||
if nodeInfo.Others[val].(api.Devices).HasDeviceRequest(pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether a type assertion is required here?
if err != nil { | ||
return err | ||
for _, val := range api.RegisteredDevices { | ||
fit, err = node.Others[val].(api.Devices).FilterNode(task.Pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place is also similar, whether a type assertion is required?
Add vgpu feature to volcano
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Implement vgpu feature from k8s-vgpu-scheduler(https://github.com/4paradigm/k8s-vgpu-scheduler) by using devices API, better device-sharing mechanism for vgpu, see docs/user-guide/how_to_use_vgpu.md for more detail.
device-plugin part, see volcano-sh/devices#33