Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix priority handling for same-pf VFgroups. #196

Merged
merged 1 commit into from
Feb 17, 2022

Conversation

mskrocki
Copy link
Contributor

This change fixes VFGroups assigned to nodes based
on the policies priorities: highest priority (lowest
value of priority) should be the only one present in the
SriovNetworkNodeState.spec.

Additionally, for same priority policies we discard
overlapping VF ranges, only the highest priority is
present.

FIXES: #194

@zshi-redhat
Copy link
Collaborator

/cc @SchSeba @pliurh

api/v1/helper.go Outdated
// configured with the #-notation in pfName and 2. SR-IOV policies have the same
// priority.
func (iface Interface) mergeConfigs(input *Interface, equalPriority bool) {
if !equalPriority {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would still want to allow VF partition policies with different priority numbers, for example:

Policy 1:

spec:
  deviceType: netdevice
  resourceName: kernel-device
  numVfs: 8
  priority: 50
  nicSelector:
    pfNames:
    - "ens1f0#0-3"

Policy 2:

spec:
  deviceType: vfio-pci
  resourceName: dpdk-device
  numVfs: 8
  priority: 70
  nicSelector:
    pfNames:
    - "ens1f0#4-7"

With current logic, policy-2 won't be applied if I understand correctly, but we should allow the policy-2 vfGroup be created since it doesn't conflict with policy-1 configuration.

This means the conditional check !equalPriority shall not be applied to vfGroup merging in line 308.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From logical point of view I agree, these policies could be merged, but on the other hand wont that complicate the "priority's" field meaning? Now it is simple, merge only when same priority.
Why those policies cannot have same priority? If user explicitly defined the partitioning, the intention is to merge these, so in my opinion, they should have same priority.

Copy link
Collaborator

@zshi-redhat zshi-redhat Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is it may break existing deployment if we change current logic. For example, if user already has VF partition + different priorities settings, upgrade of operator will result in unexpected result.

I agree with you on that priority field may not be necessary in most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing users can be instructed via documentation (or release notes), we are not changing any APIs. They would only have to adjust their priorities.
Additionally, this repo docs do not specify how exactly the priority field works, probably smth I can add to this PR.

Copy link
Collaborator

@zshi-redhat zshi-redhat Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing users can be instructed via documentation (or release notes), we are not changing any APIs.

User may not want to change their configuration in a production environment that is working, to me this is like a regression I'd like to avoid.

They would only have to adjust their priorities.

In my opinion, priority shall apply when there is conflict between policies (only apply high priority one over the other), but not when there is no conflict.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this in Today's meeting.

In general, the priority field is not well documented (except of the one-line in the sriovnetworknodepolicy CRD)
more so, the interaction between VF partitioning and priority when merging configurations.

so we should definitely document this feature :)

AFAIU, today, if you use VF partitioning with multiple policies, if there is an overlap in VF group you might end up with an un-expected configuration (lower priority takes precedence)

I can think of additional edge cases, where you apply configuration on a per device basis and not per vf group like switchdev. e.g two sriovnetworkpolicies one with switchdev enabled the other without, same PF with different vf groups. (not sure if it would fail or not)

IMHO, if one wants to partition a PF's VFs to different resources he should do that in the same priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to be backwards compatible. Added README changes.

This change fixes VFGroups assigned to nodes based
on the policies priorities: highest priority (lowest
value of priority) should be the only one present in the
SriovNetworkNodeState.spec. Exception is made for policies
with non-overlapping VFGroups, which will be merged.

For same priority policies we discard overlapping VF ranges,
only the highest priority is present.

Added description of this behaviour to README.
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

// - skip group with same ResourceName,
// - skip overlapping groups (use only highest priority)
for _, gr := range iface.VfGroups {
if gr.ResourceName == input.VfGroups[0].ResourceName || gr.isVFRangeOverlapping(input.VfGroups[0]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't merge when a higher priority policy(input.vfGroup[0]) with same resourceName or overlapping VF group, but it should overwrite, right?

if gr.ResourceName == input.VfGroups[0].ResourceName ... {

    if !equalPriority {
        input.VfGroups[0] = gr
    }

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind what is used outside mergeConfigs, i.e. *input is the final object we use, not iface. So, input.VfGroups[0] already contains the highest priority group, and here we need to verify the other groups do not overlap or have same resourceName. If they dont we add them to the input.VfGroups list (line 311).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.VfGroups[0] already contains the highest priority group - right.
when the highest priority group overlaps or has the same resourceName as lower priority one (iface), should it override (because highest priority wins)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because this is a conflicting situation: either we have duplicate resourceName, or there is VF range overlap. In such cases we will use the high priority one (lower value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add that "override" in this "for" is done by just skipping given group and not adding it to the final "input" object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because this is a conflicting situation: either we have duplicate resourceName, or there is VF range overlap. In such cases we will use the high priority one (lower value).

Yes, this is a conflicting situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any other doubts or questions about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘d expect line 314 will also be removed based on the above change, will it?
I don't have other comments beside this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, line 314 is required as per my reply below. We cannot update mtu and numvfs blindly for every call of mergeConfigs i.e. for case where we did NOT merge (enter "continue", line 308, every time), but just override we need to preserve the original values.

input.VfGroups = append(input.VfGroups, gr)
}

if !equalPriority && !m {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not needed as we would always want to update mtu and numvfs to highest number.
For example:

if equalPriority {
      // update mtu and numVfs 
      // because higher value from same priority policies is applied
}
if !equalPriority {
      // update mtu and numVfs 
      // because value from higher priority policy gets applied
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, we do not want to merge mtu and numvfs when we just overwrite i.e. m=false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add that as mentioned above, *input is the final object we use, not iface and it already contains the highest priority policy values for mtu and NumVfs.

@SchSeba
Copy link
Collaborator

SchSeba commented Jan 3, 2022

/lgtm

@github-actions github-actions bot added the lgtm label Jan 3, 2022
@mskrocki
Copy link
Contributor Author

ping, could we merge this?

Copy link
Collaborator

@zshi-redhat zshi-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mskrocki sorry for late review.
got a follow-up questions in the above comment.

// - skip group with same ResourceName,
// - skip overlapping groups (use only highest priority)
for _, gr := range iface.VfGroups {
if gr.ResourceName == input.VfGroups[0].ResourceName || gr.isVFRangeOverlapping(input.VfGroups[0]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.VfGroups[0] already contains the highest priority group - right.
when the highest priority group overlaps or has the same resourceName as lower priority one (iface), should it override (because highest priority wins)?

@mskrocki
Copy link
Contributor Author

/ping

@zshi-redhat
Copy link
Collaborator

/cc @pliurh @bn222 PTAL.

@pliurh pliurh merged commit c1f0ddc into k8snetworkplumbingwg:master Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SriovNetworkNodeConfigPolicy priority inconsistent
5 participants