Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel drain #555

Merged
merged 7 commits into from
Mar 15, 2024
Merged

Parallel drain #555

merged 7 commits into from
Mar 15, 2024

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Dec 5, 2023

The PR moves the draining from the daemon to the main controller it's based on the work from @e0ne and #552.

Also, the PR introduced a new feature to allow multiple nodes to be drained in parallel

base on #479

Copy link

github-actions bot commented Dec 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Dec 5, 2023

please first review #553

@SchSeba SchSeba mentioned this pull request Dec 5, 2023
pkg/host/store.go Fixed Show fixed Hide fixed
Copy link

github-actions bot commented Dec 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Dec 5, 2023

Pull Request Test Coverage Report for Build 8249687295

Details

  • 459 of 853 (53.81%) changed or added relevant lines in 14 files are covered.
  • 16 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+1.5%) to 37.422%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/webhook/validate.go 15 19 78.95%
pkg/utils/cluster.go 29 35 82.86%
cmd/sriov-network-config-daemon/start.go 0 7 0.0%
pkg/webhook/webhook.go 0 13 0.0%
api/v1/helper.go 12 28 42.86%
pkg/platforms/mock/mock_platforms.go 22 39 56.41%
controllers/helper.go 31 51 60.78%
pkg/daemon/daemon.go 46 74 62.16%
pkg/drain/drainer.go 70 108 64.81%
controllers/drain_controller.go 216 307 70.36%
Files with Coverage Reduction New Missed Lines %
pkg/platforms/mock/mock_platforms.go 1 68.18%
pkg/platforms/openshift/openshift.go 1 0.0%
controllers/sriovnetworkpoolconfig_controller.go 3 63.7%
pkg/daemon/daemon.go 11 48.01%
Totals Coverage Status
Change from base Build 8234792093: 1.5%
Covered Lines: 4799
Relevant Lines: 12824

💛 - Coveralls

Copy link

github-actions bot commented Dec 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

1 similar comment
Copy link

github-actions bot commented Dec 6, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Dec 7, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Dec 8, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Dec 8, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

1 similar comment
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Copy link

github-actions bot commented Mar 5, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 5, 2024

I think I answered all the comments. please take another look when you have time

// configure logs
reqLogger := log.FromContext(ctx)
reqLogger.Info("Reconciling Drain")
reqLogger.V(2).Info("node locked for drain controller", "nodeName", req.Name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this log message.

}

func (dr *DrainReconcile) getObject(ctx context.Context, req ctrl.Request, object client.Object) error {
// configure logs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to remove this comment :)

func (dr *DrainReconcile) getObject(ctx context.Context, req ctrl.Request, object client.Object) error {
// configure logs
reqLogger := log.FromContext(ctx)
reqLogger.Info("checkForNodeDrain():")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong function name in the log message

- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: ["machineconfiguration.openshift.io"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to remove "machineconfiguration.openshift.io" from the role?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for nodes yes please don't need it anymore as it's part of the operator now

@@ -110,3 +111,53 @@ func openshiftControlPlaneTopologyStatus(c client.Client) (configv1.TopologyMode
}
return infra.Status.ControlPlaneTopology, nil
}

func ObjectHasAnnotationKey(obj metav1.Object, annoKey string) bool {
// Check if node already contains annotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this comment to the docstring?

nit: the function can be simplified

func ObjectHasAnnotationKey(obj metav1.Object, annoKey string) bool {
	_, hasKey := obj.GetAnnotations()[annoKey]
	return hasKey
}

}

func ObjectHasAnnotation(obj metav1.Object, annoKey string, value string) bool {
// Check if node already contains annotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the docstring?

}

if dn.nodeState.GetGeneration() == latest {
if dn.currentNodeState.GetGeneration() == latest && !utils.ObjectHasAnnotation(dn.desiredNodeState, consts.NodeStateDrainAnnotationCurrent, consts.DrainComplete) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move !utils.ObjectHasAnnotation(dn.desiredNodeState, consts.NodeStateDrainAnnotationCurrent, consts.DrainComplete) part to the helper function with a meaningful name, e.g. isDrainInProgress()

if !dn.disableDrain {
if err := drain.RunCordonOrUncordon(dn.drainer, dn.node, false); err != nil {
if reqReboot {
log.Log.Info("handleDrain(): apply 'Reboot_Required' label for node")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotation

pausePatch := []byte("{\"spec\":{\"paused\":false}}")
if _, err := dn.platformHelpers.GetMcClient().MachineconfigurationV1().MachineConfigPools().Patch(context.Background(), dn.mcpName, types.MergePatchType, pausePatch, metav1.PatchOptions{}); err != nil {
log.Log.Error(err, "completeDrain(): failed to resume MCP", "mcp-name", dn.mcpName)
log.Log.Info("handleDrain(): apply 'Reboot_Required' label for nodeState")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotation

log.Log.Info("handleDrain(): apply 'Reboot_Required' label for node")
err := utils.AnnotateNode(vars.NodeName, consts.NodeDrainAnnotation, consts.RebootRequired, dn.client)
if err != nil {
log.Log.Error(err, "applyDrainRequired(): Failed to annotate node")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong func name in the log message

}

if err := dn.annotateNode(vars.NodeName, annoIdle); err != nil {
log.Log.Error(err, "completeDrain(): failed to annotate node")
log.Log.Info("handleDrain(): apply 'Drain_Required' label for nodeState")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotation

return value, nil
}

func (dr *DrainReconcile) checkForNodeDrain(ctx context.Context, node *corev1.Node) (*reconcile.Result, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is a bit confusing for me. Rename to tryDrainNode?


return selectedNpcl[0], nodeList.Items, nil
} else {
logger.V(2).Info("using \"default\" sriovNetworkPool with MaxUnavailable of one", "pool", *defaultNpcl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention default pool in the log message? Maybe we should change the message to say something like: node doesn't belong to any pool, using default drain configuration

Copy link

github-actions bot commented Mar 7, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 7, 2024

Done @ykulazhenkov with the comments thanks!

be aware that in the daemon I use the background context as we don't have context there.
that will change on my next PR where we change the daemon to use controller-runtime manager

Copy link

github-actions bot commented Mar 7, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@ykulazhenkov
Copy link
Collaborator

Done @ykulazhenkov with the comments thanks!

be aware that in the daemon I use the background context as we don't have context there. that will change on my next PR where we change the daemon to use controller-runtime manager

Thanks @SchSeba for addressing my comments.
Please, check the last one #555 (comment)

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 10, 2024

Hi @ykulazhenkov done thanks!

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 11, 2024

Hi @zeeke @e0ne @ykulazhenkov when you can please take another look on the PR I will like to try and get this one merge soon.

Thanks!

@ykulazhenkov
Copy link
Collaborator

Thanks, @SchSeba.
I think it is important to have agreement about this comment #555 (comment) before we merge.
Can you comment?

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 12, 2024

Hi @ykulazhenkov I did the requested change for the empty node selector let me know if that works for you :)

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SchSeba,
LGTM

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SchSeba
Copy link
Collaborator Author

SchSeba commented Mar 15, 2024

Thanks for all the help reviewing this complicated feature!

merging after 3 approvals

@SchSeba SchSeba merged commit 1fac942 into k8snetworkplumbingwg:master Mar 15, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants