Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Operator fails to update node labels #182

Closed
e0ne opened this issue May 14, 2021 · 0 comments · Fixed by #183
Closed

Network Operator fails to update node labels #182

e0ne opened this issue May 14, 2021 · 0 comments · Fixed by #183
Labels
bug Something isn't working

Comments

@e0ne
Copy link
Collaborator

e0ne commented May 14, 2021

What happened:

If MOFED deployment fails and pod is in CrashLoopBackOff Network operator fails to update node label.

What you expected to happen:

"network.nvidia.com/operator.mofed.wait" label should be updated without any errors in logs.

Logs:

  • Network Operator version: 0.5.0
  • Logs of Network Operator controller:
github.com/Mellanox/network-operator/controllers.(*NicClusterPolicyReconciler).updateNodeLabels(0xc0007e4040, 0xc0006bf200, 0x1773bf7, 0x8)
        /root/network-operator/controllers/nicclusterpolicy_controller.go:139 +0x974
github.com/Mellanox/network-operator/controllers.(*NicClusterPolicyReconciler).Reconcile(0xc0007e4040, 0x1973818, 0xc000513080, 0x0, 0x0, 0xc000552270, 0x12, 0xc000513080, 0xc000032000, 0x164f8e0, ...)
        /root/network-operator/controllers/nicclusterpolicy_controller.go:113 +0x81c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00079c500, 0x1973770, 0xc0005a6000, 0x1619ae0, 0xc00092cea0)
        /root/network-operator/.gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.1/pkg/internal/controller/controller.go:297 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00079c500, 0x1973770, 0xc0005a6000, 0xc0006f3600)
        /root/network-operator/.gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.1/pkg/internal/controller/controller.go:252 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2(0x1973770, 0xc0005a6000)
        /root/network-operator/.gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.1/pkg/internal/controller/controller.go:215 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0006f3750)
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00090df50, 0x19440a0, 0xc00071a0c0, 0xc0005a6001, 0xc0003f4120)
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0006f3750, 0x3b9aca00, 0x0, 0x9ac901, 0xc0003f4120)
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x1973770, 0xc0005a6000, 0xc0002ac080, 0x3b9aca00, 0x0, 0x1)
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x1973770, 0xc0005a6000, 0xc0002ac080, 0x3b9aca00)
        /root/network-operator/.gopath/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        /root/network-operator/.gopath/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.1/pkg/internal/controller/controller.go:212 +0x40d
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0


@e0ne e0ne added the bug Something isn't working label May 14, 2021
e0ne added a commit to e0ne/network-operator that referenced this issue May 14, 2021
We don't need to check for container status if there is no running
containers inside MOFED pod.

Closes Mellanox#182
e0ne added a commit to e0ne/network-operator that referenced this issue May 14, 2021
We don't need to check for container status if there is no running
containers inside MOFED pod.

Closes Mellanox#182

Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
@e0ne e0ne mentioned this issue May 14, 2021
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant