Skip to content

Commit

Permalink
Fix retry handling in case IP pool is empty
Browse files Browse the repository at this point in the history
When a pod is created, and the IP pool is empty,
the pod will stuck forever in creation state, even if IPs are released.

To simulate the bug:

Create a pool with just 1 available IP (others excluded; first and last reserved).
Create 2 pods:
a. One pod will take the IP.
b. The second pod will remain in "creation" state.
Wait around 1 minute and then delete the Running pod.
Bug: The pending pod stays pending forever, even though an IP is available.
Without the fix, it update / add Pod retry mechanism isn't triggered at all, because it will be triggered
only if a non nil error is returned.

Note:
Even with the fix, the retry mechanism is capped in 15 retries, which total for around 15m.
After 15m the pod will be pending forever, even if IPs are released, because its annotations
won't be updated anymore.

Signed-off-by: Or Shoval <oshoval@redhat.com>
  • Loading branch information
oshoval committed Aug 9, 2024
1 parent 942bf1d commit ab8d89a
Showing 1 changed file with 2 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ func (h *networkClusterControllerEventHandler) AddResource(obj interface{}, from
if err != nil {
klog.Infof("Pod add failed for %s/%s, will try again later: %v",
pod.Namespace, pod.Name, err)
return err
}
case factory.NodeType:
node, ok := obj.(*corev1.Node)
Expand Down Expand Up @@ -359,6 +360,7 @@ func (h *networkClusterControllerEventHandler) UpdateResource(oldObj, newObj int
if err != nil {
klog.Infof("Pod update failed for %s/%s, will try again later: %v",
new.Namespace, new.Name, err)
return err
}
case factory.NodeType:
node, ok := newObj.(*corev1.Node)
Expand Down

0 comments on commit ab8d89a

Please sign in to comment.