Wait for old pods to terminate before proceeding to Recreate #11917

0xmichalis · 2016-11-15T14:46:46Z

@mfojtik should fix https://bugzilla.redhat.com/show_bug.cgi?id=1369644

cc: @smarterclayton opted for this instead of having a TerminatingReplicas field in the replication controller.

Upstream equivalent PR is kubernetes/kubernetes#36748

0xmichalis · 2016-11-15T14:46:52Z

[test]

0xmichalis · 2016-11-15T14:50:33Z

pkg/deploy/strategy/recreate/recreate.go

+		}
+		return deletionsNeeded == 0, nil
+	}
+	// TODO: Timeout should be config.Spec.Strategy.RecreateParams.TimeoutSeconds - (time.Now - deployerPodCreationTime)


Opened kubernetes/kubernetes#36813 for this. There is another occurence in the deployer code where we need the creation time of the deployer pod (when we set activeDeadlineSeconds for hooks).

mfojtik · 2016-11-15T15:28:52Z

pkg/deploy/strategy/recreate/recreate.go

+	// Watch from the resource version of the list and wait for all pods to be deleted
+	// before proceeding with the Recreate strategy.
+	options.ResourceVersion = podList.ResourceVersion
+	w, err := s.podClient.Pods(from.Namespace).Watch(options)


the watch will handle just 1000 events then it drops

i guess that is ok here

Unless you run 1000+ pods in one deployment, the deployer scales them down, the replication controller reaches rc.spec.replicas == rc.status.replicas == 0 (which means all of them have been marked for deletion) but >1000 pods are not deleted yet.

1000 watch events across all pods in all namespaces (or fewer if the watch cache is configured for a smaller number of events, or if the watch cache is disabled across all resources in all namespaces)... if dropping this watch fails the deployment, you need to handle re-establishment

This watch runs every time for a single deployment.

And watches only the pods of that replication controller

doesn't matter, 1000 event limit still applies across all pods, even when the watch is filtered down. what happens if the watch gets dropped here?

The rollout fails. We can ignore watch errors I guess (fallback to the old behavior).

mfojtik · 2016-11-15T15:31:20Z

pkg/deploy/strategy/recreate/recreate.go

+		return deletionsNeeded == 0, nil
+	}
+	// TODO: Timeout should be config.Spec.Strategy.RecreateParams.TimeoutSeconds - (time.Now - deployerPodCreationTime)
+	config, err := deployutil.DecodeDeploymentConfig(from, s.decoder)


just pass the timeout in the function, no need to decode twice?

mfojtik · 2016-12-06T09:35:07Z

pkg/deploy/strategy/recreate/recreate.go

 		eventClient: client.Core(),
 		getUpdateAcceptor: func(timeout time.Duration, minReadySeconds int32) strat.UpdateAcceptor {
 			return stratsupport.NewAcceptNewlyObservedReadyPods(out, client.Core(), timeout, AcceptorInterval, minReadySeconds)
 		},
 		scaler:       scaler,
 		decoder:      decoder,
 		hookExecutor: stratsupport.NewHookExecutor(client.Core(), tagClient, client.Core(), os.Stdout, decoder),
+		// TODO: Should be config.Spec.Strategy.RecreateParams.TimeoutSeconds - (time.Now - deployerPodCreationTime)


do we have follow up issue created for this?

mfojtik · 2016-12-06T09:38:45Z

pkg/deploy/strategy/recreate/recreate.go

+	defer w.Stop()
+	// Observe as many deletions as the remaining pods and then return.
+	deletionsNeeded := len(podList.Items)
+	condition := func(event watch.Event) (bool, error) {


do we need to verify the object in the event?

I don't think so, we should always get pods back.

mfojtik · 2016-12-06T09:38:58Z

pkg/deploy/strategy/recreate/recreate.go

+		}
+		return deletionsNeeded == 0, nil
+	}
+	// TODO: Timeout should be timeout - (time.Now - deployerPodStartTime)


issue pls and assigne to me :-)

Basically can you handle both of these as part of #12154?

mfojtik · 2016-12-06T09:39:15Z

@Kargakis rebase and LGTM

0xmichalis · 2016-12-06T17:00:35Z

#12157 [test]

0xmichalis · 2016-12-07T01:42:46Z

flake is #12157

0xmichalis · 2016-12-07T14:29:29Z

[merge]

0xmichalis · 2016-12-08T10:39:24Z

yum [merge]

0xmichalis · 2016-12-08T10:48:01Z

#8571 [merge]

0xmichalis · 2016-12-12T12:27:12Z

#8571 [merge]

0xmichalis · 2016-12-12T15:40:58Z

#8502 [merge]

0xmichalis · 2016-12-12T19:32:20Z

#8502 [merge]

0xmichalis · 2016-12-12T19:32:31Z

[test]

openshift-bot · 2016-12-12T19:37:29Z

Evaluated for origin test up to 2a434d1

openshift-bot · 2016-12-12T21:05:33Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12285/) (Base Commit: 9ee2ff6)

0xmichalis · 2016-12-13T09:25:41Z

#10988 [merge]

openshift-bot · 2016-12-13T09:29:16Z

Evaluated for origin merge up to 2a434d1

openshift-bot · 2016-12-13T10:05:34Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12318/) (Base Commit: eb304fd) (Image: devenv-rhel7_5531)

0xmichalis assigned mfojtik Nov 15, 2016

0xmichalis commented Nov 15, 2016

View reviewed changes

mfojtik reviewed Nov 15, 2016

View reviewed changes

openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 26, 2016

mfojtik reviewed Dec 6, 2016

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2016

Wait for old pods to terminate before proceeding to Recreate

2a434d1

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2016

0xmichalis added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2016

openshift-bot merged commit f4e266a into openshift:master Dec 13, 2016

0xmichalis deleted the fix-recreate-deployments branch December 13, 2016 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for old pods to terminate before proceeding to Recreate #11917

Wait for old pods to terminate before proceeding to Recreate #11917

0xmichalis commented Nov 15, 2016

0xmichalis commented Nov 15, 2016

0xmichalis Nov 15, 2016

mfojtik Nov 15, 2016

mfojtik Nov 15, 2016

0xmichalis Nov 15, 2016

liggitt Nov 15, 2016 •

edited

Loading

0xmichalis Nov 15, 2016

0xmichalis Nov 15, 2016

liggitt Nov 15, 2016

0xmichalis Nov 15, 2016

mfojtik Nov 15, 2016

0xmichalis Nov 16, 2016

mfojtik Dec 6, 2016

0xmichalis Dec 6, 2016

mfojtik Dec 6, 2016

0xmichalis Dec 6, 2016

mfojtik Dec 6, 2016

0xmichalis Dec 6, 2016

mfojtik commented Dec 6, 2016

0xmichalis commented Dec 6, 2016

0xmichalis commented Dec 7, 2016

0xmichalis commented Dec 7, 2016

0xmichalis commented Dec 8, 2016

0xmichalis commented Dec 8, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

openshift-bot commented Dec 12, 2016

openshift-bot commented Dec 12, 2016

0xmichalis commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016 •

edited

Loading

Wait for old pods to terminate before proceeding to Recreate #11917

Wait for old pods to terminate before proceeding to Recreate #11917

Conversation

0xmichalis commented Nov 15, 2016

0xmichalis commented Nov 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Nov 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfojtik commented Dec 6, 2016

0xmichalis commented Dec 6, 2016

0xmichalis commented Dec 7, 2016

0xmichalis commented Dec 7, 2016

0xmichalis commented Dec 8, 2016

0xmichalis commented Dec 8, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

0xmichalis commented Dec 12, 2016

openshift-bot commented Dec 12, 2016

openshift-bot commented Dec 12, 2016

0xmichalis commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016 • edited Loading

liggitt Nov 15, 2016 •

edited

Loading

openshift-bot commented Dec 13, 2016 •

edited

Loading