-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the trigger controllers cooperate instead of race #8746
Make the trigger controllers cooperate instead of race #8746
Conversation
I think that if we're going to the effort of significant refactor, we should invest the time to switch to an |
Would you be okay if @Kargakis continued to chisel the overall shape of the system into something more closely resembling upstream before doing an informer/workqueue refactor? I can envision an informer refactor helping us get down to a single controller once these remaining peripheral refactors are addressed. |
I'm not close to the area, but I would have expected the effort to look more like creating a new controller using an informer and workqueue, then picking pieces like this into it. Doing it like this seems like it would be going through the effort twice. Is it somehow easier to get there after combining our controllers using a different mechanism first? |
Perhaps a quick sketch of what will help. Here's what I would expect our DC controllers to look like:
Each controller is small, none overlap, clean hand-offs, most races stopped due to pessimistic updates. Is this what we're shooting for or are looking for something else? |
I could probably see a fifth to handle cancellation that would promote a spec request into a concrete: revert to exactly this config as the next deployment. In trying to think it all through, I'd probably make a really specific status field that tracked my destination "make this" and I could see a more specific request mechanism that allowed a detailed "I'd like you to do this" in cases more complicated that a straight "deploy again". I'm not sure that I see the solution as trying to combine all the function into a single controller. Is there a consensus reached elsewhere that the multi-controller system is harder to deal with? What is the hand-off between our controllers today? |
Also, I want to be crystal clear. Our #1 priority is to get alignment with upstream without breaking existing deployments. Refactoring the controllers should be a secondary priority underneath that. Refactors not in pursuit of that delivery need a really high bar "because it's cleaner". I will be merciless with reviews until we can show that upgrade path MVP |
Higher than "because its cleaner". "Because it's faster" is better, but not enough. |
On May 4, 2016, at 7:58 PM, David Eads notifications@github.com wrote: Perhaps a quick sketch of what will help. Here's what I would expect our DC
Moving to a different controller. Do this last.
Each controller is small, none overlap, clean hand-offs, most races stopped Is this what we're shooting for or are looking for something else? Don't underestimate DC complexity. I don't want us chasing subtle |
It wasn't meant as a TODO list, more an appeal for help understanding what the alternate vision is. |
@ironcladlou after thinking about this more, we could possibly continue using the config change controller and just trash the generator. Unfortunately semantics of latestVersion are wrong. Think about this for a sec: The main controller never updated latestVersion. It just acts based on it. Unfortunately oc (and probably the console cc: @jwforres ) are latestVersion-addicts meaning they always expect that latestVersion is synced so we need to continue supporting it without allowing them to update it directly (#7149). @deads2k i think that's the main reason shared caches are not a top priority atm, we want latestVersion in etcd as soon as we can. Races between the trigger controllers will not be a thing once we remove the shared dependency (#8696). I think the hardest part is the image change controller as it needs to update both spec and status. @liggitt suggested having the image change controller update just the spec and config change controller detect the template diff and update status. Then we can special case the imagechange controller changes in the main controller instead of the registry (what I am doing in #7149 currently) and have the main controller set the status details. Thoughts? |
@smarterclayton this is not influenced either by readability or performance-related issues. We need to transition to separate status updates and reduce the scope of what each controller is updating. Currently the trigger controllers are a mess because of the generator and our client tools are a mess because of latestVersion. |
I'm ok with the generator refactor - I'm just concerned about much larger On Thu, May 5, 2016 at 8:23 AM, Michail Kargakis notifications@github.com
|
I'm fine with the suggestion on image change controller. On Thu, May 5, 2016 at 12:25 PM, Clayton Coleman ccoleman@redhat.com
|
@ironcladlou @smarterclayton PTAL No more generator and less hacks in the server 🎉 I didn't touch the generator code, just unplugged it. Since we expose it as an endpoint I guess we will need to continue having it. |
@liggitt this is what you expected |
// change controller did its job. The second return argument helps in separating between config | ||
// change and image change causes. | ||
// | ||
// TODO: Handle multiple image change triggers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of an important todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you just mean record them, in which case it's not that important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the only thing remaining here is the annotation. Only one image is currently stored as lastTriggered. I am thinking of saving all images as part of the annotation ie. deployment.openshift.io/last-triggered-image: fullspec1,fullspec2,..
for multiple triggers. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialization as a json array if you have to, using commas is pretty gross. Or add multiple keys. For the generic trigger controller we're going to have to use multiple annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will add multiple keys for now. What's the generic trigger controller? Config change?
@smarterclayton do you have any idea why annotation keys need to be qualified names? I want to use namespace/name:tag for the annotation key but it is rejected by validation |
Switched to namespace.name.tag for now. I think this is ready for reviews, I just need to get to tests. |
Are we sure that the allowed tag values overlap with the allowed annotation On Fri, May 13, 2016 at 9:27 AM, Michail Kargakis notifications@github.com
|
Hm, not on the deployment side but we do validate imagestreamtag names, alas looser validation |
continue | ||
} | ||
|
||
// Check if the image repo matches the trigger | ||
if !triggerMatchesImage(config, params, imageRepo) { | ||
// Resolve images for all initial deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment seems inaccurate? Isn't it "Only detect changes when the trigger's enabled" or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not inaccurate - maybe ambiguous? All initial deployments will have their images resolved, even those with automatic==false (fixes #6934) . Otherwise (latestVersion != 0), we are going to resolve only automatic==true triggers. I will update the comment clarifying this.
Actually, my first take on this was doing exactly that but it would break a lot of people. The reason is that the deploymentconfig controller is syncing based on latestVersion. Say we remove the cc controller: When the deploymentconfig controller receives a latestVersion update, it will have to update the deploymentconfig, exit the reconcilation, (otherwise clients that expect a specific latestVersion are broken), then wait for the next etcd event that would be triggered by this update, and resync the deploymentconfig as it does now, right? |
My thought was that the config controller would update the latestVersion and then create the new deployment during the same handle call. The DC update would cause the DC to be re-queued and would no-op the 2nd pass through handle. Anyway, I agree (based on IRC convo) that it's not worth doing that sort of change right now. |
This LGTM as long as all the integration tests still pass. |
@smarterclayton @ironcladlou I just realized that we don't need to store any new annotation that serializes anything, we can just check the serialized config of the latest deployment (which is already deserialized for comparing templates). Last commit, PTAL |
if err != nil { | ||
return false, fatalError(fmt.Sprintf("error decoding deployment config from deployment %q for deployment config %s: %v", deployutil.LabelForDeployment(deployment), deployutil.LabelForDeploymentConfig(config), err)) | ||
} | ||
return c.decodeConfig(deployment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future ref: I think this decoder abstraction has proven to be annoying/unnecessary and we can probably just replace it with a simple stateless utility.
I think this works, good idea. |
one more [test] |
known flakes [test] |
new flake #8946 [test] |
another new: #8947 gotta deflake em all! [test] |
@ironcladlou @smarterclayton no other concerns from me. Can we merge this? |
Hold on, I need to cleanup |
Ready |
I have no other concerns [merge] On Thu, May 19, 2016 at 6:29 PM, Michail Kargakis notifications@github.com
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5954/) (Image: devenv-rhel7_4245) |
Evaluated for origin merge up to e3510a7 |
Evaluated for origin test up to e3510a7 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/3939/) |
The plan here is to remove the generator from the trigger controllers and make the image change controller update only the spec of the deploymentConfig, deferring to the config change controller for incrementing latestVersion.
FIxes #8937
Fixes #8696
Fixes #6934
Trello card: https://trello.com/c/XRA3RF5W/714-5-remove-shared-dependency-from-trigger-controllers
@ironcladlou @smarterclayton @deads2k @liggitt @pweil- @mfojtik