Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance reconcile of taskrun to avoid extra pod creation #2022

Merged

Conversation

vincent-pli
Copy link
Member

Changes

Fix issue: #1976

The root cause for the issue is:
The reconsile of taskrun hit error:

Failed to update taskRun status, the object has been modified; please apply your changes to the latest version and try again

it's a benign error, we should just ignore it, but current reconcile cannot, see the logic:

  • Check if taskrun.Status.Podname is nil, if not, get the pod and move on, one follow-up step is add READY annotation to the pod to make pod run container one by one.
  • If taskrun.Status.Podname is nil, create a new pod and move on.

The problem is:
If hit Failed to update taskRun status error, the taskrun will stay in workqueue and reconcile again after increased delay, but the Podname is still nil (although it's not in real world, since the pod has been created), so a extra pod will be created. and the previous pod will stick in Running forever since no one add READY annotation.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Describe any user facing changes here, or delete this block.

Examples of user facing changes:
- API changes
- Bug fixes
- Any changes in behavior

@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Feb 8, 2020
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 8, 2020
@vincent-pli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@vincent-pli vincent-pli changed the title Enhance reconcole of taskrun to avoid extra pod creation Enhance reconcile of taskrun to avoid extra pod creation Feb 8, 2020
@vincent-pli
Copy link
Member Author

/assign @imjasonh

@ghost
Copy link

ghost commented Feb 10, 2020

Nice! Could we also change test/retry_test.go from t.Logf to t.Errorf for this now? Or are there still edge cases that result in too many pods?

@ghost
Copy link

ghost commented Feb 10, 2020

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2020
@chmouel
Copy link
Member

chmouel commented Feb 11, 2020

master has been failing since last night on every runs :

image

I just tried it with your PR and it worked! Thanks...

image

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2020
@tekton-robot tekton-robot merged commit 85add00 into tektoncd:master Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants