Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maxRetry in job controller to prevent endless loop #412

Merged
merged 2 commits into from
Sep 5, 2019

Conversation

hzxuzhonghu
Copy link
Collaborator

In case there is invalid job container template(like stated #409 (comment)), that cannot be rejected at volcano admission. The job controller can get into an endless loop error retrying.

@volcano-sh-bot volcano-sh-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 12, 2019
glog.V(2).Infof("Failed to handle Job <%s/%s>: %v",
jobInfo.Job.Namespace, jobInfo.Job.Name, err)
// If any error, requeue it.
queue.AddRateLimited(req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update job status and record events accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job status and event inside of the Execute function should be enough for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's kind of event in Execute? How user identify whether it's done?

@TravisBuddy
Copy link

Travis tests have failed

Hey @hzxuzhonghu,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: dc61bb10-bd70-11e9-af47-41b3464977ce

@k82cn
Copy link
Member

k82cn commented Sep 5, 2019

/lgtm
/approve

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 5, 2019
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2019
@volcano-sh-bot volcano-sh-bot merged commit d6033eb into volcano-sh:master Sep 5, 2019
@hzxuzhonghu hzxuzhonghu deleted the max-retry branch September 6, 2019 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants