Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45498][CORE] Followup: Ignore task completion from old stage a… #43326

Closed
wants to merge 1 commit into from

Conversation

mayurdb
Copy link
Contributor

@mayurdb mayurdb commented Oct 11, 2023

What changes were proposed in this pull request?

With SPARK-45182, we added a fix for not letting laggard tasks of the older attempts of the indeterminate stage from marking the partition has completed in the map output tracker.

When a task is completed, the DAG scheduler also notifies all the task sets of the stage about that partition being completed. Tasksets would not schedule such tasks if they are not already scheduled. This is not correct for the indeterminate stage, since we want to re-run all the tasks on a re-attempt

Why are the changes needed?

Since the partition is not completed by older attempts and the partition from the newer attempt also doesn't get scheduled, the stage will have to be rescheduled to complete that partition. Since the stage is indeterminate, all the partitions will be recomputed

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added check in existing unit test

Was this patch authored or co-authored using generative AI tooling?

No

@mayurdb
Copy link
Contributor Author

mayurdb commented Oct 11, 2023

@cloud-fan Can you please take a look?

@github-actions github-actions bot added the CORE label Oct 11, 2023
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but will like @cloud-fan to also take a look given he added this initially for handling CommitDeniedException.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in fb3b707 Oct 13, 2023
cloud-fan pushed a commit that referenced this pull request Oct 13, 2023
### What changes were proposed in this pull request?
With [SPARK-45182](https://issues.apache.org/jira/browse/SPARK-45182), we added a fix for not letting laggard tasks of the older attempts of the indeterminate stage from marking the partition has completed in the map output tracker.

When a task is completed, the DAG scheduler also notifies all the task sets of the stage about that partition being completed. Tasksets would not schedule such tasks if they are not already scheduled. This is not correct for the indeterminate stage, since we want to re-run all the tasks on a re-attempt

### Why are the changes needed?
Since the partition is not completed by older attempts and the partition from the newer attempt also doesn't get scheduled, the stage will have to be rescheduled to complete that partition. Since the stage is indeterminate, all the partitions will be recomputed

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added check in existing unit test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43326 from mayurdb/indeterminateFix.

Authored-by: mayurb <mayurb@uber.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit fb3b707)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants