Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action on pull_request trigger is skipped even though base branch content changed #264

Open
fknorr opened this issue Sep 1, 2022 · 8 comments · May be fixed by #286
Open

Action on pull_request trigger is skipped even though base branch content changed #264

fknorr opened this issue Sep 1, 2022 · 8 comments · May be fixed by #286
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@fknorr
Copy link

fknorr commented Sep 1, 2022

I have the following workflow defined, which enables concurrent skipping and then simply echoes whether a run has been skipped or not.

name: CI

on:
  push:
  pull_request:

jobs:
  find-duplicate-workflows:
    runs-on: [ self-hosted ]
    steps:
      - id: skip-check
        uses: fkirc/skip-duplicate-actions@v4.0.0
        with:
          concurrent_skipping: "same_content_newer"
          skip_after_successful_duplicate: "true"
      - run: echo ${{ steps.skip-check.outputs.should_skip }}

Now I add a new file that only exists in the main branch, and an additional file that only exists in a pr branch. Now I create a pull request to merge pr into main. The push triggers for both branches are ran as expected.

The pull_request triggers must not be skipped however, since the merge will contain both files, so the checked out contents differ from both main and pr. skip-duplicate-actions however reports should-skip = true.

See our test repo for a reproduction. Runs 6 and 11 are from push events, and 12 is from a pull_request trigger concurrent with its sibling push.

@paescuj
Copy link
Collaborator

paescuj commented Sep 12, 2022

Thank you for the detailed report, @fknorr!

As you can see in the logs of the pull_request workflow run (#12), skip-duplicate-actions has detected that the exact same files (same tree hash) have already been checked in the push workflow run (#11) and therefore reports should-skip = true. Apparently, the push workflow run was performed a little before the pull_request workflow run.
Please note that both workflow runs, pull_request and push, are performed against the same ref (the merge commit).
So, this is working as intended.

However, once the pull request (celerity/test-skip-duplicate-actions#2) is merged into the main branch, skip-duplicate-actions should report should-skip = false because of the added commit in the main branch.

Hope this helps!

@fknorr
Copy link
Author

fknorr commented Sep 12, 2022

Thanks for taking a look at this @paescuj. I have updated the test setup to report in detail which revisions are checked out and what files are present (also I have deleted all old runs so there's no unexpected interactions).

The actions script now runs

git rev-parse HEAD
ls
echo ${{should_skip}}
  1. push trigger #19 runs for the main branch and reports
fb314dc3155556864a8d5d3fc063e0372cd0b582
file-on-main.txt
should_skip=false
  1. push trigger #20 runs for the un-merged PR branch and reports
59164db4e10d9f17a313bd5b98e0a5c3e19c42e0
file-in-pr.txt
should_skip=false
  1. pull_request trigger #21 runs for the implicit PR merge commit and reports
bd633cec460a9c1af3247d70fca45942903b2baa
file-in-pr.txt
file-on-main.txt
should_skip=true

The skip-duplicate-actions logs state that this run can be skipped since the same files have been checked in # 20, but that's not actually true. From my understanding push triggers always run on the unmerged branch, and pull_request triggers check GitHub's implicit merge commit.

@paescuj
Copy link
Collaborator

paescuj commented Sep 13, 2022

From my understanding push triggers always run on the unmerged branch, and pull_request triggers check GitHub's implicit merge commit.

Of course, you're totally right! I must have been a bit tired and got things mixed up... 😄🙈
Indeed, the behavior of skip-duplicate-actions seems to be slightly wrong in such scenarios. Thank you very much for recognizing this! I'll come up with a solution in the next days and report back here.

@paescuj paescuj added the bug Something isn't working label Sep 13, 2022
@paescuj paescuj mentioned this issue Sep 21, 2022
@paescuj paescuj added the documentation Improvements or additions to documentation label Sep 22, 2022
@paescuj
Copy link
Collaborator

paescuj commented Sep 22, 2022

Unfortunately, I don't see any way to get the tree SHA from the merge commit for all workflow runs that were triggered by the pull request event. (See #270 for an unsuccessful attempt)
So as a conclusion:

  • When I do find the time, I'll demand GitHub to include this information in the Workflow runs API.
  • Document that the current checks of skip-duplicate-actions are slightly wrong on pull_request events and when there are changes in the base branch.

@fknorr
Copy link
Author

fknorr commented Sep 22, 2022

Thanks for giving this a try!

I have looked into what actions/checkout does on pull requests, since it also needs to figure out the merge SHA on a pull_request trigger. Its Readme talks about $GITHUB_SHA and $GITHUB_REF, and indeed the workflow documentation suggests that this information is passed to jobs in environment variables for any event type.

From this it seems like the Action should not rely on the API at all for figuring out what commit to run on but do a getenv instead.

Edit: Oh well, this only helps to get the commit SHA in the current job of course, not the "remote ones" (at least not without some log/communication hackery).

@psalz
Copy link

psalz commented Sep 22, 2022

(at least not without some log/communication hackery).

One such hack could be to upload the SHA as a workflow artifact. Ugly but might just work..?

@paescuj
Copy link
Collaborator

paescuj commented Sep 22, 2022

Edit: Oh well, this only helps to get the commit SHA in the current job of course, not the "remote ones" (at least not without some log/communication hackery).

Exactly 😄

One such hack could be to upload the SHA as a workflow artifact. Ugly but might just work..?

Yeah, I was thinking about something like this. One "problem" with GitHub Actions is that you always need to keep an eye on the amount of API requests you're doing, otherwise you risk to run into the rate limit.
Possibly, we could upload a map with the SHAs from the last 100 runs in each workflow run and then just fetch it from the last run, respectively.
Another concern is the execution time of skip-duplicate-actions which I'd like to keep as low as possible.
I might give this a try when I find some time...

@fknorr
Copy link
Author

fknorr commented Sep 26, 2022

A conservative solution could be to identify the cases we definitely know are duplicates, and keep the jobs for which we cannot be sure.

  1. In the current run, we always know the true "checkout SHA" from the $GITHUB_SHA.
  2. For API queries on other concurrent / finished jobs, we only consider those that were not triggered on pull_request, because we know that for non-pull_requests the "API SHA" and "checkout SHA" will match.

That means we can actively skip a pull_request trigger if we find a matching push trigger, but not the other way round. I believe the remaining triggers would also behave like pushes in that sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants