Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition can cause big checklists #27123

Closed
roryabraham opened this issue Sep 11, 2023 · 19 comments · Fixed by #31458
Closed

Race condition can cause big checklists #27123

roryabraham opened this issue Sep 11, 2023 · 19 comments · Fixed by #31458
Assignees
Labels
Engineering Hourly KSv2 Reviewing Has a PR in review

Comments

@roryabraham
Copy link
Contributor

Problem

Sometimes we get big checklists like this one that include all the PRs from the previous checklist.

The root cause is a race condition between:

Both are triggered at the same time after updateStaging finishes. If deployStaging finishes (i.e: the tag for the new staging version is pushed to the remote) before the checklist is created, then it works. If not, then we get a huge checklist like this will all the PRs that were deployed in the previous version.

Solution

TBD – probably change the deploy system a bit such that staging deploys are triggered by the creation of a prerelease and production deploys by the creation of a "full" release.

That way we can push the tag as soon as it's created without triggering a deploy

@roryabraham
Copy link
Contributor Author

No update right now

@melvin-bot melvin-bot bot removed the Overdue label Sep 20, 2023
@melvin-bot melvin-bot bot added the Overdue label Sep 28, 2023
@roryabraham
Copy link
Contributor Author

No update

@melvin-bot melvin-bot bot removed the Overdue label Sep 29, 2023
@mountiny
Copy link
Contributor

mountiny commented Oct 3, 2023

Looking into this one now

Noting that the createNewDeployChecklist was moved to the platformDeploy.yml, which feels like the race condition is not happening anymore and the bug might be somewhere else.

When the updateStaging finished, it force pushes to staging branch and then the deploy.yml runs which pushes the tags and finally we trigger the platformDeploy.yml which would create the checklist but the correct tags should be available

Looking more into why this is failing

@mountiny
Copy link
Contributor

mountiny commented Oct 3, 2023

Created a PR which should hopefully help with this

@roryabraham
Copy link
Contributor Author

Ok, it seems like there's some way the git history can change in production that's different from what we have represented in the CIGitLogicTest.sh, just judging by running this command on staging w/ full git history:

image

Neither is correct, because the only version-bump that we expect to be included between 1.3.76-6 and 1.3.77-0 is Update version to 1.3.77-0

@roryabraham
Copy link
Contributor Author

It looks like the last checklist that was created without being affected by this error is this one, so we should try to figure out what was different between 1.3.74-3 and 1.3.75-0 vs 1.3.76-6 and 1.3.77-0

@roryabraham
Copy link
Contributor Author

Looks like the history is maybe the same?

image

@melvin-bot melvin-bot bot added Monthly KSv2 and removed Weekly KSv2 labels Oct 27, 2023
@melvin-bot
Copy link

melvin-bot bot commented Oct 27, 2023

This issue has not been updated in over 15 days. @roryabraham eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@Beamanator
Copy link
Contributor

Beamanator commented Oct 30, 2023

FYI I wrote a handy little JS script to mark off items in the new checklist that were on the old one: #30560 (comment) - just a temporary workaround

@roryabraham roryabraham added Weekly KSv2 Monthly KSv2 and removed Monthly KSv2 Reviewing Has a PR in review Weekly KSv2 labels Oct 30, 2023
@roryabraham
Copy link
Contributor Author

Moving this back to weekly since it's a PITA

@roryabraham roryabraham added the Weekly KSv2 label Oct 30, 2023
@roryabraham roryabraham removed the Monthly KSv2 label Oct 30, 2023
@melvin-bot melvin-bot bot added the Overdue label Nov 8, 2023
@roryabraham
Copy link
Contributor Author

Planning to investigate this further this week

@melvin-bot melvin-bot bot removed the Overdue label Nov 13, 2023
@roryabraham
Copy link
Contributor Author

I have a hunch that this has something to do with force-pushes.

@roryabraham
Copy link
Contributor Author

Spent several hours trying to dig into this, but so far no success in finding a minimal reproduction. I tried something like this:

  1. Create a PR A
  2. Create a PR B that conflicts with A
  3. Merge PR B
  4. Release app
  5. Merge main into pr-A branch and resolve conflicts
  6. Merge PR A

That worked as expected, so I think the next variation should be something like this:

  1. Create PR A
  2. Create PR B that conflicts with A
  3. Merge PR B
  4. Release app
  5. Rebase pr-A branch and resolve conflicts. Force-push to PR A
  6. Merge PR A

My thinking is that by rebasing onto a commit from the previous release then force-pushing we'll somehow include the commits from PR B in the new checklist

@roryabraham
Copy link
Contributor Author

roryabraham commented Nov 17, 2023

The other thing maybe worth looking into is that our tests always checkout 1 commit on main before fetching the other tags used in the git log command.

Not sure if that matters, but it's a difference between our local testing setup and the production environment (this may change if/when I create a new callable workflow to wrap createOrUpdateStagingDeploy so we can retry / refresh it at will)

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Weekly KSv2 labels Nov 18, 2023
@roryabraham
Copy link
Contributor Author

roryabraham commented Nov 18, 2023

This should be fixed by #31458. I hope so 😅

@roryabraham roryabraham added the DeployBlockerCash This issue or pull request should block deployment label Nov 18, 2023
@roryabraham
Copy link
Contributor Author

That didn't work perfectly: https://github.com/Expensify/App/actions/runs/6910552568/job/18803863115. Going to retry

@roryabraham roryabraham added DeployBlockerCash This issue or pull request should block deployment and removed DeployBlockerCash This issue or pull request should block deployment labels Nov 18, 2023
@github-actions github-actions bot added Hourly KSv2 and removed Weekly KSv2 labels Nov 18, 2023
Copy link
Contributor

👋 Friendly reminder that deploy blockers are time-sensitive ⏱ issues! Check out the open `StagingDeployCash` deploy checklist to see the list of PRs included in this release, then work quickly to do one of the following:

  1. Identify the pull request that introduced this issue and revert it.
  2. Find someone who can quickly fix the issue.
  3. Fix the issue yourself.

Copy link

melvin-bot bot commented Nov 18, 2023

📣 @github-actions[bot]! 📣
Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork.
Please follow these steps:

  1. Make sure you've read and understood the contributing guidelines.
  2. Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
  3. Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
  4. Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details.
    Screen Shot 2022-11-16 at 4 42 54 PM
    Format:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>

@roryabraham
Copy link
Contributor Author

Nice, that worked as expected (well, maybe not the upwork bit, that's weird).

Anyways, for context I'm marking this as a deploy blocker not because we actually need to block deploy on it, but because I think we should merge #31458 before the next deploy, because it should fix this big checklist issue

@mountiny mountiny removed the DeployBlockerCash This issue or pull request should block deployment label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Hourly KSv2 Reviewing Has a PR in review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants