Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: use buildjet for process replay #29277

Merged
merged 4 commits into from
Aug 26, 2023
Merged

CI: use buildjet for process replay #29277

merged 4 commits into from
Aug 26, 2023

Conversation

adeebshihadeh
Copy link
Contributor

@adeebshihadeh adeebshihadeh commented Aug 8, 2023

  • worth it?
  • only move jobs over that are faster
  • do forks get charged to us? no
  • does fork CI just work? runs-on supports expressions, so we can just fallback to the normal GitHub runners
    • need a test that enforces this, otherwise it'll break silently

@adeebshihadeh
Copy link
Contributor Author

@jnewb1 can you finish this up and evaluate how much faster it is?

@adeebshihadeh adeebshihadeh mentioned this pull request Aug 9, 2023
@jnewb1
Copy link
Contributor

jnewb1 commented Aug 10, 2023

ignoring setup time:

Runner Process Replay Time Unit Tests Time Static Analysis Car Models
Github (master) 17:05 20:03 1:37 ~11 mins
buildjet-4cpu 4:58 19:33 0:46 ~10 mins
buildjet-8cpu 2:46 20:26 0:48 ~6 mins

Perhaps if we split the unit tests across the 4 or 8 cpus it would be significantly faster for the unittests:
https://pytest-xdist.readthedocs.io/en/stable/

Seems worth it to put the replay test on buildjet at least

@adeebshihadeh
Copy link
Contributor Author

adeebshihadeh commented Aug 10, 2023

  • Definitely worth it for process replay
  • Unit tests need to be properly parallelized before we'll realize much benefit from this
  • Static analysis can stay on GH Actions or go, either seems fine
  • how's test car models? hoping that one is decently faster (it already supports parallel running with pytest -n 8

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 10, 2023

how's test car models? hoping that one is decently faster

added to table, about 6 mins compared to 11 for 8vcpu

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 10, 2023

My verdict would be:

  • run cars models and process replay on 8vcpu builtjet
  • run the rest on GH, until we can split up the unit tests to take advantage of multiple cpus

TODO:

  • figure out the caching issues on buildjet
    • Docker cache is pulled but not used?
    • scons_cache always gives a cache miss

@adeebshihadeh
Copy link
Contributor Author

Sounds good; goal is comfortably under 10mins for time-to-green-check on the average commit.

@jnewb1 jnewb1 changed the title try buildjet CI: use buildjet Aug 11, 2023
@jnewb1 jnewb1 changed the title CI: use buildjet CI: use buildjet where it's currently better Aug 12, 2023
@jnewb1 jnewb1 force-pushed the speeeed branch 3 times, most recently from f4ffa27 to 8f61041 Compare August 25, 2023 00:20
@jnewb1
Copy link
Contributor

jnewb1 commented Aug 25, 2023

Buildjet running on my fork, <6 mins for car and replay tests
https://github.com/jnewb1/openpilot/actions/runs/5973895185/job/16207030187?pr=5

Car tests are only like a minute of actual runtime on the 8cpu buildjet, maybe we combine them when running on buildjet, and instead split up the unittests?

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 25, 2023

@adeebshihadeh
i found that only buildjet requires pinning the docker version, and since it's so much faster than github actions for those cases it shouldn't be an issue

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 25, 2023

buildjet has a concurrency limit of 64 cpus, meaning only 8 jobs could run at once. We can also pay 300$ per month to increase that to 164

we might want to run the car tests on 4cpu instead of 8, 8 might be overkill since 2/3 of the time is in setup

@jnewb1 jnewb1 marked this pull request as ready for review August 25, 2023 09:34
@adeebshihadeh
Copy link
Contributor Author

From what I understand, process replay is still the only job that clearly needs a faster machine. The cars tests take ~7m on master, and I suspect you might even get a furhter speedup by using 3 or 4 jobs on the GitHub runners. As for the unit tests, it's not clear where all that time is going.

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 25, 2023

From what I understand, process replay is still the only job that clearly needs a faster machine. The cars tests take ~7m on master, and I suspect you might even get a furhter speedup by using 3 or 4 jobs on the GitHub runners. As for the unit tests, it's not clear where all that time is going.

right, but we could combine the car tests into maybe 1 or 2 steps if we use buildjet. Tried using 4 jobs for the car tests on github actions and it was about the same on average.

@adeebshihadeh
Copy link
Contributor Author

Why would we want to do that? We want to use BuildJet as little as possible: it costs us money and it diverges our CI from external contributors'.

Copy link
Contributor Author

@adeebshihadeh adeebshihadeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is ready, let's merge for process replay.

@@ -44,6 +49,12 @@ runs:
run: |
find . -type f -executable -not -perm 755 -exec chmod 755 {} \;
find . -type f -not -executable -not -perm 644 -exec chmod 644 {} \;
- id: setup-buildx-action
if: ${{ inputs.is_buildjet == 'true' }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we detect this in the setup action?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@adeebshihadeh
Copy link
Contributor Author

adeebshihadeh commented Aug 25, 2023

"Post Cache test routes" has a bunch of permission issues. Is that a problem?

Also seems like it didn't find the cache for some reason?

@jnewb1
Copy link
Contributor

jnewb1 commented Aug 25, 2023

"Post Cache test routes" has a bunch of permission issues. Is that a problem?

Also seems like it didn't find the cache for some reason?

issue on master: #29636

@adeebshihadeh adeebshihadeh changed the title CI: use buildjet where it's currently better CI: use buildjet for process replay Aug 26, 2023
@adeebshihadeh adeebshihadeh merged commit d35beff into master Aug 26, 2023
20 checks passed
@adeebshihadeh adeebshihadeh deleted the speeeed branch August 26, 2023 01:50
@jnewb1 jnewb1 mentioned this pull request Aug 22, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants